MODIFICATION AND PARALLELIZATION OF GENETIC ALGORITHM FOR SYNTHESIS OF ARTIFICIAL NEURAL NETWORKS

Context. The problem of automation synthesis of artificial neural networks for further use in diagnosing, forecasting and pattern recognition is solved. The object of the study was the process of synthesis of ANN using a modified genetic algorithm. Objective. The goals of the work are the reducing the synthesis time and improve the accuracy of the resulting neural network. Method. The method of synthesis of artificial neural networks on the basis of the modified genetic algorithm which can be implementing sequentially and parallel using MIMD – and SIMD-systems is proposed. The use of a high probability of mutation can increase diversity within the population and prevent premature convergence of the method. The choice of a new best specimen, as opposed to a complete restart of the algorithm, significantly saves system resources and ensures the exit from the area of local ex-trema. The use of new criteria for adaptive selection of mutations, firstly, does not limit the number of hidden neurons, and, secondly, prevents the immeasurable increase in the network. The use of uniform crossover significantly increases the efficiency, as well as allows emulating other crossover operators without problems. Moreover, the use of uniform crossover increases the flexibility of the genetic algorithm. The parallel approach significantly reduces the number of iterations and significantly speedup the synthesis of artificial neural networks. Results. The software which implements the proposed method of synthesis of artificial neural networks and allows to perform the synthesis of networks in sequentially and in parallel on the cores of the CPU or GPU. Conclusions. The experiments have confirmed the efficiency of the proposed method of synthesis of artificial neural networks and allow us to recommend it for use in practice in the processing of data sets for further diagnosis, prediction or pattern recognition. Prospects for further research may consist in the introduction of the possibility of using genetic information of several parents to form a new individual and modification of synthesis methods for recurrent network architectures for big data processing.


ABBREVIATIONS
ANN is an artificial neural net; EA is an evolutionary algorithm; ESP is an enforced subpopulations; MGA is a modified genetic algorithm; NEAT is a neural evolution through augmenting topologies; PMG is a parallel modified genetic algorithm; RAM is a random access memory; RV is a random value; SANE is a symbiotic adaptive neuroevolution.P is a population of the individuals (neural nets); e convergenc p is a probability of early convergence of the method; mut p is a probability of mutation; w is a connection between neurons.

INTRODUCTION
The choosing of topology and configuration the weights of connections of the ANN are the most important stages in the use of neural network technologies for solving practical problems [1][2][3][4][5][6][7].From these stages de- pends on the quality (adequacy) of the obtained neuronet models, control systems, etc.
Synthesis of ANNs by the traditional method is performed, in fact, through trial and error.The researcher sets the number of layers of neurons, as well as the structure of connections between them (the presence/absence of recurrent connections), and then analyzes the result.That is, ANNs is trained using any method, and then tested on a test sample.If the results of the synthesis meet the specified criteria, the task of building the ANN is considered to be completed successfully; otherwise, the process is repeated with other values of the output parameters [8][9][10][11][12][13][14][15].
Of course, the rapid development of the theory and practice of the using of genetic algorithms, forced researchers to look for ways to apply them to the problem of searching optimal structure of ANN (the evolution of neural networks or neuroevolution).This decision becomes even more logical if drawing a parallel with the real world, that is, if the idea of ANNs is borrowed from the nature, then the evolution of the nervous system with the subsequent formation and development of the brain is an example of solving such a problem [16,17].
The object of study is the process of synthesis of ANN using a modified genetic algorithm.
For the decision of tasks of choosing the topology of ANN, and settings of the weights of all neurons and rigorous methods to date do not exist.The proposed solutions are aimed at solving local problems, through which the structure of the ANN is unsatisfactory, and the training time is large.In this case, it must to create a network and make calculations again.Even less attention is paid to the construction of multilayer asymmetric ANNs, characterized by complexity and multivariance.
The subject of study is the sequential and parallel method of synthesis of ANNs.
To date, there are several methods for the synthesis of ANNs, based on the use of evolutionary algorithms, however, it should be noted that most of these methods have similar disadvantages: a considerable time and a highlyiterative nature.Therefore, the paper proposes two approaches: a MGA and PMGA for the synthesis of ANNs.
The purpose of the work is to reduce the synthesis time and improve the accuracy of the resulting ANN.Additionally, determine the feasibility of using parallel implementation of MGA.

PROBLEM STATEMENT
The basis of ANNs are neurons with a structure similar to biological analogues.Each neuron can be represented as a microprocessor with several inputs and one output.When neurons are joined together, a structure is formed, which calls a neural network [18].Vertically aligned neurons form layers: input, hidden and output.The number of layers determines the complexity and, at the same time, the functionality of the network, which is not fully investigated.
For researchers, the first stage of creating a network is the most difficult task.The following recommendations are given in the literature [10].
1) the number of neurons in the hidden layer is determined empirically, but in most cases used the rule ; 2) increasing the number of inputs and outputs of the network leads to the need to increase the number of neurons in the hidden layer; 3) for the ANNs modeling multistage processes required additional hidden layer, but, on the other hand, the addition of hidden layers may lead to overwriting and the wrong decision at the output of the network.
Based on this, we present the problem as follows: for the synthesis of ANN ( NN ) it is necessary to determine the set of neurons , what consists of subsets of input and a lot of weights of connections between neurons Having determined the values of the elements of the sets, we can consider the synthesis of ANN -complete.

REVIEW OF THE LITERATURE
The combination of ANNs and EA makes it possible to combine the flexibility of setting ANNs and adaptability EA, which allows to implement a largely unified approach to solving a wide range of problems of classification, approximation and modeling [19][20][21][22][23][24][25][26][27][28][29].
The first work on the use of EA for training and setting up ANN's, appeared about 20 years ago [30][31].Research in this area is usually associated with the following tasks: -searching for the values of weights of connections ANNs with a fixed structure; -setting the structure of the ANN without first finding the weights of connections; -setting the parameters of the training algorithm; -setting parameters of neuronal activation functions; -filtering training data; -various combinations of the above tasks.
Neuroevolution approach for simultaneous solution of two main tasks of the synthesis of ANNs: setting the weights of connections and the structure of ANN, allows compensating to some extent the disadvantages, inherent in each of them separately and combining their advantages [32][33][34][35].On the other hand, the price of it is a huge searching space, as well as combining a number of disadvantages caused by the using of the evolutionary approach.Summing up, we list the advantages and disadvantages.
The advantages include: 1) independence from the structure of ANN and characteristics of neuronal activation functions; 2) the ability to automatically search for the ANN topology and obtain a more accurate neural network model.
As noted, the synchronous solution of two problems avoids some difficulties.So the appearance of individuals in the population, which correspond to ANNs with different topologies, reduces the importance of the problem of competing solutions, and the availability of information about the weights of connections allows to bypass the problem of subjective assessment of the structure of ANN [33], due to the fact that the structure of the neural network is not estimated, but the entire ANN completely.
However, there are other disadvantages: 1) the complexity of fine-tuning the connections weights in the later stages of evolutionary search; 2) large, compared with gradient algorithms, the requirements for the amount of RAM through the use of the population of ANNs; 3) the complexity of the organization of search topology ANN.
Despite the fact that in most of the works devoted to the neuroevolutionary approach, only a theoretical approach to solving the problems of neural network optimization is proposed, several methods can be found that are recognized as promising and worthy of attention [32][33][34][35][36][37][38][39].
From the early works of noteworthy cellular Frederick Gruau method [40][41][42] uses a special grammar for the representation of neural network structures.One individual represented an entire neural network, with each neuron considered as a biological cell, and the growth of the network was determined through the mechanisms of sequential and parallel "division" of neurons i.e. cells.However, this method involves the implementation of a large number of specific operators that provide simulation of cell activity.
The SANE [43,44] method uses a different approach.It is consider the development of two independent populations, one of which individuals are separate neurons, and the other contains information about the structures of an artificial neural network.The disadvantages of this method include the fact that the number of hidden neurons and connections is limited.
The ESP method [45,46] is a development of the sane method.Its main difference is that the network structure is fixed and is given a priori.The population of neurons is divided into subpopulations, in each of which the evolution is independent.Due to parallelization of the solution search, as well as simplification of the problem due to the rejection of the evolution of the artificial neural network structure, ESP works much faster than SANE, sometimes by an order of magnitude, but for the successful operation of the method it is required to choose the appropriate structure of the neural network [47].
One of the most potentially successful attempts to get rid of the disadvantages of direct coding while preserving all its advantages is the method proposed in 2002, called NEAT [48,49].Designed by Kenneth Stanley, the NEAT method allows customizing the structure of the network, and without restrictions on its complexity.The solution proposed by the authors is based on the biological concept of homologous genes (alleles), as well as on the existence in nature of the synapsis process -the alignment of homologous genes before the crossover.The technique assumes that two genes (in two different individuals) are homologous if they are the result of the same mutation in the past.In other words, with each structural mutation (gene addition), a new gene is assigned a unique number, which then does not change during evolution.The method uses a number of techniques, such as historical labels and specialization of individuals, to make the process of evolution significantly more efficient [50].
Summing up, it can be noted that the joint use of evolutionary methods and artificial neural networks allows us to solve the problems of configuration and training of artificial neural networks both individually and simultaneously.One of the advantages of this synthesized approach is largely a unified approach to solving a variety of problems of classification, approximation, control and modeling.The use of qualitative evaluation of the functioning of artificial neural networks allows the use of neuroevolutionary methods to solve the problems of the study of adaptive behavior of intelligent agents, the search for game strategies, signal processing.Despite the fact that the number of problems and open questions concerning the development and application of neuroevolutionary methods (coding methods, genetic operators, methods of analysis, etc.) is large, often for the successful solution of the problem with the use of neuroevolutionary method adequate understanding of the problem and neuroevolutionary approach, as evidenced by a large number of interesting and successful works in this direction [33][34][35][36].

MATERIALS AND METHODS
The paper proposes a consistent implementation of MGA for the synthesis of ANN.
In the method, which is proposed to find a solution using a population of neural networks:

=
, that is, each individual is a separate ANN After initialization, all individuals have coded networks in their genes without h N , and all i N are connected to each o N .That is, at first, all the presented ANNs differ only in the weights of the interneuron connection i w .In the process of evaluation, based on the genetic information of the individual under consideration, a neural network is first built, and then its performance is checked, which determines the fitness f of the individual [51][52][53].After evaluation, all individuals are sorted in order of reduced fitness, and a more successful half of the sorted population is allowed to cross, with the best individual immediately moving to the next generation.In the process of reproduction, each individual is crossed with a randomly selected individual from among those selected for crossing.The resulting two descend-ants are added to the new generation: Once a new generation is formed the mutation operator starts working.However, it is important to note that the selection of the truncation significantly reduces the diversity within the population, leading to an early convergence of the algorithm, so the probability of mutation is chosen to be rather large: [51].If the best individual in the population does not change within a certain number of generations (by default, it is proposed to set this number at seven), a new best individual is selected from the queue.This approach significantly saves time and resources of the system, in contrast to the complete restart of the method, but also allows implementing the exit from the area of local extrema due to the relief of the objective function, as well as a large degree of reliability of individuals in one generation.
It should be noted that the number of hidden neurons is theoretically unlimited.To regulate the size of the resulting networks, three criteria are used: the criteria for regulating the size and direction of development of the network, allowing at the stage of mutation to adaptively choose which type of structure transformation is more suitable for this network.
Obviously, the chosen method of coding requires special genetic operators that implement crossover and mutation.
The uniform crossover operator is one of the most efficient recombination operators in the standard genetic algorithm [54][55][56].
Uniform crossover is performed according to a randomly selected pattern that indicates which genes should be inherited from the first father (other genes are taken from the second parent).That is, the General rule of uniform crossing can be represented as follows:  It has long been known that setting the probability of transmission of the parent gene to the offspring in uniform crossing can significantly improve its efficiency [54,55], and also allows you to emulate other crossing operators (single-point, two-point).It is also known that the use of the operator of uniform crossover allows the use of the so-called multi-parents recombination, when more than two parents are used to generate one offspring.Despite this, most studies use only two parents and a fixed probability of gene transfer is 0.5 [54].
Uniform crossover gives more flexibility when combining strings, which is an important advantage when working with genetic algorithms.
When using the proposed method, such types of mutation operator can be used: 1) adding a hidden neuron with an index assignment . The new neuron is added along with the input and output connections.In this case, the output connection of the neuron can not bind it to the input neuron; 2) removal of a randomly selected hidden neuron along with all input and output connections.In this case, if a gap is formed in the remaining indices of neurons, the correction of indices in accordance with the above algorithm.The input and output neurons of the network cannot be removed; 3) adding a connection.Randomly determine the starting and ending indexes of the neurons in ANN submitted by mutating individual.In this case the connection can't end the input to the neuron.The link weight is also determined randomly with: [ ] . If the ins already has a connection with similar input and output neurons, its weight is replaced by a random; 4) delete a randomly selected connection.In this case, a situation may arise when the last connection in the hidden neuron is removed.In this case, the neuron is removed, and, if necessary, the correction of neuronal indices of the network; 5) changing the weight of a randomly selected connection to a random value from the range [-0.5; 0.5].
Thus, using mutations of points it is possible to change the parameters of the structure of ANN.
Chaotic addition (removal) of neurons and connections can lead to situations where, for example, the network has many neurons and few connections.More logical would be to use different types of mutations depending on the characteristics of the network architecture presented mutouch individual.For this purpose, three criteria were introduced that regulate the size and direction of the network development [57,58].
The first of them characterizes the degree of connectedness [57] of neurons in the network and is calculated by the formula: It is worth noting that connections from hidden neurons to the output can appear in any case.Thus, the smaller the more likely it is that a new connection will be added as a result of the mutation.
The use of the second coefficient is based on the assumption that the more elements in the sum of the input and output vectors of the training sample (the more the total number of input and output neurons), which is likely the more complex should be the ANN required to solve the problem [57].The second coefficient is calculated by the formula: That is, the more neurons in the network, the lower the value of the criterion diff top f .and the less likely the mutation will be chosen, which adds a new hidden neuron.The third criterion is also based on the assumption that a more complex network should be used to solve more complex problems.However, this criterion characterizes the conditional complexity of the network.This criterion is based on the concept of cyclomatic complexity [59], [60].
For any of the described cases, a ligament is used in the method, since the degree of connectedness of existing neurons must be taken into account for use.
Removing connections in ANN gives a side effect: may appear hanging neurons that have no incoming connections, as well as dead-end neurons, i.e. neurons without output connections.In cases where the function of neuronal activation is such that at zero weighted sum of inputs its value is not equal to zero, the presence of hanging neurons makes it possible to adjust the neural displacement.It is worth noting that, in addition to ensuring the diversity of the population, the removal of connections can contribute to the removal of some of the uninformative and lowinformative input features.
In the developed method, it is proposed to use an adaptive mutation mechanism [57,59,60], which provides for the choice of the mutation type depending on the values of the criteria con f , diff top f .and diff comp f . .The choice of mutation type is determined based on the value of the multiplication This approach on the one hand does not limit the number of hidden neurons, on the other, prevents the immeasurable increase in the network, because the addition of each new neuron in the network will be less likely.A mutation of the weight of a randomly existing bond occurs for all mutating individuals with a probability of 0.5.Fig. 2 shows a schematic representation of the mutation type selection process.
Given the features of the proposed MGA synthesis of neural networks, its parallel form can be represented as in Fig. 3.All stages of the method can be divided into 3 stages, separated by points of barrier synchronization.At the first stage, the main core initializes the population P , and adjusts the initial parameters of the method, namely: the stopping criterion, the population size, the criteria for adaptive selection of mutations.Next, the distribution of equal parts of the population (subpopulations) and initial parameters to the cores of the computer system is performed.Initialization of the initial population cannot be carried out in parallel on the cores of the system, because the generated independent populations intersect thus increasing the search for solutions.The second stage of the proposed method is performed in parallel by the cores of the system.All cores perform the same sequence of operations on their initial population.After the barriersynchronization, the main core receives the best solutions  from the other cores and checks the stopping criterion.If it is, then the next G is formed.Otherwise, after changing the initial parameters, allowing the cores of the system getting the other solutions, return to the distribution of the initial parameters to the cores on the system is performed.
And then the cores perform parallel calculations according to the second stage of the method.
The proposed parallel method for ANN synthesis can be applied both on MIMD-systems [61] (clusters and supercomputers) and on SIMD (for example, GPU programmed with CUDA technology).

EXPERIMENTS
The proposed methods for MGA and PMGA were compared with existing analogues: ESP, SANE and NEAT.
Also note that testing the MGA, ESP, SANE and NEAT will occur using the following hardware and software: the computing system of the Department of software tools of National University "Zaporizhzhia Polytechnic" (NUZP), Zaporizhzhia: Xeon processor E5-2660 v4 (14 cores), RAM 4x16 GB DDR4, the programming model of Java threads.
The experimental verification of the proposed PMGA will additionally be performed with the additional use of the Nvidia GTX 960 GPU with 1024 cores, which are programmed using CUDA technology.
This testing technology will further compare the speed and performance of the PMGA on the MIMD-systems and SIMD.
For testing it used a training sample of Physical Unclonable Functions Data Set from the open repository UCI Machine Learning Repository [62,63].General information about the sample are given in table 1.For ANN training, 5 million instances were used, and testing of the resulting ANN occurred on 1 million instances from the sample.

RESULTS
Table 2 presents the overall results of the proposed MGA in comparison with the results of existing analogues.Particular attention is paid to the determination of the time needed for the synthesis of ANN, the value of the average error in training stage and the value of the average error when working in test mode.
Tables 3-5 show the results of testing PMGA using different hardware and using different number of CPU and GPU cores during operation.
Table 3 shows how the time spent on the synthesis of ANN changes when the number of CPU or GPU cores is increased.
Table 4 shows the speedup changes depending on the number of CPU or GPU cores used.
Table 5 shows the changing the communication overhead with the increase in used CPU or GPU cores.
For more clarity, the dependence of the speedup on the number of CPU cores used in the form of a diagram is shown in Fig. 4, for the GPU at the Fig. 5.
Fig. 6 shows the efficiency graph of NUZP computing systems when executing the proposed method.

DISCUSSION
As can be seen from the results in Table 1, consistent implementation MGA at the time of synthesis of ANN to the two existing analogues namely ESP and SANE, but is far ahead NEAT.If we compare the value of the average error in the synthesis of ANN, then using MGA it was possible to minimize it to 1.01%, which is significantly ahead of the results of analogues.It should also be noted that when testing is synthesized ANN in the case of MGA, the results are much better than analogs, so the average error value of the output of the ANN is 2.3 times less than, for example, in ANN synthesis by the method SANE.Therefore, it is possible to make a conclusion that the proposed MGA significantly exceeds the existing methods in the accuracy of the synthesized neural network.
As already noted, the testing of PMGA was carried out under a different scenario for a more complete study of the applicability and feasibility of the method on different parallel computing systems.
As can be seen from Table 3, the proposed method has an acceptable degree of parallelism and is effective on both MIMD-systems and SIMD.So on when using CPU cores it was possible to reduce execution time of a method from 86132.26 seconds (on one core) to acceptable 8053.90 seconds on 16 cores.However, it should be noted that when using a slightly different MIMD system, such as a cluster, there would be significant performance differences due to architectural features.In the cluster, the cores are connected using the InfiniBand Communicator, and in the multi-core computer they are located on the single chip, which explains the smaller impact of overhead (transfers and synchronizations).In addition, the processor model in a multi-core computer supports Turbo Boost [64] technology, so that the execution time of the method on one such core is much less than the execution time on the cluster core, which does not support such technology.
On the GPU with 960 cores involved in the execution time became 20192.78seconds that can be adequately compared with the four cores of the computer.
From Table 4 and the graphs in Fig. 4 and 5 it can be seen that the speedup, though not linear, but approaches to linear.This is due to the fact that the share of overhead (Table 5) communication overhead execution of the proposed method in computer systems is relatively small, and the number of parallel operations significantly exceeds the number of consecutive operations and synchronizations.In communication overhead we understand the ratio of the time spent by the system on forwarding and synchronization between cores, in the time of target calculations on a given number of cores.
The graph of efficiency of computer systems NUZP is presented in Fig. 6.It shows that the using of even 16 cores of computer systems for the implementation of the proposed method retains the efficiency at a relatively acceptable level and indicates the potential, if necessary and possibly, to use even more cores.Thus, the proposed method is well developed on modern computer architectures, which can significantly reduce the time of the task of synthesis of ANN.The parallel approach significantly increases the efficiency of sequential MGA and makes it even more acceptable for the synthesis of ANNs, through a significant reduction in time costs and maintaining high accuracy of the obtained neural networks.

CONCLUSIONS
The urgent problem of the synthesis of the ANNs using for diagnosis and future forecasting has been solved.
The scientific novelty lies in the fact that for the synthesis of ANNs is proposed to use a modification of the classical GA.So the input of the high probability of mutation allows to increase the diversity within the population and to prevent early convergence of the method.The choice of a new best individual, as opposed to a complete restart of the method, significantly saves system resources and ensures the exit from the area of local extrema.The use of new criterias for adaptive selection of mutations, firstly, does not limit the number of hidden neurons, and, secondly, prevents the immeasurable increase in the network.The use of uniform crossing significantly increases the efficiency, as well as allows you to emulate other crossover operators without problems.Moreover, the use of uniform crossover that increases GA flexibility.The parallel approach significantly reduces the number of iterations and significantly accelerates the synthesis of ANNs.
The practical significance of obtained results in the fact that the practical problems of synthesis of ANNs are solved, which can later be used for diagnosis and pattern recognition.The experimental results showed that the proposed synthesis methods allow to obtain accurate ANN based on the input data and can be used in practice to solve practical problems of diagnosis, prediction and pattern recognition.
which characterizes the conditional complexity of the network; con f is a criterion which characterizes the degree of connectedness of neurons in the network; fitness f is a fitness function of the individual; diff top f . is a criterion which characterizes the complexity of the network; G is a generation of the individuals; Ind g is a genes (genetic information) of the individual; Ind is an individual from the population (generation); h N is a hidden neuron; i N is an input neuron; o N is an output neuron; NN is a neural net or individual from the population (generation);

.
the second half of the population are defined as the inver-This allows for a uniform distribution of single and zero bits in the population to minimize the probability of early convergence of the method: example of a uniform crossover is shown in Fig.1.

Figure 1 -
Figure 1 -Example of a uniform crossover

Figure 2 -
Figure 2 -The choice of the type of mutation

Table 1 -
General information about data set

Table 2 -
General results of the testing

Table 3 -
Dependence the execution time of the proposed method to the number of involved cores

Table 4 -
Dependence the speedup to the number of involved cores