INFORMATION TECHNOLOGY OF DIAGNOSIS MODELS SYNTHESIS BASED ON PARALLEL COMPUTING

Context. The problem of diagnosis models synthesis in the big data processing based on parallel computing is solved. The object of the research is the process of diagnosis models synthesis. The subject of the research are the methods and information technologies for diagnosis models synthesis. Objective. The research objective is to develop diagnosis models synthesis information technology. Method. The paper deals with information technology of diagnosis models synthesis which is a set of diagrams graphically describing structural elements of the system as well as the behavioral aspects of their interaction at various stages of diagnostics objects models construction. The developed information technology enables to perform the construction of distributed diagnostics systems where computationally complex stages of diagnosis models synthesis are performed on high-performance server equipment, which makes it possible to significantly increase the practical threshold for using diagnostics systems in the processing of big data sets for solving of the tasks of training sample data reduction, rules extraction, diagnosis models construction and retraining. Results. The software which implements the proposed information technology and allows to synthesize diagnosis models based on the given data samples has been developed. Conclusions. The conducted experiments have confirmed the proposed information technology operability and allow to recommend it for solving the problems of big data processing for technical and biomedical diagnostics in practice. The prospects for further researches may include the modification of the developed information technology by introducing of other methods of diagnosis models synthesis.


INTRODUCTION
Solving of technical and medical diagnostics tasks, as well as the task of nondestructive product quality control, is connected with the task of diagnosis models construction [1][2][3][4][5][6]. Such models enable to classify samples which are diagnosed with high accuracy, and at that the target classes are sufficiently convenient for perception and analysis by experts in the application areas.
Methods which enable to solve training sample data reduction, rules extraction, diagnosis models detection and retraining problems are presented in the papers [7][8][9][10][11][12][13][14][15][16]. However such methods based on sequential computing require significant time costs, which makes it difficult to use such methods for diagnostic decision making process automation in practice. So it is necessary to develop information technology which is capable to accelerate diagnosis models synthesis process based on parallel computing [17][18][19][20].
The research objective is to develop diagnosis models synthesis information technology which enables to build distributed diagnostics systems where computationally complex stages of model synthesis are performed on highperformance server equipment, which makes it possible to significantly increase the practical threshold for using diagnostic systems in big data sets processing.

REVIEW OF THE LITERATURE
As stated above, in the papers [7][8][9][10][11][12][13][14][15][16] methods, which enable to solve problems concerning diagnosis models synthesis, are proposed. However such methods based on sequential computing require significant time costs which makes it difficult to use such methods for diagnostic decision making process automation in practice.
Feature selection methods, which were proposed in the papers [7][8][9], enable to select informative feature combinations based on evolutionary [24][25][26] and multi-agent technologies [27] of computational intelligence. The proposed methods use aprioristic information about individual selfdescriptiveness, reducing search space and decreasing time of informative feature combination selection. Nevertheless such methods require significant time costs for implementation during processing of high dimensionality data.
Productional rules extraction methods [10][11][12] enable to find the most significant replications Y X → from the given data samples , . It provides improvement of cumulative properties of diagnosis models which are synthesized, as well as increases interoperability of models, decreases its dimensionality (structural and parametrical complexity), utilized storage capacity and response speed.
The method of parametrical identification of neuro-fuzzy networks based on parallel random search, which was proposed in [13][14][15][16], uses stochastic optimization for synthesized models parameters setting (parameters of membership functions and weight coefficients of neuroelements), forms initial solution set subject to training sample information (significance of feature terms according to the compactness of training set samples arrangement in the corresponding term and the degree of its effect on output parameter value). It makes it possible to move initial search points near optimal values and to accelerate optimization process.
It is necessary to develop information technology which enables to use the methods proposed in the papers [7][8][9][10][11][12][13][14][15][16] in practice for construction of distributed diagnostics systems where computationally complex stages of diagnosis models synthesis are performed on high-performance server equipment, significantly increasing the practical threshold for using diagnostic systems.

MATERIALS AND METHODS
As it was mentioned above, diagnosis models construction based on available data sets generally requires considerable computational resources. Therefore the developed information technology is implemented based on «client-server» architecture [28][29][30]. At that it is proposed to implement complex computational processes concerning training sample processing, big data storing, model synthesis etc., on server side, organizing client access according to their access permissions. Clients are understood as people and computer systems which solve practical tasks concerning construction and application of diagnosis models.
The diagram describing main system functions of information technology front-end and back-end is presented in Fig. 1.
As it is described in Fig. 1, main functions of information system are divided between server and client sides. Client side of information system provides implementation of user interface, processing of data, entered by user (verification of data format correctness), and calculation of diagnostics object output parameter value based on the synthesized model. Server side of the system should use highperformance equipment for solving of the tasks concerning diagnosis models construction. Computationally complex processes concerning reduction of training sample dimensionality, rules extraction, synthesis and retraining of diagnosis models are processed on the server side. Besides it is proposed to use database for storage and processing of training samples from different users and also of synthesized models, extracted rules, reduced samples and other results of system operation.
For the design and development possibility of diagnosis models synthesis software system, corresponding information technology presented as the set of UML-models [31][32][33] is proposed. UML-models are represented as diagrams ( Fig. 2-7), graphically describing structural and behavioral aspects of construction of distributed diagnostics systems, which enable to solve the tasks of training sample data reduction, rules extraction, diagnosis models construction and retraining.
Based on the main functions of the system (Fig. 1) and also on the chosen architecture of diagnosis models synthesis information technology, it is possible to define general configuration and topology of the system as the model presented in deployment diagram (Fig. 2).
System consists of three nodes: server, client computer and database server. Software implementation of intellectual methods which are used at various stages of diagnosis models synthesis (data reduction, rules extraction, model construction, model retraining) is contained on the server. Set of client computers with installed software can interact with the server. Work of database server is organized through interaction between database management system and database. Interaction of user (client computers) with database is performed indirectly through the server, where user access permissions for corresponding data sets are verified.
For representation of different user interactions with the system the model was figured as use case diagram (Fig. 3).
As can be seen from the Fig. 3, there are two user kinds which interact with the system: user and administrator. Interaction between them is realized in the use case "Registration", when administrator approves user permissions for access to the other system functions.
System user is a person which uses information technology for solving practical diagnostics tasks. User should be registered in the system to provide access of p-ISSN 1607-3274. Радіоелектроніка, інформатика, управління. 2017. № 3 e-ISSN 2313-688X. Radio Electronics, Computer Science, Control. 2017. № 3

Software implementation of data reduction methods
Database server

Client computer
Nondestructive diagnosing application  System administrator provides user registration through input and editing of information about users and user registration confirmation, and also provides access to the database for removal or archiving of information which is not used during long period of time and reduction in such a way of physical volume of the disk space which is in use.
With the object of modeling of logical operation and actions performed in the diagnosis models synthesis information system, the corresponding model was created and is presented in the Fig. 4 as activity diagram.
As it is presented in the Fig. 4, user enters account data at the beginning of system usage. After that system proposes to choose one of the following operating modes: -diagnosis model synthesis -computationally complex process, which is realized on server, performing stages of training sample data reduction, rules extraction, diagnosis model construction; -retraining of the synthesized model;  Figure 4 -Activity diagram of diagnosis models synthesis information technology -diagnosing using the synthesized model -calculation of diagnostics object output parameter value using the model based on the measured input parameters.
Then user can continue to use the system or can quite. The model of distribution of interaction between information system objects and users in time is represented as sequence diagram (Fig. 5-7). In the diagram the process of interaction between system components through calls of procedures, which realize corresponding use cases, is presented.
In the sequence diagram, presented in the Fig. 5, system use cases concerning user registration, input, editing of training sample and removing of irrelevant data (Fig. 3) are represented. As can be seen, not only users but also system administrator takes part in these processes.
Processes connected with preliminary processing of training sample for diagnosis model synthesis (reduction of data sample and rules extraction) are performed on user request (through client computer) on the server, and the results are saved in the database (Fig. 6). Processes, presented in the sequence diagram (Fig. 6), correspond to the system use cases "Sample reduction" and "Rules extraction" (Fig. 3), which are performed by user without administrator participation. Computationally complex processes of model synthesis and retraining are also performed on the server on user request ( Fig. 7) with saving of results to the database. The result of these processes (synthesized diagnosis model) is transmitted to client computer. It enables further usage of the synthesized model for diagnosis object or process output parameter value calculation. Such approach provides diagnostics process execution on client computer without server access.

ПРОГРЕСИВНІ ІНФОРМАЦІЙНІ ТЕХНОЛОГІЇ
Thus the developed information technology of diagnosis models synthesis is represented by the set of diagrams ( Fig. 3-8), which graphically describe structural elements of the system, and also behavioral aspects of its interaction at various stages of diagnostics objects models construction. The proposed information technology allows to construct distributed diagnostics systems, where computationally complex stages of diagnosis models synthesis are performed on the high-performance server equipment, which enables to significantly increase the practical threshold for using diagnostic systems which are capable of solving the tasks of training sample data reduction, rules extraction, diagnosis models construction and retraining.
As it was mentioned above, it is supposed to use database for storage and processing of information about objects or processes which are researched (training samples) and also for system operation results (synthesized models, extracted rules, reduced samples etc.) in the developed diagnosis models synthesis information technology. Database contains set of tables, which are connected in some way and contain information about users and samples of data representing objects and processes which are researched [34][35][36]. ER-model of the developed diagnosis models synthesis information technology database supposes availability of the following entities: -Users -contains information about system users. Entity fields: id -user number (unique identifier); userLogin -login of user; userPass -password of user; infoinformation about user; -Groups -reflects user groups and group data. Fields: id -primary key, name -group name, info -group description; -UserGroups -describes correspondence between users and their groups. At that idGroup, idUser are foreign keys pointing at the corresponding entities; with the identifier id; info -description of training sample; idGroups -reference to the user group which has permissions for access to the sample id; -SamplesRed -contains information about the samples which were reduced using the methods implemented in the system through the reduction of training samples from the entity Samples. Fields: id -unique identifier of reduced data sample; ref -reference to the file with the table which contains reduced sample, corresponding to the identifier id; info -description of reduced sample (particularly reduction method); idSample -identifier (foreign key) of the training sample which was the base for getting of the reduced sample; -Rules -contains information about productional rules, which were extracted using methods implemented in the system through the processing of training samples from the entity Samples. Entity fields: id -unique identifier of rules set; ref -reference to the file which contains set of rules T P → , corresponding to the identifier id; info -description of rules set (particularly rules extraction method); idSample -identifier (foreign key) of the training sample > T P, < = S , which was the base for getting of the set of rules T P → ; -Models -contains information about diagnosis models, which were synthesized based on the given (idSample) or reduced (idSampleRed) samples, and also on the extracted rules (idRules) using the methods which were implemented in the system. Fields: id -unique identifier of diagnosis model; ref -reference to the file which contains structure and parameters of the model with identifier id; infodescription of the synthesized model (for example, synthesis method); idSample, idSampleRed, idRules -identifiers (foreign keys) of the training sample, reduced sample and rules set correspondingly, which were the base for the synthesis of the model id. At the same time if only one data set is used in the synthesis of the model id, it is necessary to use reference to the dummy record in the other entities Samples, SamplesRed, Rules as values of the other foreign keys; idParentModel -model which was used as the base for synthesis of the model id (this parameter is not required, because it is necessary only if synthesized model is retrained); idGroups -reference to the group of users having access to the implementation of the synthesized model.
Database scheme is presented in the Fig. 8.
As it is presented in the Fig. 8, connections between entities are created from foreign to primary key, and integrity control is provided by database management system tools.
The developed database supports storage and processing of information about objects and processes which are researched (training samples) and also system ПРОГРЕСИВНІ ІНФОРМАЦІЙНІ ТЕХНОЛОГІЇ

EXPERIMENTS
For efficiency examination of the proposed information technology corresponding software system for diagnosis models synthesis was developed.
The software was developed based on Java programming language and architecture pattern MVC. Graphic part (View) was implemented using SWING package. The application has the following class structure. Class Model represents classes which describe entities of database ER-model (Fig. 8). Fields of these classes are identical with fields of database tables. It enables to apply modern programming frameworks, for example, Hibernate. Every row of database table can be got as one class instance in the application. At that several table rows form collection. Classes View are frames (forms, dialog boxes), which contain graphical user interface and allow user to interact with the application. Classes Controller connect to the server, realize event service and handling of user actions. Class LoginController enables to connect to the server using login and password or to register (to create user account which is inserted to the database). MainController enables to handle user instructions of calling appropriate forms. It sends instruction to the server and gets permission (according to user access permissions) to call appropriate function, for example, diagnostics or synthesis. Class WorkWithSampleController gives methods, which enable to load sample from the file with the following sending of it to the server through ConnectionController. Besides it enables to edit sample and to reduce it. Class SynthesisController downloads samples and synthesis methods which are available for user from database and also performs models synthesis. Class AdditionalTrainingController gives methods, which get models and retraining methods available for user from database, besides it enables to extend sample and to retrain model. DiagnosisController enables to access diagnosis models, to input model parameters and to diagnose with the following saving of the result to the file.
Numerical experiments on the efficiency of the proposed information technology and the corresponding software system were performed by solving of different practical diagnostics problems [37][38][39], particularly the task of hypertensive patient health status prediction [37]. Applying  [7][8][9][10][11][12][13][14][15][16] and the proposed information technology) and software support the tasks of learning sample reduction (informative attributes identification and production rules extraction) and diagnosis models synthesis were solved sequentially. Numerical results of the developed information technology application for neuro-fuzzy diagnosis models definition using the proposed parallel method based on stochastic computation [24][25][26][27]40] and island method of evolutional search (Island Genetic Algorithm, IGA) [24][25][26] are presented. Experiments were executed on 1, 2, 4, 8 and 16 cores of CPU cluster [41] as well as on graphic engine GPU [42].

RESULTS
Experimental results using CPU cluster are presented in the table 1.
The results of experiments using graphic engine GPU NVIDIA GTX 285+, which was programmed based on CUDA technology [43], are presented in the table 2. Speedup of computational process was measured regarding one CPU.

DISCUSSION
As it is presented in the tables 1 and 2, the proposed technology of diagnosis models synthesis allows to synthesize models with productivity similar to the models described in [3,10,13]. Thus, the method proposed in [13] due to application of new solution search operators modifications decreases number of processor operations, including communicatory costs, and so random search is realized quicker than in the IGA method [24][25][26] (for example, time of model synthesis for 16 cores of CPU equaled 11.45 s for the proposed method and 15.34 s for IGA method). At that the proposed diagnosis models synthesis technology provides construction of neuro-fuzzy models with acceptable accuracy. That is productivity rise is not provided due to decrease of diagnosis models approximating and resumptive properties level.
The efficiency of CPU cluster, which was used by the method proposed in [13] and the IGA method, is acceptable (particularly parallel system efficiency reaches 0.7 for the proposed method and 0.68 for the IGA method using 8 cores of CPU). Application of more than 8 cores of CPU isn't justified, because it greatly decreases system efficiency due to transmission and synchronization. If number of GPU threads rises above 140, speedup of computing process will decrease, because overheads considerably rise and at the same time threads begin to stand.
Thus diagnosis models synthesis technology which was proposed in the paper allows to efficiently apply modern parallel computing architectures for getting the result with appropriate accuracy in acceptable time. Usage of crossplatform language considerably extends scope of the proposed technology.

CONCLUSIONS
In this paper actual problem of diagnosis models synthesis process automation was solved.
Scientific novelty of the paper is in the proposed information technology of diagnosis models synthesis, which consists of the set of methods and diagrams which connect methods with each other, graphically describe structural elements of diagnostics systems and also behavioral aspects of its communication at various stages of diagnostics objects models construction. The proposed information technology enables to construct distributed diagnostics systems where computationally complex stages of diagnosis model synthesis are performed on highperformance server equipment, which makes it possible to significantly increase the practical threshold for using diagnostic systems in big data sets processing, solving the tasks of training sample data reduction, rules extraction, building and retraining of diagnosis models.
Practical significance of the paper consists in the solution of practical problems. Experimental results showed that the proposed information technology allowed to significantly increase the speed of diagnosis models synthesis process and it could be used in practice for solving of practical tasks concerning diagnostics and nondestructive product quality control.