INFORMATION-EXTREME MACHINE TRAINING SYSTEM OF FUNCTIONAL DIAGNOSIS SYSTEM WITH HIERARCHICAL DATA STRUCTURE

Context. The problem of information-extreme machine learning of the functional diagnosis system is considered by the example of recognizing the technical state of a laser printer by typical defects of the printed material. The object of the research is the process of hierarchical machine learning of the functional diagnosis system of an electromechanical device. Objective. The main objective is to improve the functional efficiency of machine learning during functional diagnostics system retraining using automatically forming a new hierarchical data structure for an expanded alphabet of recognition classes. Method. A method of information-extreme hierarchical machine learning of the system of functional diagnosis of a laser printer based on typical defects of the printed material is proposed. The method was developed with functional approach of modeling the cognitive processes of natural intelligence, which makes it possible to give the diagnostic system the properties of adaptability under arbitrary initial conditions for the formation of images of printing defects and flexibility during retraining of the system due to an increase in the power of the alphabet of recognition classes. The method is based on the principle of maximizing the amount of information in the process of machine learning. The process of information-extreme machine learning is considered as an iterative procedure for optimizing the parameters of the functioning of the functional diagnostics system according to the information criterion. As a criterion for optimizing machine learning parameters, a modified Kullback’s information measure is considered, which is a functional of the exact characteristics of classification solutions. According to the proposed categorical functional model, an information-extreme machine learning algorithm has been developed based on a hierarchical data structure in the form of a binary decomposition tree. The use of such a data structure makes it possible to split a large number of recognition classes into pairs of nearest neighbors, for which the optimization of machine learning parameters is carried out according to a linear algorithm of the required depth. Results. Information, algorithmic software for the system of functional diagnostics of a laser printer based on images of typical defects in printed material has been developed. The influence of machine learning parameters on the functional efficiency of the system of functional diagnostics of a laser printer based on images of defects in printed material has been investigated. Conclusions. The results of physical modeling have confirmed the efficiency of the proposed method of information-extreme machine learning of the system of functional diagnosis of a laser printer based on typical defects in printed material and can be recommended for practical use. The prospect of increasing the functional efficiency of information-extremal learning of the functional diagnostics system is to increase the depth of machine learning by optimizing additional parameters of the system’s functions, including the parameters of the formation of the input training matrix.


IEI-technology is an information-extreme intellectual technology;
SCD ia s system of control tolerances; SFD is a system of functional diagnostics.

NOMENCLATURE M is a set of recognition classes;
m is a number of the recognition class; N is a set of recognition features in the structured vector; i is a number of the recognition feature; J is a set of structured vectors of recognition features; j is a number of the structured vector; H is a set of tiers of decursive tree; h is a number of the tier of decursive tree; S is a set of strata of decursive tree; s is a number of the stratum of decursive tree; G is a set of input factors; T is a set of moments of time reading information;  is a space of recognition signs; Z is a set of technical conditions of the object of diagnosis; Y is an input training matrix; X is a working binary training matrix; g is a decursive tree construction operator; L is a set of steps of the machine algorithm for sequential optimization of control tolerances; l is a number of the step of machine algorithm;  is a repeat operation symbol;  is an operator of formation of a set of exact characteristics for the set system of estimations of decisions;  is an operator for calculating the information criterion for optimizing the parameters of machine learning; y m, i is a value of the i-th diagnostic feature of the average of the educational matrix vector y m of the recognition class o m X ; U is an operator that regulates the process of machine learning; M h, s is a number of recognition classes of the s-th stratum of the h-th tier.

INTRODUCTION
No matter how reliable a laser printer is, over time it loses its initial stability.Defects can be caused by individual pieces of equipment, consumables, printing materials, internal or external software, and environmental conditions.Therefore, the creation of SFD laser printer by analyzing the image of the printed material is an urgent task.The main way to solve this problem is to apply ideas and methods of data mining based on machine learning and pattern recognition.
A method of hierarchical information-extreme machine learning for task of information synthesis of SFD laser printer by defects in printed material is proposed.
The object of research is the process of SFD hierarchical machine learning.
The recognition classes alphabet expansion leads to increase the degree of their intersection in the fixed diagnostic features space and to reduce the full probability of correct diagnosis.One of the ways to increase the functional efficiency of SFD machine learning is a transforming a linear data structures to hierarchical ones.However, the existing methods of data mining, including artificial neural networks, have a problem of retraining the system.The solution of this problem requires reprogramming of the intelligent system by changing its structure or functional model.
The subject of research is the method of SFD information-extreme hierarchical machine learning.
The known divisive or agglomerative hierarchical machine learning methods have functional efficiency that significantly depends on the power of the recognition classes alphabet.Therefore, important tasks are to reduce the impact of the alphabet's power for high efficiency and rapidity of diagnostic solutions.
The purpose of the work is to increase the functional efficiency of SFD machine learning during its retraining after expanding the recognition classes alphabet by automatically forming a new hierarchical data structure.

PROBLEM STATEMENT
Consider the formalized formulation of the problem of information synthesis capable of learning the SFD of a laser printer based on images of defects in printed material.
Let the alphabet }, of recognition classes, which characterize the different technical states of the laser printer be given.Based on the results of scanning the images of defects in the printed material of the laser printer, the input training matrix of brightness is formed According to the concept of IEI-technology, the input training matrix is transformed into a working binary matrix, which in the process of machine learning is adapted to the maximum reliability of diagnostic solutions.Let the depth of machine learning be equal to two levels.At the first level, the optimal (hereinafter in the informational sense) geometric parameters of hyperspherical containers of recognition classes are determined, and at the second level, the system of control tolerances for recognition features is determined.In this case, the vector of operating parameters that affect the functional efficiency of machine learning system to recognize the vectors of class features o The restrictions are imposed on the parameters of the system, which will be called machine learning parameters: -the radius 2) according to the optimal geometric parameters of the containers of recognition classes obtained in the process of machine learning to build decisive rules for each stratum of the hierarchical structure, which guarantee a high total probability of making the correct diagnostic decisions.
3) at the stage of examination it is necessary to make a classification decision on the belonging of the recognized recognition to one of the classes of the formed alphabet of the corresponding final stratum; 4) automatically form a decursive hierarchical data structure that contains the learning matrix of the new recognition class and retrain the SFD.
Thus, the task of information-extreme synthesis of learnable SFD is to optimize the parameters of its machine learning by approaching the global maximum of the information criterion (2) to its maximum limit value.

REVIEW OF THE LITERATURE
A detailed analysis of the causes of possible defects of the material printed on a typical laser printer is considered in [1,2].In practice, diagnosing a laser printer for defects in printed material requires a high level of professionalism and experience from the person performing the repair.However, the search for the cause of the defect is usually associated with the need to study the technical condition of the components and devices of the laser printer and test the system software.At the current level of development of information technology, increasing the efficiency of troubleshooting machines and complex devices is achieved through computer-integrated systems of functional diagnostics (SFD) [3,4].As the main way of information synthesis of SFD is the use of intelligent information technologies of data analysis [5][6][7].At the same time, the most widespread methods of machine learning and pattern recognition [8,9].Algorithms of machine learning based on neural networks [10][11][12] and the method of reference vectors [13,14] are known, but due to the many dimensions of the feature dictionary and significant intersection of recognition classes, they do not allow to achieve high enough image recognition reliability.In [15][16][17], the application of fuzzy neural networks for functional diagnostics is considered, but there is also the problem of multidimensionality, which significantly limits the capability of fuzzy logic.
In [18,19] to reduce the impact of multidimensionality, it is proposed to use input data extractors built on artificial networks, but this approach can lead to loss of information.The use of ideas and methods of the so-called IEI-data analysis technology, which is based on maximizing the information capacity of the system in the process of its machine learning [20][21][22], should be considered as a promising area.The main paradigm of information-extreme machine learning, as in neuro-like structures, is the adaptation of the input mathematical description of the system to the maximum reliability of pattern recognition.But in contrast to neurolike structures, the decision-making rules constructed within the framework of the geometric approach are practically invariant to the multidimensionality of the dictionary of features.Since the use of functional diagnostics is expedient at high power of the alphabet of recognition classes, which characterize the possible technical conditions of the device, it is necessary to retrain the SFD in automatic mode.To this end, [23][24][25] considers the functioning of the SFD in the mode of information-extreme hierarchical machine learning, which allows you to automatically retrain the system when expanding the alphabet of recognition classes.But these works do not explore the problem of building a new hierarchical data structure, which inevitably arises when retraining SFD.
The article considers the problem of increasing the functional efficiency of information-extreme machine learning by automatically forming a new hierarchical data structure when retraining SFD through the expansion of the alphabet of recognition classes.

MATERIALS AND METHODS
The method of information synthesis of SFD will be considered as part of IEI-technology based on maximizing the information capacity of the system in the process of information-extreme machine learning.It is known that with increasing the power of the alphabet of recognition classes and the constant space of diagnostic features increases the degree of intersection of recognition classes.Since in [21] the degree of intersection of recognition classes is characterized by the ratio of the total probability P f of making erroneous diagnostic decisions to the total probability P t of making correct diagnostic decisions, the reliability of diagnosis, respectively, due to increasing probability P f will decrease.A recognized way to reduce the impact of the multidimensionality of the recognition alphabet on the functional efficiency of machine learning is the transition from a linear data structure to a hierarchical.
Consider the possibility of automating the formation of the input learning matrix by expanding the alphabet of recognition classes by implementing the method of information-extreme machine learning using a hierarchical data structure in the form of a binary decursive information tree.The data structure in the form of a binary tree will be called decursive, in which, in contrast to the recursive attribute from the top of the upper tier is transferred to the top of its stratum of the lower tier.In our case, the learning matrices of the corresponding recognition classes are considered as attributes of the vertices.Final executions from which attributes are not transferred will be called final.Thus, as the power of the recognition class alphabet increases, the decursive hierarchical structure is divided into strata, each of which consists of the two closest in binary space Hamming features of the recognition classes.This allows for their classification to use a linear algorithm of information-extreme machine learning of the required depth.In contrast to neuro-like structures, the depth of information-extreme machine learning is determined not by the number of hidden layers, but by the number of machine learning parameters that are optimized by the information criterion.
The incidence matrix A={a π, ς } of a decursive tree will be determined as follows: a π, ς = 1, if the beginning of the edge ς connects to the vertex π and has a direction from the vertex π; a π, ς = -1, if the end (arrow) of the rib ς connects to another vertex and has a direction from the vertex π; a π, ς = 0, if the beginning of the edge ς does not connect to the vertex π; a π, ς = *, if the beginning of the edge ς connects with the vertex π and has a direction from the vertex π to the vertex of the stratum of the lower tier with the same attribute.
For the incident matrix of a decursive tree the specific difference from the oriented graph establishes the following lemma.
Lemma.For a decursive graph with ς edges, the number of columns of the incident matrix that have zero sum of elements is equal to ς -π * , where π * -the number of vertices that pass their attributes.
The functional categorical model of informationextreme machine learning according to the hierarchical data structure will be presented in the form of an oriented graph of mappings by machine learning operators of the corresponding sets one on top of the other The categorical model of information-extreme machine learning of SFD according to the decursive hierarchical structure of data is shown in Fig. 1.
. The operator γ determines the set of accuracy characteristics ℑ |Q| , where Q=C 2 , and the operator φ calculates the set E of values of the information criterion of optimization, which is a functional of the accuracy characteristics.The control tolerance optimization loop is closed by a term set D, the elements of which are the values of the control tolerances on the recognition features.The operator u H regulates the process of machine learning.Thus, the proposed categorical model of informationextreme machine learning allows directly in the operating mode to automatically retrain SFD when expanding the alphabet of recognition classes.
Information-extreme machine learning according to the hierarchical data structure in the form of a binary decursive tree is carried out according to the scheme: of structured diagnostic features are determined by the input training matrices of the initial alphabet recognition classes; 2) for a given parameter δ of the field of control tolerances are calculated for each i-th feature of the vector y m lower A HKm, i and upper A BKm, i control tolerances for diagnostic features according to the formulas 3) the vectors of the set {x m } are ordered by increasing the code distance from the zero binary vector; 4) binary ordered vectors of diagnostic features are divided into two approximately equal groups, which determine the two branches of the binary decursive tree; 5) as attributes of vertices of the first (upper according to dendrographic classification) tier of a decursive tree containing one stratum, educational matrices of recognition classes are selected, the averaged vectors of features of which are adjacent for each group; 6) strata of the lower tiers of each branch of the tree contain in addition to the transported from the upper tier of the training matrix also the training matrix of the nearest neighboring in its group recognition class; 7) the construction of the tree continues until the final strata are formed, which contain training matrices of all recognition classes from the initial complete alphabet Thus, the binary decursive tree constructed according to the above scheme divides the given alphabet of high power into strata, each of which contains the two nearest neighboring classes.As a result, the necessary condition is created for the construction of highly reliable decision rules for each stratum by information-extreme machine learning according to a linear algorithm.
According to the categorical model (Fig. 1), the information-extreme algorithm of SFD machine learning according to the hierarchical data structure will be presented in the form of a procedure for finding the global maximum averaged alphabetically Thus, in contrast to the linear algorithm, in which the optimal value of the parameter δ is determined for the entire alphabet of recognition classes, in informationextreme machine learning on a hierarchical decursive data structure, the parameter  is determined for each stratum separately.
The internal cycle of procedure (3) implements the basic algorithm, the functions of which are the calculation at each step of machine learning criterion (2), finding its global maximum and determining the optimal geometric parameters of the containers of recognition classes.
The input data of the basic algorithm are an array of implementations system of control tolerances {δ K, i } for diagnostic features and levels of selection {ρ m, i } of coordinates of binary averaged feature vectors, which in our case by default are equal to ρ m, i = 0,5.Optimization of geometric parameters of containers of recognition classes takes place according to the following main stages of the basic algorithm of machine learning: 1) formation of the input structured learning matrix; 2) determination of average implementations of recognition classes; 3) the formation of a binary training matrix with a given system of control tolerances for diagnostic signs; 4) determination of averaged binary feature vectors, the coordinates of which are calculated by statistical averaging of the corresponding binary training samples; 5) determining the center-to-center distances for a given alphabet of recognition classes by calculating the code distances between the averaged vectors of features of recognition classes.
6) calculation at each step of learning the average information criterion for optimizing the parameters of machine learning; 7) search for the global maximum of the average information criterion for optimizing the parameters of machine learning, which is in the working (permissible) area of determining the function of the criterion; 8) determination of optimal radius of containers of recognition classes, which at each step of machine learning are restored in the radial basis of the space of diagnostic features by iterative procedure * , , , , , , arg max ( ), 1, , 9) STOP.
In the external cycle of procedure (3) the operator of change of the parameter δ h, s of the field of control tolerances is realized until the value of the information criterion of optimization of parameters of machine learning does not reach the maximum value.
As a criterion for optimizing the parameters of machine learning SFD for each stratum of the decursive hierarchical data structure was used a modified Kulback's information measure, which for equally probable two alternative hypotheses has the form , , , , When calculating the information criterion of optimization (5) in the process of implementing the machine learning algorithm due to the limited random samples in the training matrix instead of the exact characteristics used their estimates.
According to the optimal geometric parameters of hyperspherical containers of recognition classes, decisive rules in the form of implication are constructed In expression (6) function μ m the affiliation of the vector x (j) .
Thus, the vector of features   j x belongs to the class from the given alphabet of the corresponding stratum, for which the membership function ( 6) is positive and maximal.In addition, built on the geometric approach of the decision rules (6) allow you to make diagnostic decisions in real time, which is relevant in functional diagnosis.

EXPERIMENTS
As an example of the implementation of informationextreme machine learning SFD images of seven printing defects were considered, which characterized the corresponding recognition classes, arranged according to the above scheme of construction of the variation series and shown in Fig. 2. ).The input training matrix was formed by reading the brightness of the pixels of parts of the images of the printed material with the dimension of 100 100  pixels that contained defects, shown in Fig. 2. Since the images are considered stationary in brightness, their scanning was carried out in the Cartesian coordinate system.In addition, its transposed matrix was attached to the input training matrix, which allowed to double the space of diagnostic features and thus, in accordance with the maximum-distance principle of recognition theory, create the necessary conditions to increase the average interclass code distance.
In order to test the functional efficiency of the proposed method, information-extreme machine learning of the SFD was first implemented according to the hierarchical decursive structure of the first five and seventh shown in Fig. 2 recognition classes.In Fig. 3 shows the initial decursive data structure, built according to the above algorithm.Analysis of fig. 3 shows that the alphabet of the six recognition classes was divided into four final strata, each consisting of the two nearest neighboring classes.If one class is included in the two final stratas, then when constructing the decision rules (6) the membership function ( 7) is chosen with the optimal geometric parameters of the class for which the radius determined by procedure (4) is minimal.
In order to test the algorithm of information-extreme machine learning of the SFD of the printer, the sixth class of recognition was added to the alphabet.(Fig. 2f).Since the fifth class was the nearest neighbor for the new recognition class in the variation series, they formed a new final stratum.Figure 4 shows the decursive tree of the new hierarchical data structure.As a result of SFD retraining, new decision rules were built for the extended alphabet of recognition classes.In the operating mode, diagnosing the printer according to the image of the defect of the printed material is carried out by consistent implementation of the decisive rules built at the stage of machine learning (6).

RESULTS
Information-extreme machine learning of the SFD of a laser printer for defects in printed material was initially carried out for six classes of recognition according to the hierarchical structure shown in Fig. 3.As an example, consider the results of the implementation of the machine learning algorithm for the execution of the first tier (according to the dendrographic classification, the tiers are counted from above) and the first execution of the second tier.In fig. 5 shows a graph of the dependence of the average information criterion (5) on the parameter of the field of control tolerances for recognition features, obtained by procedure (3) with parallel optimization of control tolerances for recognition features for the execution of the first tier of the decursive tree.In Fig. 5 and further, double hatching indicates the working area for determining the function of criterion (5), in which the first reliability is greater than 0.5, and the error of the second kind is less than 0.5.Analysis of Fig. 5 shows that the optimal value of the parameter of the field of control tolerances is * 1, 1 10   (hereinafter in the gradations of brightness) at the maximum value of the information criterion * 1, 1 2.78 E  .To build the decision rules (6) it is necessary to know the optimal geometric parameters of the containers of recognition classes.Figure 6 shows graphs of the dependence of the information criterion (5) on the radius of hyperspherical containers of the strata recognition classes of the first tier of the decursive tree.The analysis of Fig. 6 shows that the optimal radius of the containers of the recognition classes are: 21 X .In Fig. 7 shows a graph of the dependence of the average information criterion ( 5) on the parameter of the field of control tolerances on the recognition features for the first stratum of the second tier of the decursive tree.Similarly, in the process of machine learning, the geometric parameters of hyperspherical containers of other recognition classes, which are part of the hierarchical structure shown in Figure 4, were optimized.
After optimizing the geometric parameters of the containers of all seven classes of recognition, the average value of the information criterion was equal to * 3.28 E  .Since the maximum limit value of criterion ( 5 The control tolerances defined at the parallel optimization stage are accepted as starting points for the sequential optimization algorithm.Since the optimization of the i -th diagnostic feature other subsequent features have suboptimal control tolerances, the sequential optimization in this case requires iterative runs until the value of the information criterion of optimization will not change. Table 1 shows the results of parallel-sequential optimization of machine learning parameters for recognition classes of all strata of the decursive tree (Fig. 4).Analysis of table 1 shows that the average value of the information criterion after additional sequential optimization of control tolerances according to procedure (8) was equal to * 3.91 E  , which exceeds the value of this criterion obtained by parallel optimization.

DISCUSSION
The obtained results of information-extreme machine learning according to the hierarchical structure of data in the form of a decursive tree open a promising direction for solving the problem of multidimensionality of the alphabet of recognition classes.The possibility of automatic retraining of the system with increasing power of the alphabet of recognition classes, which characterize the relevant technical conditions of the device, is proved on the example of information synthesis of the SFD of a laser printer capable of learning from typical defects of printed material.It is known that the application of a linear algorithm of machine learning at high power of the alphabet leads to a significant reduction in the reliability of recognition by increasing the degree of intersection of recognition classes with constant dimensionality of the space of recognition features.In contrast to the linear algorithm of information-extreme machine learning, which determines the optimal system of control tolerances for all recognition classes, in the proposed method, optimal control tolerances are determined only for the nearest neighboring classes.Building a hierarchical data structure in the form of a binary multi-tiered decursive tree allows the division of the high-power alphabet into strata, which consist of the nearest neighboring recognition classes.As a result, for each stratum in the process of machine learning determines its optimal system of control tolerances as shown in Figures 4 and 6.It is shown that machine learning should be carried out by parallel-sequential optimization of control tolerances.In this case, both the reliability of recognition and the efficiency of machine learning increases, because with consistent optimization, the search for the global maximum of the information criterion is carried out in the work area, which is determined by parallel optimization.CONCLUSIONS 1.A functional categorical model is proposed, on the basis of which an algorithm of information-extreme machine learning according to the hierarchical data structure is developed and programmatically implemented.At the same time, building a hierarchical data structure in the form of a binary decursive tree allows you to divide a powerful set of recognition classes into pairs of nearest neighbors.As a result, the optimization of machine learning parameters is carried out by a linear algorithm of sufficient depth for the two nearest neighboring recognition classes, which provides high recognition reliability.
2. Decisive rules based on the example of information-extreme machine learning SFD laser printer on images of defects of printed material are not infallible on the training matrix, which requires increasing the depth of machine learning by optimizing other parameters of the system, including parameters of input information description of the system.Цель.Повышение функциональной эффективности машинного обучения системы функционального диагностирования методом автоматического формирования новейшей иерархической структуры данных при переобучении системы через расширение алфавита классов распознавания.

smG
is a serial number of the recognition class in the s- equal to half of the control field of tolerances on the characteristics of the recognition classes of the s-th stratum of the h-th tier; H  is a parameter equal to half the normalized field of tolerances for recognition features; is a working (permissible) area for determining the function of the information criterion of optimization; d G is an allowable range of values of the radius of the containers of the recognition classes;

* 1 D
is an extreme value of the first reliability; *  is an extreme value of the error of the second kind; , h s M is a set of recognition classes in the s-th stratum of the h-th tier; , h s m is a number of the recognition class in the s-th stratum of the h-th tier; } {k is a set of steps of machine learning; | | ~M  is a fuzzy division of the feature space into M recognition classes; | |S Y is an input training matrix of recognition classes S strata of decursive tree; | |S X is a binary training matrix of recognition classes S of decursive tree strata; Е is a term set of values of the information criterion; R is an operator of construction of division |2|   of space of signs on recognition classes;Ψ is an operator for testing the basic statistical hypothesis about the affiliation of the vector , ,

Figure 1 -
Figure 1 -Categorical model of SFD machine learning Shown in Fig. 1 the operator g from the source of information, which is given by the Cartesian product of sets Z G T    , forms a decursive binary tree H,

Figure 3 -
Figure 3 -Hierarchical data structure for the six classes

Figure 4 -
Figure 4 -Hierarchical data structure for seven recognition classes

Figure 5 -
Figure 5 -Graph of the dependence of the information criterion on the parameter of the field of control tolerances for the execution of the first tier

а b Figure 6 - 3 X
Graph of the dependence of the information criterion on the radius of the containers of the classes of recognition of the strata of the first tier: а -class o ; b -class o 4 X

Figure 7 - 3 X . Since the class o 3 XFigure 8 - 2 X
Figure 7 -Graph of the dependence of the information criterion on the parameter of the field of control tolerances for the first stratum of the second tier Analysis of Fig. 7 shows that the optimal value of the parameter of the field of control tolerances is equal 47 * 1 , 2   to the maximum value of the information criterion 48 , 3 * 1 , 1  E .The result of optimization by criterion (5) of the geometric parameters of the the recognition classes containers of the first stratum of the second tier is shown in Fig. 8. Analysis of Figure 8 shows that the optimal radius of the containers of the recognition classes are: 65 * 2  d for of parallel-sequential optimization of control tolerances in the form of a procedure was used to increase the functional efficiency of SFD machine learning to the hyperspherical container of the

Table 1 -
The results of machine learning