LOGICAL RECOGNITION TREE CONSTRUCTION ON THE BASIS OF A STEP-TO-STEP ELEMENTARY ATTRIBUTE SELECTION

Context. A general problem of constructing logical recognition/classification trees has been analyzed. Logical classification trees are the object of the present study. The subject of the study are the relevant methods and algorithms of logical classification trees.  Objective. The goal of this work is to develop a simple and efficient method of constructing logical tree-like models on the basis of classification trees for training discrete information selection characterized by a structure of constructed logical classification trees from elementary attributes estimated on the basis of their informativeness calculation functional.  Method. A general method of constructing logical classification trees is suggested that constructs a tree-like structure for a given initial training selection comprising a set of elementary attributes estimated at each step of constructing a model according to the above selection. In other words, a method of constructing logical classification trees is suggested with the main idea of approximating the initial selection of an arbitrary volume by the elementary attribute set. This method during the current logical tree (node) vertex formation provides selecting the most informative (high-quality) elementary attributes from the initial set. Such approach at constructing the resulting classification tree allows one to reduce essentially the tree size and complexity (i.e. the total number of branches and structural layers) and increase the quality of its further analysis (interpretability). The method of constructing logical classification trees suggested by us enables one to construct the tree-like models for a wide class of artificial intellect theory problems.  Results. The method developed and presented in this work has received a software realization and was studied when solving a problem of classifying the geological type data characterized by a large-dimension attribute space.  Conclusions. Experiments carried out in this work have confirmed the efficiency of the software suggested and demonstrate the possibility of its use for solving a wide spectrum of applied recognition and classification problems. The outlook of the further studies may be related to creating a limited method of logical classification tree by introducing the stopping criterion for the logical tree construction procedure according to the structure depth, its program realization optimization, as well as to the experimental study of this method in a wider circle of applied problems.


NOMENCLATURE
G is a certain initial signal set; R is a certain partition into the classes i H preset on the initial set G ; is a generalized attribute f information quality value;  , r r r are the fixed attribute values that define a certain path in the LCT structure; i T is a path in the logical tree structure that corresponds to the fixed training pair with TS; k is a total number of classes in the set G ; n is a total number of the problem attributes (attribute space dimension); j i m is the RF value that ends a certain path i T , whereas j is the number of the logical tree construction step; M is a total number of training pairs (objects) in the initial TS; ts M is a total number of the ST test pairs (objects); S is a total number of the TS training pairs, for which the relation holds true;  is the parameter that characterizes training efficiency estimation with respect to the current problem; 3  is a set of all the pairs of the initial TS that belong to a fixed path; tr V is a total number of vertices of constructed LCT; tr N is a total number of attributes used in the constructed LCT structure; tr C is a total number of transfers (links) in the LCT model structure; is a integral quality index for LCT model.

INTRODUCTION
Information technologies based on the mathematical image recognition models in a form of LCT (i.e. the treelike models) are being widely used in the social, economic, environmental and other information processing systems. This is explained by the fact that such approach allows a set of the classical methods shortcomings to be eliminated and a principally new result to be achieved by effective and rational use of the computing systems capacities [1].
More than 3.500 of recognition algorithms (based on the different approaches and concepts) are known today having certain limitations in their use (i.e. accuracy, speed, memory, versatility, reliability etc). In addition, each of the algorithms is limited by a certain specific character of application problems, and this is, indisputably, the most severe bottleneck of not only the above algorithms but also the recognition systems based on the relevant concepts [2].
The classification (decision) trees are the object of the present study.
It is known that presentation of big-volume training selections (i.e. discrete information) in a form of the logical tree structures has its essential advantages from the view-point of economic description of the data and efficient mechanisms of their application [3]. In other words, the training selection covering by a set of elementary attributes in the LCT case or that by a fixed set of autonomous recognition and classification algorithms in the ACT case give rise to a fixed tree-like data structure that, to some extent, provides even the initial TS data compressing and processing, and, thus, allows the apparatus resources of information system to be essentially optimized and saved [4].
Note that the field of the use of the LCT concept is now extremely extensive, whereas a set of problems and tasks solved by the above apparatus could be reduced to the following three basic segments, i.e. to the problem of description of the data structure, to that of recognition/classification and to that of regression problem [5].
Thus, the ability of LCT to perform a one-dimensional branching to analyze the influence (importance, quality) of certain variables allows one to work with various-type variables in a form of predicates (in the ACT case -with the relevant autonomous classification and recognition algorithms) [6]. In this case, the logical tree structure is presented in a form of branches and nodes, with certain marks (attributes, values) placed on the tree branches determining the target function (the recognition function in the LCT case) and the RF values or the extended transition attributes located at the nodes. Note that when constructing LCT the issues of choosing the attribute criterion (the LCT vertex), where the initial TS partition occurs, the criterion of training stopping (the LCT structure construction) and that of rejecting the logic tree branches (the LCT sub-trees) remain the crucial ones. Just at this stage, a principal question of the LCT theory, i.e. the problem of possible construction of all the variants of logical trees that correspond to the initial TS and selection of the minimal with respect to the depth (layer number) logical tree arises [7]. It should be noted here that the above problem is the NP-complete one (this was fixed yet by L. Highfill and R. Rives), and therefore, has no simple and efficient methods of solving.
The methods and algorithms of constructing the logical classification trees (the decision trees) are the subject of this study.
The principal available methods of training selection processing at the recognition function construction do not allow the required level of recognition system accuracy to be achieved and their complexity in the process of these systems construction to be regulated [8]. The methods of constructing recognition systems based on the methods of logical classification trees (decision trees) are free of such shortcoming. In this case, the possibility of complex use of a number of known recognition algorithms (methods) to solve any particular problem of constructing recognition scheme is a specific feature of the logical tree (algorithmic classification tree) method. It is based on a single methodology, i.e. the optimal approximation of training selection by a set of generalized attributes (autonomous algorithms) included into a certain scheme (operator) constructed in the course of training process [9].
The objective of this work is to develop an efficient and versatile method of constructing classification (recognition) models on the basis of the LCT concept for the discrete information arrays. In addition, the RS schemes obtained have a tree-like structure and elementary attributes as their structural elements.

PROBLEM STATEMENT
Let TS be set in a following form (1). )) ( , ( )),..., ( , ( Note that here G x i  ( G is a certain set), and RF is . Here R f is a certain finitely manyvalued function that assigns the partition R of the set G that consists of sub-sets (images, classes) Thus, TS is a manifold (more strictly, a sequence) of certain sets, and each set is a manifold of certain attribute values and those of certain functions from this set. One may add that the manifold of attribute values is a certain image, whereas the function value correlates this image with the corresponding figure [10].
The task is to build the LCT -L construction based on the initial TS array and determine the values of its structural parameters p (i.e. ) )) ( ), ,

REVIEW OF THE LITERATURE
Analyzing the problems of tree-like classification and recognition models, one may realize a certain lack of current studies in this field, when the main emphasis is made on the concept of neural network recognition concept [11]. To a great extent, this can be explained by the specific features of just the LCT models, the difficulties of realization moments of the ACT concept (i.e. the highest level of the LCN concept abstraction), the set of the severe rules and restrictions concerning the practical work with such data structures [12].
Present study is a continuation of a cycle of works dedicated to the problem of tree-like discrete object recognition/classification schemes [13][14][15]. They relate the issues of constructing, using and optimizing the logical trees. It is known [13], that the resulting classification rule (scheme) constructed by an arbitrary method or by means of a branched attribute selection algorithm has a tree-like logical structure. The logical tree consists of the vertices (attributes) grouped in layers and obtained at a certain step (stage) of recognition tree construction [2]. The problem of synthesizing the recognition trees to be represented actually by the algorithm tree (graph) is an important problem [6]. Contrary to available methods, the main peculiarity of tree-like recognition systems is the fact that the importance of certain attributes (attribute/algorithm group) is defined with respect to the function that sets the object partition into classes [16].
For instance, in [17], the principal issues of the decision tree generation for the case of less informative attributes have been analyzed. The ability of LCT to perform a one-dimensional branching to analyze the influence (importance, quality) of certain variables gives a possibility to operate with the various-type variables in a form of predicates (in the ACT case -the relevant autonomous classification/recognition algorithms). Such a concept of logical trees is being used actively in the intellectual data analysis, where the final goal lies in synthesizing the model that predicts the target variable value on the basis of a set of the initial data at the system input [18][19][20].
Note that today there is a considerable number of algorithms that realize the decision tree concept -CART, C4.5, Sparc, NewID, ITrule, CHAID, CN2, Oris and other, but the first ones are the most widely used and popular. It should be noted that the above logical tree algorithm С4.5 uses the so-called theoretical and information criterion as the node (vertex) selection one, whereas the CART algorithm is based on calculating the Gini index that takes into account relative distances between the class distributions [21].
In [2], the important elements of branched attribute selection are proposed and the LCD construction scheme is analyzed on the basis of the logical tree algorithm with a step-by-step estimation of the importance of the discrete attributes according to the TS data. A modified BAS with the one-off estimation of a set of attributes was suggested in [9], while in [22] an algorithm of generating the set (manifold) of random logic classification trees was proposed with a final optimal selection stage.
Because the principal idea of the BAS methods and algorithms could be defined as the optimal approximation of a certain initial TS by the set of elementary attributes (object attributes), their central problem of an efficient branching criterion (vertices, attributes, discrete object attributes) selection becomes the most important. Just these principal problems are considered in [11], where the questions of high-quality estimation of certain discrete attributes, their sets and fixed associations are raised allowing an effective mechanism of branching realization to be implemented.
The LCT structure is characterized by the compactedness, on the one hand, and the layer filling (sparsity) inhomogenuity, on the other hand, as compared to the regular trees (the algorithm with the one-off attribute estimation) [23]. Note that the problem of the BAS methodbased LCT construction process convergence and the issues of selecting the logical tree synthesis process stopping (e.g., tree depth or complexity-related restrictions, accuracy restrictions, structure error number restrictions) remain essential [2].
Note that the logical tree concept does not contradict the possibility of using as the LCT attributes (vertices) certain attributes of their combination objects (generalized attribute idea considered in [24]) and sets. However, if one goes further and does not consider as branches the object attributes but selects certain independent recognition algorithms (estimated in accordance with the TS data), a new structure (ACT) will be obtained at the output.

MATERIALS AND METHODS
The principal scheme of the BAS method algorithms on the LCT concept basis lies in the idea of maximizing the quantity ) ( f W M [2]. The latter means that one has to find in the logical tree algorithms such generalized attribute f for the training selection (1), for which the quan- should be as possibly maximal. Note that the following quantity will be the attribute significance (2). ( Obviously, the significance of other attributes could be estimated in a similar way. The quantity   value at the z set is known. Obviously, the attribute, for which the above information is maximal, is considered the most important attribute with respect to ) (x f R . Note that selection (1) may have the probability character, i.e. the pairs )) ( , ( , but the generalized attribute is determinate. Thus, the problem of optimal approximation of the probability selection (1) in general is set using a certain determinate function represented in general case by a generalized attribute f . It is obvious that the above problem has a sense when the image (class) character is rather close to the determinate one. The latter means that the main part is occupied by the points (objects) x , for which the value )) / ( ),..., is close to unit. This value may vary substantially only at the points (objects) that lie on the boundary of several classes  (1) is the data of some experiment (e.g. computer measuring) stored in the read-only memory. The training algorithm in this case is a multiple selection (1) processing. Note that selection (1) may have extremely large volume, therefore, processing algorithms must ensure that selection (1) during operation could not be fed to the read-only memory.
If the case (a) does not occur, and there is no need in storing data in the read-only memory, we deal with the case (b). Here all the pairs processed at the step i d are not stored and, therefore, at the step 1  i d another series of training pairs of the form (1) is fed.
For definiteness, we shall further consider the case (a), i.e. the same TS will be fed at each step i d .
In this LCT construction scheme, a certain elemen- (1) is as maximal as possible. Note once more that is calculated in accordance with the method specified in [2,10,25,26]. The next steps of the logic tree method are convenient to be interpreted using a tree (Fig. 1).
Note that each LCT vertex (  Any attribute in all the layers, beginning from the first one and ending by the n -th ones, are the attributes obtained after the n steps (stages) of the LCT construction. Note that the attributes located at the n -th layer are those derived at the n -th step (stage) of the logical construction process.
Assume that only three steps of the LCT construction are realized, and are the all attributes obtained as a result of the above steps. The logic tree obtained in this three steps has a following form (Fig. 2). Each bound pair )) ( , ( (1) corresponds to the relevant LCT path (Fig. 2) , then the path i T ends by an arrow going from the vertex 3 1  and is denoted by the 0 symbol (Fig. 2). m - (Fig. 2).
We shall call the paths corresponding to the case (a) not finished, whereas those in the case (b) will be called finished ones.
If the path i T that corresponds to the pair x is finished and has at its end the value . For instance (see Fig. 2), for any pair )) ( , ( , another condition holds true 3 1 ) One may say that for the value i x corresponding to the finished path i T the complete tree-related recognition is realized (Fig. 2). In other words, the pair )) ( , ( belongs to the relevant path i T . At the next stages of classification tree construction, the unfinished paths shall be considered only. Then we shall denote each path at the tree under construction by a binary set }) . For instance, the binary set 010 at the tree (Fig. 2) shall indicate the path that ends by a final arrow going from the vertex 3 2  and denoted as 0 . Obviously, the set   101  ,  100  ,  011  ,  010  ,  001  ,  000 is a manifold of all the unfinished paths at the LCT (Fig. 2). Let  at the LCT (Fig. 2), we obtain the following LCT (Fig. 3).
Note that the tree from Fig. 3    , the process of tree construction continues. When constructing the recognition tree, we shall first select at LCT (Fig. 3)  In this case, we obtain the tree (see Fig. 4 All the paths 000 , 100 and 101 at the tree (Fig. 4)

Figure 4 -Final LCT
Then we apply to the tree from Fig, 1 the same process as in case of the tree from Fig. 2.
It is worthy noting that there is no need in forming a separate set of training pairs to realize each selection 3  will be realized.

EXPERIMENTS
It is obvious that one may construct the LCT set on the basis of one initial TS using the relevant methods and algorithms. For example, using the functional (2) as the branching criterion, we can construct at least two LCTs dependent on the choice -whether it is worthy to evaluate the quality of elementary attributes at each step of logical tree generation or it should be done once at the beginning of construction providing, thus, system apparatus resource saving. Therefore, a 'hot' problem of comparing models obtained arises with the aim to choose the best TS with respect to the current one.
Determining the indices that characterize the basic properties of the models obtained is an important stage of the process of comparing the tree-like models constructed. Note that the models are compared on the basis of an integral quality index, i.e. the central criterion of the LCT model comparing.
Similarly to [17,[26][27][28] Note that the question of the reduction of the LCT structure complexity remains a principal moment (here we deal with the following parameters, tr N being the number of attributes in the LCT structure, tr V being the number of the LCT model vertices and tr C being a total number of transitions in the LCT structure; and the memory and processor time loss parameters  and  , respectively).
Note that it is reasonable to increase the parameters tr O and V tr O in the LCT structure. This will allow the time of making solutions according to this logical tree model to be reduced and the processor time to be saved. It is necessary to maximize the parameter Main I (the LCT model generalization index) that allows one to reach the most optimal LCT structure and provides the maximal compression of the initial TS data (i.e. to represent the initial data array by the minimally structurally complex logical tree), as well as the parameter avg Q (the average number of the TS sets at the resulting RF values, i.e. the LCT leaf) [29][30][31][32][33].
The integral model quality index presented in the following form is an important LCT model index that takes into account the above parameters: It should be noted here that this integral LCT model quality index has a sense only in case when the condition holds true. Obviously, in the opposite case it will be zero. This index increase characterizes the growth of the LCT model quality and, vice versa, its decrease testifies to the classification quality worsening.
Thus, having taken, as the basis, the classification tree method and the modularity principle, the software complex Orion III has been developed at the Uzhgorod National University being used to generate the autonomous recognition systems. This system algorithmic library has 11 recognition algorithms including the algorithmic realization of the LCT construction suggested above.
The basic problem here was the construction of the autonomous recognition system on the basis of geological data (the oil-bearing bed partitioning problem). 12 basic elementary attributes and 10 additional ones have been used to recognize objects.
Information concerning the two classes of objects is presented in the TS. At the examination stage, the classification system constructed has to provide an efficient recognition of unknown classification methods with respect to the above two classes. Before starting the work, training selection was automatically checked for correctness (searching for and removing the similar objects of different belonging -the first kind errors), though the system realizes the training and error-correcting scheme for the classification tree (the TEC algorithm) -as generation was carried out in the automatic mode, this algorithm has not been used.

RESULTS
Note that training selection consisted of 1.250 objects (756 of them being oil-bearing ones), and the efficiency of constructed recognition system was estimated at the test selection comprising 240 objects. The data of training and test selections have been obtained in accordance with the geological survey on the territory of Transcarpathian province during the period from 2001 to 2011.
It should be noted that a fragment of the main results of the above experiments is presented in Table 1. The LCT models constructed provided a required level of accuracy set by the problem statement, speed and system operating memory consumption.

DISCUSSION
Note that the suggested LCT model quality estimates fix the most important characteristic of the logical trees and could be used as the optimality criterion in the procedure of LCT construction and final selection from the LCT model set.
It should be noted that this LCT construction scheme was compared to the ACT method showing an acceptable result. The main idea of ACT is its approximation by a set of initial TS algorithms. The ACT structure obtained is characterized by high versatility and relatively compact model structure. However, it requires large apparatus costs to store the generalized attributes and to provide the initial estimation of the TS classification algorithm quality. In contrary, LCT has high classification rule speed, low apparatus costs for storing and tree structure operating and high classification quality.

CONCLUSIONS
In this work, the problem of the LCT construction automation on the basis of the TS approximation by a set of elementary attributes has been solved.
The scientific novelty of the results obtained is confirmed by the fact that a simple method of LCT construction has been suggested for the first time on the basis of the elementary attribute selection with a permanent estimation of their importance at each step of the classification tree generation. Here, at each branching step, the influence of any attribute value on the resulting RF value in the tree structure has been taken into account. Note that the functional proposed could be used not only to estimate the informativeness of certain elementary attributes, but also to calculate the importance of the attribute sets and their combinations that allows more optimal structure of synthesized LCT according to the TS initial data to be achieved in future.
We have suggested in the present work a set of general indices that allows the general LCT model characteristics to be presented effectively. This set could be used to select the most optimal LCT from a number of constructed ones (e.g., in the case of the random LCT construction algorithms [22]).
The applied value of the results obtained in the present work means that suggested LCT construction method has been realized in the algorithm library of a universal software system ORION III to solve different applied problems of discrete object array classification/ recognition.
Note that practical testing carried out by us has confirmed the performance of suggested LCT models and software. This enables recommendations on the use of the above approach and its software realization to be elaborated for a wide spectrum of applied problems of discrete object array classification/recognition.