THE FRACTAL ANALYSIS OF SAMPLE AND DECISION TREE MODEL

Context. The problem of decision tree model synthesis using the fractal analysis is considered in the paper. The object of study is a decision trees. The subject of study is a methods of decision tree model synthesis and analysis. Objective. The objective of the paper is a creation of methods and fractal indicators allowing jointly solving the problem of decision tree model synthesis and the task of reducing the dimension of training data from a unified approach based on the principles of fractal analysis. Method. The fractal dimension for a decision tree based model is defined as for whole training sample as for specific classes. The method of the fractal dimension of a model based on a decision tree estimation taking into account model error is proposed. It allows to built model with an acceptable error value, but with optimized level of fractal dimensionality. This makes possibility to reduce decision tree model complexity and to make it mo interpretable. The set of indicators characterizing complexity of decision tree model is proposed. The set of indicators characterizing complexity of decision tree model is proposed. It contains complexity of node checking, complexity of node achieving, an average model complexity and worst tree model complexity of computations. On the basis of proposed set of indicators a complex criterion for model building is proposed. The indicators of the fractal dimension of the decision tree model error can be used to find and remove the non-informative features in the model. Results. The developed indicators and methods are implemented in software and studied at practical problem solving. As results of experimental study of proposed indicators the graphs of their dependences were obtained. They include graphs of dependencies of number of hyperblocks covering the sample in the features space from size of block side: for whole sample, for each class, for different set error values and obtained error values, for varied values of resulted number of features and instances, also as graphs of dependencies between average and worst tree complexities, decision tree fractal dimensionality and tree average complexity, joint criterion and indicator of feature set reduction, and between joint criterion and tree fractal dimensionality/ Conclusions. The conducted experiments confirmed the operability of the proposed mathematical support and allow recommending it for use in practice for solving the problems of model building by the precedents.


NOMENCLATURE
ε is a maximum acceptable error value; ω is a set of model parameters; is a joint multiplicative criterion for decision tree model; N I is a coefficient of features reduction; j is a number of feature; K is a number of classes; K is a number of classes; L a number of intervals on which the ranges of feature values will be separated; l is a hypercube side length; l is a length of the interval; L is a number of intervals; N is an number of input features; N' is a feature subset size; n(l) is a number of hyperblocks of side with the l size covering the sample; n i,q is a number of instances belonging to a rectangular hyperblock formed by feature intervals; n i,q,k is a number of k-th classes exemplars, which are given in each rectangular hyperblock formed by features intervals; opt is a symbol of optimum; Q is a number of clusters; r is a heuristically defined cut-off radius; r k is a Euclidean distance between pair of points; S' is a subsample size; S is a number of precedents; tree is a tree recognizing model; U is a total number of tree nodes; i u is a type of the i-th node of the tree; X is a data sample; x j is a j-th input feature; х s is a s-th instance of a sample; x s j is a value of j-th input for s-th instance; max j x is a maximal value of x j ; max j x is a minimal value of x j ; i Y is a class of majority of instances hit to the i-th node, which is a leaf; y is an output feature vector; y s is a value of output feature values for s-th instance.

INTRODUCTION
Decision trees are a popular tool for solving problems of building models on precedents in diagnostics, pattern recognition, and forecasting in various practical areas [1][2][3][4]. One of the most significant advantages of models based on decision trees is their interpretability (convenience for human perception and analysis).
The object of study is a decision trees.
It is now known a large number of methods to synthesize a model based on decision trees [5][6][7][8][9][10][11]. However, as a rule, the known methods in their goal functions (the criteria for the training quality) do not take into account the characteristics of the training sample. This in practice can lead to the construction of non-optimal models.
On the other hand, the model synthesis for big data sets unusually requires the preliminary reduction of the data dimensionality size, which is explained by the high iterativity of the known training methods, as well as the need to obtain a model that provides a good generalization of the data. At the same time, the traditionally used methods of informative feature selection [12][13][14][15] and of sample formation [16][17][18][19][20][21] have such common disadvantage as they are not directly related to each other and come from different points of view on the informativeness of features or instances.
The subject of study is methods of decision tree model synthesis and analysis.
One of the promising areas of data analysis is a fractal analysis [22][23][24][25][26][27][28][29][30][31]. There are various approaches to the definition of fractal parameters for data [25,27]. However, they are also not interconnected with each other and with the decision tree model training process.
The objective of the paper is a creation of methods and fractal indicators allowing jointly solving the problem of decision tree model synthesis and the task of reducing the dimension of training data from a unified approach based on the principles of fractal analysis.

PROBLEM STATEMENT
Let we have an original data sample X = <x, y> a set of S precedents (instances. exemplars, observations) characterizing dependence y(x), where x={x s }, y={y s }, s=1, 2, ... S, characterized by the set of N input features {x j }, j = 1, 2, ..., N, and output feature y. Each s-th precedent can be noted as <x s , y s >, x s ={x s j }, y s ∈ {1, 2, ..., K}, K>1. Then the problem of model synthesis of the dependence y(x) will be considered in a search of such structure

REVIEW OF THE LITERATURE
The key concept in a fractal analysis is a fractal dimension, which is defined as coefficient describing the fractal structure or the set on the basis of a quantitative assessment of its complexity as the coefficient of variation in details and with a scale conversion.
The Hausdorff-Besicovich dimension according to [25,28] is defined as One of the most affordable ways to determine the Hausdorff-Besicovich dimension is box-counting method [29,30], which consists in repeating fractal object coating by hypercubes of equal size and counting minimum number of hypercubes which contain points of the object.
By consistently reducing the hypercubes size l we will get a set of points with coordinates ( )) ( log( l n , ( ) 1 log − l ), which define a curve, which slope determined by the linear regression, is a fractal dimension: The Takens' method [31] is used to determine the correlation dimension: where R ={r k | r k < r}, |R| is a cardinality of the set R, r>0.
The common disadvantage of considered methods [22][23][24][25][26][27][28][29][30][31] for determining the fractal dimension is that the cardinality of the set must satisfy the inequality N<2log 10 S, which shows that the number of data points S required for accurate dimension estimation of the N-dimensional set must be at least 2 10 N . It leads to large N values even for small sets.
The common feature of all above described methods for determining the fractal dimension is that sample dimension and dimension of the model trained on its basis are defined with no connection to each other. It limits their practical application.
In [32] the methods for estimating the fractal dimension allowing characterize properties of the sample. The sample instances are represented as points in the feature space. Then clusters will correspond to the compact areas in a feature space, which will be combined into classes. Different geometric shapes can describe clusters. Fractal analysis of the sample in the feature space can be performed by setting the elemental form for clustering and varying the size of the cluster for partitioning the sample into fragments. For the sample fractal dimension analysis the method [32] contains following stages.
Initialization stage. Set a learning sample <x,y> and L the number of intervals on which the ranges of feature values will be separated.
Sample normalization stage. If feature values are nonnormalized, they should be normalized by mapping to the interval [0, 1]: Clustering stage. Divide the range of each feature values on L intervals of length l: Form clusters as rectangular blocks at the different features interval intersection.
Data analysis stage. Determine the number of instances belonging to a rectangular hyperblock formed by feature intervals n i,q . Determine the number of k-th classes exemplars, which is given in each rectangular hyperblock formed by features intervals n i,q,k .
Determine the number of hyperblocks with the side of l-size covering the k-th class of the sample in the N features space: Determine the number of hyperblocks with the side of l size covering the sample in the N features space: Stage of fractal dimension estimation. Determine at a given l the fractal dimension of k-th class, k = 1, 2, ..., K: Determine the fractal dimension of the sample at a given l: . This method operates with rectangular blocks of the same size, covering the feature space by them. The single controlled parameter of the method is defined by the number of intervals L, which are divided in ranges of the feature values.
It is obvious that number of clusters Q ≥ K, Q=L N , and for each feature L ≥ 2. To provide generalization properties of clusters we impose restriction Q ≤ NS.
Thus, we obtain K ≤ L N ≤ NS, L ≥ 2. Taking logarithms log(K) ≤ Nlog(L) ≤ log(NS) we obtain after transformations: Note that minimum step for varying the L values is 1. If the upper limit value N NS is less than 2, it can be replaced with S. This is due to the fact that on the each feature axis will not be more than the S points and feature axis partition on more than S intervals will obviously lead to occurrence of the empty intervals. For large N values, the given number of partitions S on each feature will lead to forming of a huge number of blocks equal S N , which make a computation very hard, and in some cases practically non-realizable. Therefore, it is reasonable in this case to set the value of the upper limit of L by the round(log(S)), where round is function of rounding to the nearest integer number. Evaluation of indicator D for small values of L requires high cost of computing resources and computer memory resources than for large L values. However, the analysis accuracy for small L values will be lower while the generalization level will be higher than for large L values.
Consider possible ways to implement this method. If we assume that the data structure will be created containing the counters of instances numbers belonging to each of rectangular hyper-block in the feature space, it will require at least 2L N memory cells where 2 bytes will be given to represent L N integers. In turn, for each hyperblock we need to evaluate belonging of the sample instances, which would require about 2SL N comparisons. This approach, obviously, is practically applicable only for small N. Since to determine the fractal dimension it is not important to know how many instances hit in each block, but it is important to know how many blocks contains instances, then to reduce the computational and memory costs are encouraged to use the following approach.
The advantage of the described method and of the sample quality indicator determined on its basis is the fact that they does not depend of the model synthesis method, and of the results of its work and allow to evaluate the properties of the single sample.
The disadvantages of this method are the uncertainty in the choice of the L parameter value, and absence of relation between the method and the quality of the synthesized model.

MATERIALS AND METHODS
The decision tree model consists of nodes connected by the links. The node can be a root (having no parents), a leaf (having no successors), or an internal (having parent and successors nodes). Each node of the tree (excluding leafs) contains check on one of the features. As a result of checking the recognized instance on this node, it will be redirected to one of the successor nodes of this node, depending on what interval of checked feature values it falls into.
For a decision tree based model, we define the fractal dimension as the minimum number of rectangular blocks in the feature space needed to cover the training data set. Since the leaf nodes of the model based on the decision tree correspond to rectangular areas in the feature space, and the instances of the training sample belong by the model only to these areas, the number of leaf nodes in the tree is the fractal dimension of the decision tree.
Let i u is a type of the i-th node of the tree ( i u = 1 if i-th node is a leaf; i u = 0, otherwise), U is a total number of tree nodes. Then the number of hyperblocks with the side of l size covering the sample in the normalized feature space can be evaluated as: By analogy for k-th class in the sample we can define: where i Y is a class of majority of instances hit to the i-th node, which is a leaf, K is a number of classes.
It is obviously, that To estimate the fractal dimension of a model based on a decision tree, we will use an approach similar to neural networks [33][34][35].
Initialization stage. Set the training sample <x, y>, the model synthesis method, the model training quality crite-rion as error function E, and the maximum acceptable error value ε .
Sample normalization stage. If feature values are nonnormalized, they should be normalized by mapping to the interval [0, 1].
Formation and analysis of data partition stage. Sequentially changing the value of L = 2, ..., S: -determine the length of the interval l; -quantize the sample features, partitioning their ranges of values on L intervals; -determine the number of hyperblocks of side with the l size covering the sample in the space of the N features n(l); -prune a recognizing model tree by a given method, minimizing the error function E to achieve an acceptable level ε ; -estimate the error E of the constructed recognizing model tree.
The fractal dimension determining stage. For every l, for which the model error E is acceptable, determine the data fractal dimension relatively the accuracy (error) of the synthesized model tree: This method operates by rectangular blocks of equal size, covering by them the feature space. The single controlled parameter of the method is given threshold value of the model error ε .
Obviously, the smaller the given ε value, the more detailed model should be, i.e., it will need to form a larger number of clusters Q, and hence the greater should be the L value. Accordingly, with a decrease of the given ε , the cost of computing resources and computer memory resources will increase for the sample analysis.
The advantage of the proposed method and sample quality indicator determined on its basis is the fact that they are related with quality indicator of the synthesized model, and automatically sets the optimum value of the L.
The disadvantages of the proposed method are the uncertainty of ε parameter values choice and its dependence on the training quality and model functioning principles on which it is defined. It should also be noted that the error function used in the method is only one of the synthesized model characteristics, but it does not take into account the model dimension and generalizing properties. Therefore, the fractal dimension of the trained model is proposed to be determined on the basis of the below method taking into account the model dimension.
In addition to the tree fractal dimensionality we can take into account complexity of calculations.
For i-th node it's complexity of checking i с can be obtained as number of i-th node's successors.
For i-th leaf node the complexity of achieving a i с can be evaluated as a sum of complexities of checking of all nodes in the path from the tree root to the i-th leaf node. For the tree model the worse complexity of computations can be estimated as the maximal complexity of the leaf nodes: For the tree model the average complexity of computations can be estimated as the average complexity of the leaf nodes: Generally, when the model error is acceptable, we need to minimize the average computational complexity of leaf nodes reaching, as well as minimize the worst complexity of leaf nodes reaching.
For the decision tree model synthesis the proposed set of fractal indicators allows to define a system of criterions: Since in the general case the number of features in the original set and the number of features used by the model based on the decision tree may differ, it is advisable to consider it as a characteristic of the model quality. Then it is possible to determine on its basis the coefficient of features reduction as: Obviously, the greater the coefficient of feature reduction, the simpler the model, provided that acceptable accuracy is achieved. In the best case N I N = , in the worst case 1 = N I .
It is possible also to define one joint multiplicative criterion for decision tree model synthesis based on the fractal analysis as:

EXPERIMENTS
To study the complex of proposed sample and model fractal indicators they were implemented in software. The developed software was used in compu-tational experiments to study the applicability of proposed indicators for solving the problems of automatic classification.
Several datasets for different tasks [4,[36][37][38] characterized in Table 1 were used for experimental study. For this datasets several series of experiments were conducted.
The first series of experiments were devoted to study the methods of data dimensionality reduction using fractal indicators for model synthesis. Here it is needed to evaluate fractal dimensionalities of original datasets and their classes. Then is possible to study dependencies of n(l) from l -1 for the entire sample and the classes, for different given ε values and obtained values of error E , as well as dimensions of the formed data subsample: subsample size S' and feature subset size N'.
The second series of experiments were concerned to study the methods of decision tree model synthesis using fractal indicators. For each task we need to built a tree model and study dependencies between sample properties and proposed indicators.

RESULTS
For each data set as a result of the experiments, the fractal dimensions of the data and the decision tree models constructed on their basis are calculated.
Graphs of dependencies from l -1 in a logarithmic system of coordinates for the entire sample and the classes are shown in Fig. 1  Fig . 5 shows the schematic graph of generalized dependencies between c a tree and c w tree . Fig. 6 presents the schematic graph of generalized dependencies between D tree and c a tree . Fig. 7 shows the schematic graph of generalized dependency between F tree and I N . between F tree and D tree 6 DISCUSSION As it can be seen from Fig. 1 and Fig. 2 the proposed indicators of the fractal dimension allow show the differences between classes. These indicators can be used in methods of sample selection, defining quality criteria of formed subsamples on the base of the proposed indicators of the fractal dimension.
If formed subsample or its classes on indicators of the fractal dimension are significantly differ from similar parameters of the original sample, it is possible that, the sample does not have the representativeness relative to the original sample. Also the proposed indicators at the several subsamples-candidates comparing could be used as their quality measures: among subsamples-candidates should be preferred that which have indicators of the fractal dimension with closest values to the original sample indicators values.
As it can be seen from Fig. 3, a change of the specified components of formed subsample dimension (number of features N' and the number of instances S'), also as E and ε affect the position of the straight line connecting points of dependence n(l). The greater the ε value the greater the n(l) and the less the E value the greater the n(l).
From the Fig. 4 we can see that the less S' value and the bigger the N' value the bigger the n(l) value, As it can bee seen from thr Fig. 5 the bigger the c w tree the bigger the c a tree . The smaller c a tree comparing with c w tree the slower c a tree grow. Fig. 6 indicates that the bigger c a tree value the bigger the D tree value. If we have decreasing of l or n(l) or U or c w tree ; then D tree value will grow slowly comparing with increasing of l or n(l) or U or c w tree From the Fig. 7 we can bring that the bigger the I N value the less the F tree value. If model complexity of computations (c a tree and/or c w tree ) increased then the bigger F tree and vice versa.
As it can be seen from the Fig. 8 the Ftree indicator will receive the greater value the greater the D tree , l, n(l), N', c a tree , c wt ree and the less the I N indicator value.

CONCLUSIONS
The urgent problem of decision tree model synthesis using the fractal analysis is considered in the paper.
The scientific novelty of obtained results is that the fractal dimension for a decision tree based model is defined as for whole training sample as for specific classes. The method of the fractal dimension of a model based on a decision tree estimation taking into account model error is proposed. It allows to build model with an acceptable error value, but with optimized level of fractal dimensionality. This makes possibility to reduce decision tree model complexity and to make it mo interpretable.
The set of indicators characterizing complexity of decision tree model is proposed. It contains complexity of node checking, complexity of node achieving, an average model complexity and worst tree model complexity of computations. On the basis of proposed set of indicators a complex criterion for model building is proposed. The indicators of the fractal dimension of the decision tree model error can be used to find and remove the noninformative features in the model.
The practical significance of obtained results is that the developed indicators and methods are implemented in software and studied at practical problem solving. The conducted experiments confirmed the operability of the proposed software and allow recommending it for use in practice for solving the problems of model building by the precedents.
The prospects for further study may include the optimization of software implementation of proposed methods and indicators, also as experimental study of proposed indicators on the larger complex of practical problems having different nature and dimension.

ACKNOWLEDGEMENTS
The work was conducted in the framework of the state budget scientific project "Development and research of intelligent methods and software for diagnosing and nondestructive quality control of military and civilian equipment" (State register No. 0119U100360) of National University "Zaporizhzhia Polytechnic" under partial support of international project "Innovative Multidisciplinary Curriculum in Artificial Implants for Bio-Engineering BSc/MSc Degrees" (BIOART, Ref. no. 586114-EPP-1-2017-1-ES-EPPKA2-CBHE-JP)" co-funded by the Eras-mus+ Programme of the European Union and "Virtual Master Cooperation on Data Science (ViMaCs) funded by the DAAD.