THE SYSTEM OF CRITERIA FOR FEATURE INFORMATIVENESS

process of informative


( )
A ceil is a function which gives the least integer that is greater than or equal to the given value A; M is a number of features in the sample of observations S; P is a set of features (attributes) of observations in the given sample; * P is a feature set which is estimated;  НЕЙРОІНФОРМАТИКА ТА ІНТЕЛЕКТУАЛЬНІ СИСТЕМИ combination, which characterizes researched objects, processes or systems [1][2][3][4]. The known feature selection methods [5][6][7] allow to extract combinations of informative features from initial data samples, removing insignificant and redundant characteristics. It simplifies process of diagnosis and recognition model synthesis and also improves its generalization and approximating abilities.
Feature selection methods generally use prognostication or classification error obtained by the model which was constructed using estimated data set as criterion of feature set informativeness estimation for searching of the most informative feature combination [3,[8][9][10][11][12][13][14][15][16]. Such approach needs significant computational and time costs of resources, because it is connected with computationally complex procedure of model synthesis which should be performed for every estimated feature set. Besides it feature selection result, which is a combination of features with the largest informativeness, depends on the type of model used for estimation.
Informational criteria (Information Gain, Gini Index, Entropy etc.) [3][4][5][6] don't require to perform computationally complex procedure of mathematical model synthesis for estimation of feature set informativeness. However such criteria suppose that features of initial data sample are independent [17,18]. Therefore it is difficult to use such criteria in practice and it is unsuitable for situations when features in initial samples are interdependent and redundant. The described shortcomings cause actuality of the development of the criteria system for feature informativeness estimation, which is free from these drawbacks.
The research objective is to develop the system of criteria for feature informativeness estimation which enables to compute informativeness of interdependent feature sets.
Pair correlation coefficient [7] is widely used for estimation of individual significance, when investigated feature and output parameter values are continuous. However such a criterion allows to estimate availability and closeness of the connection between two parameters only when it is linear.
Informational criterion [1,6,7] and criterion which is based on feature entropy computation use informational approach for estimation of individual feature significance. Such criteria in contrast to pair correlation coefficient enable to estimate also closeness of nonlinear connection between features [6,7]. However information theory is based on the assumption that system state probability values are known. For practical tasks solving, probabilities should be evaluated based on statistical data and are stochastic quantities. Therefore evaluated values can be considered as accurate only for input data samples , of infinitely large size [6,7].
It is significant that criteria based on the informative approach suppose that features of the sample > < = T P S , are independent. That is why such criteria are hardly applied for solution of practical tasks, where training samples contain interdependent features [1,6,7].
In the papers [17,18] it was proposed to compute feature significance based on the Relief method, which allows to estimate informativeness of interdependent features based on geometrical location of features in the sample , . But such criteria allowed to estimate only individual significance of features and could not be used for estimation of feature set informativeness.
Criteria based on the informative approach (informationtheoretical criterion, feature set entropy etc.) are applicable for evaluation of group feature informativeness [1,7,8]. However such criteria have the same disadvantages as criteria which are applied for individual informativeness estimation. Furthermore possibility of usage of such criteria is based on the assumption that patterns which define classes of the sample , are normally distributed. Errors of models, which are synthesized using estimated feature set P P ⊆ * , are often used for estimation of group informativeness in solving of feature selection problem. Such approach is characterized by significant computational and time costs of resources during feature selection. It is caused by high computational complexity of model synthesis procedure which should be performed for every estimated feature set P P ⊆ * and makes it difficult to use the known feature selection methods in practice [1,7,9].
Thus disadvantages of the known criteria of individual and group significance estimation cause actuality of the development of criteria system for feature informativeness estimation which should be free from the discovered drawbacks.
The described shortcomings cause actuality of the development of the criteria system for feature informativeness estimation, which is free from these drawbacks.

MATERIALS AND METHODS
As mentioned above, it is difficult to use informational criteria [1,5,7,8] in practice for estimation of individual and group feature informativeness because such criteria suppose mutual independence of features of initial data sample. At the same time practicable data samples as a rule contain interdependent features and if such features are used for synthesis of diagnosis or recognition model, then model approximating and generalization properties and its interpretability are getting worse and model structural complexity is increasing. Moreover possibility of usage of such criteria in practice is based on the assumption that patterns, which form classes of the sample > < = T P S , , are normally distributed. In the developed criteria system it is proposed to estimate feature informativeness according to spatial location of observations of different classes (size of changing of output parameter). In contrast to criteria proposed in the papers [17,18] and allowing to estimate individual feature informativeness for classification problem, the developed criteria system enables to estimate also group feature informativeness for classification and regression problems.
Let's define conceptions of individual and group informativeness.
where values of quantities  After partial individual informativeness ( ) At that os q s is the nearest observation to observation q s with the other or the same output parameter value.
Formula (5) is used instead of formulas (2) and/or (3) where probability ( ) q m q t p os, ρ is calculated based on the sample data > < = T P S , using formula (7): In the expression (9)  Then partial individual informativeness of feature can be calculated using formula (10): This formula allows to divide spread of output parameter T in such a way, that normalized width of every interval T Δ shouldn't exceed quantity n ε : for diagnosis and recognition models construction and also for feature informativeness estimation.
Thus the developed criteria system for feature informativeness estimation proposes to evaluate feature significance according to spatial location of observations of different classes (size of changing of output parameter). The proposed criteria system enables to estimate individual and group feature informativeness for classification and regression problems in situations when initial data samples contain redundant and interdependent features as well as observations with missing values. The proposed criteria don't require to construct models based on the estimated feature combinations, in such a way considerably reducing time and computing costs for informative feature selection. Application of the proposed criteria for estimation and selection of informative features allows to reduce structural complexity of synthesized diagnosis and recognition models, to raise its interpretability and generalization ability due to removing of insignificant, interdependent and redundant features for diagnostics and pattern recognition problems.

EXPERIMENTS
The proposed system of feature informativeness estimation for pattern recognition was integrated into the diagnosis model synthesis software system as corresponding module [21]. This module is implemented for informative feature selection in data sample reduction stage.
Numerical experiments for solving of informative feature selection problem for vehicle recognition [22] were held for investigation of efficiency of the proposed estimation criteria system application in practice.
Every vehicle was presented by images which were gotten from highway video cameras. Every image was in color mode of shades of gray. Whole training sample contained 10,000 images. Image areas where vehicle was situated were identified by recognition system. Areas which were obtained for every image in such a way were displayed into a matrix of 128x128. Object graphic information was encoded using 26 characteristics. Besides that for formation of training sample image data were classified by experts in It was necessary to estimate group informativeness of training sample feature subsets at the beginning of application of the considered mechanism of diagnosis model synthesis in data reduction process. Feature informativeness estimation criteria system proposed in the paper was applied for this purpose. Hamming distance was used for calculation of group informativeness. After that it was necessary to select subset of training sample features which were used by software system for recognition.
Feature selection stage for the other considered methods (Principal Component Analysis, Group Method of Data Handling, Canonical Method of Evolutionary Search, Method of alternately Adding and Removing of Features, Multiagent Methods with Indirect and Direct Connection between Agents) was realized according to its special mechanism. Maximum power of feature sets obtained by these methods was limited to 12.
By application of the developed mathematical support and software the tasks of informative feature selection and diagnosis model synthesis were sequentially solved.
Recognition was realized using neural network of direct propagation which contains 3 neurons on the first layer and 1 neuron on the second layer, uses logistic sigmoid activation function of neuroelements and weighted sums as discriminant functions.
Numerical study of the developed software system application based on the proposed estimation criteria system and the traditional feature selection methods was held. For comparison of the study results the following comparison criteria set was developed: -number k of features which were selected as informative and are in sample after reduction; -recognition error E, which is computed as ratio of incorrectly recognized observations to the total number of observations in sample; -operating time T c which is needed by method to achieve an acceptable solution.
At that the first criterion is a relevant estimation for each specific version of informative feature selection problem (for example, when only passenger cars are recognized). When recognition results are averaged for several classes (for example, vehicles of several types), value of this criterion should be stricken off the results.
In recognition tasks it is important not only to evaluate overall recognition error level E, but also to evaluate recognition error levels for observations of corresponding class. So if there are two classes: 1 -observations belonging to images of passenger cars, and 2 -correspondingly observations which don't belong to images of passenger cars, then after recognition the following subsets can be formed: C 11 which consists of observations belonging to the first class and recognized as belonging to the same class, C 12 which consists of observations belonging to the first class and recognized as belonging to the second class, C 21 which consists of observations belonging to the second class and recognized as belonging to the first class, C 22 which consists of observations belonging to the second class and recognized as belonging to the same class. Then recognition error for observations of the first class E 1 and the second class E 2 can be calculated as following: , 12 11 12 1 C C C E + = (16) . 22 21 Probabilistic optimization realization of the majority of the researched methods causes necessity of results averaging, therefore search was realized 100 times during numerical research, and then averaged values of comparison criteria were calculated. Table 1 presents computed values of comparison criteria for the proposed and known informative feature selection methods in vehicle recognition task. The results represent recognition of vehicles of passenger cars class.

RESULTS
Distribution of recognition error, depending on the number of features (feature subset dimensionality) selected for recognition, is presented in the Figure 1.
Estimations of group informativeness of subsets with corresponding numbers of features for the class of passenger cars are presented in the Figure 2.
Values of comparison criteria № Feature selection method Е T c K  Estimations of group informativeness of feature subsets, selected for recognition by the methods listed in the table 1, are presented in the Figure 3.
Distribution of recognition error over the classes defined by formulas (16) and (17) for passenger cars is presented in the Figure 4.
The recognition results obtained for application of all investigated methods for recognition of all types of vehicles are presented in the Figure 5.
Averaged comparison criteria values computed for the proposed (CSFIEFS) and the known informative feature selection methods in recognition of all vehicle classes are presented in the Table 2.

DISCUSSION
Comparison results, presented in the Tables 1 and 2, show that the lowest recognition error value of 0.0181 on average for all vehicles was obtained by the proposed method. Diagram, presented in the Figure 4, also shows acceptable recognition error level when it is divided into two components which correspond to every output feature value (0.0176 and 0.017).
At that PCA (618 sec) and MARF (2262 sec) operated considerably quicker (by a factor of 4.5 and more) than other traditional researched methods, however, its recognition error was the largest too. The proposed method showed speed (799 sec) comparable with these methods. It is caused by the fact that it didn't require model synthesis using data sample which is sufficiently computionally complex and long process. It allows to use the proposed feature informativeness estimation criteria system specifically under conditions of limited time and resources and also when feature selection is separated to the individual stage of decision making and its operating time gets additionally importance.
As shown in Table 1, the lowest number of features (10) for passenger cars recognition was selected by CMES, ММICA and CSFIEFS proposed in the paper. Every method reduced the sample by 61.5 %.
But at the same time as shown in Table 1 (recognition errors of 0.0219, 0.0198 and 0.0172), each method selected different informative feature subsets from the overall set, though these subsets had the same power. Data, represented in the Figure 3, show that estimated group informativeness of feature subsets selected by each method allowed to get estimation which correlates with recognition error. So it can be stated that criteria system proposed in the paper is informative and can be used for decision making.
CSFIEFS allowed to get the lowest recognition error (0.0172) among all feature subsets which were proposed by the researched methods and consist of 10 elements. It corresponds to effective feature set selection and to effective recognition problem solution.
Dependence, presented in the Figure 1, is made from feature subsets of different power. Every subset allowed to get the lowest recognition error among all subsets of the same power. This dependence shows that the most optimal solution is a subset which consists of 10 elements.
Graph, presented in the Figure 2, shows dependence between group informativeness and size of sets which were defined in a way described for the previous dependence. At that this graph shows that the proposed informativeness criteria allow to estimate subsets in a way relevant to recognition error.
Thus the proposed criteria system for feature informativeness estimation in pattern recognition allows to efficiently solve the problem of informative feature selection, leading to effective solution of the pattern recognition task. At that in comparison with traditional informative feature selection approaches based on the error criteria this process has quicker realization, lower recourse requirements and provides the lowest recognition error.

CONCLUSIONS
In this paper the actual task of automation of feature informativeness estimation process in diagnostics and pattern recognition problems was solved.
Scientific novelty of the paper is in the proposed criteria system of feature informativeness estimation. The proposed criteria system is based on the idea that feature significance is computed according to the spatial location of observations of different classes (size of changing of output parameter). The developed criteria system enables to estimate individual and group feature informativeness in classification and regression problems in situations when input data samples contain redundant and interdependent features as well as observations with missing values. The proposed criteria don't require to construct models based on the estimated feature combinations, in such a way considerably reducing time and computing costs for informative feature selection. Application of the proposed criteria for estimation and selection of informative features allows to reduce structural complexity of synthesized diagnosis and recognition models, to raise its interpretability and generalization ability due to removing of insignificant, interdependent and redundant features for diagnostics and pattern recognition problems.
Practical significance of the paper consists in the solution of practical problems of pattern recognition. Experimental results showed that the proposed criteria system allowed to estimate individual and group informativeness of features and it could be used in practice for solving of practical tasks of diagnostics and pattern recognition.