METHODS AND CHARACTERISTICS OF LOCALITY-PRESERVING TRANSFORMATIONS IN THE PROBLEMS OF COMPUTATIONAL INTELLIGENCE

METHODS AND CHARACTERISTICS OF LOCALITY-PRESERVING TRANSFORMATIONS IN THE PROBLEMS OF COMPUTATIONAL INTELLIGENCE The problem of the development of mathematical support for data dimensionality reduction is solved. Its results can be used to automate the process of diagnostic and recognizing model construction by precedents. The set of rapid transformations from the original multidimensional space to the one-dimensional axis was firstly proposed. They provide a solution of the feature extraction and feature selection problems. The complex of indicators characterizing the properties of transformations was firstly proposed. On the basis of the proposed indicators the set of criteria was defined. It facilitate comparison and selection of the best transformations and results of their work in diagnosis and recognition problems solving on the basis of computational intelligence methods. The software realizing proposed transformations and indicators characterizing their properties was developed. The experimental study of proposed transformations and indicators was conducted, which results allow to recommend the proposed transformations for use in practice.


INTRODUCTION
The problem of model complexity reduction and the model construction speed increasing often occurs in the process of diagnostic and recognizing model constructing by precedents, characterized by a big number of features [1].One way to solve this problem is the using of transformation from the multidimensional space of initial features to the one-dimensional axis for data dimensionality reduction [2,3].
There are various methods of transformation for data dimension reduction [2][3][4][5][6][7][8][9][10][11], which, however, require the calculation of distances between instances or feature correlation coefficients and for a large-scale problem they are hardly applicable in practice due to big requirements of time and computer memory in the process of determining the transformation parameters and in the process of transformation execution.This situation is additionally compounded by that the number of known transformations and their modifications is very big and there are no any formal criteria to analyze their quality, as well as to select the best available transformation for a particular task [3].
Therefore, the actual problem is speed increasing of the data dimensionality reduction transformation, and the development of criteria for the transformation selection to use in a particular problem solving.
The purpose of this work is the development of rapid transformations from a multidimensional feature space to a one-dimensional axis, the creation of a set of indicators characterizing the properties of transformations, and the experimental study of the properties of transformations in practical problem solving.

PROBLEM STATEMENT
Suppose we have an initial (original) sample X = <x, y> the set of S precedents describing dependence y(x), x = {x s }, y={y s }, s = 1, 2, ..., S, characterized by a set of N input features {x j }, j = 1, 2, ..., N, where j is the number of feature, and an output feature y.Every s-th precedent can be represented as <x s , y s >, x s ={x s j }, where x s j is the value of j-th input feature and y s is the value of output feature for the s-th precedent (instance) of the sample, y s ∈ {1, 2, ..., K}, where K is the number of classes, K> 1.
Then the problem of the sample X dimensionality reduction can be formally represented as follows: find a transformation H: X → I, which for each instance x s ={x j } determine the coordinate I s on the generalized axis I and thus provides a mapping of instances of different classes to the different intervals of the generalized axis.
Since, as a rule, known transformations do not guarantee an exact solution of this problem, further problem arises of designing of indicators to quantify the quality of the transformation and to compare the results of the various transformations between themselves to choose the best transformation of the set.

TRANSFORMATIONS OF INSTANCES FROM THE MULTIDIMENSIONAL SPACE TO THE GENERALIZED AXIS
For large-scale problems it is advisable to ensure the creation of such transformations, which would allow the mapping of individual instances without loading of whole initial sample, as well as taking into account the feature informativity in the process of transforming and to provide a generalization of data.
ISSN 1607-3274.Радіоелектроніка, інформатика, управління.2014.№ 1 To ensure the generalization of close located data points (instances) we propose to replace feature values to numbers of feature value interval.For this we need previously to discretize the features by partitioning them into intervals of values.
To partitioning the features into intervals the number if interval (term), which hits the s-th instance on the j-th feature is proposed to determine by the formula: where x j min , x j max are the minimum and maximum values of j-th feature, respectively.
For the mapping of instances from the original multidimensional feature space to the one-dimensional generalized axis is suggested to use the following transformations.
Transformation 1.For each number of interval of j-th feature get it binary representation (binary numbers padded with zeros from the left to c j -the number of digits in k j ).Set the coordinate of s-th instance on the generalized axis I s =0, set the position (bit) number of generalized axis coordinate p = 1.Going by the feature numbers j s in descending order of their rank and by the group of digits in the interval number c = 1, 2 , ..., c j perform in a cycle: if p ≤ d, where d is a number of bits in a computer bit grid, then record at p-th position (with numbering from left) of the binary representation of a generalized feature I s the c-th position value (with numbering from left) of interval number, in which the s-th instance hit on the j-th feature, and set: p=p+1.As a result, we will obtain a generalized axis coordinate of instance with the implicit ranking and selection of features.
Transformation 2. It is an alternative format of constructing a generalized feature for transformation 1.If the total number of bits to represent interval numbers of all features c j k j N does not exceed the number of bits in a binary bit grid d when the values c j are equal for all features: for each interval number of j-th feature obtain its binary representation (binary numbers padded with zeros from the left to c j -the number of digits in k j ) set the coordinate of sth instance on the generalized axis I s = 0, set the position number of coordinate on a generalized axis p = 1; looking in a cycle on a group of digits in the interval number c = 1, 2, ..., c j and on feature numbers j in the descending order of their ranks: put to the p-th bit position (numbering from the left) of the binary representation of the generalized feature I s the c-th bit (numbering from the left) of interval number, in which the s-th instance hits on the j-th feature and set: p=p+1.As a result, we obtain the generalized axis coordinate with implicit ranking of features.
Transformation 3. The generalized feature formed on the basis of locality-preserving hashing [12][13][14][15].The initial feature space is divided into 2 k equal hypercubes, each of which identified by the key I s of a k bit length, where k is a number of feature partitions.After the i-th partition the initial feature space split to 2 i N-dimensional cubes, wherein the ith partition is carried out on the j-th dimension: j = i mod N. At the i-th partition if hypercube located in the top half of the partitioned range, then set to one the i-th bit of its key, and otherwise set the i-th bit of its key to zero (set to one the bit in the i-th position of k-bit identifier, extended by zeros from the left, if the length is less than k).The key I s algorithmically can be generated as follows: set: I s = 0, x j min′ =x j min , x j max′ = x j max , then for i = 1, 2, ..., k do: set: mid , then set: x j min′ = x j mid , I s = I s + 1, else set: x j max′ = x j mid .Transformation 4. The above-described transformations provide mapping to the discrete generalized axis.If the total number of bits to represent numbers of all feature intervals exceeds the number of bits in a bit grid of computer, it is possible to use a transformation to the generalized real axis with partial information loose: add to the real coordinate on generalized feature I s the c-th bit (numbering from the left) of interval number, in which the s-th instance hits on the j-th feature: where q k j S , is a number of instances of q-th class located in the k-th interval of j-th feature, r j is a rank of j-th feature (the number of j-th feature in decreasing order of individual feature importance).
Transformation 5. Define the distance from the s-th instance to the unit vector in the normalized coordinate system: and the angle between the instance as a vector and the unit vector: Thus we map the s-th instance from the N-dimensional space into two-dimensional space.Next for coordinates of s-th instance in formed two-dimensional space by analogy with the first transformation obtain coordinate of s-th instance on the generalized axis I s .Transformation 6. Generate Q support vectors -the centers of pseudo-clusters C q = {C q j }, q=1, 2, ..., Q, K ≤ Q<<S, j = 1, 2, ..., N. In the simplest case their coordinates can be set as random taking into account dimensionality and feature scales (x min j ≤ C q j ≤ x max j ), or by setting Q=K to determine the center of each its class: After this calculate the clusters based on their proximity and position in feature space relative to the smallest feature values: -find the distance from the cluster centers to the point with the lowest feature values: ) ( ; -find the distance between the cluster centers: , ( ; -find the center of cluster closest to the point with the lowest feature values: -set this center as the current, set a new number of current cluster t=1, put current cluster in the set of centers with a new index (C * =C * ∪ C *1 , С *1 = C q ) and delete it from the set of centers without a new index (C=C / C q ); -while exist at least one cluster without a new index (i.e. C ∅ ≠ ) perform: among the remaining clusters without a new index in C find the closest cluster to the current cluster: )} , ( { min arg ; ,..., 2 , 1 , then increase t = t+1, put the current cluster to the set of centers with a new index (C * =C * ∪ C *t , С *t = C p ) and remove it from the set of centers without a new index (C=C / C p ).
As a result we will receive C * -a set of cluster centers with numbers corresponding to their proximity to the point with the lowest values of features, and also allowing to determine qualitatively the proximity of the cluster centers.
Further for each instance of the initial sample x s , s=1, ..., S do: -define the distance from it to each cluster center, q=1,2,...,Q: ( ) -find the index of the nearest cluster center: -find the angle between the vectors x s and С p* relative to the point with the lowest feature values: -assign the s-th instance with the coordinate on the generalized axis: . π ϕ + = p I s

CHARACTERISTICS OF TRANSFORMATIONS TO THE GENERALIZED AXIS
Entered above transformations are encouraged to use the following characteristics of instance mapping process: -t s is the time of transforming of one instance from the original feature space to the generalized axis for the sequential computations; -m s is the computer memory volume used by the transformation method for processing one instance; -λ is the number of adjustable parameters of transformation needed for its implementation; -t is the time of calculation of transformation parameters based on the training sample; -m the computer memory volume used to calculate the transformation parameters on the basis of training sample.
Situations where several instances have equal coordinates may occur in the original and in the synthesized feature spaces.Such situations are called collisions.Under the collision point we will understand the point in the feature space, in which there is a collision.
The collision is quite admissible and even desirable in problems of automatic classification on condition that all instances located at the point of collision belongs to the same class.However, if the instances located at the point of collision, belongs to different classes, the used feature set does not provide a good separability of instances.
Denote the set of points of collision {g v }, v=1, 2, ..., V, where g v is a set of instances belonging to a v-th point of collision, V is the number of points of collision, which obviously can not exceed 0,5S .
To estimate the quality of the results of considered transformations we propose to use the following indicators.
The number of points of collisions in which instances belongs to different classes, after the transformation of the sample to the generalized axis can be defined as: This indicator in the best case will be equal to zero when there is no collision points, and in the worst case it maximum value will not exceed 0,5S.
The probability estimation (frequency) of the collision points in which instances belonging to different classes, after the transformation of the sample to the generalized axis can be expressed by the formula: .
The corrected number of points of collisions in which instances belong to different classes, after the transformation of the sample to the generalized axis is defined as: where E * <x,y> -is the number of collision points in which the instances belongs to different classes in the initial sample: The indicator *' , > < y I E more accurately characterizes the quality of transformation to the generalized axis because it eliminates the errors present in the sample.In the best case it will be equal to zero when there is no collisions, and in the worst case it maximum value will not exceed 0,5S.
The corrected probability estimation (frequency) of the collision points in which the instances belong to different classes after the sample transformation to the generalized axis can be obtained by the formula: The total number of instances in the collision points in which the instances belong to different classes after the sample transformation to the generalized axis is suggested to calculate by the formula: .
The more will be value of this indicator, the worse separability of instances on the generalized axis.In the best case it will be equal to zero and in the worst case it will not exceed the number of instances in the sample S.
The probability estimation of instance hitting to the collision point in which instances belong to different classes after the sample transformation to the generalized axis can be obtained by the formula: .
The total number of instances in the collision points of the initial sample in which the instances belong to different classes it is proposed to define as: .
The more will be the value of this indicator, the worse the separability of instances of the initial sample will be.In the best case it will be equal to zero and in the worst case it not exceed the number of instances in the sample S.
The probability estimation of instance collision in the sample in which instances belong to different classes can be obtained from the formula: The number of pairwise collision of instances of different classes after the sample transformation to the generalized axis is proposed to determine as: In the best case, this indicator is zero when there is no any collision, and in the worst case its value will not exceed S(S-1).
The probability estimation (frequency) of pairwise collision of instances of different classes after the sample transformation to the generalized axis can be calculated as follows:

This indicator '
, > < y I E in comparison with the previous indicator more accurately characterizes the quality of the transformation to the generalized axis, because it eliminates the errors present in the original sample.In the best case, it would be equal to zero when there is no any collision, and at worst case, it maximum value will not exceed S(S-1).
The corrected probability estimation (frequency) of pairwise collisions of instances of different classes after sample transformation to the generalized axis can be defined by the formula: To determine k, we need order the instances <I s , y s > in ascending order on the generalized axis.Then, looking from left to right, we need to identify clusters -the intervals of one-dimensional axis, all instances of each of which belong to only one class.
The less will be the number of such clusters, the simply is partition of generalized axis.
In the best case when the classes are compact, i.e. k = K, this indicator is equal to one.
The more will be value of this indicator, the worse the separability of instances will be on the generalized axis.
In the worst case where each instance falls into a single cluster its value will be K S k = .
The minimum distance between instances of different classes on the generalized axis is offered to determine by the formula: The more will be value of this ratio, the better classes will be separated on the generalized axis.
The maximum distance between instances of one class on the generalized axis is offered to determine by the formula: The less will be this indicator value, the more compact instances of each class will be positioned on the generalized axes.
The average ratio of distances on the generalized axis and in the original feature space is proposed to calculate by the formula: .
The more will be value of this indicator, the better on the average the transformation on the generalized axis reflects location of instances in the original space and features the better separability of instances on the generalized axis; Average of the relative distance products on the generalized axis and in the original feature space: .This indicator will vary from zero to one: The more will be its value, the better on the average the transformation on the generalized axis reflects the location of instances in the original feature space.
The indicator of generalized axis feasibility of establishing: where k , is the number of intervals of different classes on the axis of feature x j .This indicator in the best case will be equal to S / K, and in the worst case will be equal to K / S. If this indicator will be greater than one, the use of the generalized axis will be feasible, otherwise it can be replaced with the original ISSN 1607-3274.Радіоелектроніка, інформатика, управління.2014.№ 1 feature, characterized by the smallest number of intervals of different classes.

THE COMPARISON CRITERIA OF GENERALIZED AXIS TRANSFORMATIONS
On the basis of the indicators characterizing the basic properties of on the generalized axis transformations introduced in the previous section it is possible to determine the criteria for comparison, the criteria for performing and criteria for evaluating the quality of the results of transformations.
The criteria for evaluation of the transformation process is proposed to define as the following: -the combined criterion of the minimum of time and memory on the instance transformation: F 1 = t s m s →min; -the combined criterion of the minimum of time and memory to determine the transformation parameters for the training sample: F 2 = λtm → min; -the integral criterion: The criteria for evaluating the quality of results of transformations: -the criterion of the minimum of probability of instance group collisions: min

EXPERIMENTS AND RESULTS
The proposed transformations on the generalized axis, as well as indicators characterizing their properties have been implemented as software and experimentally studied in practical problem solving of technical and medical diagnosis, and of automatic classification, whose characteristics are given in the table 1. [3] The fragment of the results of experiments to study the transformations on the generalized axis is shown in the table.2.
The conducted experiments confirmed the efficiency and the practical suitability of the developed mathematical tools.The experiments have shown that the proposed transformations allow to significantly reduce the data sample dimensionality.
The developed indicators of transformation quality allow to select the best transformation for the corresponding task providing thereby the data dimensionality reduction, also as class separability improving.
The proposed transformations can be recommended for use in the construction of diagnostic and recognizing models by precedents, as well as for the formation of the training samples from the source samples of large volume.-the integral criterion of the minimum of collisionsmaximum of a generalized axis establishing feasibilitycompactness-separability of classes and the maximum of average of relative distances products on the generalized axis and in the original feature space:

СONCLUSION
The actual problem of the development of mathematical support for data dimensionality reduction was solved in the paper.Its results can be used to automate the process of diagnostic and recognizing model construction by precedents.
The scientific novelty of results consists in that: -the set of rapid transformations from the original multidimensional space into one-dimensional axis was firstly proposed.It is based on the principles of hashing and provides taking into account the instance locations in the feature space with respect to the class centers of gravity, and also allows to determine and to take into account the feature weights and thereby implicitly solves the problem of feature selection.Thus, the proposed transformations provide a solution both the problem of constructing of  artificial features (feature extraction), and the problem selection of the most significant features (feature selection); -the complex of indicators characterizing the properties of transformations from multidimensional space to generalized axis was firstly proposed.On the basis of the proposed indicators the set of criteria is defined.It facilitates comparison and selection of the best transformations and results of their work at diagnosis and recognition problems solving by precedents.
The practical significance of obtained results is that: -the software realizing proposed transformations and indicators characterizing their properties was developed.Its usage allows to automate the data dimensionality reduction and analysis of its results; -the experimental investigation of the proposed transformations and the indicators characterizing them was conducted at practical problem solving.The results of research allow to recommend the proposed transformations The corrected number of pairwise collision of instances of different classes after the transformation of training and (or) test sample to the generalized axis is proposed to determine as: E <x,y> is a number of pairwise collision of instances of different classes in the original sample: of clusters per class on a generalized axis can be calculated by the formula K k k = , where k is a number of clusters of different classes on a generalized axis.
criterion of minimum of collisionscompactness-separability of classes: criterion of minimum of collision-maximum of compactness-separability of classes and maximum of average of relative distances products on the generalized axis and in the original feature space:

Table 1 .
Characteristics of initial data samples

Table 2 .
The fragment of the experimental results to study the transformations on generalized axis