SAMPLE FORMATION AND REDUCTION FOR DATA MINING
In data mining problem solving it has to operate with a large amount of data samples. This entails a significant amount of time to process the data. Therefore, an urgent task is to reduce the dimensionality of the data samples. The aim of paper is to provide a method for the formation and reduction of samples, allowing to handle a large amount of the original sample. The problem of sample formation and reduction for data mining was solved. The scientific novelty of the work lies in the fact that the method of sample formation and reduction is firstly proposed. It provides a saving of the most important topological properties of original sample in the formed sub-sample without the need for downloading the original sample to the computer memory, and without numerous passages of the original sample. It allows to reduce the size of the sample and to reduce the resource requirements of a computer. The practical significance of the work lies in the development of software, which implements the proposed method of sample formation and reduction, also as conducting of experiments on research of proposed method to solve practical problems, the results of which allows to recommend the developed method for use in practice in solving problems of data mining. Using the proposed method one can significantly reduce the amount of a sample (in 7,7–12,5 times), without the need to download the original sample into computer memory, providing preservation in the generated sub-sample the most important for analysis of the topological properties of the original sample.
sample, example selection, data reduction, data mining, data dimensionality reduction
Copyright (c) 2014 S. A. Subbotin
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
Zaporizhzhya National Technical University,
Zhukovskiy street, 64, Zaporizhzhya, 69063, Ukraine.
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
The reference to the journal is obligatory in the cases of complete or partial use of its materials.