SAMPLE FORMATION AND REDUCTION FOR DATA MINING
Keywords:sample, example selection, data reduction, data mining, data dimensionality reduction
AbstractIn data mining problem solving it has to operate with a large amount of data samples. This entails a significant amount of time to process the data. Therefore, an urgent task is to reduce the dimensionality of the data samples. The aim of paper is to provide a method for the formation and reduction of samples, allowing to handle a large amount of the original sample. The problem of sample formation and reduction for data mining was solved. The scientific novelty of the work lies in the fact that the method of sample formation and reduction is firstly proposed. It provides a saving of the most important topological properties of original sample in the formed sub-sample without the need for downloading the original sample to the computer memory, and without numerous passages of the original sample. It allows to reduce the size of the sample and to reduce the resource requirements of a computer. The practical significance of the work lies in the development of software, which implements the proposed method of sample formation and reduction, also as conducting of experiments on research of proposed method to solve practical problems, the results of which allows to recommend the developed method for use in practice in solving problems of data mining. Using the proposed method one can significantly reduce the amount of a sample (in 7,7–12,5 times), without the need to download the original sample into computer memory, providing preservation in the generated sub-sample the most important for analysis of the topological properties of the original sample.
How to Cite
Copyright (c) 2014 S. A. Subbotin
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.