OPTIMIZATION OF INFORMATION PREPROCESSING IN CLUSTERING SYSTEMS OF HIGH DIMENSION DATA
DOI:
https://doi.org/10.15588/1607-3274-2014-2-19Keywords:
сlustering, the feature space dimension, normalization, entropyAbstract
The methodic of choice of optimal normalization method for object cluster structure of creation, with high dimension of feature space, is shown. The Shannon entropy criterion and entropy relative change were used as main criterions of estimating the data preprocessing quality during the data transformation. Decreasing of feature space dimension of tested objects was realized by component analysis. Model of system clustering with the use of fuzzy C-means algorithm was constructed, which the help of whith the estimate of clustering quality was established by the use of different data preprocessing methods. It’s shown that the best normalization method for tested data is decimal-scaling method, by which the entropy of processed signal gets minimal significance, and relative change of entropy doesn’t exceed permissible norms during the process of data transformation by component analysis.
References
Shannon C. E. А mathematical theory of communication / C. E. Shannon // Bell System Technical Journal. – 1948. – Vol. 27. – P. 379–423, 623–656. 2. Shore J. E. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy / J. E. Shore, R. W. Johnson // IEEE Transactions on Information theory. – 1980. – Vol. IT–26. – P. 26–37. 3. Data Analysis of Bio-Medical Data Mining using Enhanced Hierarchical Agglomerative Clustering / [Krishnaiah J. V., Chandra Sekar D. V., Ramchand K., Rao H.] // International Journal of Engineering and Innovative Technology. – 2012. – Vol. 2, Issue 3. – P. 43–49. 4. Liang J. Computational analysis of microarray gene expression profiles: clustering, classification, and beyond / J. Liang, S. Kachalo // Chemometrics and Intelligent Laboratory Systems. – 2002. – No. 62. – P. 199–216. 5. Rezankova H. Cluster analysis of economic data / H. Rezankova // Statistica. – 2014. – No. 94(1). – P. 73–86. 6. Li Y. Text document clustering based on frequent word meaning sequences / Y. Li, S. M. Chung, J. D. Holt // Data & Knowledge Engineering. – 2008. – No. 64(1). – Р. 381–404. 7. Jain А. K. Data clustering: A review / A. K. Jain, M. N. Murty, P. J. Flynn // ACM Computing Surveys. – 1999. – Vol. 31, No. 3. – P. 264–323. 8. Ивахненко А. Г. Объективная кластеризация на основе теории самоорганизации моделей / А. Г. Ивахненко // Ав- томатика. – 1987. – № 5. – С. 6–15.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2015 S. A. Babichev
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.