OPTIMIZATION OF INFORMATION PREPROCESSING IN CLUSTERING SYSTEMS OF HIGH DIMENSION DATA

Authors

  • S. A. Babichev Kherson National Technical University, Kherson, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2014-2-19

Keywords:

сlustering, the feature space dimension, normalization, entropy

Abstract

The methodic of choice of optimal normalization method for object cluster structure of creation, with high dimension of feature space, is shown. The Shannon entropy criterion and entropy relative change were used as main criterions of estimating the data preprocessing quality during the data transformation. Decreasing of feature space dimension of tested objects was realized by component analysis. Model of system clustering with the use of fuzzy C-means algorithm was constructed, which the help of whith the estimate of clustering quality was established by the use of different data preprocessing methods. It’s shown that the best normalization method for tested data is decimal-scaling method, by which the entropy of processed signal gets minimal significance, and relative change of entropy doesn’t exceed permissible norms during the process of data transformation by component analysis.

References

Shannon C. E. А mathematical theory of communication / C. E. Shannon // Bell System Technical Journal. – 1948. – Vol. 27. – P. 379–423, 623–656. 2. Shore J. E. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy / J. E. Shore, R. W. Johnson // IEEE Transactions on Information theory. – 1980. – Vol. IT–26. – P. 26–37. 3. Data Analysis of Bio-Medical Data Mining using Enhanced Hierarchical Agglomerative Clustering / [Krishnaiah J. V., Chandra Sekar D. V., Ramchand K., Rao H.] // International Journal of Engineering and Innovative Technology. – 2012. – Vol. 2, Issue 3. – P. 43–49. 4. Liang J. Computational analysis of microarray gene expression profiles: clustering, classification, and beyond / J. Liang, S. Kachalo // Chemometrics and Intelligent Laboratory Systems. – 2002. – No. 62. – P. 199–216. 5. Rezankova H. Cluster analysis of economic data / H. Rezankova // Statistica. – 2014. – No. 94(1). – P. 73–86. 6. Li Y. Text document clustering based on frequent word meaning sequences / Y. Li, S. M. Chung, J. D. Holt // Data & Knowledge Engineering. – 2008. – No. 64(1). – Р. 381–404. 7. Jain А. K. Data clustering: A review / A. K. Jain, M. N. Murty, P. J. Flynn // ACM Computing Surveys. – 1999. – Vol. 31, No. 3. – P. 264–323. 8. Ивахненко А. Г. Объективная кластеризация на основе теории самоорганизации моделей / А. Г. Ивахненко // Ав- томатика. – 1987. – № 5. – С. 6–15.

Published

2014-11-04

How to Cite

Babichev, S. A. (2014). OPTIMIZATION OF INFORMATION PREPROCESSING IN CLUSTERING SYSTEMS OF HIGH DIMENSION DATA. Radio Electronics, Computer Science, Control, (2). https://doi.org/10.15588/1607-3274-2014-2-19

Issue

Section

Progressive information technologies