CLUSTERIZATION OF DATA ARRAYS BASED ON COMBINED OPTIMIZATION OF DISTRIBUTION DENSITY FUNCTIONS AND THE EVOLUTIONARY METHOD OF CAT SWARM

Authors

  • Ye. V. Bodyanskiy Kharkiv National University of Radio Electronics, Kharkiv, Ukraine, Ukraine
  • I. P. Pliss Kharkiv National University of Radio Electronics, Kharkiv, Ukraine, Ukraine
  • A. Yu. Shafronenko Kharkiv National University of Radio Electronics, Kharkiv, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2022-4-5

Keywords:

fuzzy clustering, density peak of dataset, evolutionary method

Abstract

Context. The task of clustering arrays of observations of an arbitrary nature is an integral part of Data Mining, and in the more general case of Data Science, a huge number of approaches have been proposed for its solution, which differ from each other both in a priori assumptions regarding the physical nature of the data and the problem, and in the mathematical apparatus. From a computational point of view, the clustering problem turns into a problem of finding local extrema of a multiextremal function of the vector density argument using gradient procedures that are repeatedly launched from different points of the initial data array. It is possible to speed up the process of searching for these extremes by using the ideas of evolutionary optimization, which includes algorithms inspired by nature, swarm algorithms, population algorithms, etc.

Objective. The purpose of the work is to introduce a data clustering procedure based on the peaks of the data distribution density and the evolutionary method of cat swarms, that combines the main advantages of methods for working with data in conditions of overlapping classes, is characterized by high-quality clustering, high speed and accuracy of the obtained results.

Method. The method for clustering data arrays based on the combined optimization of distribution density functions and the evolutionary method of cat swarms was proposed. The advantage of the proposed approach is to reduce the time for solving optimization problems in conditions where clusters are overlap.

Results. The results of the experiments confirm the effectiveness of the proposed approach in clustering problems under the condition of classes that overlap and allow us to recommend the proposed method for use in practice to solve problems of automatic clustering big data.

Conclusions. The method for clustering data arrays based on the combined optimization of distribution density functions and the evolutionary method of cat swarm was proposed. The advantage of the proposed approach is to reduce the time for solving optimization problems in conditions where clusters are overlap. The method is quite simple from the numerical implementation and is not critical for choosing an optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under conditions of overlapping clusters.

Author Biographies

Ye. V. Bodyanskiy, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

Dr. Sc., Professor at the Department of Artificial Intelligence

I. P. Pliss, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

PhD, Leading Researcher at Control Systems Research Laboratory

A. Yu. Shafronenko, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

PhD, Associate Professor at the Department of Informatics

References

Gan G. Ma Ch., Wu J..Data Clustering: Theory, Algorithms and Applications. Philadiphia, Pensilvania, SIAM, 2007, 455 p. DOI: https://doi.org/10.1137/ 1.9780898718348

Abonyi J., Feil D. Cluster Analisis for Data Mining and System Identification. Basel, Birlhause, 2007, 303 p.

Xu R., Wunsch D. C. Clustering. Hoboken N. J., John Wiley & Sons, Inc., 2009, 398 p.

Aggarwal C. C. Data Mining. Switzerland, Springer, 2015, 727 p. DOI https://doi.org/ 10.1007 / 978-3-319-14142-8.

Engelbrecht A. Computational intelligence: an introduction. Sidney, John Wiley & Sons, 2007, 597 p.

Rutkowski L. Computational Intelligence Methods and Techniques. Berlin Heidelberg, Springer-Verlag, 2008, 514 p.

Kroll A. Computational Intelligence. Eine Einfürung in Problelme, Methoden and Tchnische Anwendungen. München, Oldenbourg Verlag, 2013, 428 p.

Kohonen T. Self-Organizing Maps. Berlin, Springer, 1995, 362 p. DOI: 10.1007/978-3-642-56927-2.

Hinneburg A., Klein D. An efficient approach to clustering in large multimedia databases with noise, Proc. 4th Int. Conf. in Knowledge Discovering and Data Mining, KDD98, N.Y., AAAI Press, Aug. 27, 1998. Hinneburg, 1998, pp. 58–65.

Hinneburg A., Gabriel H. H. Denclue 2.0: Fast clustering based on kernel density estimation, International symposium on intelligent data analysis. Springer, Berlin, Heidelberg, 2007, pp. 70– 80. https://doi.org/10.1007 /978-3-540-74825-0_7

Hinneburg A., Keim D. A. A general approach to clustering in large databases with noise-knowledge and Identification Systems, 2003, 5 (4), pp. 387–415. https://doi.org/10.1007 /s10115-003-0086-9

Rehioui H. et al. DENCLUE-IM: A new approach for big data clustering, Procedia Computer Science, 2016, Vol. 83, pp. 560– 567. DOI: 10.1016/j.procs.2016.04.2 65

Epanechnikov V. A. Nonparametric estimation of multivariate probability density, Probability theory and its Application, 1968, 14, No. 2, pp. 156–161.

Nadaraya E. A. On non-parametric estimates of density functions and regression curves, Theory of Probability & Its Applications, 1965, Vol. 10, No. 1, pp. 186–190.

Watson G. S. Smooth regression analysis, Sankhyā: The Indian Journal of Statistics, Series A, 1964, pp. 359–372.

Grosan C., Abraham A., Chis M. Swarm intelligence in Data Mining, Studies in Computational Intelligence, 2006, № 34, pp. 1–20.

Shafronenko A. Yu, Bodyanskiy Ye. V., Pliss I. P. The Fast Modification of Evolutionary Bioinspired Cat Swarm Optimization Method [Electronic resource], 2019 IEEE 8th International Conference on Advanced Optoelectronics and Lasers (CAOL), 2019. Sozopol, Bulgaria, 2019, pp. 548–552. DOI: 10.1109 /CAOL46282. 2019.9019583

Eiben A., Smith J. Introduction to Evolutionary Computing. Heidelberg, Springer, 2003.

Karpenko A. P. Population algorithms for global continious optimization. Review of new and little-known algorithms, Supplement to the journal “Information, Technologies”, 2012, No. 7, 32 p.

Chu S.-C., Tsai P.-W., Pan J. S. Cat swarm optimization, Lecture Notes in Artificial Intelligence, 4099. Berlin Heidelberg, Springer-Verlag, 2006, pp. 854–858.

Chu S.-C., Tsai P.-W. Computational Intelligence based on the behavior of cats, International Journal of Innovative Computing, Information, and Control, 2007, Vol. 3, № 1, pp. 163–173.

Shafronenko A., Bodyanskiy Ye., Pliss I., Klymova I. Online Credibilistic Fuzzy Clustering Method Based on Cauchy Density Distribution Function, 2021 11th International Conference on Advanced Computer Information Technologies (ACIT), proceedings. Deggendorf, Germany, IEEE, 2021, pp. 704–707. DOI: 10.1109/ ACIT52158.2021.9548572

Published

2022-12-05

How to Cite

Bodyanskiy, Y. V., Pliss, I. P., & Shafronenko, A. Y. (2022). CLUSTERIZATION OF DATA ARRAYS BASED ON COMBINED OPTIMIZATION OF DISTRIBUTION DENSITY FUNCTIONS AND THE EVOLUTIONARY METHOD OF CAT SWARM . Radio Electronics, Computer Science, Control, (4), 61. https://doi.org/10.15588/1607-3274-2022-4-5

Issue

Section

Neuroinformatics and intelligent systems