CLUSTERIZATION OF DATA ARRAYS BASED ON COMBINED OPTIMIZATION OF DISTRIBUTION DENSITY FUNCTIONS AND THE EVOLUTIONARY METHOD OF CAT SWARM
DOI:
https://doi.org/10.15588/1607-3274-2022-4-5Keywords:
fuzzy clustering, density peak of dataset, evolutionary methodAbstract
Context. The task of clustering arrays of observations of an arbitrary nature is an integral part of Data Mining, and in the more general case of Data Science, a huge number of approaches have been proposed for its solution, which differ from each other both in a priori assumptions regarding the physical nature of the data and the problem, and in the mathematical apparatus. From a computational point of view, the clustering problem turns into a problem of finding local extrema of a multiextremal function of the vector density argument using gradient procedures that are repeatedly launched from different points of the initial data array. It is possible to speed up the process of searching for these extremes by using the ideas of evolutionary optimization, which includes algorithms inspired by nature, swarm algorithms, population algorithms, etc.
Objective. The purpose of the work is to introduce a data clustering procedure based on the peaks of the data distribution density and the evolutionary method of cat swarms, that combines the main advantages of methods for working with data in conditions of overlapping classes, is characterized by high-quality clustering, high speed and accuracy of the obtained results.
Method. The method for clustering data arrays based on the combined optimization of distribution density functions and the evolutionary method of cat swarms was proposed. The advantage of the proposed approach is to reduce the time for solving optimization problems in conditions where clusters are overlap.
Results. The results of the experiments confirm the effectiveness of the proposed approach in clustering problems under the condition of classes that overlap and allow us to recommend the proposed method for use in practice to solve problems of automatic clustering big data.
Conclusions. The method for clustering data arrays based on the combined optimization of distribution density functions and the evolutionary method of cat swarm was proposed. The advantage of the proposed approach is to reduce the time for solving optimization problems in conditions where clusters are overlap. The method is quite simple from the numerical implementation and is not critical for choosing an optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under conditions of overlapping clusters.
References
Gan G. Ma Ch., Wu J..Data Clustering: Theory, Algorithms and Applications. Philadiphia, Pensilvania, SIAM, 2007, 455 p. DOI: https://doi.org/10.1137/ 1.9780898718348
Abonyi J., Feil D. Cluster Analisis for Data Mining and System Identification. Basel, Birlhause, 2007, 303 p.
Xu R., Wunsch D. C. Clustering. Hoboken N. J., John Wiley & Sons, Inc., 2009, 398 p.
Aggarwal C. C. Data Mining. Switzerland, Springer, 2015, 727 p. DOI https://doi.org/ 10.1007 / 978-3-319-14142-8.
Engelbrecht A. Computational intelligence: an introduction. Sidney, John Wiley & Sons, 2007, 597 p.
Rutkowski L. Computational Intelligence Methods and Techniques. Berlin Heidelberg, Springer-Verlag, 2008, 514 p.
Kroll A. Computational Intelligence. Eine Einfürung in Problelme, Methoden and Tchnische Anwendungen. München, Oldenbourg Verlag, 2013, 428 p.
Kohonen T. Self-Organizing Maps. Berlin, Springer, 1995, 362 p. DOI: 10.1007/978-3-642-56927-2.
Hinneburg A., Klein D. An efficient approach to clustering in large multimedia databases with noise, Proc. 4th Int. Conf. in Knowledge Discovering and Data Mining, KDD98, N.Y., AAAI Press, Aug. 27, 1998. Hinneburg, 1998, pp. 58–65.
Hinneburg A., Gabriel H. H. Denclue 2.0: Fast clustering based on kernel density estimation, International symposium on intelligent data analysis. Springer, Berlin, Heidelberg, 2007, pp. 70– 80. https://doi.org/10.1007 /978-3-540-74825-0_7
Hinneburg A., Keim D. A. A general approach to clustering in large databases with noise-knowledge and Identification Systems, 2003, 5 (4), pp. 387–415. https://doi.org/10.1007 /s10115-003-0086-9
Rehioui H. et al. DENCLUE-IM: A new approach for big data clustering, Procedia Computer Science, 2016, Vol. 83, pp. 560– 567. DOI: 10.1016/j.procs.2016.04.2 65
Epanechnikov V. A. Nonparametric estimation of multivariate probability density, Probability theory and its Application, 1968, 14, No. 2, pp. 156–161.
Nadaraya E. A. On non-parametric estimates of density functions and regression curves, Theory of Probability & Its Applications, 1965, Vol. 10, No. 1, pp. 186–190.
Watson G. S. Smooth regression analysis, Sankhyā: The Indian Journal of Statistics, Series A, 1964, pp. 359–372.
Grosan C., Abraham A., Chis M. Swarm intelligence in Data Mining, Studies in Computational Intelligence, 2006, № 34, pp. 1–20.
Shafronenko A. Yu, Bodyanskiy Ye. V., Pliss I. P. The Fast Modification of Evolutionary Bioinspired Cat Swarm Optimization Method [Electronic resource], 2019 IEEE 8th International Conference on Advanced Optoelectronics and Lasers (CAOL), 2019. Sozopol, Bulgaria, 2019, pp. 548–552. DOI: 10.1109 /CAOL46282. 2019.9019583
Eiben A., Smith J. Introduction to Evolutionary Computing. Heidelberg, Springer, 2003.
Karpenko A. P. Population algorithms for global continious optimization. Review of new and little-known algorithms, Supplement to the journal “Information, Technologies”, 2012, No. 7, 32 p.
Chu S.-C., Tsai P.-W., Pan J. S. Cat swarm optimization, Lecture Notes in Artificial Intelligence, 4099. Berlin Heidelberg, Springer-Verlag, 2006, pp. 854–858.
Chu S.-C., Tsai P.-W. Computational Intelligence based on the behavior of cats, International Journal of Innovative Computing, Information, and Control, 2007, Vol. 3, № 1, pp. 163–173.
Shafronenko A., Bodyanskiy Ye., Pliss I., Klymova I. Online Credibilistic Fuzzy Clustering Method Based on Cauchy Density Distribution Function, 2021 11th International Conference on Advanced Computer Information Technologies (ACIT), proceedings. Deggendorf, Germany, IEEE, 2021, pp. 704–707. DOI: 10.1109/ ACIT52158.2021.9548572
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Є. В. Бодянський, І. П. Плісс, А. Ю. Шафроненко
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.