CREDIBILISTIC FUZZY CLUSTERING BASED ON ANALYSIS OF DATA DISTRIBUTION DENSITY AND THEIR PEAKS

Authors

  • Ye. V. Bodyanskiy Kharkiv National University of Radio Electronics, Kharkiv, Ukraine, Ukraine
  • I. P. Pliss Kharkiv National University of Radio Electronics, Kharkiv, Ukraine, Ukraine
  • A. Yu. Shafronenko Kharkiv National University of Radio Electronics, Kharkiv, Ukraine, Ukraine
  • O. V. Kalynychenko Kharkiv National University of Radio Electronics, Kharkiv, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2022-3-6

Keywords:

fuzzy clustering, credibilistic clustering, density peak of dataset.

Abstract

Context. The task of clustering – classification without a teacher of data arrays occupies a rather important place in Data Mining. To solve this problem, many approaches have been proposed at the moment, differing from each other in a priori assumptions in the studied and analyzed arrays, in the mathematical apparatus that is the basis of certain methods. The solution of clustering problems is complicated by the large dimension of the vectors of the analyzed observations, their distortion of various types.

Objective. The purpose of the work is to introduce a fuzzy clustering procedure that combines the advantages of methods based on the analysis of data distribution densities and their peaks, which are characterized by high speed and can work effectively in conditions of classes that overlapping.

Method. The method of fuzzy clustering of data arrays, based on the ideas of analyzing the distribution densities of these data, their peaks, and a confidence fuzzy approach has been introduced. The advantage of the proposed approach is to reduce the time for solving optimization problems related to finding attractors of density functions, since the number of calls to the optimization block is determined not by the volume of the analyzed array, but by the number of density peaks of the same array.

Results. The method is quite simple in numerical implementation and is not critical to the choice of the optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under the condition of cluster intersection and allow us to recommend the proposed method for practical use in solving problems of automatic clustering of large data volumes.

Conclusions. The method is quite simple in numerical implementation and is not critical to the choice of the optimization procedure. The advantage of the proposed approach is to reduce the time for solving optimization problems related to finding attractors of density functions, since the number of calls to the optimization block is determined not by the volume of the analyzed array, but by the number of density peaks of the same array. The method is quite simple in numerical implementation and is not critical to the choice of the optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under conditions of overlapping clusters.

Author Biographies

Ye. V. Bodyanskiy, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

Dr. Sc., Professor at the Department of Artificial Intelligence

I. P. Pliss, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

PhD, Leading Researcher at Control Systems Research Laboratory

A. Yu. Shafronenko, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

PhD, Associate Professor Professor at the Department of Informatics

O. V. Kalynychenko, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

PhD, Associate Professor Professor at the Department of Software Engineering

References

Gan G., Ma Ch., Wu J. Data Clustering: Theory, Algorithms and Applications. Philadiphia, Pensilvania, SIAM, 2007, 455 p.

Abonyi J., Feil D. Cluster Analisis for Data Mining and System Identification. Basel, Birlhause, 2007, 303 p.

Xu R., Wunsch D. C. Clustering. Hoboken N.J., John Wiley & Sons, Inc., 2009, 398 p.

Aggarwal C. C. Data Mining. Switzerland, Springer, 2015, 727 p. DOI https://doi.org/ 10.1007 / 978-3-319-14142-8.

Höppner F., Klawonn F., Kruse R., Runkler T. Fuzzy Clustering Analysis: Methods for Classification, Data Analisys and Image Recognition. Chichester, John Wiley &Sons, 1999, 300 p.

Bezdek J. C. et al. Fuzzy models and algorithms for pattern recognition and image processing. Springer Science & Business Media, 1999, Vol. 4.

Hinneburg A., Klein D. An efficient approach to clustering in large multimedia databases with noise, Proc. 4th Int. Conf. in Knowledge Discovering and Data Mining, KDD98, N.Y.: AAAI Press, Aug. 27, 1998. Hinneburg, 1998, pp. 58– 65.

Hinneburg A., Gabriel HH. In: R. Berthold, M., ShaweTaylor, J., Lavrač, N. (eds) DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation. Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, Vol. 4723. Springer. Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_7

Hinneburg A., Keim D. A. A general approach to clustering in large databases with noise, Knowledge and Identification Systems, 2003, 5 (4), pp. 387–415. https://doi.org/10.1007 /s10115-003-0086-9

Rehhioni H., Idrissi A., Abourezq M., Zegrary F. DENCLUE-IM: A new approachfor big data clustering, Procedia Computer Science, 2016, 83, pp. 560–567.

Rodriguez A., Laio A. Clustering by fast seach and find of density peaks, Science, 2014, No. 34, pp. 1492–1496. https://doi.org/10.1126/science.124207

Shafronenko A., Bodyanskiy Ye., Pliss I., Klymova I. Online Credibilistic Fuzzy Clustering Method Based on Cauchy Density Distribution Function, 2021 11th International Conference on Advanced Computer Information Technologies (ACIT): proceedings. Deggendorf, Germany, IEEE, 2021, pp. 704–707. DOI: 10.1109/ ACIT52158.2021.9548572

Epanechnikov V. A. Nonparametric estimation of multivariate probability density, Probability theory and its Application, 1968, 14, No. 2, pp. 156–161.

Parzen E. On estimation of a probably density function and mode, The Annals of Math Statistics, 1962, 33, No. 3, pp. 1065–1076. http://dx.doi.org/10.1214/aoms/1177704472

Nadaraya E. A. On nonparametric estimates of density function and regression curves, Theory of Probabilistic Application, 1965, No. 10, pp. 186–190.

Watson G. S. Smoth regression analisys, The Indian Journal of Statistics. Sankhya, 1964, Ser. A, 26, No. 4, pp. 359–372.

Fukunaga K., Hostler L. D.// The estimation of the gradient of a density function with application in pattern recognition, IEEE Trans. on Inf. Theory, Jan., 1975, IEEE, 1975, No. 21 pp. 32–40. https://doi.org/ 10.1109/TIT.1975.10 55330.

Zhou J., Wang Q., Hung C.-C., Yi X. Credibilistic clustering: the model and algorithms, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2015, Vol. 23, No. 4, pp. 545–564. https://doi.org/ 10.1142/S0218488515500245

Zhou J., Wang Q., Hung C. C. Credibilistic clustering algorithms via alternating cluster estimation, Journal of Intelligent Manufacturing, 2017, Vol. 28, pp. 727–738. DOI: https://doi.org/10.1007/s10845-014-1004-6.

Shafronenko A., Bodyanskiy Ye., Klymova I., Holovin O.] Online credibilistic fuzzy clustering of data using membership functions of special type[Electronic resource, Proceedings of The Third International Workshop on Computer Modeling and Intelligent Systems (CMIS-2020), April 27-1 May 2020. Zaporizhzhia, 2020. Access mode: http://ceurws.org/Vol-2608/paper56.pdf.

Published

2022-10-16

How to Cite

Bodyanskiy, Y. V., Pliss, I. P., Shafronenko, A. Y., & Kalynychenko, O. V. (2022). CREDIBILISTIC FUZZY CLUSTERING BASED ON ANALYSIS OF DATA DISTRIBUTION DENSITY AND THEIR PEAKS . Radio Electronics, Computer Science, Control, (3), 58. https://doi.org/10.15588/1607-3274-2022-3-6

Issue

Section

Neuroinformatics and intelligent systems