A COMPARATIVE STUDY OF CLUSTER VALIDITY INDICES
DOI:
https://doi.org/10.15588/1607-3274-2019-4-6Keywords:
Сluster validity indices, cluster, clustering.Abstract
Context. Cluster analysis is a method of classification without a teacher, that is, under conditions where preliminary informationon the number of clusters is previously unknown. Therefore, defining the optimal number of clusters and test results of partitioning
data sets is a complex task and requires further research.
Objective. The aim of paper is to study the efficiency of finding the natural data structure by crisp and fuzzy clustering validity
indices, when the partition is realized by the clustering method based on fuzzy binary relations and conducting their comparative
analysis.
Method. For partition of data sets the method based on fuzzy binary relation was used that provides an opportunity to
simultaneously conduct crisp and fuzzy grouping of objects by different types of similarity measures. The distance similarity
measure, which divides data into ellipsoid clusters, is used in the research. Two synthetic 2-dimensional data sets of a special type
are generated, natural clustering of which is possible in two ways. Both sets are Gaussian. The most effective and frequently used
groups of crisp and fuzzy cluster validity indices, which allow to find the optimal data set structure are described.
Results. The study of estimating the quality of clustering was conducted by means of method of fuzzy binary relations with six
indices in two data sets. A comparative analysis of the effectiveness of determining the cluster and sub-cluster data structures by
validity indices is made.
Conclusions. In practice, for some cluster validity indexes it is important to find not only the global extreme, but also local ones.
They can fix the optimal sub-cluster data structure with less separation. To ensure the effectiveness of estimating the quality of
clustering and to obtain objective results it is appropriate to take into account not only one index, but several of them. In perspective studies, creating a combined criterion that would join the most effective cluster validity indices by means of method based on fuzzy binary relations by a distance similarity measure is anticipated as well as implementing generalized cluster validity index for any similarity measures of fuzzy binary relations method; developing a software system that would ensure the automatic grouping of objects into clusters by concentric spheres, cones, ellipses without the preliminary determination of the clustering threshold.
References
Kondruk N. E. Decision Support System for automated diets, Management of Development of Complex Systems, 2015, Issue. 23(1), pp. 110–114.
Kondruk N. Clustering method based on fuzzy binary relation, Eastern-European Journal of Enterprise Technologies, 2017, Vol. 2, No. 4(86), pp. 10–16. DOI:10.15587/1729–4061.2017.94961
Ghosh A., De Rajat K. Identification of certain cancermediating genes using Gaussian fuzzy cluster validity index Journal of biosciences, 2015, Vol. 40, No. 4, pp. 741–754. DOI: 10.1007/s12038-015-9557-x
Vendramin L., Campello R. J. G. B., Hruschka E. R. Relative clustering validity criteria: A comparative overview, Statistical analysis and data mining: the ASA data science journal, 2010, Vol. 3, No. 4, pp. 209–235. DOI:10.1002/sam.10080
Meroufel H., Mahi H., Farhi N. Comparative Study between Validity Indices to Obtain the Optimal Cluster, International Journal of Computer Electrical Engineering, 2017, Vol. 9, No. 1, pp. 1–8. DOI: 10.17706/IJCEE.2017.9.1.343-350
Tomasini C. A Study on the Relationship between Internal and External Validity Indices Applied to Partitioning and Density-based Clustering Algorithms, ICEIS (1), 2017, pp. 89–98. DOI: 10.5220/0006317000890098
Rousseeuw P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of computational and applied mathematics, 1987, Vol. 20, pp. 53–65. DOI: 10.1016/0377-0427(87)90125-7
Luna-Romera J. M., García-Gutiérrez J., Martínez-Ballesteros M., Santos J. An approach to validity indices for clustering techniques in Big Data, Progress in Artificial Intelligence, 2018, Vol. 7, No. 2, pp. 1–14. DOI:10.1007/s13748-017-0135-3
Sivogolovko E. V. Methods for assessing the quality of clear clustering, Computer tools in education, 2011, No. 4 (96), pp. 14–31.
Dunn J. C. Well-separated clusters and optimal fuzzy partitions, Journal of cybernetics, 1974, Vol. 4, No. 1, pp. 95–104. DOI: 10.1080/01969727408546059
Estiri H., Omran B. A., Murphy S. N. Kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning, Big data research, 2018, Vol. 13, pp. 38–51. DOI: 10.1016/j.bdr.2018.05.003
Zhao Q., Hautamaki V., Fränti P. Knee point detection in BIC for detecting the number of clusters, International conference on advanced concepts for intelligent vision systems: ACIVS 2008, LNCS 5259, 2008, pp. 664–673. DOI:10.1007/978-3-540-88458-3_60
Fraley C., Raftery A. E. How many clusters? Which clustering method? Answers via model-based cluster analysis, The computer journal, 1998, Vol. 41, No. 8, pp. 578–588. DOI: 10.1093/comjnl/41.8.578
Gamarra D. F. T. Fuzzy image segmentation using validity indexes correlation, International Journal of Computer Science and Information Technology, 2015, Vol. 7, No. 3, pp. 15–26.
Zhou K., Ding S., Fu C., Yang S. L. Comparison and weighted summation type of fuzzy cluster validity indices, International Journal of Computers Communications & Control, 2014, Vol. 9, No. 3, pp. 370–378. DOI:10.15837/ijccc.2014.3.237
Meroufel H., Mahi H., Farhi N. Comparative Study between Validity Indices to Obtain the Optimal Cluster, International Journal of Computer Electrical Engineering, 2017, Vol. 9, No. 1, pp. 1–8.
Xie X. L., Beni B G. A validity measure for fuzzy clustering, IEEE Transactions on Pattern Analysis & Machine Intelligence, 1991, Vol. 13, No. 8, pp. 841–847. DOI: 10.1109/34.85677
Bensaid A. M., Hall L. O., Bezdek J. C. Validity-guided (re)clustering with applications to image segmentation,
IEEE Transactions on Fuzzy Systems, 1996, Vol. 4, No. 2, pp. 112–123. DOI: 10.1109/91.493905
Kondruk, N. Е. Some methods of automatic grouping of objects, Eastern-European Jornal of Enterprise Technologies, 2014, Vol. 2, Issue № 4 (68), pp. 20–24.
Kondruk N. Use of length-based similarity measure in clustering problems, Radio Electronics, Computer Science, Control, 2018, No. 3 (46), pp. 98–105. DOI:10.15588/1607-3274-2018-3-11.
Teklehaymanot F. K., Muma M., Zoubir A. M. Novel Bayesian cluster enumeration criterion for cluster analysis with finite sample penalty term, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, 15–20 April 2018, Calgary, ICASSP, 2018, pp. 4274–4278.
DOI: 10.1109/ICASSP.2018.8462172
Ren M., Peiyu L., Zhihao W., Jing Y. A self-adaptive fuzzy c-means algorithm for determining the optimal number of clusters, Computational intelligence and neuroscience, 2016, Vol. 2016, pp. 1–12.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 N. E. Kondruk
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.