DOI: https://doi.org/10.15588/1607-3274-2019-4-6

A COMPARATIVE STUDY OF CLUSTER VALIDITY INDICES

N. E. Kondruk

Abstract


Context. Cluster analysis is a method of classification without a teacher, that is, under conditions where preliminary information
on the number of clusters is previously unknown. Therefore, defining the optimal number of clusters and test results of partitioning
data sets is a complex task and requires further research.
Objective. The aim of paper is to study the efficiency of finding the natural data structure by crisp and fuzzy clustering validity
indices, when the partition is realized by the clustering method based on fuzzy binary relations and conducting their comparative
analysis.
Method. For partition of data sets the method based on fuzzy binary relation was used that provides an opportunity to
simultaneously conduct crisp and fuzzy grouping of objects by different types of similarity measures. The distance similarity
measure, which divides data into ellipsoid clusters, is used in the research. Two synthetic 2-dimensional data sets of a special type
are generated, natural clustering of which is possible in two ways. Both sets are Gaussian. The most effective and frequently used
groups of crisp and fuzzy cluster validity indices, which allow to find the optimal data set structure are described.
Results. The study of estimating the quality of clustering was conducted by means of method of fuzzy binary relations with six
indices in two data sets. A comparative analysis of the effectiveness of determining the cluster and sub-cluster data structures by
validity indices is made.
Conclusions. In practice, for some cluster validity indexes it is important to find not only the global extreme, but also local ones.
They can fix the optimal sub-cluster data structure with less separation. To ensure the effectiveness of estimating the quality of
clustering and to obtain objective results it is appropriate to take into account not only one index, but several of them. In perspective studies, creating a combined criterion that would join the most effective cluster validity indices by means of method based on fuzzy binary relations by a distance similarity measure is anticipated as well as implementing generalized cluster validity index for any similarity measures of fuzzy binary relations method; developing a software system that would ensure the automatic grouping of objects into clusters by concentric spheres, cones, ellipses without the preliminary determination of the clustering threshold.

Keywords


Сluster validity indices, cluster, clustering.

Full Text:

PDF

References


Kondruk N. E. Decision Support System for automated diets, Management of Development of Complex Systems, 2015, Issue. 23(1), pp. 110–114.

Kondruk N. Clustering method based on fuzzy binary relation, Eastern-European Journal of Enterprise Technologies, 2017, Vol. 2, No. 4(86), pp. 10–16. DOI:10.15587/1729–4061.2017.94961

Ghosh A., De Rajat K. Identification of certain cancermediating genes using Gaussian fuzzy cluster validity index Journal of biosciences, 2015, Vol. 40, No. 4, pp. 741–754. DOI: 10.1007/s12038-015-9557-x

Vendramin L., Campello R. J. G. B., Hruschka E. R. Relative clustering validity criteria: A comparative overview, Statistical analysis and data mining: the ASA data science journal, 2010, Vol. 3, No. 4, pp. 209–235. DOI:10.1002/sam.10080

Meroufel H., Mahi H., Farhi N. Comparative Study between Validity Indices to Obtain the Optimal Cluster, International Journal of Computer Electrical Engineering, 2017, Vol. 9, No. 1, pp. 1–8. DOI: 10.17706/IJCEE.2017.9.1.343-350

Tomasini C. A Study on the Relationship between Internal and External Validity Indices Applied to Partitioning and Density-based Clustering Algorithms, ICEIS (1), 2017, pp. 89–98. DOI: 10.5220/0006317000890098

Rousseeuw P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of computational and applied mathematics, 1987, Vol. 20, pp. 53–65. DOI: 10.1016/0377-0427(87)90125-7

Luna-Romera J. M., García-Gutiérrez J., Martínez-Ballesteros M., Santos J. An approach to validity indices for clustering techniques in Big Data, Progress in Artificial Intelligence, 2018, Vol. 7, No. 2, pp. 1–14. DOI:10.1007/s13748-017-0135-3

Sivogolovko E. V. Methods for assessing the quality of clear clustering, Computer tools in education, 2011, No. 4 (96), pp. 14–31.

Dunn J. C. Well-separated clusters and optimal fuzzy partitions, Journal of cybernetics, 1974, Vol. 4, No. 1, pp. 95–104. DOI: 10.1080/01969727408546059

Estiri H., Omran B. A., Murphy S. N. Kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning, Big data research, 2018, Vol. 13, pp. 38–51. DOI: 10.1016/j.bdr.2018.05.003

Zhao Q., Hautamaki V., Fränti P. Knee point detection in BIC for detecting the number of clusters, International conference on advanced concepts for intelligent vision systems: ACIVS 2008, LNCS 5259, 2008, pp. 664–673. DOI:10.1007/978-3-540-88458-3_60

Fraley C., Raftery A. E. How many clusters? Which clustering method? Answers via model-based cluster analysis, The computer journal, 1998, Vol. 41, No. 8, pp. 578–588. DOI: 10.1093/comjnl/41.8.578

Gamarra D. F. T. Fuzzy image segmentation using validity indexes correlation, International Journal of Computer Science and Information Technology, 2015, Vol. 7, No. 3, pp. 15–26.

Zhou K., Ding S., Fu C., Yang S. L. Comparison and weighted summation type of fuzzy cluster validity indices, International Journal of Computers Communications & Control, 2014, Vol. 9, No. 3, pp. 370–378. DOI:10.15837/ijccc.2014.3.237

Meroufel H., Mahi H., Farhi N. Comparative Study between Validity Indices to Obtain the Optimal Cluster, International Journal of Computer Electrical Engineering, 2017, Vol. 9, No. 1, pp. 1–8.

Xie X. L., Beni B G. A validity measure for fuzzy clustering, IEEE Transactions on Pattern Analysis & Machine Intelligence, 1991, Vol. 13, No. 8, pp. 841–847. DOI: 10.1109/34.85677

Bensaid A. M., Hall L. O., Bezdek J. C. Validity-guided (re)clustering with applications to image segmentation,

IEEE Transactions on Fuzzy Systems, 1996, Vol. 4, No. 2, pp. 112–123. DOI: 10.1109/91.493905

Kondruk, N. Е. Some methods of automatic grouping of objects, Eastern-European Jornal of Enterprise Technologies, 2014, Vol. 2, Issue № 4 (68), pp. 20–24.

Kondruk N. Use of length-based similarity measure in clustering problems, Radio Electronics, Computer Science, Control, 2018, No. 3 (46), pp. 98–105. DOI:10.15588/1607-3274-2018-3-11.

Teklehaymanot F. K., Muma M., Zoubir A. M. Novel Bayesian cluster enumeration criterion for cluster analysis with finite sample penalty term, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, 15–20 April 2018, Calgary, ICASSP, 2018, pp. 4274–4278.

DOI: 10.1109/ICASSP.2018.8462172

Ren M., Peiyu L., Zhihao W., Jing Y. A self-adaptive fuzzy c-means algorithm for determining the optimal number of clusters, Computational intelligence and neuroscience, 2016, Vol. 2016, pp. 1–12.


GOST Style Citations


1. Kondruk N. E. Decision Support System for automated diets / N. Е. Kondruk // Management of Development of
Complex Systems. – 2015. – Issue 23 (1). – P. 110–114.
2. Kondruk N. Clustering method based on fuzzy binary relation / N. Kondruk // Eastern-European Journal of Enterprise Technologies. – 2017. – Vol. 2, № 4 (86). – P. 10–16. DOI: 10.15587/1729–4061.2017.94961
3. Ghosh A. Identification of certain cancer-mediating genes using Gaussian fuzzy cluster validity index / A. Ghosh, K. De Rajat // Journal of biosciences. – 2015. – Vol. 40, № 4. – P. 741–754. DOI: 10.1007/s12038-015-9557-x
4. Vendramin L. Relative clustering validity criteria: A comparative overview / L. Vendramin, R. J. G. B. Campello, E. R. Hruschka // Statistical analysis and data mining: the ASA data science journal. – 2010. – Vol. 3, №. 4. – P. 209–235. DOI: 10.1002/sam.10080
5. Meroufel H. Comparative Study between Validity Indices to Obtain the Optimal Cluster / H. Meroufel, H. Mahi, N. Farhi // International Journal of Computer Electrical Engineering. – 2017. – Vol. 9, № 1. – P. 1–8. DOI:10.17706/IJCEE.2017.9.1.343-350
6. Tomasini C. A Study on the Relationship between Internal and External Validity Indices Applied to Partitioning and
Density-based Clustering Algorithms / C. Tomasini // ICEIS (1). – 2017. – P. 89–98. DOI: 10.5220/0006317000890098
7. Rousseeuw P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis / P. J. Rousseeuw // Journal of computational and applied mathematics. – 1987. – Vol. 20. – P. 53–65. DOI:10.1016/0377-0427(87)90125-7
8. An approach to validity indices for clustering techniques in Big Data / [J. M. Luna-Romera, J. García-Gutiérrez,
M. Martínez-Ballesteros, J. Santos] // Progress in Artificial Intelligence. – 2018. – Vol. 7, № 2. – P. 1–14. DOI:10.1007/s13748-017-0135-3
9. Sivogolovko E. V. Methods for assessing the quality of clear clustering / E. V. Sivogolovko // Computer tools in education. – 2011. – № 4 (96). – P. 14–31.
10. Dunn J. C. Well-separated clusters and optimal fuzzy partitions / J. C. Dunn // Journal of cybernetics. – 1974. –
Vol. 4, № 1. – P. 95–104. DOI:10.1080/01969727408546059
11. Estiri H. Kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning / H. Estiri, B. A. Omran, S. N. Murphy // Big data research. – 2018. – Vol. 13. – P. 38–51. DOI:10.1016/j.bdr.2018.05.003
12. Zhao Q. Knee point detection in BIC for detecting the number of clusters / Q. Zhao, V. Hautamaki, P. Fränti // International conference on advanced concepts for intelligent vision systems: ACIVS 2008, LNCS 5259, 2008. – P. 664–673. DOI: 10.1007/978-3-540-88458-3_60
13. Fraley C. How many clusters? Which clustering method? Answers via model-based cluster analysis / C. Fraley, A. E.
Raftery // The computer journal. – 1998. – Vol. 41, №. 8. – P. 578–588. DOI: 10.1093/comjnl/41.8.578
14. Gamarra D. F. T. Fuzzy image segmentation using validity indexes correlation/ D. F. T. Gamarra // International Journal of Computer Science and Information Technology. – 2015. – Vol. 7, №. 3. – P. 15–26.
15. Zhou K. Comparison and weighted summation type of fuzzy cluster validity indices / K. Zhou, S. Ding, C. Fu, S. L. Yang
// International Journal of Computers Communications & Control. – 2014. – Vol. 9, № 3. – P. 370–378. DOI:10.15837/ijccc.2014.3.237
16. Meroufel H. Comparative Study between Validity Indices to Obtain the Optimal Cluster / H. Meroufel, H. Mahi, N. Farhi // International Journal of Computer Electrical Engineering. – 2017. – Vol. 9, № 1. – P. 1–8.
17. Xie X. L. A validity measure for fuzzy clustering / X. L. Xie, B G. Beni // IEEE Transactions on Pattern
Analysis & Machine Intelligence. – 1991. – Vol. 13, № 8. – P. 841–847. DOI: 10.1109/34.85677
18. Bensaid A. M. Validity-guided (re)clustering with applications to image segmentation/ A. M. Bensaid, L. O. Hall, J. C. Bezdek // IEEE Transactions on Fuzzy Systems. – 1996. – Vol. 4, № 2. – P. 112–123. DOI:10.1109/91.493905
19. Kondruk N. Е. Some methods of automatic grouping of objects / N. Е. Kondruk // Eastern-European Jornal of
Enterprise Technologies. – 2014. – Vol. 2, Issue № 4 (68). – P. 20–24.
20. Kondruk N. Use of length-based similarity measure in clustering problems/ N. Kondruk // Radio Electronics,
Computer Science, Control. – 2018. – № 3 (46). – P. 98–105. DOI: 10.15588/1607-3274-2018-3-11.
21. Teklehaymanot F. K. Novel Bayesian cluster enumeration criterion for cluster analysis with finite sample penalty term / F. K. Teklehaymanot, M. Muma, A. M. Zoubir // 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, 15–20 April 2018 – Calgary: ICASSP,
2018. – P. 4274–4278. DOI:10.1109/ICASSP.2018.8462172
22. A self-adaptive fuzzy c-means algorithm for determining the optimal number of clusters / [M. Ren, L. Peiyu, W. Zhihao, Y. Jing // Computational intelligence and neuroscience. – 2016. – Vol. 2016. – P. 1–12.






Copyright (c) 2020 N. E. Kondruk

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
National University "Zaporizhzhia Polytechnic", 
Zhukovskogo street, 64, Zaporizhzhia, 69063, Ukraine. 
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
E-mail: rvv@zntu.edu.ua

The reference to the journal is obligatory in the cases of complete or partial use of its materials.