DATA CLUSTERING BASED ON INDUCTIVE LEARNING OF NEURO-FUZZY NETWORK WITH DISTANCE HASHING

Authors

  • S. A. Subbotin National University “Zaporizhzhia Polytechnic”, Zaporizhzhia, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2022-4-6

Keywords:

cluster analysis, neuro-fuzzy network, hash, fuzzy inference, data analysis

Abstract

Context. Cluster analysis is widely used to analyze data of various nature and dimensions. However, the known methods of cluster analysis are characterized by low speed and are demanding on computer memory resources due to the need to calculate pairwise distances between instances in a multidimensional feature space. In addition, the results of known methods of cluster analysis are difficult for human perception and analysis with a large number of features.

Objective. The purpose of the work is to increase the speed of cluster analysis, the interpretability of the resulting partition into clusters, as well as to reduce the requirements of cluster analysis to computer memory.

Method. A method for cluster analysis of multidimensional data is proposed, which for each instance calculates its hash based on the distance to the conditional center of coordinates, uses a one-dimensional coordinate along the hash axis to determine the distances between instances, considers the resulting hash as a pseudo-output feature, breaking it into intervals, which matches the labels pseudo-classes – clusters, having received a rough crisp partition of the feature space and sample instances, automatically generates a partition of input features into fuzzy terms, determines the rules for referring instances to clusters and, as a result, forms a fuzzy inference system of the Mamdani-Zadeh classifier type, which is further trained in the form of a neuro-fuzzy network to ensure acceptable values of the clustering quality functional. This makes it possible to reduce the number of terms and features used, to evaluate their contribution to making decisions about assigning instances to clusters, to increase the speed of data cluster analysis, and to increase the interpretability of the resulting data splitting into clusters.

Results. The mathematical support for solving the problem of cluster data analysis in conditions of large data dimensions has been developed. The experiments confirmed the operability of the developed mathematical support have been carried out.

Conclusions. . The developed method and its software implementation can be recommended for use in practice in the problems of analyzing data of various nature and dimensions.

Author Biography

S. A. Subbotin, National University “Zaporizhzhia Polytechnic”, Zaporizhzhia, Ukraine

Dr. Sc., Professor, Head of the Department of Software Tools

References

Everitt B., Landau S., Morven L. et al. Cluster analysis. Chichester, Wiley, 2011, 330 p.

Aggarwal С., Reddy С., Chandan K.. eds. Data Clustering : Algorithms and Applications. New York, Chapman and Hall/CRC, 2016, 652 p.

Huang Z. Extensions to the k-means algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery, 1998, Vol. 2, Issue 3, pp. 283–304. DOI:10.1023/A:1009769707641.S2CID 11323096.

Ng R., Han J. Efficient and effective clustering method for spatial data mining, 20th International Conference on Very Large Data Bases (VLDB'94), September 12–15, 1994, Santiago, Chile, proceedings. Burlington, Morgan Kaufmann, 1994, pp. 144–155.

Bailey K. D. Typologies and Taxonomies: An Introduction to Classification Techniques. London, Sage Publications, 1994, 96 p.

Gordon A.D. Classification. Boca Raton, Chapman & Hall/CRC, 1999, 256 p.

Romesburg C. H. Cluster Analysis for Researchers. Belmont, Lifetime Learning Publications, 1984, 334 p.

Aldenderfer M. S., Blashfield R. K. Cluster Analysis. London, Sage Publications, 1984, 88 p.

Meilă, M. Comparing Clusterings by the Variation of Information, Lecture Notes in Computer Science, 2003, Vol. 2777, pp. 173–187. DOI:10.1007/978-3-540-451679_14.

Kraskov A., Stögbauer H., Andrzejak R. G., Grassberger P. Peter Hierarchical Clustering Based on Mutual Information, [Electronic resource]. Access mode: https://arxiv.org/abs/qbio/0311039.

Frey B. J., Dueck D. Clustering by Passing Messages Between Data Points, Science, 2007,Vol. 315, № 5814, pp. 972–976. DOI: 10.1126/science.1136800.

Pfitzner D., Leibbrandt R., Powers D. Characterization and evaluation of similarity measures for pairs of clusterings, Knowledge and Information Systems, 2009, Vol. 19, № 3, pp. 361–394. DOI:10.1007/s10115-008-0150-6.

Dunn J. Well separated clusters and optimal fuzzy partitions, Journal of Cybernetics, 1974, № 4, pp. 95– 104. DOI:10.1080/01969727408546059.

Rand W. M. Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, 1971, Vol. 66 (336), pp. 846–850. DOI:10.2307/2284239.

Hubert L., Arabie P. Comparing partitions, Journal of Classification, 1985, Vol. 2, pp. 193–218. DOI:10.1007/BF01908075.S2CID189915041

Di Marco A., Navigli R. Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction, Computational Linguistics, 2013, Vol. 39, № 3, pp. 709– 754. DOI:10.1162/COLI_a_00148. S2CID 1775181.

Arnott R. D. Cluster Analysis and Stock Price Comovement, Financial Analysts Journal,1980,Vol. 36, № 6, pp. 56–62. DOI: 10.2469/faj.v36.n6.56. ISSN 0015-198X.

Dunn J. C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters, Journal of Cybernetics, 1973, Vol. 3, Issue 3, pp. 32–57. DOI:10.1080/01969727308546046.

Ahmed M., Yamany S., Mohamed N., Farag A., Moriarty T. A Modified Fuzzy C-Means Algorithm for Bias Field Estimation and Segmentation of MRI Data, IEEE Transactions on Medical Imaging, 2002, Vol. 21, № 3, pp. 193–199. DOI:10.1109/42.996338.

Abonyi J., Feil B. Cluster Analysis for Data Mining and System Identification. Berlin, Birkhäuser Verlag, 2007, 306 p. DOI: 10.1007/978-3-7643-7988-9

Höppner F., Klawonn F., Kruse R., Runkler T. Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition. Chichester, John Wiley & Sons, 1999, 304 p.

Miyamoto S. Fuzzy Sets in Information and Retrieval and Cluster Analysis. Dordrecht, Kluwer Academic Publishers, 1990, 274 p.

Banerjee T. Day or Night Activity Recognition From Video Using Fuzzy Clustering Techniques, IEEE Transactions on Fuzzy Systems, 2014, Vol. 22, Issue 3, pp. 483–493. DOI: 10.1109/TFUZZ.2013.2260756.

Valente de Oliveira J., Pedrycz W. eds. Advances in Fuzzy Clustering and its Applications. Chichester, John Wiley & Sons, 2007, 454 p.

Bezdek J. C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York, Plenum Press, 1981, 272 p. DOI: 10.1007/978-1-4757-0450-1

Dumitrescu D., Lazzerini B., Jain L. C. Fuzzy Sets and Applications to Clustering and Training. Boca Raton, CRC Press, 2000, 664 p.

Achtert E., Böhm С., Kriegel H.-P., Kröger P., MüllerGorman I., Zimek A. Finding Hierarchies of Subspace Clusters, Lecture Notes in Computer Science, 2006, Vol. 4213, pp. 446–453. DOI:10.1007/11871637_42. ISBN 978-3-54045374-1.

Achtert E., Böhm C., Kriegel H. P., Kröger P., MüllerGorman I., Zimek A. Detection and Visualization of Subspace Cluster Hierarchies, Lecture Notes in Computer Science, 2007, Vol. 4443, pp. 152–163. DOI:10.1007/978-3540-71703-4_15.

Johnson S. C. Hierarchical clustering schemes, Psychometrika, 1967, Vol. 32, № 3, pp. 241– 254. DOI:10.1007/BF02289588

Jafari O., Maurya P., Nagarkar P., Islam K. M., Crushev C. A Survey on Locality Sensitive Hashing Algorithms and their Applications [Electronic resource]. Access mode: https://arxiv.org/pdf/2102.08942

Buhler J. Efficient large-scale sequence comparison by locality-sensitive hashing, Bioinformatics, 2001, Vol. 17, № 5, pp. 419–428.

Zhao K., Lu H., Mei J. Locality Preserving Hashing, Twenty-Eighth AAAI Conference on Artificial Intelligence, 27–31 July 2014, Québec, proceedings. Palo Alto, AAAI Press, 2014, pp. 2874–2880.

Tsai Y.-H., Yang M.-H. Locality preserving hashing, 2014 IEEE International Conference on Image Processing (ICIP), Paris, 27–30 of October 2014, proceedings. Los Alamitos, IEEE, 2014, pp. 2988–2992. DOI: 10.1109/ICIP.2014.7025604.

Weinberger K., Dasgupta A., Langford J., et al. Feature Hashing for Large Scale Multitask Learning, 26th Annual International Conference on Machine Learning (ICML '09) Montreal, June 2009, proceedings. New York, ACM, 2009, pp. 1113–1120. DOI: 10.1145/1553374.1553516

Wolfson H. J., Rigoutsos I. Geometric Hashing: An Overview, IEEE Computational Sci-ence and Engineering, 1997, Vol. 4, No 4, pp. 10–21.

Fast supervised discrete hashing / [J. Gui, T. Liu, Z. Sunet et al.] // IEEE Transactions on Pattern Analysis and Machine Intelligence. – 2017. – Vol. 40, No 2. – P. 490–496. DOI: 10.1109/TPAMI.2017.2678475

Aluç, G., Özsu M., Daudjee K. Building self-clustering RDF databases using Tunable-LSH, The VLDB Journal, 2018, Vol. 28, № 2, pp. 173–195. DOI:10.1007/s00778-018-05309

Pauleve L., Jegou H., Amsaleg L. Locality sensitive hashing: A comparison of hash function types and querying mechanisms, Pattern Recognition Letters, 2010, Vol. 31, № 11, pp. 1348–1358. DOI:10.1016/j.patrec.2010.04.004.

Andoni A., Indyk P. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, Communications of the ACM, 2008, Vol. 51, № 1, pp. 117– 122. DOI:10.1145/1327452.1327494.

Salakhutdinov R., Hinton G. Semantic hashing, International Journal of Approximate Reasoning, 2008, Vol. 50, № 7, pp. 969–978. DO:10.1016/j.ijar.2008.11.006.

Dahlgaard S., Knudsen M., Thorup M. Fast similarity sketching, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), 15–17 October 2017, Berkeley. Los Alamitos: IEEE, 2017, pp. 663–671. DOI: 10.1109/FOCS.2017.67

Chin A. Locality-preserving hash functions for general purpose parallel computation, Algorithmica, 1994, Vol. 12, Issue 2–3, pp. 170–181. DOI: 10.1007/BF01185209. S2CID 18108051.

Subbotin S. A. Methods and characteristics of localitypreserving transformations in the problems of computational intelligence, Radio Electronics, Computer Science, Control, 2014, № 1, pp. 120–128. DOI: 10.15588/1607-3274-2014-117

Subbotin S. A., Blagodarev A. Yu., Gofman Ye. A. The neuro-fuzzy diagnostic model synthesis with hashed transformation in the sequence and parallel mode, Radio Electronics, Computer Science, Control, 2017, No. 1, pp. 56–65. DOI: 10.15588/1607-3274-2017-1-7

Subbotin S. A. The polar coordinates based hashing for data dimensionality reduction, Radio Electronics, Computer Science, Control, 2020, № 4, pp. 118–128. DOI: 10.15588/1607-3274-2020-4-12

Subbotin S. A., Oleynik A. A. Analiz preobrazovaniy dlya proyetsi-rovaniya dannykh na obobshchonnuyu os’ v zadachakh raspoznavaniya obrazov, Shtuchniy íntelekt, 2010, № 1, pp. 114–121.

Subbotin S. A. Constructed features for automatic classification of stationary timing signals, Radio Electronics, Computer Science, Control, 2012, № 1, pp. 96–103. DOI: 10.15588/1607-3274-2012-1-19

Subbotin S. A. The complex data dimensionality reduction for diagnostic and recognition model building on precedents, Radio Electronics, Computer Science, Control, 2016, No. 4. pp. 70–76. DOI: 10.15588/1607-3274-2016-4-9

Subbotin S. A. Evaluation of informativity and selection of instances based on hashing, Radio Electronics, Computer Science, Control, 2020, № 3, pp. 129–137. DOI: 10.15588/1607-3274-2020-3-12

Oliinyk A., Subbotin S., Lovkin V., Blagodariov O., Zaiko T. The system of criteria for feature informativeness estimation in pattern recognition, Radio Electronics, Computer Science, Control, 2017, № 4, pp. 85–96. DOI: 10.15588/1607-3274-2017-4-10

Subbotin S. eds.: Bris R., Majernik J., Pancerz K., Zaitseva E. The instance and feature selection for neural network based diagnosis of chronic obstructive bronchitis, Applications of Computational Intelligence in Biomedical Technology. Cham, Springer, 2016, pp. 215–228. (Studies in Computational Intelligence, Vol. 606).

Subbotin S. The neuro-fuzzy network synthesis and simplification on precedents in problems of diagnosis and pattern recognition, Optical Memory and Neural Networks (Information Optics), 2013, Vol. 22, № 2, pp. 97–103.

Subbotin S., Oliinyk A. The Fully-Defined Neuro-Fuzzy Model Synthesis, Data Stream Mining & Processing (DSMP), 2016 IEEE First International Conference, Lviv, 23–27 August 20016, proceedings. Lviv: NU “Lvivska Politeckhnika”, 2016. – P. 9–14.

Oliinyk A. O., Skrupsky S. Yu., Subbotin S. A. Using parallel random search to train fuzzy neural networks, Automatic Control and Computer Sciences, 2014, Vol. 48, №. 6, pp. 313–323.

Subbotin S. The method of a structural-parametric synthesis of neuro-fuzzy diagnostic model based on the hybrid stochastic search, The experience of designing and application of CAD systems in microelectronics : XI International conference CADSM–2011, Polyana–Svalyava, 23–25 February 2011, proceedings. Lviv, NU “Lviv Polytechnic”, 2011, pp. 248–249.

Subbotin S. A. Building a fully defined neuro-fuzzy network with a regular partition of a feature space based on large sample, Radio Electronics, Computer Science, Control, 2016, № 3, pp. 47–53. DOI: 10.15588/1607-3274-2016-3-6

Halgamuge S. K. A trainable transparent universal approximator for defuzzification in Mamdani-type neuro-fuzzy controllers, IEEE Transactions on Fuzzy Systems, Vol. 6, № 2. pp. 304–314. DOI: 10.1109/91.669031.

Downloads

Published

2022-12-09

How to Cite

Subbotin, S. A. (2022). DATA CLUSTERING BASED ON INDUCTIVE LEARNING OF NEURO-FUZZY NETWORK WITH DISTANCE HASHING . Radio Electronics, Computer Science, Control, (4), 71. https://doi.org/10.15588/1607-3274-2022-4-6

Issue

Section

Neuroinformatics and intelligent systems