DATA CLUSTERING BASED ON INDUCTIVE LEARNING OF NEURO-FUZZY NETWORK WITH DISTANCE HASHING
DOI:
https://doi.org/10.15588/1607-3274-2022-4-6Keywords:
cluster analysis, neuro-fuzzy network, hash, fuzzy inference, data analysisAbstract
Context. Cluster analysis is widely used to analyze data of various nature and dimensions. However, the known methods of cluster analysis are characterized by low speed and are demanding on computer memory resources due to the need to calculate pairwise distances between instances in a multidimensional feature space. In addition, the results of known methods of cluster analysis are difficult for human perception and analysis with a large number of features.
Objective. The purpose of the work is to increase the speed of cluster analysis, the interpretability of the resulting partition into clusters, as well as to reduce the requirements of cluster analysis to computer memory.
Method. A method for cluster analysis of multidimensional data is proposed, which for each instance calculates its hash based on the distance to the conditional center of coordinates, uses a one-dimensional coordinate along the hash axis to determine the distances between instances, considers the resulting hash as a pseudo-output feature, breaking it into intervals, which matches the labels pseudo-classes – clusters, having received a rough crisp partition of the feature space and sample instances, automatically generates a partition of input features into fuzzy terms, determines the rules for referring instances to clusters and, as a result, forms a fuzzy inference system of the Mamdani-Zadeh classifier type, which is further trained in the form of a neuro-fuzzy network to ensure acceptable values of the clustering quality functional. This makes it possible to reduce the number of terms and features used, to evaluate their contribution to making decisions about assigning instances to clusters, to increase the speed of data cluster analysis, and to increase the interpretability of the resulting data splitting into clusters.
Results. The mathematical support for solving the problem of cluster data analysis in conditions of large data dimensions has been developed. The experiments confirmed the operability of the developed mathematical support have been carried out.
Conclusions. . The developed method and its software implementation can be recommended for use in practice in the problems of analyzing data of various nature and dimensions.
References
Everitt B., Landau S., Morven L. et al. Cluster analysis. Chichester, Wiley, 2011, 330 p.
Aggarwal С., Reddy С., Chandan K.. eds. Data Clustering : Algorithms and Applications. New York, Chapman and Hall/CRC, 2016, 652 p.
Huang Z. Extensions to the k-means algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery, 1998, Vol. 2, Issue 3, pp. 283–304. DOI:10.1023/A:1009769707641.S2CID 11323096.
Ng R., Han J. Efficient and effective clustering method for spatial data mining, 20th International Conference on Very Large Data Bases (VLDB'94), September 12–15, 1994, Santiago, Chile, proceedings. Burlington, Morgan Kaufmann, 1994, pp. 144–155.
Bailey K. D. Typologies and Taxonomies: An Introduction to Classification Techniques. London, Sage Publications, 1994, 96 p.
Gordon A.D. Classification. Boca Raton, Chapman & Hall/CRC, 1999, 256 p.
Romesburg C. H. Cluster Analysis for Researchers. Belmont, Lifetime Learning Publications, 1984, 334 p.
Aldenderfer M. S., Blashfield R. K. Cluster Analysis. London, Sage Publications, 1984, 88 p.
Meilă, M. Comparing Clusterings by the Variation of Information, Lecture Notes in Computer Science, 2003, Vol. 2777, pp. 173–187. DOI:10.1007/978-3-540-451679_14.
Kraskov A., Stögbauer H., Andrzejak R. G., Grassberger P. Peter Hierarchical Clustering Based on Mutual Information, [Electronic resource]. Access mode: https://arxiv.org/abs/qbio/0311039.
Frey B. J., Dueck D. Clustering by Passing Messages Between Data Points, Science, 2007,Vol. 315, № 5814, pp. 972–976. DOI: 10.1126/science.1136800.
Pfitzner D., Leibbrandt R., Powers D. Characterization and evaluation of similarity measures for pairs of clusterings, Knowledge and Information Systems, 2009, Vol. 19, № 3, pp. 361–394. DOI:10.1007/s10115-008-0150-6.
Dunn J. Well separated clusters and optimal fuzzy partitions, Journal of Cybernetics, 1974, № 4, pp. 95– 104. DOI:10.1080/01969727408546059.
Rand W. M. Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, 1971, Vol. 66 (336), pp. 846–850. DOI:10.2307/2284239.
Hubert L., Arabie P. Comparing partitions, Journal of Classification, 1985, Vol. 2, pp. 193–218. DOI:10.1007/BF01908075.S2CID189915041
Di Marco A., Navigli R. Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction, Computational Linguistics, 2013, Vol. 39, № 3, pp. 709– 754. DOI:10.1162/COLI_a_00148. S2CID 1775181.
Arnott R. D. Cluster Analysis and Stock Price Comovement, Financial Analysts Journal,1980,Vol. 36, № 6, pp. 56–62. DOI: 10.2469/faj.v36.n6.56. ISSN 0015-198X.
Dunn J. C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters, Journal of Cybernetics, 1973, Vol. 3, Issue 3, pp. 32–57. DOI:10.1080/01969727308546046.
Ahmed M., Yamany S., Mohamed N., Farag A., Moriarty T. A Modified Fuzzy C-Means Algorithm for Bias Field Estimation and Segmentation of MRI Data, IEEE Transactions on Medical Imaging, 2002, Vol. 21, № 3, pp. 193–199. DOI:10.1109/42.996338.
Abonyi J., Feil B. Cluster Analysis for Data Mining and System Identification. Berlin, Birkhäuser Verlag, 2007, 306 p. DOI: 10.1007/978-3-7643-7988-9
Höppner F., Klawonn F., Kruse R., Runkler T. Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition. Chichester, John Wiley & Sons, 1999, 304 p.
Miyamoto S. Fuzzy Sets in Information and Retrieval and Cluster Analysis. Dordrecht, Kluwer Academic Publishers, 1990, 274 p.
Banerjee T. Day or Night Activity Recognition From Video Using Fuzzy Clustering Techniques, IEEE Transactions on Fuzzy Systems, 2014, Vol. 22, Issue 3, pp. 483–493. DOI: 10.1109/TFUZZ.2013.2260756.
Valente de Oliveira J., Pedrycz W. eds. Advances in Fuzzy Clustering and its Applications. Chichester, John Wiley & Sons, 2007, 454 p.
Bezdek J. C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York, Plenum Press, 1981, 272 p. DOI: 10.1007/978-1-4757-0450-1
Dumitrescu D., Lazzerini B., Jain L. C. Fuzzy Sets and Applications to Clustering and Training. Boca Raton, CRC Press, 2000, 664 p.
Achtert E., Böhm С., Kriegel H.-P., Kröger P., MüllerGorman I., Zimek A. Finding Hierarchies of Subspace Clusters, Lecture Notes in Computer Science, 2006, Vol. 4213, pp. 446–453. DOI:10.1007/11871637_42. ISBN 978-3-54045374-1.
Achtert E., Böhm C., Kriegel H. P., Kröger P., MüllerGorman I., Zimek A. Detection and Visualization of Subspace Cluster Hierarchies, Lecture Notes in Computer Science, 2007, Vol. 4443, pp. 152–163. DOI:10.1007/978-3540-71703-4_15.
Johnson S. C. Hierarchical clustering schemes, Psychometrika, 1967, Vol. 32, № 3, pp. 241– 254. DOI:10.1007/BF02289588
Jafari O., Maurya P., Nagarkar P., Islam K. M., Crushev C. A Survey on Locality Sensitive Hashing Algorithms and their Applications [Electronic resource]. Access mode: https://arxiv.org/pdf/2102.08942
Buhler J. Efficient large-scale sequence comparison by locality-sensitive hashing, Bioinformatics, 2001, Vol. 17, № 5, pp. 419–428.
Zhao K., Lu H., Mei J. Locality Preserving Hashing, Twenty-Eighth AAAI Conference on Artificial Intelligence, 27–31 July 2014, Québec, proceedings. Palo Alto, AAAI Press, 2014, pp. 2874–2880.
Tsai Y.-H., Yang M.-H. Locality preserving hashing, 2014 IEEE International Conference on Image Processing (ICIP), Paris, 27–30 of October 2014, proceedings. Los Alamitos, IEEE, 2014, pp. 2988–2992. DOI: 10.1109/ICIP.2014.7025604.
Weinberger K., Dasgupta A., Langford J., et al. Feature Hashing for Large Scale Multitask Learning, 26th Annual International Conference on Machine Learning (ICML '09) Montreal, June 2009, proceedings. New York, ACM, 2009, pp. 1113–1120. DOI: 10.1145/1553374.1553516
Wolfson H. J., Rigoutsos I. Geometric Hashing: An Overview, IEEE Computational Sci-ence and Engineering, 1997, Vol. 4, No 4, pp. 10–21.
Fast supervised discrete hashing / [J. Gui, T. Liu, Z. Sunet et al.] // IEEE Transactions on Pattern Analysis and Machine Intelligence. – 2017. – Vol. 40, No 2. – P. 490–496. DOI: 10.1109/TPAMI.2017.2678475
Aluç, G., Özsu M., Daudjee K. Building self-clustering RDF databases using Tunable-LSH, The VLDB Journal, 2018, Vol. 28, № 2, pp. 173–195. DOI:10.1007/s00778-018-05309
Pauleve L., Jegou H., Amsaleg L. Locality sensitive hashing: A comparison of hash function types and querying mechanisms, Pattern Recognition Letters, 2010, Vol. 31, № 11, pp. 1348–1358. DOI:10.1016/j.patrec.2010.04.004.
Andoni A., Indyk P. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, Communications of the ACM, 2008, Vol. 51, № 1, pp. 117– 122. DOI:10.1145/1327452.1327494.
Salakhutdinov R., Hinton G. Semantic hashing, International Journal of Approximate Reasoning, 2008, Vol. 50, № 7, pp. 969–978. DO:10.1016/j.ijar.2008.11.006.
Dahlgaard S., Knudsen M., Thorup M. Fast similarity sketching, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), 15–17 October 2017, Berkeley. Los Alamitos: IEEE, 2017, pp. 663–671. DOI: 10.1109/FOCS.2017.67
Chin A. Locality-preserving hash functions for general purpose parallel computation, Algorithmica, 1994, Vol. 12, Issue 2–3, pp. 170–181. DOI: 10.1007/BF01185209. S2CID 18108051.
Subbotin S. A. Methods and characteristics of localitypreserving transformations in the problems of computational intelligence, Radio Electronics, Computer Science, Control, 2014, № 1, pp. 120–128. DOI: 10.15588/1607-3274-2014-117
Subbotin S. A., Blagodarev A. Yu., Gofman Ye. A. The neuro-fuzzy diagnostic model synthesis with hashed transformation in the sequence and parallel mode, Radio Electronics, Computer Science, Control, 2017, No. 1, pp. 56–65. DOI: 10.15588/1607-3274-2017-1-7
Subbotin S. A. The polar coordinates based hashing for data dimensionality reduction, Radio Electronics, Computer Science, Control, 2020, № 4, pp. 118–128. DOI: 10.15588/1607-3274-2020-4-12
Subbotin S. A., Oleynik A. A. Analiz preobrazovaniy dlya proyetsi-rovaniya dannykh na obobshchonnuyu os’ v zadachakh raspoznavaniya obrazov, Shtuchniy íntelekt, 2010, № 1, pp. 114–121.
Subbotin S. A. Constructed features for automatic classification of stationary timing signals, Radio Electronics, Computer Science, Control, 2012, № 1, pp. 96–103. DOI: 10.15588/1607-3274-2012-1-19
Subbotin S. A. The complex data dimensionality reduction for diagnostic and recognition model building on precedents, Radio Electronics, Computer Science, Control, 2016, No. 4. pp. 70–76. DOI: 10.15588/1607-3274-2016-4-9
Subbotin S. A. Evaluation of informativity and selection of instances based on hashing, Radio Electronics, Computer Science, Control, 2020, № 3, pp. 129–137. DOI: 10.15588/1607-3274-2020-3-12
Oliinyk A., Subbotin S., Lovkin V., Blagodariov O., Zaiko T. The system of criteria for feature informativeness estimation in pattern recognition, Radio Electronics, Computer Science, Control, 2017, № 4, pp. 85–96. DOI: 10.15588/1607-3274-2017-4-10
Subbotin S. eds.: Bris R., Majernik J., Pancerz K., Zaitseva E. The instance and feature selection for neural network based diagnosis of chronic obstructive bronchitis, Applications of Computational Intelligence in Biomedical Technology. Cham, Springer, 2016, pp. 215–228. (Studies in Computational Intelligence, Vol. 606).
Subbotin S. The neuro-fuzzy network synthesis and simplification on precedents in problems of diagnosis and pattern recognition, Optical Memory and Neural Networks (Information Optics), 2013, Vol. 22, № 2, pp. 97–103.
Subbotin S., Oliinyk A. The Fully-Defined Neuro-Fuzzy Model Synthesis, Data Stream Mining & Processing (DSMP), 2016 IEEE First International Conference, Lviv, 23–27 August 20016, proceedings. Lviv: NU “Lvivska Politeckhnika”, 2016. – P. 9–14.
Oliinyk A. O., Skrupsky S. Yu., Subbotin S. A. Using parallel random search to train fuzzy neural networks, Automatic Control and Computer Sciences, 2014, Vol. 48, №. 6, pp. 313–323.
Subbotin S. The method of a structural-parametric synthesis of neuro-fuzzy diagnostic model based on the hybrid stochastic search, The experience of designing and application of CAD systems in microelectronics : XI International conference CADSM–2011, Polyana–Svalyava, 23–25 February 2011, proceedings. Lviv, NU “Lviv Polytechnic”, 2011, pp. 248–249.
Subbotin S. A. Building a fully defined neuro-fuzzy network with a regular partition of a feature space based on large sample, Radio Electronics, Computer Science, Control, 2016, № 3, pp. 47–53. DOI: 10.15588/1607-3274-2016-3-6
Halgamuge S. K. A trainable transparent universal approximator for defuzzification in Mamdani-type neuro-fuzzy controllers, IEEE Transactions on Fuzzy Systems, Vol. 6, № 2. pp. 304–314. DOI: 10.1109/91.669031.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 S. A. Subbotin
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.