CREDIBILISTIC ROBUST ONLINE FUZZY CLUSTERING IN DATA STREAM MINING TASKS
DOI:
https://doi.org/10.15588/1607-3274-2023-3-10Keywords:
fuzzy clustering, distorted data, credibilistic fuzzy clustering, Data Stream Mining, robust functionAbstract
Context. The task of clustering-classification without a teacher of data arrays occupies an important place in the general problem of Data Mining, and for its solution there exists currently many approaches, methods and algorithms. There are quite a lot of situations where the real data to be clustered are corrupted with anomalous outliers or disturbances with non-Gaussian distributions. It is clear that “classical” methods of artificial intelligence (both batch and online) are ineffective in this situation. The goal of the paper is to develop a credibilistic robust online fuzzy clustering method that combines the advantages of credibilistic and robust approaches in fuzzy clustering tasks.
Objective. The goal of the work is online credibilistic fuzzy clustering of distorted data, using of credibility theory in data stream mining.
Method. The procedure of fuzzy clustering of data using credibilistic approach based on the use of both robust goal functions of a special type, insensitive to outliers and designed to work both in batch and its recurrent online version designed to solve Data Stream Mining problems when data are fed to processing sequentially in real time.
Results. Analyzing the obtained results overall accuracy of clustering methods and algorithm, proposed method similar with result of credibilistic fuzzy clustering method, but has time superiority regardless of the number observations that fed on clustering process.
Conclusions. The problem of fuzzy clustering of data streams contaminated by anomalous non-Gaussian distributions is considered. A recurrent credibilistic online algorithm based on the objective function of a special form is introduced, which suppresses these outliers by using the hyperbolic tangent function, which, in addition to neural networks, is used in robust estimation tasks. The proposed algorithm is quite simple in numerical implementation and is a generalization of some well-known online fuzzy clustering procedures intended for solving Data Stream Mining problems.
References
Gan G., Ma Ch., Wu J. Data Clustering: Theory, Algorithms and Applications. Philadelphia, Pennsylvania: SIAM: 2007. – 455 p. doi: https://doi.org/10.1137/ 1.9780898718348
Abony J., Feil D. Cluster Analysis for Data Mining and System Identification. Basel, Birkhouser, 2007, 303 p.
Xu R., Wunsch D. C. Clustering. Hoboken N.J., John Wiley & Sons, Inc., 2009, 398 p.
Bezdek J. C. Pattern recognition with fuzzy objective function algorithms. New York, Springer, 1981, 253 p. DOI https://doi.org/10.1007/978-1-4757-0450-1.
Höppner F., Klawonn F., Kruse R., Runkler T. Fuzzy Clustering Analysis: Methods for Classification, Data Analysis and Image Recognition. Chichester, John Wiley &Sons, 1999, 300 p.
Zhou J., Wang Q., Hung C.-C., Yi X. Credibilistic clustering: the model and algorithms, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2015, Vol. 23, № 4, pp. 545–564. DOI: https://doi.org/ 10.1142/S0218488515500245
Zhou J., Wang Q., Hung C. C. Credibilistic clustering algorithms via alternating cluster estimation, Journal of Intelligent Manufacturing, 2017, Vol. 28, pp. 727–738. DOI: https://doi.org/10.1007/s10845-014-1004-6.
Tsuda K., Senda S., Minoh M., Ikeda K. Sequential fuzzy cluster extraction and its robust against noise, System and Computers in Japan, 1997, 28, pp. 10–17.
Höppner F., Klawonn F. Fuzzy clustering of sampled functions, 19th Int. Conf. North American Fuzzy Information Processing Society (NAFIPS). Atlanta, USA, 2000, pp. 257– 255.
Georgieva O., Klawonn F. A clustering algorithm for identification of single clusters in large data sets, Proc. 11th East – West Fuzzy Coll. Zittau/Görlitz, FH, 2004, pp. 118–125.
Kohonen T. Self-Organizing Maps. Berlin, Springer, 1995, 362 p. DOI: 10.1007/978-3-642-56927-2.
Park D. C., Dagger I. Gradient based fuzzy c-means (GBFCM) algorithm, IEEE International Conference on Neural Networks, 28 June – 2July,1984, proceedings. Orlando, IEEE, 1984, pp. 1626–1631. DOI: 10.1109 / ICNN. 1994.374399.
Bodyanskiy Ye. Computational intelligence techniques for data analysis, Lecture Notes in Informatics. Bonn, Gesellschaft für Informatik, 2005, pp. 15–36.
Shafronenko A., Bodyanskiy Ye., Klymova I., Holovin O. Online credibilistic fuzzy clustering of data using membership functions of special type [Electronic resource], Proceedings of The Third International Workshop on Computer Modeling and Intelligent Systems (CMIS-2020), April 27–1 May 2020. Zaporizhzhia, 2020. Access mode: http://ceur-ws.org/Vol-2608/paper56.pdf.
Shafronenko A., Bodyanskiy Ye., Pliss I., Klymova I. Online Credibilistic Fuzzy Clustering Method Based on Cauchy Density Distribution Function, 2021 11th International Conference on Advanced Computer Information Technologies (ACIT): proceedings. Deggendorf, Germany, IEEE, 2021, pp. 704–707. DOI: 10.1109/ ACIT52158.2021.9548572
Bodyanskiy Ye., Gorshkov Ye., Kokshenev I., Kolodyazhniy V. Robust recursive fuzzy clustering algorithms, Proc. 12th East West Fuzzy Coll 2005. Zittau-Grölitz, FH, 2005, pp. 301–308.
Bodyanskiy Ye. Gorshkov Ye., Kokshenev I., Kolodyazhniy V. Outlier resistant recursive fuzzy clustering algorithms, Ed. By B. Reusch «Computational Intelligence Theory and Applications» – Advances in Soft Computing, Vol. 38. Berlin Heidelberg, Springer Verlag, 2006, pp. 647– 652.
Arrow K. J., Hurwitz L., Uzawa H. Studies in Linear and Nonlinear Programming. Stanford University Press, 1958, 242 p.
Bodyanskiy Ye., Kolodyazhniy V., Stephan A. Recurcive fuzzy clustering algorithm, Proc 10th East West Fuzzy Coll, 2002. Zittau-Görlitz, HS, 2002, pp. 276–283.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 А. Ю. Шафроненко, Н. B. Касаткіна, Є. В. Бодянський, Є. О. Шафроненко
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.