IMPLEMENTATION OF DBSCAN CLUSTERING ALGORITHM WITHIN THE FRAMEWORK OF THE OBJECTIVE CLUSTERING INDUCTIVE TECHNOLOGY BASED ON R AND KNIME TOOLS
DOI:
https://doi.org/10.15588/1607-3274-2019-1-8Keywords:
Objective clustering, clustering quality criteria, inductive modelling, DBSCAN clustering algorithmAbstract
Context. The problem of the data clustering within the framework of the objective clustering inductive technology is considered. Practical implementation of the obtained hybrid model based on the complex use of R and KNIME tools is performed. The object of the study is the hybrid model of the data clustering based on the complex use of both DBSCAN clustering algorithm and the
objective clustering inductive technology.
Objective. The aim of the work is the creation of the hybrid model of the objective clustering based on DBSCAN clustering algorithm and its practical implementation on the basis of the complex use of both R and KNIME tools.
Method. The inductive methods of complex systems modelling have been used as the basis to determine the optimal parameters of DBSCAN clustering algorithm within the framework of the objective clustering inductive technology. The practical
implementation of this technology involves: the use of two equal power subsets, which contain the same quantity of pairwise similar objects; calculation of the internal and the external clustering quality criteria; calculation of the complex balance criterion, maximum value of which corresponds to the best clustering in terms of the used criteria. Implementation of this process involves two main
stages. Firstly, the optimal values of the EPS parameter were determined at each step within the range of the minPts value changes. The charts of the complex balance criterion versus the EPS value were obtained for each minPts value as the results of this stage implementation. Then, the analysis of the obtained intermediate results was performed in order to determine the optimal solution,
which corresponds to both the maximum value of the complex balance criterion on the one side and the aims of the current clustering on the other side.
Results. The developed hybrid model has been implemented based on software KNIME with the use of plugins, which have been
written in software R. The efficiency of the model was tasted with the use of the different data: low dimensional data of the computing school of East Finland University; Fisher’s iris; gene expression profiles of the patients, which were investigated on lung cancer.
Conclusions. The results of the simulation have shown high efficiency of the proposed method. The studied objects were distributed into clusters correctly in all cases. The proposed method allows us to decrease the reproducibility error, since the solution concerning determination of the clustering algorithm optimal parameters was taken based on both the clustering results obtained on
equal power subsets separately and the difference of the clustering results obtained on the two equal power subsets.
References
Madala H. R., Ivakhnenko A. G. Inductive learning
algorithms for complex systems modeling. CRC Press,
, 365 p.
Soni N., Ganatra A. Categorization of several clustering
algorithms from different perspective: a review,
International Journal of Advanced Research in Computer
Science and Software Engineering, 2012, Vol. 2, Issue 8,
pp. 63–68.
Stepashko V., Bulgakova O. , Zosimov V. Construction and
research of the generalized iterative GMDH algorithm with
active neurons, Advances in Intelligent Systems and
Computing II, 2018, pp. 492–510. DOI: 10.1007/978-3-319-
-1_35
Bulgakova O., Stepashko V., Zosimov V. Numerical study
of the generalized iterative algorithm GIA GMDH with
active neurons, Proceedings of the 12th International
Scientific and Technical Conference on Computer Sciences
and Information Technologies, 2017, 1, art. no. 8098836,
pp. 496–500. DOI: 10.1109/STC-CSIT.2017.8098836
Osypenko V. V., Reshetjuk V. M. The methodology of
inductive system analysis as a tool of engineering researches
analytical planning, Ann. Warsaw Univ. Life Sci, 2011,
SGGW. No. 58, pp. 67–71. [Electronic resource]. Access
mode: http://annals-wuls.sggw.pl/?q=node/234
[Babichev S. , Taif M. A. , Lytvynenko V. , Korobchinskyi
M. ] Objective clustering inductive technology of gene
expression sequences features, Communications in
Computer and Information Science: In the book “Beyond
Databases, Architectures and Structures”, edited by
S. Kozelski and D. Mrozek, 2017, pp. 359–372.
Babichev S. , Taif M. A. , Lytvynenko V. , Osypenko V.
Criterial analysis of gene expression sequences to create the
objective clustering inductive technology, Proceeding of the
IEEE 37th International Conference on Electronics
and Nanitechnology (ELNANO), 2017, pp. 244–249.
Babichev S., Taif M. A. , Lytvynenko V. Inductive model of
data clustering based on the agglomerative hierarchical
algorithm, Proceeding of the 2016 IEEE First International
Conference on Data Stream Mining and Processing (DSMP,
, pp. 19–22. [Electronic resource]. Access mode:
http://ieeexplore.ieee.org/document/7583499/
Babichev S., Taif M. A. , Lytvynenko V. Estimation of the
inductive model of objects clustering stability based on the
k-means algorithm for different levels of data noise, Radio
Electronics, Computer Science, Control, 2016, No. 4,
pp. 54–60.
Puchala D., Yatsymirskyy M. M. Joint compression and
encryption of visual data using orthogonal parametric
transforms, Bulletin of the Polish Academy of Sciences-
Technical Sciences, 2016, Vol. 64, Issue 2, pp. 373–382.
Rashkevych Y., Peleshko D., Vynokurova O., Izonin I.,
Lotoshynska N. Single-frame image super-resolution based
on singular square matrix operator, 1st IEEE Ukraine
Conference on Electrical and Computer Engineering
(UKRCON), MAY 29-JUN 02, 2017, pp. 944–948.
Ivakhnenko A. Group method of data handling as
competitor to the method of stochastic approximation, Soviet
Automatic Control, 1968, Vol. 3, pp. 64–78.
Calinski T., Harabasz J. A dendrite method for cluster
analysis, Communication in Statistics, 1974, Vol. 3,
pp. 1-27.
Zhao Q. Xu M., Fränti P. Sum-of-squares based cluster
validity index and significance analysis, Proceeding of
International Conference on Adaptive and Natural
Computing Algorithms, 2009, pp. 313–322.
Babichev S. , Krejci J. , Bicanek J. , Lytvynenko V. ] Gene
expression sequences clustering based on the internal and
external clustering quality criteria, Proceedings of the 12th
International Scientific and Technical Conference on
Computer Sciences and Information Technologies, 2017, 1,
art. no. 8098744, pp. 91–94. DOI: 10.1109/STCCSIT.
8098744.
Harrington J. The desirability function, Industrial Quality
Control, 1965, Vol. 21(10), pp. 494–498. [Electronic
resource]. Access mode: http://asq.org/qic/displaytem/?
item=4860.
Ester M., Kriegel H. , Sander J. A density-based algorithm
for discovering clusters in large spatial databases,
Proceedings of the Second International Conference on
Knowledge Discovery and Data Mining, 1996, pp. 226–231.
Ester M. , Kriegel H. , Sander J. , Xu X. ] A density-based
algorithm for discovering clusters in large spatial databases
with noise, KDD-1996 : proceedings, 1996, pp. 226–231.
Kriegel H.-P., Kröger P., Sander J., Zimek A. Density-based
clustering, Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery, 2011, No. 1(3), pp. 231–240.
[Electronic resource]. Access mode:
https://www.knime.com/ knime-software/knime-analyticsplatform
Ihaka R., Gentleman R. R: a linguage for data analysis and
graphics, Journal of Computational and Graphical
Statistics, 1996, Vol. 5(3), pp. 299–314.
Gionis A., Mannila H. , Tsaparas P. Clustering aggregation,
ACM Transactions on Knowledge Discovery from Data
(TKDD), 2007, Vol. 1(1), pp. 1–30.
Zahn C. T. Graph-theoretical methods for detecting and
describing gestalt clusters, IEEE Transactions on
Computers, 1971, Vol. 100(1), pp. 68–86.
[Electronic resource]. Access mode:
http://www.sthda.com/english/rpkgs/factoextra
Jain A., Law M. Data clustering: A user’s dilemma, Lecture
Notes in Computer Science, 2005, Vol. 3776, pp. 1-10.
Fisher R. A. The Use of Multiple Measurements in
Taxonomic Problems, Annals of Eugenics, 1936, Vol. 7,
pp. 179-188.
Beer D. G. , Kardia S. L. and al Gene-expression profiles
predict survival of patients with lung adenocarcinoma,
Nature Medicine, 2002, Vol. 8(8), pp. 816–824
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2019 S. Babichev, S. Vyshemyrska, V. Lytvynenko
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.