IMPLEMENTATION OF DBSCAN CLUSTERING ALGORITHM WITHIN THE FRAMEWORK OF THE OBJECTIVE CLUSTERING INDUCTIVE TECHNOLOGY BASED ON R AND KNIME TOOLS
Context. The problem of the data clustering within the framework of the objective clustering inductive technology is considered. Practical implementation of the obtained hybrid model based on the complex use of R and KNIME tools is performed. The object of the study is the hybrid model of the data clustering based on the complex use of both DBSCAN clustering algorithm and the
objective clustering inductive technology.
Objective. The aim of the work is the creation of the hybrid model of the objective clustering based on DBSCAN clustering algorithm and its practical implementation on the basis of the complex use of both R and KNIME tools.
Method. The inductive methods of complex systems modelling have been used as the basis to determine the optimal parameters of DBSCAN clustering algorithm within the framework of the objective clustering inductive technology. The practical
implementation of this technology involves: the use of two equal power subsets, which contain the same quantity of pairwise similar objects; calculation of the internal and the external clustering quality criteria; calculation of the complex balance criterion, maximum value of which corresponds to the best clustering in terms of the used criteria. Implementation of this process involves two main
stages. Firstly, the optimal values of the EPS parameter were determined at each step within the range of the minPts value changes. The charts of the complex balance criterion versus the EPS value were obtained for each minPts value as the results of this stage implementation. Then, the analysis of the obtained intermediate results was performed in order to determine the optimal solution,
which corresponds to both the maximum value of the complex balance criterion on the one side and the aims of the current clustering on the other side.
Results. The developed hybrid model has been implemented based on software KNIME with the use of plugins, which have been
written in software R. The efficiency of the model was tasted with the use of the different data: low dimensional data of the computing school of East Finland University; Fisher’s iris; gene expression profiles of the patients, which were investigated on lung cancer.
Conclusions. The results of the simulation have shown high efficiency of the proposed method. The studied objects were distributed into clusters correctly in all cases. The proposed method allows us to decrease the reproducibility error, since the solution concerning determination of the clustering algorithm optimal parameters was taken based on both the clustering results obtained on
equal power subsets separately and the difference of the clustering results obtained on the two equal power subsets.
Full Text:PDF (Українська)
Madala H. R., Ivakhnenko A. G. Inductive learning
algorithms for complex systems modeling. CRC Press,
, 365 p.
Soni N., Ganatra A. Categorization of several clustering
algorithms from different perspective: a review,
International Journal of Advanced Research in Computer
Science and Software Engineering, 2012, Vol. 2, Issue 8,
Stepashko V., Bulgakova O. , Zosimov V. Construction and
research of the generalized iterative GMDH algorithm with
active neurons, Advances in Intelligent Systems and
Computing II, 2018, pp. 492–510. DOI: 10.1007/978-3-319-
Bulgakova O., Stepashko V., Zosimov V. Numerical study
of the generalized iterative algorithm GIA GMDH with
active neurons, Proceedings of the 12th International
Scientific and Technical Conference on Computer Sciences
and Information Technologies, 2017, 1, art. no. 8098836,
pp. 496–500. DOI: 10.1109/STC-CSIT.2017.8098836
Osypenko V. V., Reshetjuk V. M. The methodology of
inductive system analysis as a tool of engineering researches
analytical planning, Ann. Warsaw Univ. Life Sci, 2011,
SGGW. No. 58, pp. 67–71. [Electronic resource]. Access
[Babichev S. , Taif M. A. , Lytvynenko V. , Korobchinskyi
M. ] Objective clustering inductive technology of gene
expression sequences features, Communications in
Computer and Information Science: In the book “Beyond
Databases, Architectures and Structures”, edited by
S. Kozelski and D. Mrozek, 2017, pp. 359–372.
Babichev S. , Taif M. A. , Lytvynenko V. , Osypenko V.
Criterial analysis of gene expression sequences to create the
objective clustering inductive technology, Proceeding of the
IEEE 37th International Conference on Electronics
and Nanitechnology (ELNANO), 2017, pp. 244–249.
Babichev S., Taif M. A. , Lytvynenko V. Inductive model of
data clustering based on the agglomerative hierarchical
algorithm, Proceeding of the 2016 IEEE First International
Conference on Data Stream Mining and Processing (DSMP,
, pp. 19–22. [Electronic resource]. Access mode:
Babichev S., Taif M. A. , Lytvynenko V. Estimation of the
inductive model of objects clustering stability based on the
k-means algorithm for different levels of data noise, Radio
Electronics, Computer Science, Control, 2016, No. 4,
Puchala D., Yatsymirskyy M. M. Joint compression and
encryption of visual data using orthogonal parametric
transforms, Bulletin of the Polish Academy of Sciences-
Technical Sciences, 2016, Vol. 64, Issue 2, pp. 373–382.
Rashkevych Y., Peleshko D., Vynokurova O., Izonin I.,
Lotoshynska N. Single-frame image super-resolution based
on singular square matrix operator, 1st IEEE Ukraine
Conference on Electrical and Computer Engineering
(UKRCON), MAY 29-JUN 02, 2017, pp. 944–948.
Ivakhnenko A. Group method of data handling as
competitor to the method of stochastic approximation, Soviet
Automatic Control, 1968, Vol. 3, pp. 64–78.
Calinski T., Harabasz J. A dendrite method for cluster
analysis, Communication in Statistics, 1974, Vol. 3,
Zhao Q. Xu M., Fränti P. Sum-of-squares based cluster
validity index and significance analysis, Proceeding of
International Conference on Adaptive and Natural
Computing Algorithms, 2009, pp. 313–322.
Babichev S. , Krejci J. , Bicanek J. , Lytvynenko V. ] Gene
expression sequences clustering based on the internal and
external clustering quality criteria, Proceedings of the 12th
International Scientific and Technical Conference on
Computer Sciences and Information Technologies, 2017, 1,
art. no. 8098744, pp. 91–94. DOI: 10.1109/STCCSIT.
Harrington J. The desirability function, Industrial Quality
Control, 1965, Vol. 21(10), pp. 494–498. [Electronic
resource]. Access mode: http://asq.org/qic/displaytem/?
Ester M., Kriegel H. , Sander J. A density-based algorithm
for discovering clusters in large spatial databases,
Proceedings of the Second International Conference on
Knowledge Discovery and Data Mining, 1996, pp. 226–231.
Ester M. , Kriegel H. , Sander J. , Xu X. ] A density-based
algorithm for discovering clusters in large spatial databases
with noise, KDD-1996 : proceedings, 1996, pp. 226–231.
Kriegel H.-P., Kröger P., Sander J., Zimek A. Density-based
clustering, Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery, 2011, No. 1(3), pp. 231–240.
[Electronic resource]. Access mode:
Ihaka R., Gentleman R. R: a linguage for data analysis and
graphics, Journal of Computational and Graphical
Statistics, 1996, Vol. 5(3), pp. 299–314.
Gionis A., Mannila H. , Tsaparas P. Clustering aggregation,
ACM Transactions on Knowledge Discovery from Data
(TKDD), 2007, Vol. 1(1), pp. 1–30.
Zahn C. T. Graph-theoretical methods for detecting and
describing gestalt clusters, IEEE Transactions on
Computers, 1971, Vol. 100(1), pp. 68–86.
[Electronic resource]. Access mode:
Jain A., Law M. Data clustering: A user’s dilemma, Lecture
Notes in Computer Science, 2005, Vol. 3776, pp. 1-10.
Fisher R. A. The Use of Multiple Measurements in
Taxonomic Problems, Annals of Eugenics, 1936, Vol. 7,
Beer D. G. , Kardia S. L. and al Gene-expression profiles
predict survival of patients with lung adenocarcinoma,
Nature Medicine, 2002, Vol. 8(8), pp. 816–824
GOST Style Citations
Copyright (c) 2019 S. Babichev, S. Vyshemyrska, V. Lytvynenko
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
Zaporizhzhya National Technical University,
Zhukovskiy street, 64, Zaporizhzhya, 69063, Ukraine.
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
The reference to the journal is obligatory in the cases of complete or partial use of its materials.