DOI: https://doi.org/10.15588/1607-3274-2019-1-8

### IMPLEMENTATION OF DBSCAN CLUSTERING ALGORITHM WITHIN THE FRAMEWORK OF THE OBJECTIVE CLUSTERING INDUCTIVE TECHNOLOGY BASED ON R AND KNIME TOOLS

#### Abstract

Context. The problem of the data clustering within the framework of the objective clustering inductive technology is considered. Practical implementation of the obtained hybrid model based on the complex use of R and KNIME tools is performed. The object of the study is the hybrid model of the data clustering based on the complex use of both DBSCAN clustering algorithm and the

objective clustering inductive technology.

Objective. The aim of the work is the creation of the hybrid model of the objective clustering based on DBSCAN clustering algorithm and its practical implementation on the basis of the complex use of both R and KNIME tools.

Method. The inductive methods of complex systems modelling have been used as the basis to determine the optimal parameters of DBSCAN clustering algorithm within the framework of the objective clustering inductive technology. The practical

implementation of this technology involves: the use of two equal power subsets, which contain the same quantity of pairwise similar objects; calculation of the internal and the external clustering quality criteria; calculation of the complex balance criterion, maximum value of which corresponds to the best clustering in terms of the used criteria. Implementation of this process involves two main

stages. Firstly, the optimal values of the EPS parameter were determined at each step within the range of the minPts value changes. The charts of the complex balance criterion versus the EPS value were obtained for each minPts value as the results of this stage implementation. Then, the analysis of the obtained intermediate results was performed in order to determine the optimal solution,

which corresponds to both the maximum value of the complex balance criterion on the one side and the aims of the current clustering on the other side.

Results. The developed hybrid model has been implemented based on software KNIME with the use of plugins, which have been

written in software R. The efficiency of the model was tasted with the use of the different data: low dimensional data of the computing school of East Finland University; Fisher’s iris; gene expression profiles of the patients, which were investigated on lung cancer.

Conclusions. The results of the simulation have shown high efficiency of the proposed method. The studied objects were distributed into clusters correctly in all cases. The proposed method allows us to decrease the reproducibility error, since the solution concerning determination of the clustering algorithm optimal parameters was taken based on both the clustering results obtained on

equal power subsets separately and the difference of the clustering results obtained on the two equal power subsets.

#### Keywords

#### Full Text:

PDF (Українська)#### References

Madala H. R., Ivakhnenko A. G. Inductive learning

algorithms for complex systems modeling. CRC Press,

, 365 p.

Soni N., Ganatra A. Categorization of several clustering

algorithms from different perspective: a review,

International Journal of Advanced Research in Computer

Science and Software Engineering, 2012, Vol. 2, Issue 8,

pp. 63–68.

Stepashko V., Bulgakova O. , Zosimov V. Construction and

research of the generalized iterative GMDH algorithm with

active neurons, Advances in Intelligent Systems and

Computing II, 2018, pp. 492–510. DOI: 10.1007/978-3-319-

-1_35

Bulgakova O., Stepashko V., Zosimov V. Numerical study

of the generalized iterative algorithm GIA GMDH with

active neurons, Proceedings of the 12th International

Scientific and Technical Conference on Computer Sciences

and Information Technologies, 2017, 1, art. no. 8098836,

pp. 496–500. DOI: 10.1109/STC-CSIT.2017.8098836

Osypenko V. V., Reshetjuk V. M. The methodology of

inductive system analysis as a tool of engineering researches

analytical planning, Ann. Warsaw Univ. Life Sci, 2011,

SGGW. No. 58, pp. 67–71. [Electronic resource]. Access

mode: http://annals-wuls.sggw.pl/?q=node/234

[Babichev S. , Taif M. A. , Lytvynenko V. , Korobchinskyi

M. ] Objective clustering inductive technology of gene

expression sequences features, Communications in

Computer and Information Science: In the book “Beyond

Databases, Architectures and Structures”, edited by

S. Kozelski and D. Mrozek, 2017, pp. 359–372.

Babichev S. , Taif M. A. , Lytvynenko V. , Osypenko V.

Criterial analysis of gene expression sequences to create the

objective clustering inductive technology, Proceeding of the

IEEE 37th International Conference on Electronics

and Nanitechnology (ELNANO), 2017, pp. 244–249.

Babichev S., Taif M. A. , Lytvynenko V. Inductive model of

data clustering based on the agglomerative hierarchical

algorithm, Proceeding of the 2016 IEEE First International

Conference on Data Stream Mining and Processing (DSMP,

, pp. 19–22. [Electronic resource]. Access mode:

http://ieeexplore.ieee.org/document/7583499/

Babichev S., Taif M. A. , Lytvynenko V. Estimation of the

inductive model of objects clustering stability based on the

k-means algorithm for different levels of data noise, Radio

Electronics, Computer Science, Control, 2016, No. 4,

pp. 54–60.

Puchala D., Yatsymirskyy M. M. Joint compression and

encryption of visual data using orthogonal parametric

transforms, Bulletin of the Polish Academy of Sciences-

Technical Sciences, 2016, Vol. 64, Issue 2, pp. 373–382.

Rashkevych Y., Peleshko D., Vynokurova O., Izonin I.,

Lotoshynska N. Single-frame image super-resolution based

on singular square matrix operator, 1st IEEE Ukraine

Conference on Electrical and Computer Engineering

(UKRCON), MAY 29-JUN 02, 2017, pp. 944–948.

Ivakhnenko A. Group method of data handling as

competitor to the method of stochastic approximation, Soviet

Automatic Control, 1968, Vol. 3, pp. 64–78.

Calinski T., Harabasz J. A dendrite method for cluster

analysis, Communication in Statistics, 1974, Vol. 3,

pp. 1-27.

Zhao Q. Xu M., Fränti P. Sum-of-squares based cluster

validity index and significance analysis, Proceeding of

International Conference on Adaptive and Natural

Computing Algorithms, 2009, pp. 313–322.

Babichev S. , Krejci J. , Bicanek J. , Lytvynenko V. ] Gene

expression sequences clustering based on the internal and

external clustering quality criteria, Proceedings of the 12th

International Scientific and Technical Conference on

Computer Sciences and Information Technologies, 2017, 1,

art. no. 8098744, pp. 91–94. DOI: 10.1109/STCCSIT.

8098744.

Harrington J. The desirability function, Industrial Quality

Control, 1965, Vol. 21(10), pp. 494–498. [Electronic

resource]. Access mode: http://asq.org/qic/displaytem/?

item=4860.

Ester M., Kriegel H. , Sander J. A density-based algorithm

for discovering clusters in large spatial databases,

Proceedings of the Second International Conference on

Knowledge Discovery and Data Mining, 1996, pp. 226–231.

Ester M. , Kriegel H. , Sander J. , Xu X. ] A density-based

algorithm for discovering clusters in large spatial databases

with noise, KDD-1996 : proceedings, 1996, pp. 226–231.

Kriegel H.-P., Kröger P., Sander J., Zimek A. Density-based

clustering, Wiley Interdisciplinary Reviews: Data Mining

and Knowledge Discovery, 2011, No. 1(3), pp. 231–240.

[Electronic resource]. Access mode:

https://www.knime.com/ knime-software/knime-analyticsplatform

Ihaka R., Gentleman R. R: a linguage for data analysis and

graphics, Journal of Computational and Graphical

Statistics, 1996, Vol. 5(3), pp. 299–314.

Gionis A., Mannila H. , Tsaparas P. Clustering aggregation,

ACM Transactions on Knowledge Discovery from Data

(TKDD), 2007, Vol. 1(1), pp. 1–30.

Zahn C. T. Graph-theoretical methods for detecting and

describing gestalt clusters, IEEE Transactions on

Computers, 1971, Vol. 100(1), pp. 68–86.

[Electronic resource]. Access mode:

http://www.sthda.com/english/rpkgs/factoextra

Jain A., Law M. Data clustering: A user’s dilemma, Lecture

Notes in Computer Science, 2005, Vol. 3776, pp. 1-10.

Fisher R. A. The Use of Multiple Measurements in

Taxonomic Problems, Annals of Eugenics, 1936, Vol. 7,

pp. 179-188.

Beer D. G. , Kardia S. L. and al Gene-expression profiles

predict survival of patients with lung adenocarcinoma,

Nature Medicine, 2002, Vol. 8(8), pp. 816–824

#### GOST Style Citations

Copyright (c) 2019 S. Babichev, S. Vyshemyrska, V. Lytvynenko

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

**Address of the journal editorial office:**

Editorial office of the journal «Radio Electronics, Computer Science, Control»,

National University "Zaporizhzhia Polytechnic",

Zhukovskogo street, 64, Zaporizhzhia, 69063, Ukraine.

Telephone: +38-061-769-82-96 – the Editing and Publishing Department.

E-mail: rvv@zntu.edu.ua

**The reference to the journal is obligatory in the cases of complete or partial use of its materials.**