МЕТОДЫ КОЛИЧЕСТВЕННОГО РЕШЕНИЯ ПРОБЛЕМЫ  НЕСБАЛАНСИРОВАННОСТИ КЛАССОВ

D. А. Kavrin; S. A. Subbotin

doi:10.15588/1607-3274-2018-1-10

Authors

D. А. Kavrin Zaporizhzhya National Technical University, Zaporizhzhya, Ukraine, Ukraine
S. A. Subbotin Zaporizhzhya National Technical University, Zaporizhzhya, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2018-1-10

Keywords:

sample, example, quality metric, cluster, classificatory, majority class, minority class.

Abstract

Context. The problem of recovery the classes’ balance in imbalanced samples is solved to increase the efficiency of diagnostic and
recognition models.
Objective. The purpose of the work is to modify the existing method of recovery classes’ balance and to conduct comparative analysis
of performance indicators with some modern methods.
Method. The proposed data preprocessing method is based on combining the undersampling and cluster-analysis technologies. The
method has allowed restoring the balance and reducing the sample while maintaining important topological properties of the sample, high
accuracy and acceptable operating time.
Results. The software that implements in proposed method has been developed and used in the computational experiments on the study
of method’s properties and comparative analysis with other methods of restoring classes’ balance.
Conclusions. The experiments confirmed the efficiency of the proposed method and its implemented software. The method has allowed
reducing the majority class to the size of the minority class, thus reducing the training sample (the sample is considered imbalanced if the size of the minority class is less than 10% of the original sample size), while demonstrating the best indicators of model accuracy and comparable sampling speed. It can be recommended for the practical application in solving problems of imbalance data for diagnostic and recognition models.

References

He H., Garcia E. A. Learning from Imbalanced Data, IEEE

Transactions on Knowledge and Data Engineering, 2009,

Vol. 21, pp. 1263–1284. DOI: 10.1109/TKDE.2008.239

Paklin N. B., Ulanov S. V., Car’kov S. V. Postroenie klassifikatorov

na nesbalansirovannykh vyborkakh na primere kreditnogo

skoringa, Iskusstvennyjj intellekt, 2010, No. 3, pp. 528–534.

Sun Y., Wong A. K. C., Kamel M. S. Classification of imbalanced

data: a review, International Journal of Pattern Recognition and

Artificial Intelligence, 2009, Vol. 23, Issue 4, pp. 687–719.

DOI: 10.1142/S0218001409007326

Batista G. E. A. P. A., Prati R. C., Monard M. C. A study of the

behavior of several methods for balancing machine learning

training data, SIGKDD Explorations, 2004, Vol. 6, Issue 1,

pp. 20–29. DOI: 10.1145/1007730.1007735

Lin W. C., Tsai C. F., Hu Y. H., Jhang J. S. Clustering-based

undersampling in class-imbalanced data, Information Sciences,

, Vol. 409–410, pp. 17–26. DOI: 10.1016/j.ins.2017.05.008

Yen S. J., Lee Y. S. Cluster-based under-sampling approaches for

imbalanced data distributions, Expert Systems with Applications,

, Vol. 36, Issue 3, pp. 5718–5727. DOI: 10.1016/

j.eswa.2008.06.108

Chawla N. V., Bowyer K. W., Hall L. O., Kegelmeyer W. P.

SMOTE: Synthetic minority over-sampling technique, Journal

of Artificial Intelligence Research, 2002, Vol. 16, pp. 321–357.

DOI: 10.1613/jair.953

Wang B. X., Japkowicz N. Imbalanced Data Set Learning with

Synthetic Samples [Electronic resource]. Access mode: http://

www.iro.umontreal.ca/~lisa/workshop2004/ program.html

Subbotіn S. O., Olіjjnik A. O. Іntelektual’nі sistemi : navch. posіb.

pіd zag. red. prof. S. O. Subbotіna. Zaporіzhzhja, ZNTU, 2014,

p.

Elkan C. The foundations of cost-sensitive learning, 17th

international joint conference on Artificial intelligence, Seattle,

–10 August 2001 : Proceedings. San Francisco, Morgan

Kaufmann Publishers Inc., 2001, Vol. 2, pp. 973–978.

Fawcett T. An Introduction to ROC Analysis, Pattern Recognition

Letters, 2006, Vol. 27, Issue 8, pp. 861–874. DOI: 10.1016/

j.patrec.2005.10.010

Cover T., Hart P. Nearest neighbor pattern classification, IEEE

Transactions on Information Theory, 1967, Vol. 13, Issue 1,

P. 21–27. DOI: 10.1109/TIT.1967.1053964

Zagorujjko N. G. Prikladnye metody analiza dannykh i znanijj.

Novosibirsk, IIM, 1999, 270 p.

Lloyd S. P. Least Squares Quantization in PCM, IEEE Transactions

on Information Theory, 1982, Vol. 28, pp. 129–137.

DOI: 10.1109/TIT.1982.1056489

Subotіn S. O., Kavrіn D. A. Avtomatizovana sistema vіdboru

optimal’nogo metodu vіdnovlennja balansu klasіv pri formuvannі

navchal’noї vibіrki, Іnformatika, upravlіnnja ta shtuchnijj іntelekt.

Materіali chetvertoї mіzhnarodnoї naukovotekhnіchnoї

konferencії studentіv, magіstrіv ta aspіrantіv. Kharkіv, NTU

“KhPІ”, 2017, P. 94.

Kokren U. Metody vyborochnogo issledovanija. Mosсow,

Statistika, 1976, 440 p.

THE METHODS FOR QUANTITATIVE SOLVING THE CLASS IMBALANCE PROBLEM

Authors

DOI:

Keywords:

Abstract

References

Downloads

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue

Announcements