DECISION TREE CONSTRUCTION FOR THE CASE OF LOW-INFORMATIVE FEATURES

Authors

  • S. A. Subbotin Zaporizhzhia National Technical University, Zaporizhzhia, Ukraine., Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2019-1-12

Keywords:

decision tree, pattern recognition, classification, feature, informatineness.

Abstract

Context. The problem of automating the decision tree construction is addressed. The object of study is a decision tree. The subject of study is the methods of decision tree building.
Objective. The purpose of the work is to create a method for constructing models based on decision trees for data samples that are characterized
by sets of individually low-informative features.
Method. A method for decision tree constructing is proposed, which for a given sample determines the individual informativeness of features relatively to the output feature, and also evaluates the relationship of input features with each other as their individual informativity pairwise relatively to each other, at the step of forming the next node the method selects as a candidate feature the feature that gives the best partition in the whole set of features, after which it sequentially searches among all the features that are not selected for this node the one that
is individually most closely related with the selected candidate, then for the set of selected features, iterating through the available transformations from a given set, determines the quality of the partition for each transformation, selects the best transformation and adds it to the node. When forming the next node, the method tends to single out a group of the most closely interrelated features, the conversion of which into a scalar value will provide the best partitioning of a subsample of instances hit into this node. This makes possible to reduce the size of the model and the branching of the tree, speed up the calculations in recognizing instances based on the model, as well as improve the generalizing properties of the model and its interpretability. The proposed method allows using the constructed decision tree to assess the feature significance.Results. The developed method is implemented as software and investigated at signal represented by a set of individually lowinformative readings classification problem solving.           Conclusions. The experiments have confirmed the efficiency of the proposed software and allow recommending it for use in practice in solving problems of diagnostics and automatic classification by features. The prospects for further research may consist in the creation of parallel methods for constructing decision trees based on the proposed method, optimization of its software implementations, and also in an experimental study of the proposed method on a wider set of practical problems 

References

Amit Y., Geman D., Wilder K. Joint induction of shape features

and tree classifiers, IEEE Transactions on Pattern Analysis

and Machine Intelligence, 1997, V. 19, № 11, pp. 1300–

Breiman L. L., Friedman J. H., Olshen R. A., Stone C. J.

Classification and regression trees. Boca Raton, Chapman and

Hall/CRC, 1984, 368 p.

Dietterich T. G., Kong E. B. Machine learning bias, statistical

bias, and statistical variance of decision tree algorithms [Electronic

resource]. Corvallis, Oregon State University, 1995, 14

p. Access mode:

http://www.cems.uwe.ac.uk/~irjohnso/coursenotes/uqc832/trbias.

pdf

Geurts P., Irrthum A., Wehenkel L. Supervised learning with

decision tree-based methods in computational and systems biology,

Molecular Biosystems, 2009, V. 5, No. 12, pp. 1593–

Heath D., Kasif S., Salzberg S. Induction of oblique decision

trees [Electronic resource]. Baltimor, Johns Hopkins University,

, 6 p. Access mode:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.9

&rep=rep1&type=pdf

Hothorn T., Hornik K., Zeileis A. Unbiased recursive partitioning:

A conditional inference framework, Journal of Computational

and Graphical Statistics, 2006, V. 15, No. 3, pp. 651–

Hyafil L., Rivest R. L. Constructing optimal binary decision

trees is np-complete, Information Processing Letters, 1976, V.

, № 1, pp. 15–17.

Kim H., Loh W.-Y. Classification trees with unbiased multiway

splits, Journal of the American Statistical Association,

, V. 96, No. 454, pp. 589–604.

Kufrin R. Decision trees on parallel processors, Machine Intelligence

and Pattern Recognition, 1997, V. 20, pp. 279–306.

Kwok S. W., Carter C. Multiple decision trees, Fourth Annual

Conference on Uncertainty in Artificial Intelligence (UAI '88),

–12 July 1988, Minneapolis : proceedings. Amsterdam

North-Holland Publishing Co., 1990, pp. 327–338.

Mingers J. An empirical comparison of pruning methods for

decision tree induction, Machine learning, 1989, V. 4, No. 2,

pp. 227–243.

Quinlan J. R. Induction of decision trees, Machine learning,

, V. 1, No. 1, pp. 81– 106.

Strobl C., Boulesteix A., Augustin T. Unbiased split selection

for classification trees based on the Gini index, Computational

Statistics & Data Analysis, 2007, V. 52, No. 1, pp. 483–501.

Utgoff P. E. Incremental induction of decision trees, Machine

learning, 1989, V. 4, No. 2, pp. 161–186. DOI:

1023/A:1022699900025

Breiman L. Bagging predictors, Machine Learning, 1996, V.

, No. 2, pp. 123–140.

Chawla N. V. Hall L. O., Bowyer K. W., W. P. Kegelmeyer

Learning ensembles from bites: A scalable and accurate approach,

Journal of Machine Learning Results, 2004, V. 5, pp.

–451.

Dietterich T. G. An experimental comparison of three methods

for constructing ensembles of decision trees: bagging, boosting,

and randomization, Machine learning, 2000, V. 40, № 2,

pp. 139–157.

Efron B. Bootstrap methods: another look at the jackknife, The

Annals of Statistics, 1979, V. 7, № 1, pp. 1–26.

Altmann A., Toloşi L., Sander O., Lengauer T. Permutation

importance: a corrected feature importance measure, Bioinformatics,

, V. 26, № 10, pp. 1340–1347.

DOI:10.1093/bioinformatics/btq134.

De Mántaras R. L. A distance-based attribute selection measure

for decision tree induction, Machine learning, 1991, V. 6,

№ 1, pp. 81–92.

Deng H., Runger G., Tuv E. Bias of importance measures for

multi-valued attributes and solutions, 21st International Conference

on Artificial Neural Networks (ICANN), Espoo, 14–17

June 2011 : proceedings. Berlin, Springer-Verlag, 2011, V. 2,

pp. 293–300.

Mingers J. An empirical comparison of selection measures for

decision-tree induction, Machine learning, 1989, V. 3, No. 4,

pp. 319–342.

Mitchell T. Machine learning. New York, McGraw-Hill, 1997,

p.

Painsky A., Rosset S. Cross-validated variable selection in

tree-based methods improves predictive performance, IEEE

Transactions on Pattern Analysis and Machine Intelligence,

, V. 39, No. 11, pp. 2142–2153.

DOI:10.1109/tpami.2016.2636831

Miyakawa M. Criteria for selecting a variable in the construction

of efficient decision trees, IEEE Transactions on Computers,

, V. 38, No. 1, pp. 130–141.

Tolosi L., Lengauer T. Classification with correlated features:

unreliability of feature ranking and solutions, Bioinformatics,

, V. 27, No. 14, pp. 1986–1994.

DOI:10.1093/bioinformatics/btr300

Alpaydin E. Introduction to Machine Learning. London, The

MIT Press. 2010, 400 p.

Subbotin S., Oleynik An. Entropy based evolutionary search

for feature selection, The experience of designing and application

of CAD systems in microelectronics : IX International

conference CADSM–2007, Lviv–Polyana, 20–24 February

: рroceedings. Lviv, NU “Lviv Polytechnic”, 2007, pp.

–443.

Subbotin S., Oliinyk A. Eds.: R. Szewczyk, M. Kaliczyńska

The dimensionality reduction methods based on computational

intelligence in problems of object classification and diagnosis,

Recent Advances in Systems, Control and Information Technology.

Cham, Springer, 2017, pp. 11–19. (Advances in Intelligent

Systems and Computing, vol. 543).

Subbotin S. A. Methods and characteristics of localitypreserving

transformations in the problems of computational

intelligence, Radio Electronics, Computer Science, Control,

, No. 1, pp. 120–128.

Boguslayev A. V., Oleynik Al. A., Oleynik An. A., Pavlenko

D. V., Subbotin S. A.; pod red. Pavlenko D. V., Subbotina S.

A. Progressivnyye tekhnologii modelirovaniya, optimizatsii i

intellektual'noy avtomatizatsii etapov zhiznennogo tsikla aviatsionnykh

dvigateley : monografiya. Zaporozh'ye, OAO “Motor

Sich”, 2009, 468 p.

Dubrovin V., Subbotin S., Morshchavka S., Piza D. The plant

recognition on remote sensing results by the feed-forward neural

networks, International Journal of Smart Engineering System

Design, 2001, V. 3, No. 4, pp. 251–256.

How to Cite

Subbotin, S. A. (2019). DECISION TREE CONSTRUCTION FOR THE CASE OF LOW-INFORMATIVE FEATURES. Radio Electronics, Computer Science, Control, (1). https://doi.org/10.15588/1607-3274-2019-1-12

Issue

Section

Neuroinformatics and intelligent systems