THE FRACTAL ANALYSIS OF SAMPLE AND DECISION TREE MODEL
Keywords:Decision tree, sample, fractal dimension, indicator, tree complexity.
Context. The problem of decision tree model synthesis using the fractal analysis is considered in the paper. The object of study is a decision trees. The subject of study is a methods of decision tree model synthesis and analysis.
Objective. The objective of the paper is a creation of methods and fractal indicators allowing jointly solving the problem of decision tree model synthesis and the task of reducing the dimension of training data from a unified approach based on the principles of fractal analysis.
Method. The fractal dimension for a decision tree based model is defined as for whole training sample as for specific classes. The method of the fractal dimension of a model based on a decision tree estimation taking into account model error is proposed. It allows to built model with an acceptable error value, but with optimized level of fractal dimensionality. This makes possibility to reduce decision tree model complexity and to make it mo interpretable. The set of indicators characterizing complexity of decision tree model is proposed. The set of indicators characterizing complexity of decision tree model is proposed. It contains complexity of node checking, complexity of node achieving, an average model complexity and worst tree model complexity of computations. On the basis of proposed set of indicators a complex criterion for model building is proposed. The indicators of the fractal dimension of the decision tree model error can be used to find and remove the non-informative features in the model.
Results. The developed indicators and methods are implemented in software and studied at practical problem solving. As results of experimental study of proposed indicators the graphs of their dependences were obtained. They include graphs of dependencies of number of hyperblocks covering the sample in the features space from size of block side: for whole sample, for each class, for different set error values and obtained error values, for varied values of resulted number of features and instances, also as graphs of dependencies between average and worst tree complexities, decision tree fractal dimensionality and tree average complexity, joint criterion and indicator of feature set reduction, and between joint criterion and tree fractal dimensionality/
Conclusions. The conducted experiments confirmed the operability of the proposed mathematical support and allow recommending it for use in practice for solving the problems of model building by the precedents.
Geurts P., Irrthum A., Wehenkel L. Supervised learning with decision tree-based methods in computational and systems biology, Molecular Biosystems, 2009, Vol. 5, No. 12, pp. 1593– 1605.
Breiman L., Friedman J. H., Olshen R. A., Stone C. J. Classiﬁcation and regression trees. Boca Raton, Chapman and Hall/CRC, 1984, 368 p.
Heath D., Kasif S., Salzberg S. Induction of oblique decision trees [Electronic resource]. Baltimor, Johns Hopkins University, 1993, 6 p. Access mode: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.92 08&rep=rep1&type=pdf
Rabcan J., Levashenko V., Zaitseva E., Kvassay M., Subbotin S. Non-destructive diagnostic of aircraft engine blades by Fuzzy Decision Tree, Engineering Structures. – Vol. 197, 109396.
Quinlan J. R. Induction of decision trees, Machine learning, 1986, Vol. 1, No. 1, pp. 81– 106.
Breiman L. Bagging predictors , Machine Learning, 1996, Vol. 24, No. 2, pp. 123–140.
Utgoff P. E. Incremental induction of decision trees, Machine learning, 1989, Vol. 4, No. 2, pp. 161–186. DOI:10.1023/A:1022699900025
Hyaﬁl L., Rivest R. L. Constructing optimal binary decision trees is np-complete, Information Processing Letters. – 1976, Vol. 5, No. 1, pp. 15–17.
Subbotin S. A. Postroyeniye derev’yev resheniy dlya sluchaya maloinformativnykh priznakov, Radío Electronics, Computer Science, Control, 2019, No. 1, pp. 122–131.
Amit Y., Geman D., Wilder K. Joint induction of shape features and tree classiﬁers, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, Vol. 19, No. 11. pp. 1300–1305.
Subbotin S. A. Metody sinteza modeley kolichest-vennykh zavisimostey v bazise derev’yev regressii, realizuyushchikh klaster-regressionnuyu approksimatsiyu po pretsedentam, Radío Electronics, Computer Science, Control, 2019, No. 3, pp. 76-85.
Jensen R., Shen Q. Computational intelligence and feature selection: rough and fuzzy approaches. Hoboken, John Wiley & Sons, 2008, 300 p.
Subbotin S. The instance and feature selection for neural network based diagnosis of chronic obstructive bronchitis, Applications of Computational Intelligence in Biomedical Technology. Cham, Springer, 2016, pp. 215–228. DOI: 10.1007/978-3-31919147-8_13
Miyakawa M. Criteria for selecting a variable in the construction of efﬁcient decision trees, IEEE Transactions on Computers, 1989, Vol. 38, № 1, pp. 130–141.
Tolosi L., Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions, Bioinformatics, 2011, Vol. 27, No. 14, pp. 1986–1994. DOI:10.1093/bioinformatics/btr300
Chaudhuri A., Stenger H.. Survey sampling theory and methods. New York, Chapman & Hall, 2005, 416 p. DOI: 10.1201/9781420028638
Subbotin S.A. Methods of sampling based on exhaustive and evolutionary search, Automatic Control and Computer Sciences, 2013, Vol. 47, No. 3, pp. 113–121. DOI: 10.3103/s0146411613030073
Subbotin S.A. The sample properties evaluation for pattern recognition and intelligent diagnosis, Digital Technologies : 10th International Conference, Zilina, 9–11 July 2014 : proceedings. Los Alamitos, IEEE, 2014, pp. 332–343. DOI: 10.1109/dt.2014.6868734
Lavrakas P.J. Encyclopedia of survey research methods. – Thousand Oaks: Sage Publications, 2008, Vol. 1–2, 968 p. DOI: 10.4135/9781412963947.n159
Łukasik S., Kulczycki P. An algorithm for sample and data dimensionality reduction using fast simulated annealing, Advanced Data Mining and Applications, Lecture Notes in Computer Science. Berlin: Springer, 2011, Vol. 7120, pp. 152–161. DOI: 10.1007/978-3-642-25853-4_12
Subbotin S. A. The training set quality measures for neural network learning, Optical Memory and Neural Networks (Information Optics), 2010, Vol. 19, No. 2, pp. 126–139. DOI: 10.3103/s1060992x10020037
Cheng Q. Multifractal Modeling and Lacunarity Analysis, Mathematical Geology, 1997, Vol. 29, No. 7, pp. 919–932. DOI:10.1023/A:1022355723781
Eftekhari A. Fractal Dimension of Electrochemical Reactions, Journal of the Electrochemical Society, 2004, Vol. 151, No. 9, pp. E291–E296. DOI:10.1149/1.1773583.
Dubuc B., Quiniou J., Roques-Carmes C., Tricot C., Zucker S. Evaluating the fractal dimension of profiles, Physical Review,
– Vol. 39, No. 3. – P. 1500–1512. DOI:10.1103/PhysRevA.39.1500
Camastra F. Data Dimensionality Estimation Methods: A survey, Pattern Recognition, 2003, Vol. 36, No. 12, pp. 2945– 2954. DOI: 10.1016/S0031-3203(03)00176-6
de Sousa P. M., Traina C., Traina A. J. M., Wu L., Faloutsos C. A fast and effective method to find correlations among attributes in databases, Data Mining and Knowledge Discovery, 2007, Vol. 14, Issue 3, pp. 367–407. DOI: 10.1007/s10618-0060056-4
Roberts A., Cronin A. Unbiased estimation of multi-fractal dimensions of finite data sets, Physica A: Statistical Mechanics and its Applications, 1996, Vol. 233, No. 3–4, pp. 867–878. DOI:10.1016/s0378-4371(96)00165-3
Kumaraswamy K. Fractal Dimension for Data Mining [Electronic resource]. Access mode: https://www.ml.cmu.edu/research/dap-papers/skkumar_kdd _project.pdf.
Li J. Du Q., Sun C. An improved box-counting method for image fractal dimension estimation, Pattern Recognition, 2009, Vol. 42, No. 11, pp. 2460–2469. DOI:10.1016/j.patcog.2009.03.001.
Popescu D. P., Flueraru C., Mao Y., Chang S., Sowa M.G.,Signal attenuation and box-counting fractal analysis of optical coherence tomography images of arterial tissue, Biomedical Optics Express, 2010, Vol. 1, No. 1, pp. 268–277. DOI:10.1364/boe.1.000268
Takens F. On the numerical determination of the dimension of an attractor, Dynamical Systems and Bifurcations Workshop Groningen, 16–20 April 1984 : proceedings. Berlin, Springer, 1985, pp. 99–106. DOI: 10.1007/bfb0075637
Subbotin S. A. Metriki kachestva vyborok dannykh i modeley zavisimostey, osnovannyye na fraktal’noy razmernosti, Radío Electronics, Computer Science, Control, 2017, No. 2, pp. 70– 81.
Zong-Chang Y. Establishing structure for artificial neural networks based-on fractal, Journal of Theoretical and Applied Information Technology, 2013, Vol. 49, No. 1, pp. 342–347.
Crişan D. A., Dobrescu R. Fractal dimension spectrum as an indicator for training neural networks, Universitatea Politehnica Bucuresti Sci. Bull. Series C, 2007, Vol. 69, No. 1, pp. 23–32.
Subbotin S. A. The neural network model synthesis based on the fractal analysis, Optical Memory and Neural Networks, 2017, Vol. 26, No. 4, pp. 257–273.
Fisher Iris dataset [Electronic resource]. Access mode: https://archive.ics.uci.edu/ml/datasets/Iris
Dubrovin V., Subbotin S., Morshchavka S., Piza D. The plant recognition on remote sensing results by the feed-forward neural networks, International Journal of Smart Engineering System Design, 2001, Vol. 3, No. 4, pp. 251–256.
Arrhythmia Data Set [Electronic resource]. Access mode: http://archive.ics.uci.edu/ml/datasets/arrhythmia
How to Cite
Copyright (c) 2020 S. A. Subbotin, Ye. A. Gofman
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.