DOI: https://doi.org/10.15588/1607-3274-2020-1-11

THE FRACTAL ANALYSIS OF SAMPLE AND DECISION TREE MODEL

S. A. Subbotin, Ye. A. Gofman

Abstract


Context. The problem of decision tree model synthesis using the fractal analysis is considered in the paper. The object of study is a decision trees. The subject of study is a methods of decision tree model synthesis and analysis.

Objective. The objective of the paper is a creation of methods and fractal indicators allowing jointly solving the problem of decision tree model synthesis and the task of reducing the dimension of training data from a unified approach based on the principles of fractal analysis. 

Method. The fractal dimension for a decision tree based model is defined as for whole training sample as for specific classes. The method of the fractal dimension of a model based on a decision tree estimation taking into account model error is proposed. It allows to built model with an acceptable error value, but with optimized level of fractal dimensionality. This makes possibility to reduce decision tree model complexity and to make it mo interpretable. The set of indicators characterizing complexity of decision tree model is proposed. The set of indicators characterizing complexity of decision tree model is proposed. It contains complexity of node checking, complexity of node achieving, an average model complexity and worst tree model complexity of computations. On the basis of proposed set of indicators a complex criterion for model building is proposed. The indicators of the fractal dimension of the decision tree model error can be used to find and remove the non-informative features in the model.

Results. The developed indicators and methods are implemented in software and studied at practical problem solving. As results of experimental study of proposed indicators the graphs of their dependences were obtained. They include graphs of dependencies of number of hyperblocks covering the sample in the features space from size of block side: for whole sample, for each class, for different set error values and obtained error values, for varied values of resulted number of features and instances, also as graphs of dependencies between average and worst tree complexities, decision tree fractal dimensionality and tree average complexity, joint criterion and indicator of feature set reduction, and between joint criterion and tree fractal dimensionality/

Conclusions. The conducted experiments confirmed the operability of the proposed mathematical support and allow recommending it for use in practice for solving the problems of model building by the precedents.


Keywords


Decision tree, sample, fractal dimension, indicator, tree complexity.

Full Text:

PDF

References


Geurts P., Irrthum A., Wehenkel L. Supervised learning with decision tree-based methods in computational and systems biology, Molecular Biosystems, 2009, Vol. 5, No. 12, pp. 1593– 1605.

Breiman L., Friedman J. H., Olshen R. A., Stone C. J. Classification and regression trees. Boca Raton, Chapman and Hall/CRC, 1984, 368 p.

Heath D., Kasif S., Salzberg S. Induction of oblique decision trees [Electronic resource]. Baltimor, Johns Hopkins University, 1993, 6 p. Access mode: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.92 08&rep=rep1&type=pdf

Rabcan J., Levashenko V., Zaitseva E., Kvassay M., Subbotin S. Non-destructive diagnostic of aircraft engine blades by Fuzzy Decision Tree, Engineering Structures. – Vol. 197, 109396.

Quinlan J. R. Induction of decision trees, Machine learning, 1986, Vol. 1, No. 1, pp. 81– 106.

Breiman L. Bagging predictors , Machine Learning, 1996, Vol. 24, No. 2, pp. 123–140.

Utgoff P. E. Incremental induction of decision trees, Machine learning, 1989, Vol. 4, No. 2, pp. 161–186. DOI:10.1023/A:1022699900025

Hyafil L., Rivest R. L. Constructing optimal binary decision trees is np-complete, Information Processing Letters. – 1976, Vol. 5, No. 1, pp. 15–17.

Subbotin S. A. Postroyeniye derev’yev resheniy dlya sluchaya maloinformativnykh priznakov, Radío Electronics, Computer Science, Control, 2019, No. 1, pp. 122–131.

Amit Y., Geman D., Wilder K. Joint induction of shape features and tree classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, Vol. 19, No. 11. pp. 1300–1305.

Subbotin S. A. Metody sinteza modeley kolichest-vennykh zavisimostey v bazise derev’yev regressii, realizuyushchikh klaster-regressionnuyu approksimatsiyu po pretsedentam, Radío Electronics, Computer Science, Control, 2019, No. 3, pp. 76-85.

Jensen R., Shen Q. Computational intelligence and feature selection: rough and fuzzy approaches. Hoboken, John Wiley & Sons, 2008, 300 p.

Subbotin S. The instance and feature selection for neural network based diagnosis of chronic obstructive bronchitis, Applications of Computational Intelligence in Biomedical Technology. Cham, Springer, 2016, pp. 215–228. DOI: 10.1007/978-3-31919147-8_13

Miyakawa M. Criteria for selecting a variable in the construction of efficient decision trees, IEEE Transactions on Computers, 1989, Vol. 38, № 1, pp. 130–141.

Tolosi L., Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions, Bioinformatics, 2011, Vol. 27, No. 14, pp. 1986–1994. DOI:10.1093/bioinformatics/btr300

Chaudhuri A., Stenger H.. Survey sampling theory and methods. New York, Chapman & Hall, 2005, 416 p. DOI: 10.1201/9781420028638

Subbotin S.A. Methods of sampling based on exhaustive and evolutionary search, Automatic Control and Computer Sciences, 2013, Vol. 47, No. 3, pp. 113–121. DOI: 10.3103/s0146411613030073

Subbotin S.A. The sample properties evaluation for pattern recognition and intelligent diagnosis, Digital Technologies : 10th International Conference, Zilina, 9–11 July 2014 : proceedings. Los Alamitos, IEEE, 2014, pp. 332–343. DOI: 10.1109/dt.2014.6868734

Lavrakas P.J. Encyclopedia of survey research methods. – Thousand Oaks: Sage Publications, 2008, Vol. 1–2, 968 p. DOI: 10.4135/9781412963947.n159

Łukasik S., Kulczycki P. An algorithm for sample and data dimensionality reduction using fast simulated annealing, Advanced Data Mining and Applications, Lecture Notes in Computer Science. Berlin: Springer, 2011, Vol. 7120, pp. 152–161. DOI: 10.1007/978-3-642-25853-4_12

Subbotin S. A. The training set quality measures for neural network learning, Optical Memory and Neural Networks (Information Optics), 2010, Vol. 19, No. 2, pp. 126–139. DOI: 10.3103/s1060992x10020037

Cheng Q. Multifractal Modeling and Lacunarity Analysis, Mathematical Geology, 1997, Vol. 29, No. 7, pp. 919–932. DOI:10.1023/A:1022355723781

Eftekhari A. Fractal Dimension of Electrochemical Reactions, Journal of the Electrochemical Society, 2004, Vol. 151, No. 9, pp. E291–E296. DOI:10.1149/1.1773583.

Dubuc B., Quiniou J., Roques-Carmes C., Tricot C., Zucker S. Evaluating the fractal dimension of profiles, Physical Review,

– Vol. 39, No. 3. – P. 1500–1512. DOI:10.1103/PhysRevA.39.1500

Camastra F. Data Dimensionality Estimation Methods: A survey, Pattern Recognition, 2003, Vol. 36, No. 12, pp. 2945– 2954. DOI: 10.1016/S0031-3203(03)00176-6

de Sousa P. M., Traina C., Traina A. J. M., Wu L., Faloutsos C. A fast and effective method to find correlations among attributes in databases, Data Mining and Knowledge Discovery, 2007, Vol. 14, Issue 3, pp. 367–407. DOI: 10.1007/s10618-0060056-4

Roberts A., Cronin A. Unbiased estimation of multi-fractal dimensions of finite data sets, Physica A: Statistical Mechanics and its Applications, 1996, Vol. 233, No. 3–4, pp. 867–878. DOI:10.1016/s0378-4371(96)00165-3

Kumaraswamy K. Fractal Dimension for Data Mining [Electronic resource]. Access mode: https://www.ml.cmu.edu/research/dap-papers/skkumar_kdd _project.pdf.

Li J. Du Q., Sun C. An improved box-counting method for image fractal dimension estimation, Pattern Recognition, 2009, Vol. 42, No. 11, pp. 2460–2469. DOI:10.1016/j.patcog.2009.03.001.

Popescu D. P., Flueraru C., Mao Y., Chang S., Sowa M.G.,Signal attenuation and box-counting fractal analysis of optical coherence tomography images of arterial tissue, Biomedical Optics Express, 2010, Vol. 1, No. 1, pp. 268–277. DOI:10.1364/boe.1.000268

Takens F. On the numerical determination of the dimension of an attractor, Dynamical Systems and Bifurcations Workshop Groningen, 16–20 April 1984 : proceedings. Berlin, Springer, 1985, pp. 99–106. DOI: 10.1007/bfb0075637

Subbotin S. A. Metriki kachestva vyborok dannykh i modeley zavisimostey, osnovannyye na fraktal’noy razmernosti, Radío Electronics, Computer Science, Control, 2017, No. 2, pp. 70– 81.

Zong-Chang Y. Establishing structure for artificial neural networks based-on fractal, Journal of Theoretical and Applied Information Technology, 2013, Vol. 49, No. 1, pp. 342–347.

Crişan D. A., Dobrescu R. Fractal dimension spectrum as an indicator for training neural networks, Universitatea Politehnica Bucuresti Sci. Bull. Series C, 2007, Vol. 69, No. 1, pp. 23–32.

Subbotin S. A. The neural network model synthesis based on the fractal analysis, Optical Memory and Neural Networks, 2017, Vol. 26, No. 4, pp. 257–273.

Fisher Iris dataset [Electronic resource]. Access mode: https://archive.ics.uci.edu/ml/datasets/Iris

Dubrovin V., Subbotin S., Morshchavka S., Piza D. The plant recognition on remote sensing results by the feed-forward neural networks, International Journal of Smart Engineering System Design, 2001, Vol. 3, No. 4, pp. 251–256.

Arrhythmia Data Set [Electronic resource]. Access mode: http://archive.ics.uci.edu/ml/datasets/arrhythmia


GOST Style Citations


1. Geurts P. Supervised learning with decision tree-based methods in computational and systems biology / P. Geurts, A. Irrthum, L. Wehenkel // Molecular Biosystems. – 2009. –Vol. 5, № 12. – P. 1593–1605.

2. Classification and regression trees / [L. Breiman J. H. Friedman, R. A. Olshen, C. J. Stone]. – Boca Raton: Chapman and Hall/CRC, 1984. – 368 p.

3. Heath D. Induction of oblique decision trees [Electronic resource] / D. Heath, S. Kasif, S. Salzberg. – Baltimor : Johns Hopkins University, 1993. – 6 p. – Access mode:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.92 08&rep=rep1&type=pdf

4. Non-destructive diagnostic of aircraft engine blades by Fuzzy Decision Tree / [J. Rabcan, V. Levashenko, E. Zaitseva et al.] // Engineering Structures. – Vol. 197, 109396.

5. Quinlan J. R. Induction of decision trees / J. R. Quinlan // Machine learning. – 1986. – Vol. 1, № 1. – P. 81–106.

6. Breiman L. Bagging predictors / L. Breiman // Machine Learning. – 1996. – Vol. 24, № 2. – P. 123–140.

7. Utgoff P. E. Incremental induction of decision trees / P. E. Utgoff // Machine learning, 1989. – Vol. 4, № 2. – P. 161– 186. DOI:10.1023/A:1022699900025

8. Hyafil L. Constructing optimal binary decision trees is npcomplete / L. Hyafil, R. L. Rivest // Information Processing Letters. – 1976. – Vol. 5, № 1. – P.15–17.

9. Субботин С. А. Построение деревьев решений для случая малоинформативных признаков / С. А. Субботин // Радіоелектроніка, інформатика, управління. – 2019. – № 1. – С. 122–131.

10. Amit Y. Joint induction of shape features and tree classifiers / Y. Amit, D. Geman, K. Wilder // IEEE Transactions on Pattern Analysis and Machine Intelligence. – 1997. – Vol. 19, № 11. – P.1300–1305.

11. Субботин С. А. Методы синтеза моделей количественных зависимостей в базисе деревьев регрессии, реализующих кластер-регрессионную аппроксимацию по прецедентам / С. А. Субботин // Радіоелектроніка, інформатика, управління. – 2019. – № 3. – С. 76–85.

12. Jensen R. Computational intelligence and feature selection: rough and fuzzy approaches / R. Jensen, Q. Shen. –Hoboken: John Wiley & Sons, 2008. – 300 p.

13. Subbotin S. The instance and feature selection for neural network based diagnosis of chronic obstructive bronchitis / S. Subbotin // Applications of Computational Intelligence in Biomedical Technology. – Cham : Springer, 2016. – P. 215–228. DOI: 10.1007/978-3-319-19147-8_13

14. Miyakawa M. Criteria for selecting a variable in the construction of efficient decision trees / M. Miyakawa // IEEE Transactions on Computers. – 1989. – Vol. 38, № 1. – P. 130–141.

15. Tolosi L. Classification with correlated features: unreliability of feature ranking and solutions / L. Tolosi, T. Lengauer // Bioinformatics. – 2011. – Vol. 27, № 14. – P. 1986–1994. DOI:10.1093/bioinformatics/btr300

16. Chaudhuri A. Survey sampling theory and methods / A. Chaudhuri, H. Stenger. – New York : Chapman & Hall, 2005. – 416 p. DOI: 10.1201/9781420028638

17. Subbotin S.A. Methods of sampling based on exhaustive and evolutionary search / S. A. Subbotin // Automatic Control and Computer Sciences. – 2013. – Vol. 47, No. 3. – P. 113–121. DOI: 10.3103/s0146411613030073

18. Subbotin S.A. The sample properties evaluation for pattern recognition and intelligent diagnosis / S. A. Subbotin // Digital Technologies : 10th International Conference, Zilina, 9–11 July 2014 : proceedings. – Los Alamitos: IEEE, 2014. – P. 332–343. DOI: 10.1109/dt.2014.6868734

19. Lavrakas P.J. Encyclopedia of survey research methods. – Thousand Oaks: Sage Publications, 2008. – Vol. 1–2. – 968 p. DOI: 10.4135/9781412963947.n159

20. Łukasik S. An algorithm for sample and data dimensionality reduction using fast simulated annealing / S. Łukasik, P. Kulczycki // Advanced Data Mining and Applications, Lecture Notes in Computer Science. – Berlin : Springer, 2011. – Vol. 7120. – P. 152–161. DOI: 10.1007/978-3-642-25853-4_12

21. Subbotin S.A. The training set quality measures for neural network learning / S. A. Subbotin // Optical Memory and Neural Networks (Information Optics). – 2010. – Vol. 19, No. 2. – P. 126–139. DOI: 10.3103/s1060992x10020037

22. Cheng Q. Multifractal Modeling and Lacunarity Analysis / Q. Cheng // Mathematical Geology. – 1997. – Vol. 29, No. 7. – P. 919–932. DOI:10.1023/A:1022355723781

23. Eftekhari A. Fractal Dimension of Electrochemical Reactions / A. Eftekhari // Journal of the Electrochemical Society. – 2004. – Vol. 151, No. 9. – P. E291–E296. DOI:10.1149/1.1773583.

24. Evaluating the fractal dimension of profiles / [B. Dubuc, J. Quiniou, C. Roques-Carmes et al.] // Physical Review. – 1989. – Vol. 39, No. 3. – P. 1500–1512. DOI:10.1103/PhysRevA.39.1500

25. Camastra F. Data Dimensionality Estimation Methods: A survey / F. Camastra // Pattern Recognition. – 2003. – Vol. 36, No. 12. – P. 2945–2954. DOI: 10.1016/S0031-3203(03)00176-6

26. A fast and effective method to find correlations among attributes in databases / [P. M. de Sousa, C. Traina, A. J. M. Traina et al.] // Data Mining and Knowledge Discovery. – 2007. – Vol. 14, Issue 3. – P. 367–407. DOI: 10.1007/s10618-006-0056-4

27. Roberts A. Unbiased estimation of multi-fractal dimensions of finite data sets / A. Roberts, A. Cronin // Physica A: Statistical Mechanics and its Applications. – 1996. – Vol. 233, No. 3–4. – P. 867–878. DOI:10.1016/s0378-4371(96)00165-3

28. Kumaraswamy K. Fractal Dimension for Data Mining [Electronic resource] / K. Kumaraswamy. – Access mode: https://www.ml.cmu.edu/research/dap-papers/skkumar_kdd _project.pdf.

29. Li J. An improved box-counting method for image fractal dimension estimation / J. Li, Q. Du, C. Sun // Pattern Recognition. – 2009. – Vol. 42, No. 11. – P. 2460–2469. DOI:10.1016/j.patcog.2009.03.001.

30. Signal attenuation and box-counting fractal analysis of optical coherence tomography images of arterial tissue / [D. P. Popescu, C. Flueraru, Y. Mao et al.] // Biomedical Optics Express. – 2010. – Vol. 1, No. 1. – P. 268–277. DOI:10.1364/boe.1.000268

31. Takens F. On the numerical determination of the dimension of an attractor / F. Takens // Dynamical Systems and Bifurcations Workshop Groningen, 16–20 April 1984 : proceedings. – Berlin: Springer, 1985. – P. 99–106. DOI: 10.1007/bfb0075637

32. Субботин С. А. Метрики качества выборок данных и моделей зависимостей, основанные на фрактальной размерности / С. А. Субботин // Радіоелектроніка, інформатика, управління. – 2017. – № 2. – С. 70–81.

33. Zong-Chang Y. Establishing structure for artificial neural networks based-on fractal / Y. Zong-Chang // Journal of Theoretical and Applied Information Technology. – 2013. – Vol. 49, No. 1. – P. 342–347.

34. Crişan D.A. Fractal dimension spectrum as an indicator for training neural networks / D. A. Crişan, R. Dobrescu, // Universitatea Politehnica Bucuresti Sci. Bull. Series C. – 2007. – Vol. 69, No. 1. – P. 23–32.

35. Subbotin S. A. The neural network model synthesis based on the fractal analysis / S. A. Subbotin // Optical Memory and Neural Networks. – 2017. – Vol. 26, No. 4. – P. 257–273.

36. Fisher Iris dataset [Electronic resource].– Access mode: https://archive.ics.uci.edu/ml/datasets/Iris

37. The plant recognition on remote sensing results by the feedforward neural networks / [V. Dubrovin, S. Subbotin, S. Morshchavka, D. Piza] // International Journal of Smart Engineering System Design. – 2001. – Vol. 3, No. 4. – P. 251–256.

38. Arrhythmia Data Set [Electronic resource]. – Access mode: http://archive.ics.uci.edu/ml/datasets/arrhythmia







Copyright (c) 2020 S. A. Subbotin, Ye. A. Gofman

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
National University "Zaporizhzhia Polytechnic", 
Zhukovskogo street, 64, Zaporizhzhia, 69063, Ukraine. 
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
E-mail: rvv@zntu.edu.ua

The reference to the journal is obligatory in the cases of complete or partial use of its materials.