THE ALGORITHM TREE METHOD IN SOLVING THE TASK OF CLASSIFYING HYDROGRAPHIC DATA
Keywords:classification tree, algorithmic classification tree, discrete object, feature, recognition function, recognition algorithm, branching criterion.
Context. The work is dedicated to the identification of a simple and effective mechanism by which it is possible to build algorithmic classification trees (algorithmic tree models) on the basis of fixed initial information in the form of a discrete data training sample. The constructed algorithmic classification tree will unmistakably classify (recognize) the entire training sample on which the model is built, have a minimum structure (structural complexity) and consist of components – autonomous classification and recognition algorithms as the vertices of the structure (attributes of the tree).
Objective. The aim of this work is to create a simple, effective and universal method of constructing classification (recognition) models based on the concept of algorithmic trees for arrays of real hydrographic data, where the obtained schemes of classification systems (classification tree structure) are characterized by a tree structure (construction) and autonomous classification algorithms (sets of generalized features) as their structural elements (construction blocks).
Method. The general scheme of synthesizing classification trees in a form of algorithmic trees on the basis of a procedure of approximation of an array of discrete data by a set of elementary classifiers, which for the set initial training sample builds a tree-like structure, i.e. a model of the algorithmic tree, is suggested. Moreover, the constructed scheme consists of a set of autonomous classification and recognition algorithms evaluated at each step/stage of constructing the classification tree for this initial sample. A method for constructing an algorithmic classification tree has been developed, the main idea of which is to approximate step-by-step the initial sample of an arbitrary volume and structure by a set of elementary classification algorithms. The method of algorithmic tree in the formation of the current algorithmic tree vertex, node, generalized feature provides selection of the most effective, highquality elementary classifiers from the initial set and completion of only those paths in the tree structure where the largest number of errors (failures) occurs. The structural complexity of the algorithmic tree design is estimated based on the number of transitions, vertices and tiers of the model structure, which allows one to improve the quality of its subsequent analysis, provide an effective decomposition mechanism, and build algorithmic tree structures under fixed constraint sets. The method of the algorithmic tree synthesis allows one to build different types of tree-like recognition models with different initial sets of elementary classifiers with predetermined accuracy for a wide class of problems of the artificial intelligence theory.
Results. The developed method of building algorithmic tree models allows one to work with training samples of a large amount of different types of information (discrete data) and provides high speed and economy of hardware resources in the process of generating the final classification scheme, as well as to build classification trees with predetermined accuracy.
Conclusions. An approach to the synthesis of new recognition algorithms (schemes) based on a library (set) of already known algorithms (methods) and schemes has been developed. That is, an effective scheme for recognizing discrete objects based on stepby-step evaluation and selection of classification algorithms (generalized features) at each step of the scheme synthesis is presented. Based on the suggested concept of algorithmic classification trees, a model of the structure of the algorithm tree was built, which provided classification of flood situations for the Uzh river basin.
Murphy K. Machine Learning: A Probabilistics Perspective, The MIT Press, Cambridge, Massachusetts, 2012, 423 p.
Gupta Y. Selection of important features and predicting wine quality using machine learning techniques, Procedia Computer Science, 2018, Vol. 125, pp. 305–312. DOI:https://doi.org/10.1016/j.procs.2017.12.041
Denisko D., Hoffman M. Classification and interaction in random forests, Proceedings of the National Academy of Sciences of the United States of America, 2018, Vol. 115, No. 8, pp. 1690–1692. DOI:10.1073/pnas.1800256115.
Jordan M. I., Mitchell T. M. Machine learning: trends, perspectives, and prospects, Science 2015, Vol. 349(6245), pp. 255–260. DOI:https://doi.org/10.1126/science.aaa8415
Rokach L., Maimon O. Feature set decomposition for decision trees, Journal of Intelligent Data Analysis, 2005, Vol. 9, № 2, pp. 131–158. DOI: https://doi.org/10.3233/ida2005-9202
Hyafil L., Rivest R. Constructing optimal binary decision trees is npcomplete, Information Processing Letters, 1976, Vol. 5, № 1, pp. 15–17. DOI: https://doi.org/10.1016/00200190(76)90095-8
Vasilenko Y. A., Vasilenko E. Y., Kuhayivsky A. I., Papp I.O. Construction and optimization of recongnizing systems, Scientific and technical journal “Information technologies and systems”, 1999, №1, pp. 122–125.
Povkhan I., Lupei M., Kliap M., Laver V. The issue of efficient generation of generalized features in algorithmic classification tree methods, International Conference on Data Stream Mining and Processing: DSMP 2020 Data Stream Mining & Processing. Springer, Cham, 2020, pp. 98–113. DOI: https://doi.org/10.1007/978-3-030-61656-4_6
Shilen S. Nonparametric classification using matched binary decision trees, Pattern Recognition Letters, 1992, No. 13, pp. 83–87. DOI: https://doi.org/10.1016/01678655(92)90037-z
Perner P. Improving the accuracy of decision tree induction by feature preselection, Applied Artificial Intelligence, 2001, Vol. 15, № 8, pp. 747–760. DOI: https://doi.org/10.1080/088395101317018582
Povkhan I. A constrained method of constructing the logic classification trees on the basis of elementary attribute selection, CEUR Workshop Proceedings: Proceedings of the Second International Workshop on Computer Modeling and Intelligent Systems (CMIS-2020), Zaporizhzhia, Ukraine, April 15–19, 2020. Zaporizhzhia, 2020, Vol. 2608, pp. 843– 857. DOI: https://doi.org/10.15588/1607-3274-2020-2-10
Murthy S. K., Kasif S. and Salzberg S. A system for induction of oblique decision trees, Journal of Artificial Intelligence Research. August 1994, № 2, pp. 1–33. DOI: https://doi.org/10.1613/jair.63
Dovbysh А. S., Moskalenko V. V., Rizhova A. S. Information-Extreme Method for Classification of Observations with Categorical Attributes, Cibernetica and Systems Analysis, 2016, Vol. 52, № 2, pp. 45–52. DOI: 10.1007/s10559016-9818-1
Witten I. H., Frank E. Data Mining. Practical Machine Learning Tools and Techniques, Second Edition. San Francisco: Elsevier Inc., 2005, 558 p. DOI: https://doi.org/10.1016/b978-0-12-374856-0.00015-8
Geurts P., Irrthum A., Wehenkel L. Supervised learning with decision tree-based methods in computational and systems biology, Molecular Biosystems, 2009, Vol. 5, No. 12, pp. 1593–1605. DOI: https://doi.org/10.1039/b907946g
Yang J., Li Y. Orthogonal relief algorithm for feature selection, Lecture Notes in Computer Science, 2006, pp. 227– 234. DOI: https://doi.org/10.1007/11816157_22
Rokach L., Maimon O. Data Mining with decision trees: Theory and Applications, 2nd Edition. Singapore: World Scientifc Publishing Co. Pte. Ltd., 2015, 305 p.
Subbotin S.A. Construction of decision trees for the case of low-information features, Radio Electronics, Computer Science, Control, 2019, No. 1, pp. 121–130. DOI: https://doi.org/10.15588/1607-3274-2019-1-12
Rodriguez J. J., Kuncheva L. I. and Alonso C. J. Rotation forest: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, Vol. 28, No. 10, pp. 1619–1630. DOI: https://doi.org/10.1109/tpami.2006.211
What is the C4.5 algorithm and how does it work (2019). Retrieved from https://towardsdatascience.com/what-is-thec4-5-algorithm-and-how-does-it-work-2b971a9e7db0
C5.0 Classification Models (2020). Retrieved from https://cran.r-project.org/web/packages/C50 /vignettes/C5.0.html
Papagelis A., Kalles D. Breeding S. Decision Trees Using Evolutionary Techniques, Machine Learning: Proceedings of the Eighteenth International Conference (ICML), June 28–July 1 2001. Morgen Kaufmann Publishers, 2001, pp. 393–400.
Povhan I. F. Logical recognition tree construction on the basis a step-to-step elementary attribute selection, Radio Electronics, Computer Science, Control, 2020, № 2, pp. 95– 106. DOI: https://doi.org/10.15588/1607-3274-2020-2-10
Murthy S., Salzberg S. Decision Tree Induction: How Effective Is the Greedy Heuristic, Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Montreal, Kanada, August 20–21 1995, AAAI Press, 1995, pp. 222–227.
Subbotin S., Kirsanova E. The regression tree model building based on a clusterregression approximation for datadriven medicine, CEUR Workshop Proceedings, 2018, Vol. 2255, pp. 155–169.
Harrington P. Machine Learning in Action, Shelter Island, Manning Publications Co, 2012, 354 p.
Hssina B., Merbouha A., Ezzikouri H., Erritali M. A comparative study of decision tree ID3 and C4.5, International Journal of Advanced Computer Science and Applications (IJACSA), 2014, pp. 13–19. DOI: https://doi.org/10.14569/specialissue.2014.040203
Page D, Ray S. An efficient alternative to lookahead for decision tree induction, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, August 9–15 2003. Acapulko, Mexico, Publisher Not Avail, 2003, pp. 601–612.
Kaftannikov I. L., Parasich A. V. Decision Tree’s Features of Application in Classification Problems, Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control, Radio Electronics, 2015, Vol. 15, № 3, pp. 26–32. DOI: https://doi.org/10.14529/ctcr150304
Povhan I. Logical classification trees in recognition problems, Kwartalnik Naukowo-Techniczny: Informatyka Automatyka Pomiary w gospodarce o ochronie srodowiska. Krakow, 2020, No. 2, pp. 12–16. DOI: https://doi.org/10.35784/iapgos.927
Freund Y., Schapire R. Experiments with a New Boosting Algorithm, Proceedings Thirteenth of the International Conference on Machine Learning (ICML’96), Morgan Kaufmann Publishers Ins, 1996, pp. 148–156.
Wang H., Hong M. Online ad effectiveness evaluation with a two-stage method using a Gaussian filter and decision tree approach, Electronic Commerce Research and Applications, 2019, Vol. 35, Article 100852. DOI: 10.1016/j.elerap.2019.100852.
Giatzitzoglou D. G., Sotiropoulos D. N., Tsihrintzis G. A. AIRS-x: An eXtension to the Original Artificial Immune Recognition Learning Algorithm, 2019 International Conference on Computer Information and Telecommunication Systems (CITS), Beijing, China, 2019, pp. 1–5. DOI: https://doi.org/10.1109/cits.2019.8862043
He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, June 27–30, 2016, pp. 770–778. DOI: https://doi.org/10.1109/cvpr.2016.90
Witten I., Eibe F., Hall M. Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann Publishers Ins., 2011, 664 p. ISBN 9780123748560.
Sirichotedumrong W., Maekawa T., Kinoshita Y., Kiya H. Privacypreserving deep neural networks with pixel-based image encryption considering data augmentation in the encrypted domain, 2019 IEEE International Conference on Image Processing (ICIP), IEEE, 2019, pp. 674–678. DOI:https://doi.org/10.1109/icip.2019.8804201
How to Cite
Copyright (c) 2022 І. Ф. Повхан, О. В. Міца, О. Ю. Мулеса, В. В. Поліщук
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.