THE FRACTAL DIMENSION BASED QUALITY METRICS OF DATA SAMPLES AND DEPENDENCE MODELS
Keywords:Sample, fractal dimension, quality metric, cluste, sample formation.
Context. The problem of automating the sampling of the original sample a large amount for the construction of models precedent. The object of the study was to model quality samples to build the models precedents.
Objective. The goal of the work is the creation of a set of indicators to assess the quality of samples having a single nature, based on the principles of fractal analysis.
Method. A set of indicators is proposed to characterize the quality of the subsample with respect to the original sample with one point of view on the basis of the principles of fractal analysis. The methods of sample fractal dimension evaluation are proposed. They operating with rectangular blocks of equal size and covering by them the feature space. They are method not taking into account the characteristics of the synthesized model, method taking into account the error (accuracy) of synthesized model and method taking into account accuracy and complexity of the synthesized model. Along with the fractal dimension it is also provided a method for determining the sample quality indicators based on the principle of mass dimension with regard to data analysis. The proposed method divides the feature space on clusters of the same size and shape. The method allows obtaining different levels of sampling detail varying the size of the cluster. The method allows to determine the masses of the class center in the sample, the average distance between instances of the cluster, the normalized mean deviation of the distance between instances of their average mass and density of the instances of the cluster, the volume and surface area of rectangular cluster ratio of volume to surface area of the cluster, the weighted average of evenness of instances location in the clusters of a class, the mass and density of instances of the class, the weighted average of sample instances location.
Results. The developed indicators have been implemented in software and investigated for solving the problems of Fisher’s Iris classification.
Conclusions. The conducted experiments have confirmed the proposed software operability and allow recommending it for use in practice for solving the problems of diagnosis and automatic classification on the features. The prospects for further research may include the creation of parallel methods for calculation of set of proposed indicators, the optimization of their software implementations, as well as a experimental study of proposed indicators on more complex practical problems of different nature and dimensionality
Jensen R., Shen Q. Computational intelligence and feature selection: rough and fuzzy approaches. Hoboken, John Wiley & Sons, 2008, 339 p.
Chaudhuri A., Stenger H. Survey sampling theory and methods. New York, Chapman & Hall, 2005, 416 p. DOI: 10.1201/9781420028638
Ed. P. J. Lavrakas. Encyclopedia of survey research methods. Thousand Oaks, Sage Publications, 2008, Vol. 1–2, 968 p. DOI: 10.4135/9781412963947.n159
Subbotin S. A. Formirovaniye vyborok i analiz kachestva modeley na osnove neyronnykh i neyro-nechotkikh setey v zadachakh diagnostiki i raspoznavaniya obrazov: monografiya. Saarbrьcken, LAP Lambert academic publishing, 2012, 232 p. (ISBN 978-3-8473-4471-1).
Kokren U., per. s angl. Sonina I. M.; pod red. Volkova A. G., Druzhinina N. K. Metody vyborochnogo issledovaniya. Moscow, Statistika, 1976, 440 p.
Subbotin S. A. The training set quality measures for neural network learning, Optical Memory and Neural Networks (Information Optics), 2010, Vol. 19, No. 2, pp. 126–139. DOI: 10.3103/s1060992x10020037
Subbotin S. A. Kompleks kharakteristik i kriteriyev sravneniya obuchayushchikh vyborok dlya resheniya zadach diagnostiki i raspoznavaniya obrazov, Matematychni mashyny i systemy, 2010, No. 1, pp. 25–39.
Subbotin S. A. Kriterii individual’noy informativnosti i metody otbora ekzemplyarov dlya postroyeniya diagnosticheskikh i raspoznayushchikh modeley, Bionika intelektu, 2010, No. 1, pp. 38–42.
Subbotin S. A. Metody formirovaniya vyborok dlya postroyeniya diagnosticheskikh modeley po pretsedentam, Visnyk Natsional’noho tekhnichnoho universytetu «Kharkivs’kyy politekhnichnyy instytut» : zb. nauk. prats. Kharkiv: NTU «KHPI», 2011, No. 17, pp. 149–156.
Roberts A., Cronin A. Unbiased estimation of multi-fractal dimensions of finite data sets, Physica A: Statistical Mechanics and its Application, 1996, Vol. 233, No. 3–4, pp. 867–878. DOI: 10.1016/s0378-4371(96)00165-3
Dubuc B., Quiniou J., Roques-Carmes C., Tricot C., Zucker S. Evaluating the fractal dimension of profiles, Physical Review, 1989, Vol. 39, No. 3, pp. 1500–1512. DOI:10.1103/ PhysRevA.39.1500
Cheng Q. Multifractal Modeling and Lacunarity Analysis, Mathematical Geology, 1997, Vol. 29, No. 7, pp. 919–932. DOI:10.1023/A:1022355723781
Eftekhari A. Fractal Dimension of Electrochemical Reactions, Journal of the Electrochemical Society, 2004, Vol. 151, No. 9, pp. E291–E296. DOI:10.1149/1.1773583.
Popescu D. P., Flueraru C., Mao Y., Chang S., Sowa M. G. Signal attenuation and box-counting fractal analysis of optical coherence tomography images of arterial tissue, Biomedical Optics Express, 2010, Vol. 1, No. 1, pp. 268–277. DOI:10.1364/boe.1.000268
Li J., Du Q., Sun C. An improved box-counting method for image fractal dimension estimation, Pattern Recognition, 2009, Vol. 42, No. 11, pp. 2460–2469. DOI:10.1016/ j.patcog.2009.03.001.
Cri an D. A., Dobrescu R. Fractal dimension spectrum as an indicator for training neural networks, Universitatea Politehnica Bucuresti Sci. Bull. Series C, 2007, Vol. 69, № 1, pp. 23–32.
Camastra F. Data Dimensionality Estimation Methods: A survey, Pattern Recognition, 2003, Vol. 36, Issue 12, pp. 2945–2954. DOI: 10.1016/S0031-3203(03)00176-6
Takens F. eds.: Braaksma B., Broer H. W., Takens F. On the numerical determination of the dimension of an attractor, Dynamical Systems and Bifurcations, Workshop, Groningen, 16–20 April 1984 : proceedings. Berlin, Springer, 1985, pp. 99–106. (Lecture Notes in Mathematics , Vol. 1125). DOI: 10.1007/bfb0075637
Chumak O. V. Entropii i fraktaly v analize dannykh. Moscow- Izhevsk, NITS «Regulyarnaya i khaoticheskaya dinamika», Institut komp’yuternykh issledovaniy, 2011, 164 p.
Zong-Chang Y. Establishing structure for artificial neural networks based-on fractal, Journal of Theoretical and Applied Information Technology, 2013, Vol. 49, No. 1, pp. 342–347.
Fisher Iris dataset [Electronic resource]. Access mode: https://archive.ics.uci.edu/ml/datasets/Iris
How to Cite
Copyright (c) 2017 S. A. Subbotin
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.