THE FRACTAL DIMENSION BASED QUALITY METRICS OF DATA SAMPLES AND DEPENDENCE MODELS

S. A. Subbotin

Abstract


Context. The problem of automating the sampling of the original sample a large amount for the construction of models precedent. The object of the study was to model quality samples to build the models precedents.

Objective. The goal of the work is the creation of a set of indicators to assess the quality of samples having a single nature, based on the principles of fractal analysis.

Method. A set of indicators is proposed to characterize the quality of the subsample with respect to the original sample with one point of view on the basis of the principles of fractal analysis. The methods of sample fractal dimension evaluation are proposed. They operating with rectangular blocks of equal size and covering by them the feature space. They are method not taking into account the characteristics of the synthesized model, method taking into account the error (accuracy) of synthesized model and method taking into account accuracy and complexity of the synthesized model. Along with the fractal dimension it is also provided a method for determining the sample quality indicators based on the principle of mass dimension with regard to data analysis. The proposed method divides the feature space on clusters of the same size and shape. The method allows obtaining different levels of sampling detail varying the size of the cluster. The method allows to determine the masses of the class center in the sample, the average distance between instances of the cluster, the normalized mean deviation of the distance between instances of their average mass and density of the instances of the cluster, the volume and surface area of rectangular cluster ratio of volume to surface area of the cluster, the weighted average of evenness of instances location in the clusters of a class, the mass and density of instances of the class, the weighted average of sample instances location.

Results. The developed indicators have been implemented in software and investigated for solving the problems of Fisher’s Iris classification.

Conclusions. The conducted experiments have confirmed the proposed software operability and allow recommending it for use in practice for solving the problems of diagnosis and automatic classification on the features. The prospects for further research may include the creation of parallel methods for calculation of set of proposed indicators, the optimization of their software implementations, as well as a experimental study of proposed indicators on more complex practical problems of different nature and dimensionality


Keywords


Sample; fractal dimension; quality metric; cluste; sample formation.

References


Jensen R., Shen Q. Computational intelligence and feature selection: rough and fuzzy approaches. Hoboken, John Wiley & Sons, 2008, 339 p.

Chaudhuri A., Stenger H. Survey sampling theory and methods. New York, Chapman & Hall, 2005, 416 p. DOI: 10.1201/9781420028638

Ed. P. J. Lavrakas. Encyclopedia of survey research methods. Thousand Oaks, Sage Publications, 2008, Vol. 1–2, 968 p. DOI: 10.4135/9781412963947.n159

Subbotin S. A. Formirovaniye vyborok i analiz kachestva modeley na osnove neyronnykh i neyro-nechotkikh setey v zadachakh diagnostiki i raspoznavaniya obrazov: monografiya. Saarbrьcken, LAP Lambert academic publishing, 2012, 232 p. (ISBN 978-3-8473-4471-1).

Kokren U., per. s angl. Sonina I. M.; pod red. Volkova A. G., Druzhinina N. K. Metody vyborochnogo issledovaniya. Moscow, Statistika, 1976, 440 p.

Subbotin S. A. The training set quality measures for neural network learning, Optical Memory and Neural Networks (Information Optics), 2010, Vol. 19, No. 2, pp. 126–139. DOI: 10.3103/s1060992x10020037

Subbotin S. A. Kompleks kharakteristik i kriteriyev sravneniya obuchayushchikh vyborok dlya resheniya zadach diagnostiki i raspoznavaniya obrazov, Matematychni mashyny i systemy, 2010, No. 1, pp. 25–39.

Subbotin S. A. Kriterii individual’noy informativnosti i metody otbora ekzemplyarov dlya postroyeniya diagnosticheskikh i raspoznayushchikh modeley, Bionika intelektu, 2010, No. 1, pp. 38–42.

Subbotin S. A. Metody formirovaniya vyborok dlya postroyeniya diagnosticheskikh modeley po pretsedentam, Visnyk Natsional’noho tekhnichnoho universytetu «Kharkivs’kyy politekhnichnyy instytut» : zb. nauk. prats. Kharkiv: NTU «KHPI», 2011, No. 17, pp. 149–156.

Roberts A., Cronin A. Unbiased estimation of multi-fractal dimensions of finite data sets, Physica A: Statistical Mechanics and its Application, 1996, Vol. 233, No. 3–4, pp. 867–878. DOI: 10.1016/s0378-4371(96)00165-3

Dubuc B., Quiniou J., Roques-Carmes C., Tricot C., Zucker S. Evaluating the fractal dimension of profiles, Physical Review, 1989, Vol. 39, No. 3, pp. 1500–1512. DOI:10.1103/ PhysRevA.39.1500

Cheng Q. Multifractal Modeling and Lacunarity Analysis, Mathematical Geology, 1997, Vol. 29, No. 7, pp. 919–932. DOI:10.1023/A:1022355723781

Eftekhari A. Fractal Dimension of Electrochemical Reactions, Journal of the Electrochemical Society, 2004, Vol. 151, No. 9, pp. E291–E296. DOI:10.1149/1.1773583.

Popescu D. P., Flueraru C., Mao Y., Chang S., Sowa M. G. Signal attenuation and box-counting fractal analysis of optical coherence tomography images of arterial tissue, Biomedical Optics Express, 2010, Vol. 1, No. 1, pp. 268–277. DOI:10.1364/boe.1.000268

Li J., Du Q., Sun C. An improved box-counting method for image fractal dimension estimation, Pattern Recognition, 2009, Vol. 42, No. 11, pp. 2460–2469. DOI:10.1016/ j.patcog.2009.03.001.

Cri an D. A., Dobrescu R. Fractal dimension spectrum as an indicator for training neural networks, Universitatea Politehnica Bucuresti Sci. Bull. Series C, 2007, Vol. 69, № 1, pp. 23–32.

Camastra F. Data Dimensionality Estimation Methods: A survey, Pattern Recognition, 2003, Vol. 36, Issue 12, pp. 2945–2954. DOI: 10.1016/S0031-3203(03)00176-6

Takens F. eds.: Braaksma B., Broer H. W., Takens F. On the numerical determination of the dimension of an attractor, Dynamical Systems and Bifurcations, Workshop, Groningen, 16–20 April 1984 : proceedings. Berlin, Springer, 1985, pp. 99–106. (Lecture Notes in Mathematics , Vol. 1125). DOI: 10.1007/bfb0075637

Chumak O. V. Entropii i fraktaly v analize dannykh. Moscow- Izhevsk, NITS «Regulyarnaya i khaoticheskaya dinamika», Institut komp’yuternykh issledovaniy, 2011, 164 p.

Zong-Chang Y. Establishing structure for artificial neural networks based-on fractal, Journal of Theoretical and Applied Information Technology, 2013, Vol. 49, No. 1, pp. 342–347.

Fisher Iris dataset [Electronic resource]. Access mode: https://archive.ics.uci.edu/ml/datasets/Iris


GOST Style Citations


1. Jensen R. Computational intelligence and feature selection: rough and fuzzy approaches / R.Jensen, Q. Shen. – Hoboken : JohnWiley & Sons, 2008. – 339 p.

2. Chaudhuri A. Survey sampling theory and methods / A. Chaudhuri, H. Stenger. – New York : Chapman & Hall, 2005. – 416 p. DOI:10.1201/9781420028638

3. Encyclopedia of survey research methods / ed. P. J. Lavrakas. – Thousand Oaks: Sage Publications, 2008. – Vol. 1–2. – 968 p. DOI: 10.4135/9781412963947.n159

4. Субботин С. А. Формирование выборок и анализ качества моделей на основе нейронных и нейро-нечетких сетей в задачах диагностики и распознавания образов : монография /С.А. Субботин. – Saarbrьcken : LAP Lambert academic publishing, 2012. – 232 с. – (ISBN 978-3-8473-4471-1).

5. Кокрен У. Методы выборочного исследования / У. Кокрен ; пер. с англ. И. М. Сонина ; под ред. А. Г. Волкова, Н. К. Дружинина. – М. : Статистика, 1976. – 440 с.

6. Subbotin S. A. The training set quality measures for neural network learning / S. A. Subbotin // Optical Memory and Neural Networks (Information Optics). – 2010. – Vol. 19, № 2. – P. 126–139. DOI: 10.3103/s1060992x10020037

7. Субботин С. А. Комплекс характеристик и критериев сравнения обучающих выборок для решения задач диагностики и распознавания образов / С. А. Субботин // Математичні машини і системи. – 2010. – № 1. – С.25–39.

8. Субботин С. А. Критерии индивидуальной информативности и методы отбора экземпляров для построения диагностических и распознающих моделей / С. А. Субботин // Біоніка інтелекту. – 2010. – № 1. – С. 38–42.

9. Субботин С. А. Методы формирования выборок для построения диагностических моделей по прецедентам / С. А Субботин // Вісник Національного технічного університету «Харківський політехнічний інститут» : зб. наук. праць. – Харків : НТУ «ХПІ», 2011. – № 17. – C. 149–156.

10. Roberts A. Unbiased estimation of multi-fractal dimensions of finite data sets / A. Roberts, A. Cronin // Physica A: Statistical Mechanics and its Applications. – 1996. – Vol. 233, № 3–4. – P. 867-878. DOI:10.1016/s0378-4371(96)00165-3

11. Evaluating the fractal dimension of profiles / [B. Dubuc, J. Quiniou, C. Roques-Carmes, C. Tricot, S. Zucker] // Physical Review. – 1989. – Т. 39, № 3. – P. 1500–1512. DOI:10.1103/PhysRevA.39.1500

12. Cheng Q. Multifractal Modeling and Lacunarity Analysis / Q. Сheng // Mathematical Geology. – 1997. – Vol. 29, № 7. – P. 919–932. DOI:10.1023/A:1022355723781

13. Eftekhari A. Fractal Dimension of Electrochemical Reactions / A. Eftekhari // Journal of the Electrochemical Society. – 2004. – Vol. 151, № 9. – P. E291–E296. DOI:10.1149/1.1773583.

14. Signal attenuation and box-counting fractal analysis of optical coherence tomography images of arterial tissue / [D. P. Popescu, C. Flueraru, Y. Mao at al] // Biomedical Optics Express. – 2010. – Vol. 1, № 1. – P. 268–277. DOI:10.1364/boe.1.000268

15. Li J. An improved box-counting method for image fractal dimension estimation / J. Li , Q. Du, C. Sun // Pattern Recognition. – 2009. – Vol. 42, № 11. – P. 2460–2469. DOI:10.1016/ j.patcog. 2009.03.001.

16. Cri an D. A. Fractal dimension spectrum as an indicator for training neural networks / D. A. Cri an, R. Dobrescu // Universitatea Politehnica Bucuresti Sci. Bull. Series C. – 2007. – Vol. 69, № 1. – P. 23–32.

17. Camastra F. Data Dimensionality Estimation Methods: A survey / F. Camastra // Pattern Recognition. – 2003. – Vol. 36, Issue 12. – P. 2945–2954. DOI: 10.1016/S0031-3203(03)00176-6

18. Takens F. On the numerical determination of the dimension of an attractor / F. Takens // Dynamical Systems and Bifurcations : Workshop, Groningen, 16–20 April 1984 : proceedings / [eds.: Braaksma B., Broer H. W., Takens F.]. – Berlin : Springer, 1985. – P. 99–106. – (Lecture Notes in Mathematics , Vol. 1125). DOI: 10.1007/bfb0075637

19. Чумак О. В. Энтропии и фракталы в анализе данных / О. В. Чумак. – М.-Ижевск : НИЦ «Регулярная и хаотическая динамика», Институт компьютерных исследований, 2011. – 164 с.

20. Zong-Chang Y. Establishing structure for artificial neural networks based-on fractal / Y. Zong-Chang // Journal of Theoretical and Applied Information Technology. – 2013. – Vol. 49, № 1. – P. 342–347.

21. Fisher Iris dataset [Electronic resource]. – Access mode: https:// archive.ics.uci.edu/ml/datasets/Iris





DOI: https://doi.org/10.15588/1607-3274-2017-2-8



Copyright (c) 2017 S. A. Subbotin

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
Zaporizhzhya National Technical University, 
Zhukovskiy street, 64, Zaporizhzhya, 69063, Ukraine. 
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
E-mail: rvv@zntu.edu.ua

The reference to the journal is obligatory in the cases of complete or partial use of its materials.