ОЦЕНКА ИНФОРМАТИВНОСТИ И ОТБОР ЭКЗЕМПЛЯРОВ НА ОСНОВЕ ХЭШИРОВАНИЯ

S. А. Subbotin

doi:10.15588/1607-3274-2020-3-12

Authors

S. А. Subbotin National University “Zaporizhzhia Polytechnic”, Zaporizhzhia, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2020-3-12

Keywords:

Іnstance, attribute, informativeness, hashing, hash, reduction of the sample size.

Abstract

Context. To reduce the data dimensionality in the diagnostic and recognition model construction, it becomes necessary to select the most informative instances, as well as to select the most informative features. The time spent on the separate implementation of these procedures is high due to the iterativity and interconnectedness of these procedures.

Objective. The purpose of this work is to reduce the time spent on reducing the data dimensionality by creating a method for selecting the most informative instances based on hashing.

Method. A method for calculating weights for determining the hashes of instances is proposed, which determines the weights of features based on their ranks in a deterministic way, which, in turn, determines, taking into account the number of equal partitions of the ranges of features, the minimum sufficient to distinguish clusters on the axis of the feature with acceptable accuracy. This eliminates the need for iterative enumeration of various combinations of features, determining random projections of features, as well as solving iterative optimization problems of finding the best projection of features, which significantly reduces the time spent on calculating weights, while ensuring the local sensitivity of the hash. The hashes obtained can be used both for the selection of instances and for the selection of features.

A method for determining the individual and group significance of sample instances is proposed, in which it uses the distance between the hashes of the instances as a measure of similarity and, by analogy with the potential method, finds the potentials induced by the classes for each instance, and on their basis determines the indicators of the significance of the instances, based on the fact that the instance in the feature space, the more informative the less the minimum potential difference of the classes induced on the specimen.

A method for determining the estimates of the informativeness of features is proposed, which, on the basis of normalizing the weights obtained during the formation of hashes, determines the indicators of the informativeness of features, giving preference to features with a smaller number of partitions.

Results. An experimental study has been carried out, which has confirmed the efficiency of the proposed methods in solving practical problems.

Conclusions. The developed software can be recommended for solving problems of data dimension reduction.

Author Biography

S. А. Subbotin, National University “Zaporizhzhia Polytechnic”, Zaporizhzhia

Dr. Sc., Professor, Head of the Department of Software Tools

References

Jensen R., Shen Q. Computational intelligence and feature selection: rough and fuzzy approaches. Hoboken, John Wiley & Sons, 2008, 300 p.

Subbotin S., Oliinyk A. Eds.: Szewczyk R., Kaliczyńska M. The Dimensionality Reduction Methods Based on Computational Intelligence in Problems of Object Classification and Diagnosis, Recent Advances in Systems, Control and Information Technology. Cham, Springer, 2017, pp. 11–19. DOI: 10.1007/978-3-319-48923-0_2

Subbotin S. The instance and feature selection for neural network based diagnosis of chronic obstructive bronchitis, Applications of Computational Intelligence in Biomedical Technology. Cham, Springer, 2016, pp. 215– 228. DOI: 10.1007/978-3-319-19147-8_13

Chaudhuri A., Stenger H. Survey sampling theory and methods. New York, Chapman & Hall, 2005, 416 p. DOI: 10.1201/9781420028638

Subbotin S.A. Methods of sampling based on exhaustive and evolutionary search, Automatic Control and Computer Sciences, 2013, Vol. 47, No. 3, pp. 113–121. DOI: 10.3103/s0146411613030073

Lavrakas P.J. Encyclopedia of survey research methods. Thousand Oaks, Sage Publications, 2008, Vol. 1–2, 968 p. DOI: 10.4135/9781412963947.n159

Subbotin S.A. The sample properties evaluation for pattern recognition and intelligent diagnosis, Digital Technologies : 10th International Conference, Zilina, 9–11 July 2014 : proceedings. Los Alamitos, IEEE, 2014, pp. 332–343. DOI: 10.1109/dt.2014.6868734

Łukasik S., Kulczycki P. An algorithm for sample and data dimensionality reduction using fast simulated annealing, Advanced Data Mining and Applications, Lecture Notes in Computer Science. Berlin, Springer, 2011, Vol. 7120, pp. 152–161. DOI: 10.1007/978-3-64225853-4_12

Subbotin S., Oliinyk A. Eds.: R. Szewczyk, M. Kaliczyńska The Sample and Instance Selection for Data Dimensionality Reduction, Recent Advances in Systems, Control and Information Technology. Cham, Springer, 2017, pp. 97–103. DOI: 10.1007/978-3-319-48923-0_13

Elavarasan N., Mani K. A Survey on Feature Extraction Techniques, International Journal of Innovative Research in Computer and Communication Engineering, 2015, Vol. 3, Issue 1, pp. 52–55. DOI: 10.15680/ijircce.2015.0301009 52 11. Alpaydin E. Introduction to Machine Learning. London, MIT Press, 2014, 640 p.

Weinberger K., Dasgupta A., Langford J., Smola A., Attenberg J. Feature Hashing for Large Scale Multitask

Learning, 26th Annual International Conference on Machine Learning (ICML '09) Montreal, June 2009 : proceedings. New York: ACM, 2009, pp. 1113–1120. DOI: 10.1145/1553374.1553516

Wolfson H. J., Rigoutsos I. Geometric Hashing: An Overview, IEEE Computational Science and Engineering, 1997, Vol. 4, № 4, pp. 10–21.

Gui J., Liu T., Sun Z., Tao D., Tan T. Fast supervised discrete hashing, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, Vol. 40, No. 2, pp. 490– 496. DOI: 10.1109/TPAMI.2017.2678475

Indyk P., Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality, The 30th annual ACM symposium on Theory of computing (STOC'98), Dallas, 23–26 of May 1998 : proceedings. – 1998, pp. 604–613. DOI:10.1145/276698.276876

Zhao K., Lu H., Mei J. Locality Preserving Hashing, Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI'14), Québec, 27–31 of July 2014 : proceedings. Palo Alto, AAAI Press, 2014, pp. 2874–2880.

Tsai Y.-H., Yang M.-H. Locality preserving hashing, 2014 IEEE International Conference on Image Processing (ICIP), Paris, 27–30 of October 2014: proceedings. Los Alamitos, IEEE, 2014, pp. 2988–2992. DOI: 10.1109/ICIP.2014.7025604.

Faure A. Perception et reconnaissance des formes. Paris, Editests, 1985, 286 p.

Fisher Iris dataset [Electronic resource]. Access mode: https://archive.ics.uci.edu/ml/datasets/Iris

Dubrovin V., Subbotin S., Morshchavka S., Piza D. The plant recognition on remote sensing results by the feedforward neural networks, International Journal of Smart Engineering System Design, 2001, Vol. 3, No. 4, pp. 251–256.

Subbotin S. A. Avtomaticheskaja sistema obnaruzhenija i raspoznavanija avtotransportnyh sredstv na izobrazhenii, Programmnye produkty i sistemy, 2010, No. 1, pp. 114– 116.

EVALUATION OF INFORMATIVITY AND SELECTION OF INSTANCES BASED ON HASHING

Authors

DOI:

Keywords:

Abstract

Author Biography

S. А. Subbotin, National University “Zaporizhzhia Polytechnic”, Zaporizhzhia

References

Downloads

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue