THE POLAR COORDINATES BASED HASHING FOR DATA DIMENSIONALITY REDUCTION
Keywords:Hashing, hash, sample size reduction, polar coordinates.
Context. To reduce the data dimensionality of in recognition and diagnostics problems based on hashing, it becomes necessary to reduce the time spent on generating a hashing transformation.
Objective. The purpose of the work is to reduce the time spent on reducing the dimension of data by creating a hashing method that does not require solving the optimization problem of finding the best random transformation, as well as reducing the loss of local properties of the feature space.
Method. A hash generation method is proposed. It converts the instance coordinates from the original feature system into a multidimensional polar coordinate system, on which basis discretize polar coordinates using heuristics, in various ways encodes and combines the values of the discretized polar coordinates, forming hashes of instances, from which as the resulting transformation selects the best one in the system of given criteria based on minimizing the number of collisions in which instances of different classes and different values of the original features receive the same hashes. This makes possible to automate the formation of hashing transformations, eliminate the need to solve optimization problems of enumerating random projections, ensuring a reduction in time consumption, and also makes the hashing transformation freer from imposing the data on the partitioning of the feature space, of a non-inherent nature, which allows increase the generalizing properties and accuracy of transformations. Criteria for evaluating the quality of hashing transformations are proposed, including determining the number of positive and negative collisions, as well as evaluating the probabilities of the corresponding collisions on their basis. This makes it possible to automate the analysis and selection of hashing transformations to reduce the dimension of the data in the problems of recognition and diagnosis.
Results. An experimental study has been carried out, which has confirmed the efficiency of the proposed methods in solving practical problems.
Conclusions. The developed mathematical support can be recommended for solving problems of data dimension reduction.
Subbotin S., Oliinyk A. Eds.: R. Szewczyk, M. Kaliczyńska The Dimensionality Reduction Methods Based on Computational Intelligence in Problems of Object Classification and Diagnosis, Recent Advances in Systems, Control and Information Technology. Cham, Springer, 2017, pp. 11–19. DOI: 10.1007/978-3-319-48923-0_2
Jensen R., Shen Q. Computational intelligence and feature selection: rough and fuzzy approaches. Hoboken, John Wiley & Sons, 2008, 300 p.
Subbotin S. The instance and feature selection for neural network based diagnosis of chronic obstructive bronchitis, Applications of Computational Intelligence in Biomedical Technology. Cham, Springer, 2016, pp. 215–228. DOI: 10.1007/978-3-319-19147-8_13
Łukasik S., Kulczycki P. An algorithm for sample and data dimensionality reduction using fast simulated annealing, Advanced Data Mining and Applications, Lecture Notes in Computer Science. Berlin, Springer, 2011, Vol. 7120, pp. 152–161. DOI: 10.1007/978-3-642-25853-4_12
Weinberger K., Dasgupta A., Langford J., Smola A., Attenberg J. Feature Hashing for Large Scale Multitask Learning, 26th Annual International Conference on Machine Learning (ICML '09) Montreal, June 2009 : proceedings. New York, ACM, 2009, pp. 1113–1120. DOI: 10.1145/1553374.1553516
Wolfson H. J., Rigoutsos I. Geometric Hashing: An Overview, IEEE Computational Science and Engineering, 1997, Vol. 4, № 4, pp. 10–21.
Indyk P., Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality, The 30th annual ACM symposium on Theory of computing (STOC'98), Dallas, 23–26 of May 1998 : proceedings, 1998, pp. 604– 613. DOI:10.1145/276698.276876
Gui J., Liu T., Sun Z., Tao D., Tan T. Fast supervised discrete hashing, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, Vol. 40, No. 2, pp. 490–496. DOI: 10.1109/TPAMI.2017.2678475
Zhao K., Lu H., Mei J. Locality Preserving Hashing, Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI'14), Québec, 27–31 of July 2014 : proceedings, Palo Alto, AAAI Press, 2014, pp. 2874–2880.
Tsai Y.-H., Yang M.-H. Locality preserving hashing, 2014 IEEE International Conference on Image Processing (ICIP), Paris, 27–30 of October 2014: proceedings. Los Alamitos: IEEE, 2014, pp. 2988–2992. DOI: 10.1109/ICIP.2014.7025604.
Subbotin S. A. Otsenka informativnosti i otbor ekzemplyarov na osnove kheshirovaniya, Radio Electronics, Computer Science, Control, 2020, No. 3. pp. 129–137. DOI: 10.15588/1607-3274-2020-3-12
Blumenson L. E. A Derivation of n-Dimensional Spherical Coordinates, The American Mathematical Monthly, 1960, Vol. 67, No. 1, pp. 63–66. DOI:10.2307/2308932. JSTOR 2308932.
Methods and characteristics of locality-preserving transformations in the problems of computational intelligence, Radio Electronics, Computer Science, Control, 2014, No. 1, pp. 120–128. DOI: 10.15588/1607-3274-2014-1-17
Fisher Iris dataset [Electronic resource]. Access mode: https://archive.ics.uci.edu/ml/datasets/Iris
Arrhythmia dataset [Electronic resource]. Access mode: http://archive.ics.uci.edu/ml/datasets/Arrhythmia
Acute inflammations data set [Electronic resource]. Access mode: https://archive.ics.uci.edu/ml/datasets/ Acute+Inflammations
How to Cite
Copyright (c) 2020 S. A. Subbotin
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.