DOI: https://doi.org/10.15588/1607-3274-2020-4-12

THE POLAR COORDINATES BASED HASHING FOR DATA DIMENSIONALITY REDUCTION

S. A. Subbotin

Abstract


Context. To reduce the data dimensionality of in recognition and diagnostics problems based on hashing, it becomes necessary to reduce the time spent on generating a hashing transformation.

Objective. The purpose of the work is to reduce the time spent on reducing the dimension of data by creating a hashing method that does not require solving the optimization problem of finding the best random transformation, as well as reducing the loss of local properties of the feature space.

Method. A hash generation method is proposed. It converts the instance coordinates from the original feature system into a multidimensional polar coordinate system, on which basis discretize polar coordinates using heuristics, in various ways encodes and combines the values of the discretized polar coordinates, forming hashes of instances, from which as the resulting transformation selects the best one in the system of given criteria based on minimizing the number of collisions in which instances of different classes and different values of the original features receive the same hashes. This makes possible to automate the formation of hashing transformations, eliminate the need to solve optimization problems of enumerating random projections, ensuring a reduction in time consumption, and also makes the hashing transformation freer from imposing the data on the partitioning of the feature space, of a non-inherent nature, which allows increase the generalizing properties and accuracy of transformations. Criteria for evaluating the quality of hashing transformations are proposed, including determining the number of positive and negative collisions, as well as evaluating the probabilities of the corresponding collisions on their basis. This makes it possible to automate the analysis and selection of hashing transformations to reduce the dimension of the data in the problems of recognition and diagnosis.

Results. An experimental study has been carried out, which has confirmed the efficiency of the proposed methods in solving practical problems.

Conclusions. The developed mathematical support can be recommended for solving problems of data dimension reduction. 


Keywords


Hashing, hash, sample size reduction, polar coordinates.

References


Subbotin S., Oliinyk A. Eds.: R. Szewczyk, M. Kaliczyńska The Dimensionality Reduction Methods Based on Computational Intelligence in Problems of Object Classification and Diagnosis, Recent Advances in Systems, Control and Information Technology. Cham, Springer, 2017, pp. 11–19. DOI: 10.1007/978-3-319-48923-0_2

Jensen R., Shen Q. Computational intelligence and feature selection: rough and fuzzy approaches. Hoboken, John Wiley & Sons, 2008, 300 p.

Subbotin S. The instance and feature selection for neural network based diagnosis of chronic obstructive bronchitis, Applications of Computational Intelligence in Biomedical Technology. Cham, Springer, 2016, pp. 215–228. DOI: 10.1007/978-3-319-19147-8_13

Łukasik S., Kulczycki P. An algorithm for sample and data dimensionality reduction using fast simulated annealing, Advanced Data Mining and Applications, Lecture Notes in Computer Science. Berlin, Springer, 2011, Vol. 7120, pp. 152–161. DOI: 10.1007/978-3-642-25853-4_12

Weinberger K., Dasgupta A., Langford J., Smola A., Attenberg J. Feature Hashing for Large Scale Multitask Learning, 26th Annual International Conference on Machine Learning (ICML '09) Montreal, June 2009 : proceedings. New York, ACM, 2009, pp. 1113–1120. DOI: 10.1145/1553374.1553516

Wolfson H. J., Rigoutsos I. Geometric Hashing: An Overview, IEEE Computational Science and Engineering, 1997, Vol. 4, № 4, pp. 10–21.

Indyk P., Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality, The 30th annual ACM symposium on Theory of computing (STOC'98), Dallas, 23–26 of May 1998 : proceedings, 1998, pp. 604– 613. DOI:10.1145/276698.276876

Gui J., Liu T., Sun Z., Tao D., Tan T. Fast supervised discrete hashing, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, Vol. 40, No. 2, pp. 490–496. DOI: 10.1109/TPAMI.2017.2678475

Zhao K., Lu H., Mei J. Locality Preserving Hashing, Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI'14), Québec, 27–31 of July 2014 : proceedings, Palo Alto, AAAI Press, 2014, pp. 2874–2880.

Tsai Y.-H., Yang M.-H. Locality preserving hashing, 2014 IEEE International Conference on Image Processing (ICIP), Paris, 27–30 of October 2014: proceedings. Los Alamitos: IEEE, 2014, pp. 2988–2992. DOI: 10.1109/ICIP.2014.7025604.

Subbotin S. A. Otsenka informativnosti i otbor ekzemplyarov na osnove kheshirovaniya, Radio Electronics, Computer Science, Control, 2020, No. 3. pp. 129–137. DOI: 10.15588/1607-3274-2020-3-12

Blumenson L. E. A Derivation of n-Dimensional Spherical Coordinates, The American Mathematical Monthly, 1960, Vol. 67, No. 1, pp. 63–66. DOI:10.2307/2308932. JSTOR 2308932.

Methods and characteristics of locality-preserving transformations in the problems of computational intelligence, Radio Electronics, Computer Science, Control, 2014, No. 1, pp. 120–128. DOI: 10.15588/1607-3274-2014-1-17

Fisher Iris dataset [Electronic resource]. Access mode: https://archive.ics.uci.edu/ml/datasets/Iris

Arrhythmia dataset [Electronic resource]. Access mode: http://archive.ics.uci.edu/ml/datasets/Arrhythmia

Acute inflammations data set [Electronic resource]. Access mode: https://archive.ics.uci.edu/ml/datasets/ Acute+Inflammations


GOST Style Citations


1. Subbotin S. The Dimensionality Reduction Methods Based on Computational Intelligence in Problems of Object Classification and Diagnosis / S. Subbotin, A. Oliinyk // Recent Advances in Systems, Control and Information Technology / Eds.: R. Szewczyk, M. Kaliczyńska .  – Cham: Springer, 2017. – P. 11–19. DOI: 10.1007/978-3-319-48923-0_2

2. Jensen R. Computational intelligence and feature selection: rough and fuzzy approaches / R. Jensen, Q. Shen. – Hoboken: John Wiley & Sons, 2008. – 300 p.

3. Subbotin S. The instance and feature selection for neural network based diagnosis of chronic obstructive bronchitis / / S. Subbotin // Applications of Computational Intelligence in Biomedical Technology. – Cham: Springer, 2016. – P. 215– 228. DOI: 10.1007/978-3-319-19147-8_13

4. Łukasik S. An algorithm for sample and data dimensionality reduction using fast simulated annealing / S. Łukasik, P. Kulczycki // Advanced Data Mining and Applications, Lecture Notes in Computer Science. – Berlin : Springer, 2011. – Vol. 7120. – P. 152–161. DOI: 10.1007/978-3-64225853-4_12

5. Feature Hashing for Large Scale Multitask Learning / [K. Weinberger, A. Dasgupta, J. Langford et al.] // 26th Annual International Conference on Machine Learning (ICML '09) Montreal, June 2009 : proceedings. – New York : ACM, 2009. – P. 1113–1120. DOI: 10.1145/1553374.1553516

6. Wolfson H. J. Geometric Hashing: An Overview / H. J. Wolfson, I. Rigoutsos // IEEE Computational Science and Engineering. – 1997. – Vol. 4. – № 4. – P. 10–21.

7. Indyk P. Approximate nearest neighbors: towards removing the curse of dimensionality / P. Indyk; R. Motwani // The 30th annual ACM symposium on Theory of computing (STOC'98), Dallas, 23-26 of May 1998 : proceedings. – 1998. — P. 604-613. DOI:10.1145/276698.276876

8. Fast supervised discrete hashing / [J. Gui, T. Liu, Z. Sun et al.] // IEEE Transactions on Pattern Analysis and Machine Intelligence. – 2017. – Vol. 40. – № 2. – P 490–496. DOI: 10.1109/TPAMI.2017.2678475

9. Zhao K. Locality Preserving Hashing / K. Zhao, H. Lu, J. Mei // Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI'14), Québec, 27–31 of July 2014 : proceedings. – Palo Alto: AAAI Press, 2014. – P. 2874–2880.

10. Tsai Y.-H. Locality preserving hashing / Y.-H. Tsai,  M.-H. Yang // 2014 IEEE International Conference on Image Processing (ICIP), Paris, 27–30 of October 2014: proceedings. – Los Alamitos: IEEE, 2014. – P. 2988–2992. DOI: 10.1109/ICIP.2014.7025604. 

11. Субботин С. А. Оценка информативности и отбор экземпляров на основе хэширования / С. А. Субботин // Radio Electronics, Computer Science, Control. – 2020. – № 3. – P. 129–137. DOI: 10.15588/1607-3274-2020-3-12 

12. Blumenson L. E. A Derivation of n-Dimensional Spherical Coordinates / L. E. Blumenson // The American Mathematical Monthly. – 1960. – Vol. 67, № 1. – P. 63–66. DOI:10.2307/2308932. JSTOR 2308932.

13. Subbotin S. A. Methods and characteristics of localitypreserving transformations in the problems of computational intelligence / S. A. Subbotin // Radio Electronics, Computer Science, Control. – 2014. – № 1. – P. 120–128. DOI: 10.15588/1607-3274-2014-1-17 

14. Fisher Iris dataset [Electronic resource]. – Access mode: https://archive.ics.uci.edu/ml/datasets/Iris

15. Arrhythmia dataset [Electronic resource]. – Access mode: http://archive.ics.uci.edu/ml/datasets/Arrhythmia

16. Acute inflammations data set [Electronic resource]. – Access mode: https://archive.ics.uci.edu/ml/datasets/Acute+ Inflammations







Copyright (c) 2020 S. A. Subbotin

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
National University "Zaporizhzhia Polytechnic", 
Zhukovskogo street, 64, Zaporizhzhia, 69063, Ukraine. 
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
E-mail: rvv@zntu.edu.ua

The reference to the journal is obligatory in the cases of complete or partial use of its materials.