ARCHITECTURE AND TRAINING ALGORITHM FOR NEURAL NETWORK TO RECOGNIZE VOICE SIGNALS
DOI:
https://doi.org/10.15588/1607-3274-2020-3-9Keywords:
Voice interface, audio signal, signal amplitude, spectrogram, neural network, training set, standard deviation.Abstract
Context. Typically, interaction between user and mobile devices is realized by touchings. However, many situations, when to implement such interaction is too awkward or impossible, exist. For example, with some diseases of musculoskeletal system, motility of movements may be impaired. It leads to inability to use device efficiently. In that case, a task of looking for alternative ways of person-device interaction becomes relevant. Voice interface development can be one of the most prospective tasks in that way.
Objective. The goal of the study is to develop a project of neural network architecture and internal components for voicecontrolled systems. Resulting interface have to be adapted for processing and recognition Ukrainian speech.
Method. An approach, based on audio signal analyzing by sound wave shape and spectrogram, is used for making got via microphone data, appropriable for processing. Using neural network makes possible sounds classification by generated audio signal and information of its transcription. The neural network structure is completely adapted to peculiarities of Ukrainian phonetics. It takes into account the nature of the sound wave, generated during sound pronunciation, as well the number of sounds in Ukrainian phonetics.
Results. Experiments were carried out aimed to choosing optimal neural network architecture and training sample dimension. The root-mean-square deviation of neural network error was used as the main criterion in assessing its effectiveness. A comparative analysis of effectiveness of the proposed neural network and existed on the market speech recognition tools showed improvement in the relative measures of recognition by 9.26%.
Conclusions. Obtained in the research results can be used for full-featured voice interface implementation. Despite the fact that the work is focused on recognition Ukrainian speech, the proposed ideas can be used during developing transcribing services for other languages.
References
Tohyama M., Tsunehiko K. Fundamentals of Acoustic Signal Processing. Boston, Academic Press, 1998, 321 p. DOI: 10.1121/1.429575
Giannakopoulos T., Pikrakis A. Introduction to Audio Analysis: A MATLAB Approach. Fl, Academic Press, 2014, 288 p.
Lerch A. An introduction to audio content analysis. Applications in signal processing and music informatics. Hoboken, Wiley, 2012, 259 p. DOI: 10.1002/9781118393550
Poulter C. Voice recognition software – Nuance Dragon naturally speaking, Occupational Medicine, 2020, Vol. 70, Issue. 1, pp. 75–76. https://doi.org/10.1093/occmed/kqz128
Kumar A., Paek T., Lee B. Voice recognition software – Nuance Dragon naturally speaking/ Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Austin, Texas, USA, 5–10 May 2012: proceedings. Los Alamitos, IEEE, 2012, pp. 2277 –2286.
Natale S. To believe in Siri: A critical analysis of AI voice assistants, Communicative Figurations Working Papers, 2020, Vol. 32, pp. 130–146.
Pham C., Yuto L, Yasuo T. A platform for integrating Alexa Voice Service into ECHONET-based smart homes, Proceedings of International Conference on Consumer Electronics-Taiwan (ICCE-TW), Taichung, Taiwan, 19–21 May 2018: proceedings. Taichung, IEEE, 2018, pp. 895–902. DOI: 10.1109/ICCEChina.2018.8448893
Guzman A. L. Making AI Safe For Humans: A Conversation With Siri, Socialbots and Their Friends: Digital Media and the Automation of Sociality/ edited by R. W. Gehl. Routledge, 2017, Chapter 4, pp. 70–85.
Khilari P., Bhope V. P. A review on speech to text conversion methods, International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 2015, Volume 4, Issue 7, pp. 3067–3072.
Deepa V. J., Mustafa Alfateh, Sharan R. A Novel Model for Speech to Text Conversion, International Refereed Journal of Engineering and Science (IRJES), 2014, Volume 3, Issue 1, pp. 239–245.
Trivedi Ayushi, Pant Navya, Shah Pinal, Sonik Simran and Agrawal Supriya Speech to text and text to speech recognition systems-Areview, IOSR Journal of Computer Engineering, 2018, Volume 20, Issue 2, pp. 36–43. DOI: 10.9790/06612002013643.
Saksamudre Suman K., Shrishrimal P. P., Deshmukh R. R. A Review on Different Approaches for Speech Recognition System, International Journal of Computer Applications, 2020, Volume 115, No. 22, pp. 385–396. DOI: 10.5120/20284-2839
Chavan Rupali S., Sable Ganesh S. An Overview of Speech Recognition Using HMM, International Journal of Computer Science and Mobile Computing, 2013, Vol. 2, Issue 6, pp. 233– 238.
Furtună, T. F. Dynamic Programming Algorithms in Speech Recognition, Revista Informatica Economicănr, 2008, Vol. 2(46), pp. 94–98.
Graves A., Mohamed A., Hinton G. Speech recognition with deep recurrent neural networks, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 6645–6649. DOI: 10.1109/ICASSP.2013.6638947
Maas M. Lexicon-Free Conversational Speech Recognition with Neural Networks, NAAC, 2015, pp. 156–163. DOI: 10.3115/v1/N15-1038
Ossama A.-H., Mohamed A. R., Jiang H. etc. Convolutional Neural Networksfor Speech Recognition, ACM transactions on audio, speech, and language processing, 2014, Vol. 22, No. 10, pp. 1533–1545. DOI: 10.1109/TASLP.2014.2339736
Lekshmi K. R., Sherly E. Automatic Speech Recognition using different Neural Network Architectures – A Survey, International Journal of Computer Science and Information Technologies, 2016, Vol. 7 (6), pp. 242–248.
Aha D. W., Kibler D., Albert M. K. Instance-based learning algorithms, Machine Learning, 1991, No. 6, pp. 37–66. DOI: 10.1023/A:1022689900470
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2020 V. S. Molchanova, D. S. Mironenko
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.