ARCHITECTURE AND TRAINING ALGORITHM FOR NEURAL NETWORK TO RECOGNIZE VOICE SIGNALS

V. S. Molchanova; D. S. Mironenko

doi:10.15588/1607-3274-2020-3-9

Authors

V. S. Molchanova SHEU “Pryazovskiy State Technical University”, Mariupol, Ukraine
D. S. Mironenko SHEU “Pryazovskiy State Technical University”, Mariupol, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2020-3-9

Keywords:

Voice interface, audio signal, signal amplitude, spectrogram, neural network, training set, standard deviation.

Abstract

Context. Typically, interaction between user and mobile devices is realized by touchings. However, many situations, when to implement such interaction is too awkward or impossible, exist. For example, with some diseases of musculoskeletal system, motility of movements may be impaired. It leads to inability to use device efficiently. In that case, a task of looking for alternative ways of person-device interaction becomes relevant. Voice interface development can be one of the most prospective tasks in that way.

Objective. The goal of the study is to develop a project of neural network architecture and internal components for voicecontrolled systems. Resulting interface have to be adapted for processing and recognition Ukrainian speech.

Method. An approach, based on audio signal analyzing by sound wave shape and spectrogram, is used for making got via microphone data, appropriable for processing. Using neural network makes possible sounds classification by generated audio signal and information of its transcription. The neural network structure is completely adapted to peculiarities of Ukrainian phonetics. It takes into account the nature of the sound wave, generated during sound pronunciation, as well the number of sounds in Ukrainian phonetics.

Results. Experiments were carried out aimed to choosing optimal neural network architecture and training sample dimension. The root-mean-square deviation of neural network error was used as the main criterion in assessing its effectiveness. A comparative analysis of effectiveness of the proposed neural network and existed on the market speech recognition tools showed improvement in the relative measures of recognition by 9.26%.

Conclusions. Obtained in the research results can be used for full-featured voice interface implementation. Despite the fact that the work is focused on recognition Ukrainian speech, the proposed ideas can be used during developing transcribing services for other languages.

Author Biographies

V. S. Molchanova, SHEU “Pryazovskiy State Technical University”, Mariupol

PhD, Associate Professor, Department of Informatics

D. S. Mironenko, SHEU “Pryazovskiy State Technical University”, Mariupol

PhD, Head of department, Department of Informatics

References

Tohyama M., Tsunehiko K. Fundamentals of Acoustic Signal Processing. Boston, Academic Press, 1998, 321 p. DOI: 10.1121/1.429575

Giannakopoulos T., Pikrakis A. Introduction to Audio Analysis: A MATLAB Approach. Fl, Academic Press, 2014, 288 p.

Lerch A. An introduction to audio content analysis. Applications in signal processing and music informatics. Hoboken, Wiley, 2012, 259 p. DOI: 10.1002/9781118393550

Poulter C. Voice recognition software – Nuance Dragon naturally speaking, Occupational Medicine, 2020, Vol. 70, Issue. 1, pp. 75–76. https://doi.org/10.1093/occmed/kqz128

Kumar A., Paek T., Lee B. Voice recognition software – Nuance Dragon naturally speaking/ Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Austin, Texas, USA, 5–10 May 2012: proceedings. Los Alamitos, IEEE, 2012, pp. 2277 –2286.

Natale S. To believe in Siri: A critical analysis of AI voice assistants, Communicative Figurations Working Papers, 2020, Vol. 32, pp. 130–146.

Pham C., Yuto L, Yasuo T. A platform for integrating Alexa Voice Service into ECHONET-based smart homes, Proceedings of International Conference on Consumer Electronics-Taiwan (ICCE-TW), Taichung, Taiwan, 19–21 May 2018: proceedings. Taichung, IEEE, 2018, pp. 895–902. DOI: 10.1109/ICCEChina.2018.8448893

Guzman A. L. Making AI Safe For Humans: A Conversation With Siri, Socialbots and Their Friends: Digital Media and the Automation of Sociality/ edited by R. W. Gehl. Routledge, 2017, Chapter 4, pp. 70–85.

Khilari P., Bhope V. P. A review on speech to text conversion methods, International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 2015, Volume 4, Issue 7, pp. 3067–3072.

Deepa V. J., Mustafa Alfateh, Sharan R. A Novel Model for Speech to Text Conversion, International Refereed Journal of Engineering and Science (IRJES), 2014, Volume 3, Issue 1, pp. 239–245.

Trivedi Ayushi, Pant Navya, Shah Pinal, Sonik Simran and Agrawal Supriya Speech to text and text to speech recognition systems-Areview, IOSR Journal of Computer Engineering, 2018, Volume 20, Issue 2, pp. 36–43. DOI: 10.9790/06612002013643.

Saksamudre Suman K., Shrishrimal P. P., Deshmukh R. R. A Review on Different Approaches for Speech Recognition System, International Journal of Computer Applications, 2020, Volume 115, No. 22, pp. 385–396. DOI: 10.5120/20284-2839

Chavan Rupali S., Sable Ganesh S. An Overview of Speech Recognition Using HMM, International Journal of Computer Science and Mobile Computing, 2013, Vol. 2, Issue 6, pp. 233– 238.

Furtună, T. F. Dynamic Programming Algorithms in Speech Recognition, Revista Informatica Economicănr, 2008, Vol. 2(46), pp. 94–98.

Graves A., Mohamed A., Hinton G. Speech recognition with deep recurrent neural networks, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 6645–6649. DOI: 10.1109/ICASSP.2013.6638947

Maas M. Lexicon-Free Conversational Speech Recognition with Neural Networks, NAAC, 2015, pp. 156–163. DOI: 10.3115/v1/N15-1038

Ossama A.-H., Mohamed A. R., Jiang H. etc. Convolutional Neural Networksfor Speech Recognition, ACM transactions on audio, speech, and language processing, 2014, Vol. 22, No. 10, pp. 1533–1545. DOI: 10.1109/TASLP.2014.2339736

Lekshmi K. R., Sherly E. Automatic Speech Recognition using different Neural Network Architectures – A Survey, International Journal of Computer Science and Information Technologies, 2016, Vol. 7 (6), pp. 242–248.

Aha D. W., Kibler D., Albert M. K. Instance-based learning algorithms, Machine Learning, 1991, No. 6, pp. 37–66. DOI: 10.1023/A:1022689900470

ARCHITECTURE AND TRAINING ALGORITHM FOR NEURAL NETWORK TO RECOGNIZE VOICE SIGNALS

Authors

DOI:

Keywords:

Abstract

Author Biographies

V. S. Molchanova, SHEU “Pryazovskiy State Technical University”, Mariupol

D. S. Mironenko, SHEU “Pryazovskiy State Technical University”, Mariupol

References

Downloads

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue