ESTIMATION OF FORMANT INFORMATION USING AUTOCORRELATION FUNCTION OF VOICE SIGNAL
DOI:
https://doi.org/10.15588/1607-3274-2024-3-12Keywords:
autocorrelation function, authentication, voice signal, speech recognition, formant information, spectrum widthAbstract
Context. The current scientific problem of extracting biometric characteristics of a user of a voice authentication system, which can significantly increase its reliability, is considered. There has been performed estimation of formant information from the voice signal, which is a part of the user template in the voice authentication system and is widely used in the processing of speech signals in other applications, including in the presence of interfering noise components. The work is distinguished by the investigation of a polyharmonic signal.
Objective. The purpose of the work is to develop procedures for generating formant information based on the results of calculating the autocorrelation function of the analyzed fragment of the voice signal and their subsequent spectral analysis.
Method. The procedures for generating formant information in the process of digital processing of voice signal are proposed. Initially, the autocorrelation function of the analyzed fragment of the voice signal is calculated. Based on the results of the autocorrelation function estimation, the amplitude-frequency spectrum is calculated, from which the formant information is extracted, for example, by means of threshold processing. When the signal-to-noise ratio of the analyzed voice signal fragment is low, it is advisable to iteratively calculate the autocorrelation function. The latter allows increasing the signal-to-noise ratio and the efficiency of formant information extraction. However, each subsequent iteration of the autocorrelation function calculation is associated with an increase in the required computational resource. The latter is conditioned by the doubling of the amount of processed data at each iteration.
Results. The developed procedures for generating formant information were investigated both in the processing of model and experimental voice signals. The model signals had a low signal-to-noise ratio. The proposed procedures allow to determine more precisely the width of the spectrum of extracted formant frequencies, significantly increase the number of extracted formants, including cases at low signal-to-noise ratio.
Conclusions. The conducted model experiments have confirmed the performance and reliability of the proposed procedures for extracting formant information both in the processing of model and experimental voice signals. The results of the research allow to recommend their use in practice for solving problems of voice authentication, speaker differentiation, speech and gender recognition, intelligence, counterintelligence, forensics and forensic examination, medicine (diseases of the speech tract and hearing). Prospects for further research may include the creation of procedures for evaluating formant information based on phase data of the processed voice signal.
References
Van Trees H. Detection, Estimation, and Modulation Theory, Part I: Detection, Estimation, and Linear Modulation Theory. Wiley, NewYork, 1968, 716 p. DOI:10.1109/9780470544198.ch1
Ifeachor Emmanuel C., Jervis Barry W. Digital Signal Processing: A Practical Approach Second Edition. Addison-Wesley Publishing Company, 1993, 779 p. URL: https://seemabaji1.wordpress.com/wpcontent/uploads/2019/01/jervis-book.pdf
Zhuravl’ov V. Korelyatsiynyy analiz frahmentiv fonem movnoho syhnalu, Pravove, normatyvne ta metrolohichne zabezpechennya systemy zakhystu informatsiyi v Ukrayini : naukovotekhnichnyy zbirnyk, 2005, Vyp. 11, pp. 13–19. URL: https://ela.kpi.ua/handle/123456789/11249
Beigi H. Fundamentals of Speaker Recognition. NY, Springer, 2011, 942 p. DOI: 10.1007/978-0-387-77592-0
Leng Y. R., Tran H. D., Kitaoka N., Li H. Selective gammatone filterbank feature for robust sound event recognition, IEICE Transactions on Information and Systems E95.D(5), 2012, pp. 1229–1237. DOI:10.1587/transinf.E95.D.1229
Chaari S., Ouni K., Ellouze N. Wavelet ridge track interpretation in terms of formants, Ninth International Conference on Spoken Language Processing, 2006, pp. 1017–1020. URL: https://www.isca-archive.org/interspeech_2006/chaari06_interspeech.pdf
Özbek Arslan I., Demirekler M. Tracking of Visible Vocal Tract Resonances (VVTR) Based on Kalman Filtering, 9th International Conference on Spoken Language Processing/INTERSPEECH 2006, Pennsylvania, Amerika Birleşik Devletleri, 01 Ocak, 2006, pp. 1013–1016 URI:https://hdl.handle.net/11511/53260
Mellahi T., Hamdi R. LPC-based formant enhancement method in Kalman filtering for speech enhancement, AEU-International Journal of Electronics and Communications, 2015, Vol. 69, № 2, pp. 545–554. DOI:10.1016/j.aeue.2014.11.007
Panek D., Skalski A., Gajda J., Tadeusiewiczacoustic R. Analysis assessment in speech pathology detection, Int. J. Appl. Math. Comput. Sci., 2015, Vol. 25, № 3, pp. 631–643 DOI: 10.1515/amcs-2015-0046
Tran D., Wagner M., Zheng T. A Fuzzy approach to Statistical Models in Speech and Speaker Recognition, 1999 IEEE International Fuzzy Systems Conference Proceedings. Korea, 1999, pp. 1275–1280. DOI:10.1109/FUZZY.1999.790085
Lippmann R. Neural Network Classifiers for Speech Recognition, The Lincoln Laboratory Journal, 1988, Vol. 1, № 1. pp. 107–124. URL: https://archive.ll.mit.edu/publications/journal/pdf/vol01_no1/1.1.7.neuralnetworks.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 М. С. Пастушенко, М. О. Пастушенко, Т. А. Файзулаєв
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.