ESTIMATION OF FORMANT INFORMATION USING AUTOCORRELATION FUNCTION OF VOICE SIGNAL

M. S. Pastushenko; M. A. Pastushenko; T. А. Faizulaiev

doi:10.15588/1607-3274-2024-3-12

Authors

M. S. Pastushenko Kharkiv National University of Radio Electronics, Kharkiv, Ukraine, Ukraine
M. A. Pastushenko National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine, Ukraine
T. А. Faizulaiev National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2024-3-12

Keywords:

autocorrelation function, authentication, voice signal, speech recognition, formant information, spectrum width

Abstract

Context. The current scientific problem of extracting biometric characteristics of a user of a voice authentication system, which can significantly increase its reliability, is considered. There has been performed estimation of formant information from the voice signal, which is a part of the user template in the voice authentication system and is widely used in the processing of speech signals in other applications, including in the presence of interfering noise components. The work is distinguished by the investigation of a polyharmonic signal.

Objective. The purpose of the work is to develop procedures for generating formant information based on the results of calculating the autocorrelation function of the analyzed fragment of the voice signal and their subsequent spectral analysis.

Method. The procedures for generating formant information in the process of digital processing of voice signal are proposed. Initially, the autocorrelation function of the analyzed fragment of the voice signal is calculated. Based on the results of the autocorrelation function estimation, the amplitude-frequency spectrum is calculated, from which the formant information is extracted, for example, by means of threshold processing. When the signal-to-noise ratio of the analyzed voice signal fragment is low, it is advisable to iteratively calculate the autocorrelation function. The latter allows increasing the signal-to-noise ratio and the efficiency of formant information extraction. However, each subsequent iteration of the autocorrelation function calculation is associated with an increase in the required computational resource. The latter is conditioned by the doubling of the amount of processed data at each iteration.

Results. The developed procedures for generating formant information were investigated both in the processing of model and experimental voice signals. The model signals had a low signal-to-noise ratio. The proposed procedures allow to determine more precisely the width of the spectrum of extracted formant frequencies, significantly increase the number of extracted formants, including cases at low signal-to-noise ratio.

Conclusions. The conducted model experiments have confirmed the performance and reliability of the proposed procedures for extracting formant information both in the processing of model and experimental voice signals. The results of the research allow to recommend their use in practice for solving problems of voice authentication, speaker differentiation, speech and gender recognition, intelligence, counterintelligence, forensics and forensic examination, medicine (diseases of the speech tract and hearing). Prospects for further research may include the creation of procedures for evaluating formant information based on phase data of the processed voice signal.

Author Biographies

M. S. Pastushenko, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

PhD, Professor, Professor of V.V. Popovskyy Department of Infocommunication Engineering

M. A. Pastushenko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine

Student of Department of Mathematical Methods for System Analysis

T. А. Faizulaiev, National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine

Student of Department of Theoretical Mechanics, Mechanical Engineering and Robotic Systems

References

Van Trees H. Detection, Estimation, and Modulation Theory, Part I: Detection, Estimation, and Linear Modulation Theory. Wiley, NewYork, 1968, 716 p. DOI:10.1109/9780470544198.ch1

Ifeachor Emmanuel C., Jervis Barry W. Digital Signal Processing: A Practical Approach Second Edition. Addison-Wesley Publishing Company, 1993, 779 p. URL: https://seemabaji1.wordpress.com/wpcontent/uploads/2019/01/jervis-book.pdf

Zhuravl’ov V. Korelyatsiynyy analiz frahmentiv fonem movnoho syhnalu, Pravove, normatyvne ta metrolohichne zabezpechennya systemy zakhystu informatsiyi v Ukrayini : naukovotekhnichnyy zbirnyk, 2005, Vyp. 11, pp. 13–19. URL: https://ela.kpi.ua/handle/123456789/11249

Beigi H. Fundamentals of Speaker Recognition. NY, Springer, 2011, 942 p. DOI: 10.1007/978-0-387-77592-0

Leng Y. R., Tran H. D., Kitaoka N., Li H. Selective gammatone filterbank feature for robust sound event recognition, IEICE Transactions on Information and Systems E95.D(5), 2012, pp. 1229–1237. DOI:10.1587/transinf.E95.D.1229

Chaari S., Ouni K., Ellouze N. Wavelet ridge track interpretation in terms of formants, Ninth International Conference on Spoken Language Processing, 2006, pp. 1017–1020. URL: https://www.isca-archive.org/interspeech_2006/chaari06_interspeech.pdf

Özbek Arslan I., Demirekler M. Tracking of Visible Vocal Tract Resonances (VVTR) Based on Kalman Filtering, 9th International Conference on Spoken Language Processing/INTERSPEECH 2006, Pennsylvania, Amerika Birleşik Devletleri, 01 Ocak, 2006, pp. 1013–1016 URI:https://hdl.handle.net/11511/53260

Mellahi T., Hamdi R. LPC-based formant enhancement method in Kalman filtering for speech enhancement, AEU-International Journal of Electronics and Communications, 2015, Vol. 69, № 2, pp. 545–554. DOI:10.1016/j.aeue.2014.11.007

Panek D., Skalski A., Gajda J., Tadeusiewiczacoustic R. Analysis assessment in speech pathology detection, Int. J. Appl. Math. Comput. Sci., 2015, Vol. 25, № 3, pp. 631–643 DOI: 10.1515/amcs-2015-0046

Tran D., Wagner M., Zheng T. A Fuzzy approach to Statistical Models in Speech and Speaker Recognition, 1999 IEEE International Fuzzy Systems Conference Proceedings. Korea, 1999, pp. 1275–1280. DOI:10.1109/FUZZY.1999.790085

Lippmann R. Neural Network Classifiers for Speech Recognition, The Lincoln Laboratory Journal, 1988, Vol. 1, № 1. pp. 107–124. URL: https://archive.ll.mit.edu/publications/journal/pdf/vol01_no1/1.1.7.neuralnetworks.pdf

ESTIMATION OF FORMANT INFORMATION USING AUTOCORRELATION FUNCTION OF VOICE SIGNAL

Authors

DOI:

Keywords:

Abstract

Author Biographies

M. S. Pastushenko, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

M. A. Pastushenko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine

T. А. Faizulaiev, National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine

References

Downloads

Published

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue