THE AUTOMATIC SPEAKER RECOGNITION SYSTEM OF CRITICAL USE CLASSIFIER OPTIMIZATION
DOI:
https://doi.org/10.15588/1607-3274-2018-2-4Keywords:
automated speaker recognition system of critical use, signal processing, neural network, feature analysisAbstract
Context. The questions of adapting the convolution neural network classifier use in automatic speaker recognition system of critical use(ASRSCU) are considered. The research object is the individual features of the human speech process.
Objective. Development of means for separating individual features from the speaker’s speech signal, increasing their informativeness as
a result of the factor analysis, their visual representation for the use of the convolution neural network classifier, and optimizing its
architecture for the needs of ASRSCU.
Method. Measures are proposed to optimize the speaker recognition procedure of the ASRSCU, for which the optimal way of informative
features representation and the method of increasing their informativeness are theoretically justified, the topology and measures for increasing
of the speaker recognition process efficiency are justified. In particular, it is justified the use of power normalized cepstral coefficients (PNCC)
for the description of phonograms recorded in noisy environment conditions. We propose to use Gabor filters to represent information that
will be analyzed by a convolution neural network, an optimal method of factor analysis (a sparse main components analyzing method) to
reduce of the features vector length while preserving its informativeness, an improved topology of the convolution neural network in which
the Gabor filters are integrated in to the convolution layer, which allows them to optimize their parameters during the neural network training
process, and in a fully connected layer a deep neural network with a bottleneck layer is used, whose weights after training are uses as inputs for
the GMM/HMM control classifier.
Results. Methods of representation and optimization of the speaker’s individual features, methods for their visual presentation and
improvement of the topology of a convolution neural network for making speaker recognition on their basis.
Conclusions. The obtained theoretical results have found empirical confirmation. In particular, the stability of an improved convolution
neural network to the noisy input phonograms proved to be higher than the results of an ordinary convolution neural network and a deep neural
network. With an SNR increase up to 10 dB, the GMM/HMM classifier is more efficient than the neural network, which can be explained by the efficiency of the used UBM models, but it is much more resource-intensive. Also, the parameters of the Gabor filter bank frames that
provide the most variable individual features from the speech signal for speaker recognition are determined empirically.
References
Kalinli O., Seltzer M. L., Acero A. Noise adaptive training using a vector Taylor series approach for noise robust automatic speech recognition, [Electronic resource], Access mode: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/Ozlem_ICASSP09_final.pdf
Kovtun V. V., Bykov M. M. Otsiniuvannia nadiinosti
avtomatyzovanykh system rozpiznavannia movtsiv
krytychnoho zastosuvannia, Visnyk Vinnytskoho politekhnichnoho instytutu, Vinnytsia, 2017, No. 2, pp. 70–76.
Kim C., Stern R. M. Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring, [Electronic resource]. Access mode: http://c i t e s e e r x . i s t . p s u . e d u / v i e w d o c /download?doi=10.1.1.184.9018&rep=rep1&type=pdf
Mitra V., Franco H., Graciarena M., Mandal A. Normalized
amplitude modulation features for large vocabulary noise-robust speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 25–30 March 2012 : proceedings. Kyoto, Japan, IEEE, 2012, pp. 4117–4120. DOI: 10.1109/ICASSP.2012.6288824.
Speech Processing, Transmission and Quality Aspects (STQ),[Electronic resource]. Access mode: http://www.etsi.org/deliver/e t s i_ es /2 0 1 1 0 0 _ 2 0 1 1 9 9 /2 0 1 1 0 8 / 0 1 . 0 1 . 0 3 _ 6 0 /es_201108v010103p.pdf
Graves A., Mohamed A. R., Hinton G. Speech recognition with deep recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 26–31 May 2013, proceedings, Vancouver, BC, Canada, IEEE, 2013, pp. 6645–6649. DOI: 10.1109/ICASSP.2013.6638947
Mohamed A., Dahl G., Hinton G. Acoustic modeling using deep belief networks, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 31 January 2011, proceedings, IEEE, 2011, pp. 14–22. DOI: 10.1109/
TASL.2011.2109382 8. Davis S., Mermelstein P. Comparison of parametric representation
of monosyllabic word recognition in continuously spoken sentences,
[Electronic resource], Access mode: http://
www.cs.northwestern.edu/~pardo/courses/eecs352/papers/
Davis1980-MFCC.pdf
Hermansky H., Cohen J., Stern R. Perceptual Properties of Current
Speech Recognition Technology, IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), 23 July
: proceedings, IEEE, 2013, pp. 1968–1985. DOI: 10.1109/
JPROC.2013.2252316.
Virtanen T., Singh R., Raj B. Techniques for Noise Robustness in
Automatic Speech Recognition, John Wiley & Sons, Ltd,
Chichester, UK, 2012. DOI: 10.1002/9781118392683.ch1.
Stern R., Morgan N. Hearing is Believing. Biologically inspired
methods for robust automatic speech recognition, [Electronic
resource]. Access mode: https://pdfs.semanticscholar.org/d4a9/
a6aa42dcb2011e45a99b0174da6a47777b7a.pdf
Kim C., Stern R. Power-normalized cepstralcoefficients (PNCC)
for robust speech recognitions, [Electronic resource]. Access mode:
http://www.cs.cmu.edu/~robust/Papers/OnlinePNCC_V25.pdf
Movellan J. Tutorial on Gabor Filters. [Electronic resource].
Access mode: http://mplab.ucsd.edu/tutorials/gabor.pdf
Mesgarani N., Shamma S. Speech Processing with a Cortical Representation of Audio, [Electronic resource]. Access mode: h t t p s : / / p d f s . s e m a n t i c s c h o l a r . o r g / f 1 d 8 /f93cdb64390b3a65f930cee4346c30bd86e4.pdf
Morgan N., Ravuri S. Using spectro-temporal features to improve AFE feature extraction for automatic speech recognition, [Electronic resource]. Access mode: https://
p d f s . s e m a n t i c s c h o l a r . o r g / c 7 c 5 /
f2107f0ea9a3cedeeaf5cc0c48c0c92.pdf
Berthet Q., Rigollet P. Optimal Detection of Sparse Principal Components in High Dimension, [Electronic resource]. Access mode: https://arxiv.org/pdf/1202.5070.pdf
Bereza A. O., Bykov M. M., Hafurova A. D., Kovtun V. V.
Optymizatsiia alfavitu informatyvnykh oznak dlia avtomatyzovanoi systemy rozpiznavannia movtsiv krytychnoho zastosuvannia, Visnyk Khmelnytskoho natsionalnoho universytetu, seriia: Tekhnichni nauky,
Khmelnytskyi, 2017, No. 3(249), pp. 222–228.
Mak M. W., Yu H. B. A study of voice activity detection techniques for NIST speaker recognition evaluations, [Electronic resource]. Access mode: https://pdfs.semanticscholar.org/541f/9cfacdac000aadd57cd33b6d86dc96bc3308.pdf
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2018 O. V Bisikalo, T. V. Grischuk, V. V. Kovtun
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.