DOI: https://doi.org/10.15588/1607-3274-2018-4-7

ANALYSIS OF THE AUTOMATED SPEAKER RECOGNITION SYSTEM OF CRITICAL USE OPERATION RESULTS

O. V. Bisikalo, V. V. Kovtun, M. S. Yukhimchuk, I. F. Voytyuk

Abstract


Context. The article summarizes the statistical learning theory to evaluate the long-term operation results of the automated speaker recognition system of critical use (ASRSCU) taking into account the features of the system’s operation object and the
structural specificity of such a class of recognition systems.
Objective. The goal of the represented work is the development of a complex set of methods for the ASRSCU’s quality parameters stabilization during its long-term operation.
Method. The article formulated set of methods for the ASRSCU’s operational risks estimation of its long-term operation. In particular, the dependence of the risk of an incorrect speaker recognition on the features space dimension is described. Based on the
formulated measure of informativity, obtained a set of methods to analyze the training sample to identify examples that lead to increased risk. The influence of the phenomenon of the drift of the speech signal parameters on the quality indicators of the ASRSCU is described analytically. An estimation of the operation duration of the ASRSCU, during which it is impractical to re-train its the classifier, is carried out. Recommendations for choosing an optimal ASRSCU’s classifier are formulated from the position of its complexity minimization, taking into account the risks of the ASRSCU’s long-term operation and the possibility of re-training.
Results. Represented in the article theoretical results are verified by the DET-curves experiments data, which summarize the information from long-term experiments with the ASRSCU, in which, during the features space configuration were taken into
account the features based on the power normalized cepstral coefficients based and the features based on the spectral-temporal receptive fields theory. Within the framework of the created theoretical concept, an estimation of the influence of the features space
configuration and the type and complexity of the classifier on the stability of the ASRSCU’s quality parameters during its long-term operation has been carried out.
Conclusions. For the first time the theoretically analyzed the problem of average risk minimization by empirical operation results of a ASRSCU, where, unlike existing approaches, non-stationary input data with the drift of individual speech signals features and
the characteristic parameters of the recognition system classifier were taken into account, which allowed to estimate the risk’s confidence interval for conditions for re-training sessions.


Keywords


automated speaker recognition system of critical use; experiment planning theory; factor analysis; statistical learning theory.

References


Kovtun V. V., M. M. Bykov Ocinjuvannja nadijnosti

avtomatyzovanyh system rozpiznavannja movciv krytychnogo

zastosuvannja, Visnyk Vinnyc’kogo politehnichnogo instytutu, 2017,

No. 2, pp. 70–76.

Speaker verification over the telephone [Electronic resource].

Access mode:

https://pdfs.semanticscholar.org/cad0/bfdec3f4fb1198f63c959580d7

d541a0f.pdf

Introduction to Statistical Learning Theory [Electronic resource].

Access mode:

http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/pd

fs/pdf2819.pdf

Learning deep architectures for AI [Electronic resource]. - Access

mode: https://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf

Scaling learning algorithms towards AI [Electronic resource].

Access mode: http://yann.lecun.com/exdb/publis/pdf/bengio-lecun-

pdf

Learning a similarity metric discriminatively, with application to

face verification [Electronic resource]. Access mode:

http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf

Jang G., Lee T., Oh Y. Learning statistically efficient feature for

speaker recognition, IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP), 7–11 May 2001:

proceedings. Salt Lake City, UT, USA: IEEE, 2002, pp. 4117–4120.

DOI: 10.1109/ICASSP.2001.940861.

Unsupervised feature learning for audio classification using

convolutional deep belief networks [Electronic resource]. Access

mode: http://www.robotics.stanford.edu/~ang/papers/nips09-

AudioConvolutionalDBN.pdf

Learning methods for generic object recognition with invariance

to pose and lighting [Electronic resource]. Access mode:

http://yann.lecun.com/exdb/publis/pdf/lecun-04.pdf

Learning a nonlinear embedding by preserving class

neighbourhood structure [Electronic resource]. Access mode:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.8635

&rep=rep1&type=pdf

A tutorial on Principal Components Analysis [Electronic resource].

Access mode:

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_com

ponents.pdf

Gustafson J. L., Montry G. R., Benner R. E. Development of

parallel methods for a 1024-processor hypercube, SIAM Journal

on Scientific and Statistical Computing, 1988, Vol. 9, No. 4, pp.

–638.


GOST Style Citations


1. Ковтун В. В. Оцінювання надійності автоматизованих систем
розпізнавання мовців критичного застосування / М. М. Биков,
В. В. Ковтун // Вісник Вінницького політехнічного інституту. –
2017. – № 2. – С. 70–76.
2. Speaker verification over the telephone [Electronic resource]. –
Access mode:
https://pdfs.semanticscholar.org/cad0/bfdec3f4fb1198f63c959580d7
217d541a0f.pdf
3. Introduction to Statistical Learning Theory [Electronic resource]. –
Access mode:
http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/pd
fs/pdf2819.pdf
4. Learning deep architectures for AI [Electronic resource]. – Access
mode: https://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf
5. Scaling learning algorithms towards AI [Electronic resource]. –
Access mode: http://yann.lecun.com/exdb/publis/pdf/bengio-lecun-
07.pdf
6. Learning a similarity metric discriminatively, with application to
face verification [Electronic resource]. – Access mode:
http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf
7. Jang G. Learning statistically efficient feature for speaker
recognition / G. Jang, T. Lee, Y. Oh // IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP),
7–11 May 2001: proceedings. – Salt Lake City, UT, USA: IEEE,
2002. – P. 4117–4120. DOI: 10.1109/ICASSP.2001.940861.
8. Unsupervised feature learning for audio classification using
convolutional deep belief networks [Electronic resource]. – Access
mode: http://www.robotics.stanford.edu/~ang/papers/nips09-
AudioConvolutionalDBN.pdf
9. Learning methods for generic object recognition with invariance
to pose and lighting [Electronic resource]. – Access mode:
http://yann.lecun.com/exdb/publis/pdf/lecun-04.pdf
10. Learning a nonlinear embedding by preserving class
neighbourhood structure [Electronic resource]. – Access mode:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.8635
&rep=rep1&type=pdf
11. A tutorial on Principal C o m p o n e n t s Analysis [Electronic
resource]. – Access mode:
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_com
ponents.pdf
12. Gustafson J. L. Development of parallel methods for a 1024-
processor hypercube / J. L. Gustafson, G. R. Montry, R. E.
Benner // SIAM Journal on Scientific and Statistical Computing.
– 1988. – Vol. 9, № 4. – P. 609–638.
13. Bell J. An Investigation of Alternative Cache Organizations / J.
Bell, D. Casasent, C. G. Bell // IEEE Transactions on Computers. –
1974. – Vol. C-23, № 4. – P. 346–351.
14. Sergienko I. V. Topical directions of informatics. In memory
of V. M. Glushkov / I. V. Sergienko. – New York, Heidelberg,
Dordrecht, London : Springer, 2014. – 286 p.
15. Sampling – 50 years after Shannon [Electronic resource]. – Access
mode: http://bigwww.epfl.ch/publications/unser0001.pdf
16. Mak M. W. A study of voice activity detection techniques for NIST
speaker recognition evaluations / M. W. Mak, H. B. Yu //
Computer, Speech and Language. – 2014. – Vol. 28, № 1. – P. 295–
313. DOI: 10.1016/j.csl.2013.07.003.
17. Front-end factor analysis for speaker verification [Electronic
resource]. – Access mode: http://habla.dc.uba.ar/gravano/ith-
2014/presentaciones/Dehak_et_al_2010.pdf
18. Power-normalized cepstral coefficients (PNCC) for robust
speech recognitions [Electronic resource]. – Access mode:
http://www.cs.cmu.edu/~robust/Papers/OnlinePNCC_V25.pdf
19. Speech Processing with a Cortical Representation of Audio
[Electronic resource]. – Access mode:
https://pdfs.semanticscholar.org/f1d8/f93cdb64390b3a65f930cee43
46c30bd86e4.pdf
20. Using spectro-temporal features to improve AFE feature extraction
for automatic speech recognition [Electronic resource]. – Access
mode:
https://pdfs.semanticscholar.org/c7c5/04087f2107f0ea9a3cedeeaf5c
c0c48c0c92.pdf
21. Ковтун В. В. Дослідження ефективності ознак розпізнавання
мовців при використанні згортальних нейромереж / М. М.
Биков, В. В. Ковтун // Оптико-електронні інформаційно-
енергетичні технології. – 2016. – № 2 (32). – С. 22–28.







Copyright (c) 2019 O. V. Bisikalo, V. V. Kovtun, M. S. Yukhimchuk, I. F. Voytyuk

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
National University "Zaporizhzhia Polytechnic", 
Zhukovskogo street, 64, Zaporizhzhia, 69063, Ukraine. 
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
E-mail: rvv@zntu.edu.ua

The reference to the journal is obligatory in the cases of complete or partial use of its materials.