ANALYSIS OF THE AUTOMATED SPEAKER RECOGNITION SYSTEM OF CRITICAL USE OPERATION RESULTS

Authors

  • O. V. Bisikalo Vinnytsia National Technical University, Vinnytsia, Ukraine., Ukraine
  • V. V. Kovtun Vinnytsia National Technical University, Vinnytsia, Ukraine., Ukraine
  • M. S. Yukhimchuk Vinnytsia National Technical University, Vinnytsia, Ukraine., Ukraine
  • I. F. Voytyuk Ternopil National Economic University, Ternopil, Ukraine., Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2018-4-7

Keywords:

automated speaker recognition system of critical use, experiment planning theory, factor analysis, statistical learning theory.

Abstract

Context. The article summarizes the statistical learning theory to evaluate the long-term operation results of the automated speaker recognition system of critical use (ASRSCU) taking into account the features of the system’s operation object and the
structural specificity of such a class of recognition systems.
Objective. The goal of the represented work is the development of a complex set of methods for the ASRSCU’s quality parameters stabilization during its long-term operation.
Method. The article formulated set of methods for the ASRSCU’s operational risks estimation of its long-term operation. In particular, the dependence of the risk of an incorrect speaker recognition on the features space dimension is described. Based on the
formulated measure of informativity, obtained a set of methods to analyze the training sample to identify examples that lead to increased risk. The influence of the phenomenon of the drift of the speech signal parameters on the quality indicators of the ASRSCU is described analytically. An estimation of the operation duration of the ASRSCU, during which it is impractical to re-train its the classifier, is carried out. Recommendations for choosing an optimal ASRSCU’s classifier are formulated from the position of its complexity minimization, taking into account the risks of the ASRSCU’s long-term operation and the possibility of re-training.
Results. Represented in the article theoretical results are verified by the DET-curves experiments data, which summarize the information from long-term experiments with the ASRSCU, in which, during the features space configuration were taken into
account the features based on the power normalized cepstral coefficients based and the features based on the spectral-temporal receptive fields theory. Within the framework of the created theoretical concept, an estimation of the influence of the features space
configuration and the type and complexity of the classifier on the stability of the ASRSCU’s quality parameters during its long-term operation has been carried out.
Conclusions. For the first time the theoretically analyzed the problem of average risk minimization by empirical operation results of a ASRSCU, where, unlike existing approaches, non-stationary input data with the drift of individual speech signals features and
the characteristic parameters of the recognition system classifier were taken into account, which allowed to estimate the risk’s confidence interval for conditions for re-training sessions.

References

Kovtun V. V., M. M. Bykov Ocinjuvannja nadijnosti

avtomatyzovanyh system rozpiznavannja movciv krytychnogo

zastosuvannja, Visnyk Vinnyc’kogo politehnichnogo instytutu, 2017,

No. 2, pp. 70–76.

Speaker verification over the telephone [Electronic resource].

Access mode:

https://pdfs.semanticscholar.org/cad0/bfdec3f4fb1198f63c959580d7

d541a0f.pdf

Introduction to Statistical Learning Theory [Electronic resource].

Access mode:

http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/pd

fs/pdf2819.pdf

Learning deep architectures for AI [Electronic resource]. - Access

mode: https://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf

Scaling learning algorithms towards AI [Electronic resource].

Access mode: http://yann.lecun.com/exdb/publis/pdf/bengio-lecun-

pdf

Learning a similarity metric discriminatively, with application to

face verification [Electronic resource]. Access mode:

http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf

Jang G., Lee T., Oh Y. Learning statistically efficient feature for

speaker recognition, IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP), 7–11 May 2001:

proceedings. Salt Lake City, UT, USA: IEEE, 2002, pp. 4117–4120.

DOI: 10.1109/ICASSP.2001.940861.

Unsupervised feature learning for audio classification using

convolutional deep belief networks [Electronic resource]. Access

mode: http://www.robotics.stanford.edu/~ang/papers/nips09-

AudioConvolutionalDBN.pdf

Learning methods for generic object recognition with invariance

to pose and lighting [Electronic resource]. Access mode:

http://yann.lecun.com/exdb/publis/pdf/lecun-04.pdf

Learning a nonlinear embedding by preserving class

neighbourhood structure [Electronic resource]. Access mode:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.8635

&rep=rep1&type=pdf

A tutorial on Principal Components Analysis [Electronic resource].

Access mode:

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_com

ponents.pdf

Gustafson J. L., Montry G. R., Benner R. E. Development of

parallel methods for a 1024-processor hypercube, SIAM Journal

on Scientific and Statistical Computing, 1988, Vol. 9, No. 4, pp.

–638.

How to Cite

Bisikalo, O. V., Kovtun, V. V., Yukhimchuk, M. S., & Voytyuk, I. F. (2019). ANALYSIS OF THE AUTOMATED SPEAKER RECOGNITION SYSTEM OF CRITICAL USE OPERATION RESULTS. Radio Electronics, Computer Science, Control, (4). https://doi.org/10.15588/1607-3274-2018-4-7

Issue

Section

Neuroinformatics and intelligent systems