NEUROINFORMATICS AND INTELLIGENT SYSTEMS

Context. The article is devoted to the problem of a training data set forming for the automatic human emotions recognition system on the basis of a multidimensional extended neo-fuzzy neuron. The aspects of choice the attributes vector’s dimension and composition, their influence on the system learning rate are considered. The object of research is the method of multidimensional data clustering. The subject of research is two-dimensional images geometric features systematization. Objective. The main goal of the work is to develop an approach to person’s face expression description using geometric features fixed set that can be obtained by video sequence frames processing. Method. To study the facial expressions recognition system it is proposed to form a feature vector consisting of characteristic points coordinates. There were selected points that relate to the location and shape of the eyelids, eyebrows, eye pupils, lips contours, nose wings, nasolabial folds. Such points can be easily found during the automatic image processing using known contour detectors. Also, the possibility of using for the human facial expression description not the coordinates of characteristic points, but the distances between them, was investigated. From these distances a different feature vector was created, the properties of which were compared with the points coordinates vector. Results. The developed recognition system on the basis of a multidimensional extended neo-fuzzy neuron have been implemented in software and investigated for solving the problem of facial expression classification. A comparison between the attribute vectors that are different in composition and dimension is made. The structure for the feature vector, which provides high system learning rate, and does not require the additional structural elements was chosen. Conclusions. The experimental study fully confirms the effectiveness of the developed approach for the human facial expressions recognition using a multidimensional extended neo-fuzzy neuron.


INTRODUCTION
The human face is a communicative system with many inputs and many outputs, which is very flexible. To transfer information in this system, several signal groups are used [1,2]: 1) static signals are relatively permanent signs, such as bone structure, soft tissues and general proportions. These signals are usually used to identify a person; 2) slow facial signals are changes in the person appearance that occur gradually over time, such as developing permanent wrinkles and changing the texture of the skin. These signals can reduce the distinctness of the facial features and prevent fast facial signals recognition; 3) artificial signals are exogenous facial features, such as eyeglasses and cosmetics. They can hide facial features or, conversely, strengthen; 4) fast facial signals are temporary changes in neuromuscular activity that can lead to visually detectable changes in the face appearance. These person atomic signals are the basis of facial expressions.
Given the significant role of the individual in social life, it is not surprising that the potential benefits of efforts to automate the facial signals analysis, in particular fast facial signals, are diverse and numerous. Especially it concerns human-computer interfaces. Automatic analysis of fast facial signals is used in various vision subsystems, including gaze tracking, lip reading, bimodal speech processing, morphemes visual synthesis, and the formation of commands based on facial expressions. Among the areas where they are applied [3]: -control the movement of one or more vehicles, aircraft and car control; -control of high-risk facilities; -monitoring elderly patients in hospitals; -video conferences, internet lectures and distance learning.
In numerous facial expressions automatic recognition systems, two approaches to encoding information are used primarily: the detection of facial features changes and the facial muscles action. These approaches are based on the work of psychologists and physiologists. To use them in automatic analysis, it is necessary to find corresponding changes in the image characteristics (static in the photo or dynamic in the video sequence) for each psychophysiological feature. Such characteristics include contour, structural and textural image descriptions. For example, these are contour corners, contour extreme points, individual regions color data, the texture properties, etc.
The human-computer interface involves interaction in real time. This means that the number of characteristics that are calculated from the person's face image during the dialogue should not be too large. On the other hand, for a precise identification of a person's state, a set of characteristics cannot be small, since the possible emotional states number is large enough. Therefore, the problem of a data set forming for the automatic person's expression recognition on the image remains unresolved.
The object of study is the method of multidimensional data clustering.
The subject of study is two-dimensional images geometric features systematization.
The purpose of the work is increasing the accuracy of a person's face expression automatic recognition in real-time, when the sizes of the training data are small.
1 PROBLEM STATEMENT Automatic facial expressions recognition is reduced to solving the data clustering problem. If the data comes from the video, it is obvious that it can contain information about static, slow and exogenous signals that do not change from frame to frame. However, the most important are the fast signals that need to be registered, processed and recognized in each frame. The changing fast signals dynamics also indicate a psycho-emotional state. Therefore, it is necessary to register and process the information about fast signals formation rate in real time. Such rigid limitations to the rate of facial expressions automatic recognition method lead to the necessity of using a compact but informative features vector that will ensure the information preservation and high clustering rate.
The main goal of this work is to choose a way of face features detecting for an online human facial expression recognition using a system based on multidimensional neo-fuzzy neurons.

REVIEW OF THE LITERATURE
The task of automatic facial expression recognition is complex and multistage. It includes the image preprocessing and face area searching. After the face area is detected, it is possible to recognize the emotion according to the facial characteristic features set. Depending on the chosen approach, the next stage of the problem solvingthe calculation of the characteristic vector. If the recognition is performed in real time, then the 3D model use could be inefficient due to the high time costs. For online applications, the 2D image features vector (received as a result of frame-based analysis) will be justified.
The recognition of faces and their expressions uses several descriptor types. The most common are local binary masks (Local Binary Patterns), attributes selection using Gabor Facial Features, active and adaptive models of the shape and appearance (Active Shape Model, Active Appearance Model) [4]. Examples of these descriptors are shown in Fig. 1. In addition, curvilinear contours can be represented in the form of splines, as proposed, for example, in [5]. Local binary masks can be attributed more likely to textural features, since they represent generalized histograms of the face image fixed segments. Such a description will be less informative in emotions recognition.
The Gabor filtering results are definitely interesting for the facial expression recognition. Usually features detected in the image are either geometric shape features of the facial components (eyes, lips, wings of the nose, etc.) or the singular points location (corners of the eyes, mouth, etc.) or appearance elements, representing the skin face texture, including wrinkles, bulges and furrows. Typical examples of geometric characteristics are described in [6], where a 19-point grid is used; in [7], where a model of shape, defined by 58 landmarks for the face is used, and in [8]. Some examples of a hybrid description of the geometric features and appearance model are the methods in [9], where the eye, eyebrows and mouth shapes and wrinkles (crows and nasolabial furrows) were used, and in [10], where 26 facial points around the eyes, eyebrows and mouth were used and the same descriptors as in [9]. Methods based on appearance are the use of Gabor wavelets in [11,12], and the application of a holistic, monochrome, spatial and temporal patterns of facial coefficients [13]. Adaptive model in this sense is more useful, since it includes not only nodal points, but also contours (eyelids, eyebrows, lips, chin). Such signs, undoubtedly, convey exhaustive information about the emotional state of a person, even if it has a weak degree of expression. Now, for the labeling of facial actions, the facial action coding system is widely used [1,2]. FACS links the changes in the face to the muscles actions that produce them. It defines 44 different action units, which are considered to be the minimum visually perceptible face movements. The face expression descriptors are most often used for the six basic emotions (fear, sadness, happiness, anger, disgust, surprise) suggested by Ekman and theorists of discrete emotions. They assume that these emotions are everywhere displayed and recognized from facial expressions. FACS also provides rules for recognizing AU time segments (start, culmination, completion) in the front-end video. However, the main disadvantage of this system for automatic video recognition is the fact that the movements of the facial muscles on 2D frame images can not always be detected by filters, detectors, or other algorithms to numerically evaluate the properties of images. The researchers noted that methods based on geometric features, often exceeded those that are based on the use, for example, of wavelets or Gabor eigenvectors [11].
A separate problem is the definition of dimension of such a model. It is directly related to the clustering method, which is used for recognition. For example, the model in Fig. 1с contains 83 points, taking into account the number of their coordinates, the dimension of the feature vector can be equal to N = 166 (in the 2D case) or N = 249 (in the 3D case). If a neural network or neurofuzzy system is selected for clustering, the number of adjusting parameters can be 2 N .
For some types of neural networks, neuro-fuzzy systems, it is possible to reduce the number of configurable parameters to N×M, where M is the dimension of the output vector (it is determined by the number of recognizable emotions and their combinations). Nevertheless, for real-time processing, the represented dimension of the attributes input vector will also be unacceptable. b -using the Gabor filter; c -using the adaptive model of appearance

MATERIALS AND METHODS
In this paper, the system of human face expressions automatic recognition in real time, based on the multidimensional extended neo-fuzzy neuron [14], is considered.
This system implements a fuzzy Takagi-Sugeno inference of an arbitrary order (p-1) in the form: (1) Its architecture is shown in Fig. 2. The input layer of the system consists of extended neo-fuzzy neurons, the intermediate layer consists of elements performing nonlinear transform, and the output layer combines the output values into the resulting vector.
In the input layer, the extended neo-fuzzy neurons convert the input signals as follows: ,..., which is necessary if the learning vector set in the range [0, 1]. As a basis for the feature vector, it is proposed to use a set of 35 characteristic points. Their location scheme is shown in Fig. 3.
Taking into account the number of their coordinates X1, Y1, X2, Y2, ..., X35, Y35, the dimension of the feature vector can be equal to N = 70 (for the recognition by 2D images). These points can be localized in the face area using contour detectors, for example, using SURF [15] or Shi-Tomasi detectors [16].
The placement of such points can indicate the basic facial actions of the FACS system in the facial dynamics. The action unit codes and the numbers of the corresponding characteristic points are presented in Table 1.  for the left side of the face. Other basic emotions are indicated using the proposed set of points in a similar way.
Adaptive models of shape appearance use not only positions of individual points, but also the location of especial lines. These include nasolabial folds, facial wrinkles and so on. In addition, the relative position of the eyes, corners of the mouth, and the wings of the nose also indicate a change in the state of a person. These indicators can be calculated on the basis of the values of the coordinates of the proposed 35 characteristic points.
Absolute and relative distances from the corners of the lips to the outer corners of the eyes (Fig. 4) are calculated in accordance with the formula (6): .
The distance from lips corners to the outer eyes corners (Fig. 4): The distance from the lips corners to the eyebrows outer corners (Fig. 4):  The distance from the lips corners to the eyebrows inner corners and their ratio (Fig. 4) The mutual eyebrows and eyes position (Fig. 5 The eyebrows position and curvature (Fig. 5 The mutual position of the eyes, wings of the nose and cheeks (Fig. 5 Figure 4 -Scheme for calculating the absolute and relative distances from the lips corners to the outer eyes corners, the distance from the lips corners to the eyebrows outer corners, the distance from the lips corners to the eyebrows inner corners and their ratio The relative mouth opening and width (Fig. 6 The eye inclination (Fig.6

EXPERIMENTS
For a comparative evaluation of the different in composition and dimension characteristic points vectors, photographs from two open image databases that were designed to solve the problems of facial expressions recognition were used: Psychological Image Collection at Stirling [17], partly from the Extended Cohn-Kanade database [18]. Another important question was the verification of the recognition system on the basis of the neo-fuzzy neuron, if the training sample is small (up to 1000 samples). Photographs were selected from the image databases, where the person emotional state expression degree is different -from barely visible, to strongly noticeable.
The total number of photos in the training sample was 344, their distribution by basic emotions is shown in Table 2. Table 2 -Dimensions of photos training sets for single emotions   Emotion  Anger  Disgust  Fear  Happiness  Sorrow  Surprise  Neutral  Data set size  49  66  35  45  19  50  80 For these photographs, the coordinates of the proposed 35 points were found: X1, Y1, X2, Y2, ..., X35, Y35, and then calculated the indicators D1, ..., D24 and K1, ..., K14. To test the possibility of the feature vector dimension reducing, two sets of data were additionally formed. The first set includes the coordinates of four points from the original vector. These are points 13, 14, 18, 23 (Fig. 7a). In the second set -the coordinates of 22 points (Fig. 7b). The resulting feature vectors were used to learn a facial expression automatic recognizing system based on a multidimensional extended neo-fuzzy neuron (Fig. 2). In the power polynomial of this neuron nonlinear synapse, the terms number equal to three is assumed. Thus, this system implements the Takagi-Sugeno second order fuzzy inference. The number of membership functions and learning epochs was variable. The recognition system learning results are given in Tables 3-6.

RESULTS
As a result of the experiment, it was found that the reduction of the feature vector dimension can be compensated by increasing the membership functions number in the multidimensional extended neo-fuzzy neuron structure. Characteristics in the form of distances between feature points reduce the system learning rate. This problem can also be solved by increasing the membership functions number.

DISCUSSION
The experiment shows that for the system of emotions recognition based on a multidimensional expanded neofuzzy neuron, there is not enough a feature vector that includes only the coordinates of four characteristic points (position of the eyes, nose and mouth). The recognition accuracy cannot be increased by increasing the training duration or the nonlinear synapses number. Obviously, it would be easier to find four characteristic features in the video sequence using automatic detectors, but for the clustering system will not be enough of these data.
On the other hand, the use of the distances between facial features to describe the facial expression dynamics leads to the need to increase the duration of the system training. Since these distances are parameters that are to be calculated from the coordinates (that is, in fact, they are indirect data), their application for automatic analysis of facial expressions is less preferable.
As a result of the experiment, a high efficiency of vector consisting of 35 characteristic points coordinates was established for automatic recognition of facial expressions using a system based on a multidimensional expanded neo-fuzzy neuron. High clustering accuracy is achieved already for 2000 learning epochs with 11 nonlinear synapses in the neuron structure. The reduction of the feature vector dimension (up to 22 points) leads to an increase of training duration and can be compensated by increasing the synapses number to 21. Let's note here, that the number of training samples does not exceed 1000. This is quite important for many practical problems, when training data of tens and hundreds of thousands of samples are inaccessible.

CONCLUSIONS
The problem of basic human emotions automatic recognition in real time is considered. For this task, a multidimensional extended neo-fuzzy neuron is used.
The scientific novelty of the work lies in the fact that the feature vector dimensionality is determined for the proposed computational architecture. It can include both the coordinates of individual characteristic points, and the distances between them. These data vector correlate well with the well-known facial action coding system. The system learning rate can be increased by increasing the number of neo-fuzzy neuron membership functions. Detection of characteristic points in the automatic recognition system can be realized in real time using standard detectors. Despite the small volumes of the learning sample set, the system provides high recognition accuracy. This factor is especially important in such practical applications, where it is not always possible to obtain thousands and tens of thousands of training samples.
The practical significance of obtained results is that the size of the feature vector has been found that provides high accuracy of automatic recognition of basic emotions from 2D images. The experimental results allow using this vector to study the dynamics of facial expression by video.