AN INTELLIGENT MODEL BASED ON DEEP TRANSFER LEARNING FOR DETECTING ANOMALIES IN CYBER-PHYSICAL SYSTEMS

Context. The problem of detecting anomalies from signals of cyber-physical systems based on spectrogram and scalogram images is considered. The object of the research is complex industrial equipment with heterogeneous sensory systems of different nature. Objective. The goal of the work is the development of a method for signal anomalies detection based on transfer learning with the extreme gradient boosting algorithm. Method. An approach based on transfer learning and the extreme gradient boosting algorithm, developed for detecting anomalies in acoustic signals of cyber-physical systems, is proposed. Little research has been done in this area, and therefore various pre-trained deep neural model architectures have been studied to improve anomaly detection. Transfer learning uses weights from a deep neural model, pre-trained on a large dataset, and can be applied to a small dataset to provide convergence without overfitting. The classic approach to this problem usually involves signal processing techniques that extract valuable information from sensor data. This paper performs an anomaly detection task using a deep learning architecture to work with acoustic signals that are preprocessed to produce a spectrogram and scalogram. The SPOCU activation function was considered to improve the accuracy of the proposed approach. The extreme gradient boosting algorithm was used because it has high performance and requires little computational resources during the training phase. This algorithm can significantly improve the detection of anomalies in industrial equipment signals. Results. The developed approach is implemented in software and evaluated for the anomaly detection task in acoustic signals of cyber-physical systems on the MIMII dataset.


ABBREVIATIONS
XGBoost is an extreme gradient boosting; AUC is the area under the receiver operating characteristic curve; CPS is a cyber-physical system; SPOCU is a scaled polynomial constant unit; PCA is a principal component analysis; MIMII is a malfunctioning industrial machine investigation and inspection; LOF is a local factor outlier; GMM is a Gaussian mixture model; OC-SVM is a one-class support vector machine; STFT is a short-time Fourier transform; CNN is a convolutional neural network. NOMENCLATURE γ is the minimum loss reduction needed for splitting; λ is a regularization term; X is a time-frequency signal representation; F is the number of frequency bins; T is the time dimension; i x is a signal block; l is the length of the feature vector; σ is a window function; is the Fourier transform; υ is the loss value of the XGBoost algorithm; ) (x G is an activation function; i ỹ is an objective optimization function; K is the number of decision trees; INTRODUCTION An abnormal state of a cyber-physical system (CPS) can be caused by faulty components, temporary failures, misconfiguration, cyberattacks, or their combination [1,2]. An adversary intervenes in CPS to manipulate the readings of sensors or actuators, leading to abnormal system operation.
Anomaly detection in an industrial scenario is essential because undetected failures can lead to critical damage. Early detection of anomalies can improve the reliability of fault-prone industrial equipment and reduce operating and maintenance costs.
The development of Industry 4.0 has led to new technologies for efficient and reliable monitoring of such systems. Thus, modern CPSs include devices that form a multi-sensor configuration. These systems simplify the data collection process, resulting in the availability of large datasets. Consequently, there has been an increase in the development of data mining methods for detecting anomalies [3].
The classical approach to such problems usually involves signal processing techniques that extract useful information from sensor data.
The object of study is complex industrial equipment with heterogeneous sensory systems of various nature. For this purpose, preliminary data processing is required to extract the most informative features [4]. It is usually a very time-consuming task that requires expert knowledge.
The subject of study is methods for detecting anomalies in industrial equipment signals based on transfer learning. Images of signal spectrogram and scalogram are reviewed for a more accurate classification of equipment failures. The SPOCU (scaled polynomial constant unit) activation function [5] is considered to improve the accuracy of the proposed approach. The XGBoost algorithm is applied because it has high performance and requires little computational resources at the training stage.
The purpose of the work is to use transfer learning in combination with the XGBoost algorithm to improve the accuracy of detecting an abnormal state from acoustic signals of CPS.

PROBLEM STATEMENT
Suppose we are given an acoustic signal that has a time-frequency representation , where T is the time dimension, and F is the number of frequency bins. For a given signal dataset, it is necessary to find the function R X F → : such that ) (X F is higher for abnormal samples than for normal operation recordings. The acoustic signal is split into fragments using a sliding window ). Here it is proposed to extract the l -dimensional feature vector using a feature extractor for each i x . A pre-trained deep neural network is considered a feature extractor. Then some anomaly detection algorithm F is trained on all features from the fragments of the dataset

REVIEW OF THE LITERATURE
The detection of anomalies in industrial equipment is becoming an important area of research. The difficulty here is to obtain information from several sensors that differ in their specific acoustic properties [6]. Researchers propose new methods and expand existing algorithms for detecting industrial equipment faults [6][7][8][9][10][11][12][13].
Morita et al. [7] proposed principal component analysis (PCA) with local factor outlier (LOF) and Gaussian mixture model (GMM) to detect abnormal sounds in the presence of limited computing resources and a small dataset.
Paper [8] described an approach that combines pretrained OpenL3 embeddings with the reconstruction error of an interpolation autoencoder using GMM as the final predictor. The parameters were set individually for each machine using the results from the development set.
Michau and Fink [9] developed an architecture for learning a meaningful and sparse representation of highfrequency signals. They combined both the wavelets the-ory and deep learning for classification and anomaly detection tasks.
The application of autoencoder deep learning architectures for unsupervised acoustic anomaly detection based on Dense and convolutional neural networks (CNN) was considered in [10]. The energy features of the melspectrogram were extracted from the raw sounds. Several preliminary experiments were conducted to tune the autoencoder hyperparameters.
Tiwari et al. [11] proposed an ensemble of two systems capable of recording anomalous system behavior. In the first system, an outlier detection method based on the nearest neighbor search was proposed. In the second system, i-vectors and GMM are applied for anomaly detection. The negative log-likelihood is used as its anomaly scores.
OutlierNets, a family of very compact deep convolutional autoencoder architectures adapted for real-time acoustic anomaly detection, were proposed [12]. It has extremely low complexity and matches or exceeds large convolutional autoencoder architecture by AUC (area under the receiver operating characteristic curve) exhibiting microsecond scale latency on embedded hardware.
The efficiency of acoustic anomaly detection based on image transfer learning was studied [13]. The authors considered various deep neural models. Results showed that features extracted with ResNet18 and ResNet34 with GMM and OC-SVM (one-class support vector machine) achieved the best average AUC. It confirmed that the image-based features with transfer learning models might achieve competitive results in acoustic anomaly detection.
The following conclusions can be drawn summarising the analysis of the current research state in detecting anomalies from industrial facilities acoustic signals: 1) A small amount of work was focused on transfer learning for feature extraction and failure detection in industrial machines.
2) All the functionality of deep neural networks is not taken into account.
All this confirms the relevance of this study. This paper proposes a new method for the automatic detection of acoustic signal anomalies based on transfer learning. The signal spectral information is considered as input data for the proposed model. The addition of the XGBoost algorithm improves the accuracy of CPS fault detection. Experiments on a real MIMII (malfunctioning industrial machine investigation and inspection) dataset have shown the effectiveness of the proposed approach and can help experts diagnose equipment malfunctions.

MATERIALS AND METHODS
The paper proposes an approach to detecting machine signal anomalies using transfer learning. Transfer learning uses weights from a deep neural model, pre-trained on a large dataset. It can be applied to a small dataset, providing convergence without overfitting.
The proposed approach to detecting machine signal anomalies from images using transfer learning consists of the following steps: pre-processing, feature extraction using a deep neural network, feature fusion, and classification based on the XGBoost algorithm (Fig. 1).
The considered signals are pre-divided into cells of 128 samples with 64 samples overlap. A scalogram based on a wavelet transform and a spectrogram based on a short-term Fourier transform are extracted from the signals (STFT). STFT splits the signal into several overlapping blocks, multiplying them by the Hanning window function: where σ is a window function. And the Morlet wavelet is considered in the wavelet transformation to obtain more informative images [14]: where β is a parameter that controls the shape of the mother wavelet. Acquisition of the scalogram and spectrogram images is performed in parallel. Visual representations (RGB) of 128x128x3 size are then sent to a deep neural network.
Since deep learning models are trained on large datasets of various images, they can be applied to anomalies detection in signals from industrial facilities. Each of the model layers is responsible for different image features.
The MobileNet model is a small network that contains depth-separable convolutions and improves recognition performance [16]. The InceptionV3 network includes parallel convolutional layers that are then combined to produce a result [18,19]. The Xception network is a linear set of residual convolutional layers [15,20]. The summation operation speeds up the transition from one model layer to another [21]. The considered DenseNet model is used to collect information from all levels of the network and transfer it to subsequent levels when there is not enough data for training [17].
The fully connected layer was treated as a feature vector using a pre-trained model. The results are combined to extract information about various characteristics and reduce recognition errors. Thus, the total size of the feature vector is 1x1024.
In this work, SPOCU [5] is considered in the proposed model as an activation function in hidden layers to improve the accuracy of anomaly detection in image-based signals and is calculated as follows: where . Then the resulting feature vector is fed to the XGBoost classifier, which was proposed by Chen et al. in 2016 [22]. XGBoost is a regression tree that supports the classification task. The basis of the algorithm is to optimize the value of the objective function. In this case, the objective optimization function is defined as follows: where K is the number of decision trees, k φ is an independent tree with leaf scores, Φ is the space of the regression tree. In this case, the loss function is given by the following equation: where υ is the loss value of the XGBoost algorithm, i ỹ is the predicted output, Ω is a regularization term that prevents overfitting (7).
where K is the number of leaf nodes, w is the score on each leaf, γ and λ are constants to control the degree of regularization. Thus, we get the following: where i g is the first derivative, and i h is the second derivative of loss function, respectively.
For the XGBoost method, the learning rate is 0.001, number of trees to fit 100, maximum tree depth 6, 0 = γ and 1 = λ .

EXPERIMENTS
This section provides the experimental dataset description, the evaluation metrics, and the experimental results to evaluate the proposed approach based on transfer learning.
The dataset of CPS under normal and abnormal operating conditions is considered to evaluate the proposed approach [23]. The audio dataset was collected using a circular microphone array consisting of eight separate microphones as 16-bit audio signals with a sampling rate of 16 kHz [23]. It contains eight separate channels for each segment. The MIMII dataset contains the sound of four different machine types: valves, pumps, fans, and sliders. For each type of machine, different real anomalous scenarios were considered: pollution, leakage, rotating imbalance, rail damage, etc. MIMII also contains data for four machine IDs (00, 02, 04, and 06). Different signal-to-noise ratio levels (6 dB, 0 dB, and −6 dB) were considered in the dataset. It consists of 26.092 "normal" sound segments and 6,065 abnormal sound segments.
The "normal" and abnormal signatures for all machine types in the time domain of the MIMII dataset are shown in Fig. 2 ((a) -(d)) and Fig. 2 ((e) -(h)), respectively.
STFT spectrogram and scalogram based on wavelet transform for fans, pumps, sliders, and valves are shown in Fig. 3 and 4, respectively.
Performance evaluation of the proposed model is based on the following metrics: precision, recall, and Fmeasure.
The precision measure determines as the number of objects classified as positive that are truly positive: F-measure combines the recall and precision metrics: All considered metrics are widely used performance indicators in machine learning [24].

RESULTS
In this paper, the experiments are conducted in Python 2.7.13 using various libraries, including Tensorflow, Librosa, and Keras. Intel Xeon (R) CPU X5670 @ 2.93GHz * 24 with 24 GB of RAM machine was used.
This study analyses and compares various deep learning models (such as Xception, Inception, DenseNet, and MobileNet). They are trained on the MIMII dataset and applied to feature extraction from spectrogram and scalogram. Recall, Precision, and F-measure were considered as evaluation metrics. Hyperparameters optimization was performed using cross-validation. The combination of parameters was chosen based on the lowest training loss and the highest accuracy. The best performance of the model was observed with a batch size equal to 32. Also, the evaluation of the learning rate was performed on different values. The accuracy of the model decreased with the increasing learning rate. It is important to note that all four considered models achieved high accuracy by decreasing the learning rate to 0.001.
The results of the experiments are shown in Table 1. Comparison of various deep neural models showed that Densenet+XGBoost outperformed the other considered models in detecting anomalies from machine signals according to the F-measure metric. Models Incep-tion+XGBoost, Xception+XGBoost, and Densenet per-formed well for a Slider machine using recall and precision metrics. Even though all models showed the best results in terms of recall for a Valve-type machine, they were inferior to Densenet+XGBoost according to precision and F-measure.

DISCUSSION
The proposed model made it possible to achieve a significant improvement in anomaly detection according to the data of machine sensors according to AUC 95.45%, compared to the previously proposed models [7,8,10,13]. Densenet+XGBoost improved by about 8% over the PCA model [7] applied to the log spectrogram of the audio signal combined with LOF and GMM on the MIMII dataset. Grollmisch et al. [8] proposed a method combining OPENL3 embeddings and interpolation autoencoder (IAEO3_opt) for acoustic signals anomaly detection. Compared to the IAEO3_opt model, the Dense-net+XGBoost model has improved by approximately 7% [8]. Densenet+XGBoost gave comparable results (AUC 95.5%) on the reviewed MIMII dataset. Coelho et al. [10] used CNN and Dense network in combination with an autoencoder for the task of unsupervised acoustic anomaly detection, where the results averaged 72.0%, 73.1%, 91.8%, and 78.9% for the fan, pump, slider, and valve, respectively. The accuracy of the proposed method is 93.1% for the fan, 97.3% for the pump, 98.4% for the slider, and 93.0% for the valve, which is significantly higher than the above result. The results on the MIMII dataset showed that the Densenet + XGBoost model outperformed other approaches. The 10-fold-cv results dem-onstrated the reliability and robustness of the proposed model.

CONCLUSIONS
The urgent problem of detecting anomalies is solved based on acoustic signals from industrial equipment.
The scientific novelty of obtained results is that a transfer learning approach with the XGBoost classifier is proposed. There has been little research done in this area, and therefore studies are underway on various pre-trained deep neural model architectures to detect anomalies. The spectrogram and scalogram of the acoustic signal were considered as input data for the proposed architecture. The SPOCU activation function [5] was used to improve the accuracy of the proposed approach.
The practical significance of the obtained results is that experiments on a real MIMII dataset showed the effectiveness of the proposed approach and can help experts in diagnosing equipment malfunctions. Comparison with other known methods proves the superiority of Dense-net+XGBoost in terms of anomaly detection accuracy.