AN IMPROVED ENSEMBLE APPROACH FOR DOS ATTACKS DETECTION

Context. The task of using the ensemble of classifiers to detect DoS attacks in large arrays of network traffic data is solved to withstand attacks on the network. Objective of this paper is to build an ensemble of classifiers that surpasses single classifiers in terms of accuracy. Method. To achieve the formulated goal an algorithm, that indicates the probability of belonging to certain classes, which return a vector of classification scores for each point, is proposed. The peculiarity of the proposed approach is that for each point from the dataset, the predicted class label corresponds to the maximum value among all scores obtained by classification methods for a given point. As classifiers, decision trees, k-nearest neighbors algorithm, support vector machines with various kernel functions, and naпve Bayes are considered. A comparative analysis of the proposed approach with single classifiers is considered using the following metrics: accuracy, precision, recall, and F-measure. Results. The experiments have been performed in R 3.4.1 on the NSL-KDD dataset of network attacks, which was divided into three classes (DoS, normal network behavior and other types of attack). Conclusions. The conducted experiments have confirmed the efficiency of the proposed approach. The most accurate result showed an ensemble of five classifiers. The development of techniques for attacks detection based on an ensemble of classifiers avoids the problems inherent in most approaches since it is capable of detecting both known and new attacks with high accuracy. It can be concluded that the proposed approach for network attacks detection is of practical significance. In order to further study the attacks detection in network traffic, studies will be performed on real Big data sets.

( ) P H -a priori probability of each class without information on the variable x; ( | ) P H x -a posteriori probability of the variable x over the possible classes; ( | ) P x H -conditional probability of x at the likelihood H ; The research objective is to construct an ensemble of classifiers that surpasses single classifiers in terms of accuracy in order to detect DoS attacks.

PROBLEM STATEMENT
To solve the task of DoS attacks detection in network traffic, an ensemble of classifiers that surpasses single classifiers in terms of accuracy is suggested in this paper.
Let us denote the following notations (Table 1): ( 1, ) n i x R i n ∈ = is the point from the dataset, where n is the total number of points in the input dataset,  It is necessary to obtain a vector of ensemble scores on the basis of single classifiers to each data point in order to improve the classification accuracy.

REVIEW OF THE LITERATURE
A number of studies and review articles have been devoted to the intrusion detection technology [9,10] or data mining for specific applications [11]. Since the introduction of intrusion detection principles by Denning in 1987, a large number of reactive protection systems have been developed [12][13][14].
Intrusion detection methods can be divided into three categories: single, hybrid, and assemblies [15]. Support vector machines (SVMs) and artificial neural networks (ANNs) are the most popular approaches among single classifiers. Several classifiers are combined to a higher goal of a significant increase in efficiency of the classifier known as an ensemble of classifiers [16]. The majority votes, bagging and boosting are some common strategies for combining classifiers [17]. Although it is known that the disadvantages of classifiers' components accumulate in the ensemble of classifiers, but it works very effectively in varying combinations. Thus, the researchers become more and more interested in the application of the ensemble of classifiers every day.
The important cybersecurity problems for mathematical and statistical solutions have been shown in [18]. A method to improve the detection accuracy by an ensemble of twolayer SVM based on rotation forest was presented in [19]. The experiments were conducted on the KDD CUP 1999 dataset. The output of ensemble network was made by majority voting. The second layer result is used to focus on two classes "normal" and "attack". Classification accuracy has been improved by combining opinions from multiple experts into one using an ensemble approach in [20]. The ensemble construction method uses PSO generated weights to create the ensemble of classifiers with better accuracy for intrusion detection. This work was based on binary classification methods, which can distinguish between two states. In case of conflict between binary classifiers final decision has been reached by comparing their accuracy.
In [21] a multistep framework based on machine learning techniques to create an efficient classifier was introduced. A novel fuzzy weighting method for ensemble classifiers was proposed. Thus, adding the fuzzy weighted combiner can tag weights to classifiers related to their cost and performance.
Architecture of intelligent false alarm filter by employing a method of voted ensemble selection aiming to maintain the accuracy of false alarm reduction was proposed in [22]. The experiment was conducted using SVM, decision tree, and k-nearest neighbor (KNN) machine learning algorithms. The proposed method was validated on a real dataset.
The paper [23] aims to identify multiclass SVM models best suited to the intrusion detection task. A new approach (WOAR-SVM) based on a set of optimal, or near-optimal, weight coefficients, which define the relationship between the decision rules of the binary SVM classifiers was developed.
A generic architecture for automated DDoS attack detection and response system for the collaborative environment using machine learning were proposed in [24]. The main objective of this paper was to minimize the cost of classification errors of the intrusion detection. The proposed classification algorithm, RBPBoost, was achieved by combining ensemble of classifier outputs and Neyman Pearson cost minimization strategy, for final classification decision.

MATERIALS AND METHODS
More information about intrusion detection can be obtained by data classification methods. Theoretically, classification algorithms can achieve high performance, i.e. they can minimize the number of false alarms and maximize detection accuracy. One of the most attractive features of the algorithms is the ability to distinguish normal from abnormal behavior [8]. In the context of the intrusion detection, the classification algorithm is typically a map which adapts to the network invisible abnormalities [25].
Formally, each data instance is a feature vector Х measured at time t and denoted as ( ) The classification algorithm is aimed to train the function that maps all the samples to their own states. To achieve its purpose, they use a set of data instances within the network. This set is known as the training dataset. Some algorithms learn a mapping function by the use of labeled training sets, where each sample in the training set is marked as one of the states. These algorithms are called supervised learning algorithms. The purpose of the use of these algorithms is to achieve high classification accuracy.
The most popular supervised machine learning methods include SVM, decision trees, Bayesian networks, KNN algorithm, etc. . The closest to the hyperplane data points are called support vectors. The distance between support vectors is known as margin. Linear SVM can be achieved by quadratic optimization: SVM can find accurately linear, nonlinear and complex classification boundaries, even with a small amount of training sample.
SVMs are widely used for transmission of various type of data by switching the kernel function. The most used kernel functions include linear, polynomial, radial based function and sigmoid.
However, choosing the kernel function and fit the relevant parameters by SVM are still in the procedure of trial and error. SVM is fast, but its duration is increased four times when the data size of the sample is doubled. Unfortunately, the root of SVM algorithms is in binary classification. To solve the problems of multi-class classification several SVM for binary classes can be combined by the classification of each class or classification of each class pair.
A decision tree (DT) is a tree-structure model, which has leaves that represent classes or solutions, and branches that represent conjunctions of features that lead to those classifications.
Tree-structure classification of an input vector is performed by bypassing the tree from the root node to the end with a leaf. Each tree node computes inequality on the basis of one of the input variables. Each leaf is assigned to a particular class. Each inequality, which is used to divide the input space is based only on one of the input variables. Linear DTs are like binary DTs, except for the fact that inequality calculated at each node has a random linear form, which may depend on several variables. DT depends on the rules of "if-then", but does not require any parameters and metrics. This simple and interpretable structure allows decision trees to solve the problem for different types of attributes. DTs can also manage the missing values or noisy data. However, they can not guarantee the optimal accuracy, unlike other machine learning techniques. Although decision trees are easy to learn and implement, they are not often used for intrusion detection. This is due to the fact that finding the smallest decision tree is NP-hard.
Bayesian network classifier is based on Bayes' rule, which gives the hypothesis H of classes and data x where ( ) P H represents the a priori probability of each class without information on the variable x, ( | ) P H x is a posteriori probability of the variable over the possible classes, is the conditional probability of x at this likelihood H . Bayesian network nodes are represented with random variables and arcs representing probabilistic relationships between variables and conditional probabilities. Node always calculates the posterior probabilities, giving proof of inheritance for the selected nodes. Naїve Bayes (NB) is a simple Bayesian network model, which assumes that all variables are independent. It is necessary to find the maximum likelihood hypothesis, which defines the class label for the test data x , for classification by NB.
NB classifier can be resolved by the hypothesis of maximum a posteriori probability (MAP) for data as follows: where x is an observable data, and Naїve Bayes is effective for tasks with a logical conclusion. However, Naпve Bayes is based on the strong assumption of variables independence.

Numbers of nearest neighbors k and distances measures
are key components of the KNN algorithm. Selection of the number k should be based on cross-validation. By increasing the number k, the effect of noise in the data during classification is reduced, and this can erase the difference between the classes. In practice, k have to be less than the square root of the total number of training samples.
In the case of multiclass classification, KNN method is based on measuring the distance from one data sample to each trained sample [26]. The k-smallest distances are calculated, and the most common class based on these KNNs is considered to be the label of the output class.
KNN does not require training parameters. It is easy to implement, but it requires a lot of memory and time.
The proposed algorithm, that indicates the probability of belonging to certain classes, returns a vector of classification scores for each data point.
The peculiarity of the proposed approach is that for each point from the dataset, the predicted class label corresponds to the maximum value among all scores obtained by clustering methods for a given point.
The algorithm of the proposed approach for network attacks detection based on an ensemble of classifiers is presented below:

EXPERIMENTS
For the experiments was considered NSL-KDD dataset of network attacks [27], built on the basis of KDD-99 database on the initiative of the American Association for the Defense Advanced Research Projects Agency (DARPA) [28]. A dataset of connections was collected to conduct the research in the field of intrusion detection, which covers a wide range of intrusions simulated in a medium that mimics the US Air Force network.
Statistical analysis showed that there are important issues in the databases that highly affect the performance of the systems, as well as lead to a very bad evaluation of anomaly detection approaches. The considered database NSL-KDD has the following advantages: 1. No redundant records in the training set, so that the classifier does not show any bias results.
2. Duplicate records are not present in the test set. It contains some of the attacks that are not present in the training set.
3. A number of records selected from each difficulty levels group are inversely proportional to the number of records in the original dataset KDD.
The training set consists of 21 different attacks of 37 present in the test set. In addition, the number of records in the training (125973 samples) set and test set (22544 samples) of NSL-KDD is acceptable. This advantage makes it accessible for experiments on comprehensive data without the need to randomly select a small portion of data. Consequently, the evaluation of the results of various research projects is consistent and comparable.
The main objectives put forward in network intrusion detection include recognition of rare types attack, increasing the accuracy of suspicious activity detection, as well as increasing the efficiency of real-time intrusion detection models. Each record has 41 attributes, which describe various features.
To evaluate the performance of classifiers, the following metrics are used: Accuracy, Recall, Precision, and F-measure. For any classification algorithm, four classification cases are possible, and this helps to understand the difference between the following metrics: True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN) results.
Classification accuracy can be defined as the proportion of the correct results, which is achieved by the classifier:

TP +TN Accuracy TP +TN FP FN
Precision shows how much of the objects identified by the classifier as positive are really positive: Recall shows which part of the positive objects was selected by the classifier: F-measure is a metric that combines the recall and precision: 2 recall precision F measure recall precision Error rate (or Misclassification error) measures the ratio of incorrectly classified samples over the total number of classified samples:

RESULTS
It is the experiments were conducted using Windows® 10-64 bits operating system platform with core i7 processor 2.5 GHz, 8  Accuracy, Recall, Precision, and F-measure were considered as evaluation metrics. The classification results are shown in Tables 2-5. From these tables, ranks (shown in square brackets) were obtained for each metric of each class, the results of which are shown in Tables 6-8.
It can be concluded from Table 2 that the highest accuracy (92.33%) of DoS attacks detection was achieved for the ensemble of five classifiers (DT+KNN+SVM(Polynom)+ +NB+SVM(Linear)), which exceeded the result of the single classifier (KNN) by 4.12%.
Comparison of the recall and precision values for DoS attacks detection is shown in Tables 3 and 4, respectively.  [29] according to the three metrics Accuracy, Precision and F-measure, according to the metric Recall the ensemble DT+KNN+SVM(Polynom)+NB demonstrates the best result. From Table 6, the worst result showed the NB method under three metrics (out of four), and the DT method had the worst rank by one metric (Recall). From Table 7, the ensemble DT+KNN+ +SVM(Polynom)+NB showed the best results for the Normal class under the three metrics, and the worst result showed the NB method according to Accuracy, Recall, and F-measure.
3) The proposed approach with different combinations of classifiers is superior to single classifiers.
To show a comparison of methods more clearly, we demonstrate this in Fig. 1, where the error rates for each classifier are shown. For each classifier, the error rates were computed. In the figure, the red color indicates the classification error rates for DoS attacks, the green color indicates the error rates for other types of attack and the blue color shows the error rates for "normal" state. Single classifiers (DT, NB, SVM, and KNN) were compared with various ensembles of classifiers using the proposed approach. CONCLUSIONS At present, the processing and analysis of Big data are important for ensuring information security. Intrusion detection is one of the serious problems in the field of network security. In this study, in order to resist attacks on the network, the ensemble of classifiers was successfully applied. The ensemble improves recognition accuracy by combining various single classifiers. The ensemble of classifiers consisted of combinations of DT, SVM with various kernel functions, NB and KNN algorithms.

ПРОГРЕСИВНІ ІНФОРМАЦІЙНІ ТЕХНОЛОГІЇ
In general, the considered classification methods showed high accuracy of DoS attacks detection. The most accurate result was shown by an ensemble of five classifiers (DT+KNN+SVM(Polynom)+NB+SVM(Linear)). It can be concluded that the proposed approach for network attacks detection is of practical significance.