NETWORK TRAFFIC ANOMALIES DETECTION BASED ON INFORMATIVE FEATURES
Keywords:Network attacks, feature informativeness, random forest, Firefly algorithm, NSL-KDD.
Context. The urgent task for feature informativeness evaluation of a large amount of data has been solved. The object of the study was a network traffic.
Objective is to analyze the data informativeness for network traffic anomalies detection in order to reduce the feature space.
Method. The approach for feature informativeness evaluation of a large amount of data is proposed to increase the accuracy of the anomaly detection in network traffic. It also substantially increases the computation speed of the classification algorithms. The characteristics of a random forest and Firefly algorithms are considered. In the paper, an algorithm for feature selection based on the integration of these algorithms is proposed. Features are sorted in descending order according to their importance, the least informative ones are not considered. The decision trees, naive Bayes, Bayesian classifier, additive logistic regression and k-nearest neighbors method are considered as classifiers. The quality of the classification results is estimated using six evaluation metrics: true positive rate, false positive rate, precision, recall, Fmeasure and AUC.
Results. The experiments have been performed in the Matlab environment (2016a) on the NSL-KDD data set, using the proposed algorithm. The best classification results for the selected features have been obtained using k-nearest neighbors method.Conclusions. The conducted experiments have confirmed the efficiency of the proposed approach and allow recommending it for practical use in feature informativeness evaluation in order to reduce the feature space and increase the computation speed of the classification algorithms. In addition, in order to further study the effectiveness of anomaly detection in network traffic, a real data set will be used.
Dua S., Du X. Data mining and machine learning in cybersecurity. Boca Raton, FL, CRC Press, 2011, 256 p. DOI: 10.1201/b10867
Saxe J. Why security data science matters and how its different: pitfalls and promises of data science based breach detection and threat intelligence [Electronic resource], 2015, Access mode: https://www.blackhat.com/us-15/speakers/Joshua-Saxe.html
Gates C., Taylor C. Challenging the anomaly detection paradigm: a provocative discussion, Proceedings of the Workshop on New Security Paradigms, 2007, pp. 21–29. DOI: 10.1145/ 505202.505211
Molina L. C., Belanche L., Nebot A. Feature selection algorithms: a survey and experimental evaluation, Proceedings of IEEE International Conference on Data Mining, 2002, pp. 306–313. DOI: 10.1109/ICDM.2002.1183917
Yang X.-S. Firefly algorithms for multimodal optimization, Stochastic Algorithms: Foundations and Applications, 2009, Vol. 5792, pp. 169–178. DOI: 10.1007/978-3-642-04944-6_14
Breiman, L. Random forests, Machine Learning, 2001, No. 1, pp. 5–32. DOI: 10.1023/A:1010933404324
Random forests – Classification manual [Electronic resource], 2017, Access mode: http://www.math.usu.edu/adele/Forests/
Strobl C., Zeileis A. Danger: High power! – exploring the statistical properties of a test for random forest variable importance, Proceedings in Computational Statistics, 2008, pp. 59–66.
Xue B., Zhang M., Browne W. N. Particle swarm optimization for feature selection in classification: Novel initialization and updating mechanisms, Applied Soft Computing, 2014, Vol. 18, pp. 261– 276. DOI: 10.1109/TSMCB.2012.2227469
Feng D., Chen F., Xu W. Supervised feature subset selection with ordinal optimization, Knowledge-Based Systems, 2014, Vol. 56, pp. 123–140. DOI: 10.1016/j.knosys.2013.11.004
Bouaguel W., Mufti G. B., Limam M. A fusion approach based on wrapper and filter feature selection methods using majority vote and feature weighting, Proceedings of the International Conference on Computer Applications Technology, 2013, pp. 1–6. DOI: 10.1109/ICCAT.2013.6522003
Wang G., Ma J., Yang S. An improved boosting based on feature selection for corporate bankruptcy prediction, Expert Systems with Applications, 2014, Vol. 41, No. 5, pp. 2353–2361. DOI: 10.1016/j.eswa.2013.09.033
Srinivasa K. G. Application of Genetic Algorithms for Detecting Anomaly in Network Intrusion Detection Systems, Advances in Computer Science and Information Technology. Networks and Communications, 2012, Vol. 84, pp. 582–591. DOI: 10.1007/ 978-3-642-27299-8_61
Yu K. M., Wu M. F., Wong W. T. Protocol-based classification for intrusion detection, Applied Computer and Applied Computational Science, 2008, Vol. 3, No. 3, pp. 135–141.
Akbar S., Nageswara R. K., Chandulal J. A. Intrusion detection system methodologies based on data analysis, International Journal of Computer Applications, 2010, Vol. 5, No. 2, pp. 10–20. DOI: 10.5120/892-1266
Sethuramalingam S., Naganathan E. R. Hybrid feature selection for network intrusion detection, International Journal of Computer Science and Engineering, 2011, Vol. 3, No. 5, pp. 1773–1780. DOI: 10.4225/75/57a84d4fbefbb
Banati H., Bajaj M. Fire Fly based feature selection approach, IJCSI International Journal of Computer Science Issues, 2011, Vol. 8, № 4, pp. 473–80.
Hothorn T., Hornik K., Zeileis A. Unbiased recursive partitioning: a conditional inference framework, Journal of Computational and Graphical Statistics, 2006, Vol. 15, No. 3, pp. 651–674. DOI: 10.1198/106186006X133933
Breiman L. Stacked Regressions, Machine Learning, 1996, Vol. 24, pp. 49–64. DOI: 10.1007/BF00117832
Strobl C., Boulesteix A.-L., Kneib T., Augustin T., Zeileis A. Conditional variable importance for random forests, BMC Bioinformatics, 2008, Vol. 9, No. 1, pp. 25. DOI: 10.1186/1471- 2105-9-307
Siroky D. Navigating Random Forests and related advances in algorithmic modeling, Statistics Surveys, 2009, Vol. 3, pp. 147–163. DOI: 10.1214/07-SS033
Archer K. J., Kimes R. V. Empirical characterization of random forest variable importance measures, Computational Statistics & Data Analysis, 2008, No. 4, pp. 2249–2260. DOI: 10.1016/ j.csda.2007.08.015
Strobl C., Boulesteix A.-L., Zeileis A., Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution, BMC Bioinformatics, 2007, Vol. 8, No. 1, pp. 1471– 2105. DOI: 10.1186/1471-2105-8-25
Liaw A., Wiener M. Classiыcation and Regression by randomForest. R News, 2002, Vol. 2, No. 3, pp. 18–22.
Aggarwal P., Sharma S. K. Analysis of KDD dataset attributesclass wise for intrusion detection, Procedia Computer Science, 2015, Vol. 57, pp. 842–851. DOI: 10.1016/j.procs.2015.07.490
McHugh J. Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by lincoln laboratory, ACM Transactions on Information and System Security, 2000, Vol. 3, No. 4, pp. 262– 294. DOI: 10.1145/382912.382923
Tavallaee M., Bagheri E., Lu W., Ghorbani A. A detailed analysis of the KDD CUP 99 Data Set, Proceedings of the second IEEE Symposium on Computational Intelligence for Security and Defense Applications, 2009, pp. 53–58. DOI: 10.1109/ CISDA.2009.5356528
NSL-KDD data set for network-based intrusion detection systems [Electronic resource], 2017, Access mode: http://nsl.cs.unb.ca/ NSL-KDD/
Davis J. J., Clark A. J. Data preprocessing for anomaly based network intrusion detection: A review, Computers & Security, 2011, Vol. 30, No. 6–7, pp. 353–375. DOI: 10.1016/j.cose.2011.05.008
Holz T. 13 security measurements and metrics for networks, Dependability Metrics, 2008, pp. 157–165. DOI: 10.1007/978-3-540-68947-8_13
How to Cite
Copyright (c) 2017 Y. N. Imamverdiyev, L. V. Sukhostat
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.