NETWORK TRAFFIC ANOMALIES DETECTION BASED ON INFORMATIVE FEATURES

Y. N. Imamverdiyev, L. V. Sukhostat

Abstract


Context. The urgent task for feature informativeness evaluation of a large amount of data has been solved. The object of the study was a network traffic.

Objective is to analyze the data informativeness for network traffic anomalies detection in order to reduce the feature space.

Method. The approach for feature informativeness evaluation of a large amount of data is proposed to increase the accuracy of the anomaly detection in network traffic. It also substantially increases the computation speed of the classification algorithms. The characteristics of a random forest and Firefly algorithms are considered. In the paper, an algorithm for feature selection based on the integration of these algorithms is proposed. Features are sorted in descending order according to their importance, the least informative ones are not considered. The decision trees, naive Bayes, Bayesian classifier, additive logistic regression and k-nearest neighbors method are considered as classifiers. The quality of the classification results is estimated using six evaluation metrics: true positive rate, false positive rate, precision, recall, Fmeasure and AUC.

Results. The experiments have been performed in the Matlab environment (2016a) on the NSL-KDD data set, using the proposed algorithm. The best classification results for the selected features have been obtained using k-nearest neighbors method.

Conclusions. The conducted experiments have confirmed the efficiency of the proposed approach and allow recommending it for practical use in feature informativeness evaluation in order to reduce the feature space and increase the computation speed of the classification algorithms. In addition, in order to further study the effectiveness of anomaly detection in network traffic, a real data set will be used.

Keywords


Network attacks; feature informativeness; random forest; Firefly algorithm; NSL-KDD.

References


Dua S., Du X. Data mining and machine learning in cybersecurity. Boca Raton, FL, CRC Press, 2011, 256 p. DOI: 10.1201/b10867

Saxe J. Why security data science matters and how its different: pitfalls and promises of data science based breach detection and threat intelligence [Electronic resource], 2015, Access mode: https://www.blackhat.com/us-15/speakers/Joshua-Saxe.html

Gates C., Taylor C. Challenging the anomaly detection paradigm: a provocative discussion, Proceedings of the Workshop on New Security Paradigms, 2007, pp. 21–29. DOI: 10.1145/ 505202.505211

Molina L. C., Belanche L., Nebot A. Feature selection algorithms: a survey and experimental evaluation, Proceedings of IEEE International Conference on Data Mining, 2002, pp. 306–313. DOI: 10.1109/ICDM.2002.1183917

Yang X.-S. Firefly algorithms for multimodal optimization, Stochastic Algorithms: Foundations and Applications, 2009, Vol. 5792, pp. 169–178. DOI: 10.1007/978-3-642-04944-6_14

Breiman, L. Random forests, Machine Learning, 2001, No. 1, pp. 5–32. DOI: 10.1023/A:1010933404324

Random forests – Classification manual [Electronic resource], 2017, Access mode: http://www.math.usu.edu/adele/Forests/

Strobl C., Zeileis A. Danger: High power! – exploring the statistical properties of a test for random forest variable importance, Proceedings in Computational Statistics, 2008, pp. 59–66.

Xue B., Zhang M., Browne W. N. Particle swarm optimization for feature selection in classification: Novel initialization and updating mechanisms, Applied Soft Computing, 2014, Vol. 18, pp. 261– 276. DOI: 10.1109/TSMCB.2012.2227469

Feng D., Chen F., Xu W. Supervised feature subset selection with ordinal optimization, Knowledge-Based Systems, 2014, Vol. 56, pp. 123–140. DOI: 10.1016/j.knosys.2013.11.004

Bouaguel W., Mufti G. B., Limam M. A fusion approach based on wrapper and filter feature selection methods using majority vote and feature weighting, Proceedings of the International Conference on Computer Applications Technology, 2013, pp. 1–6. DOI: 10.1109/ICCAT.2013.6522003

Wang G., Ma J., Yang S. An improved boosting based on feature selection for corporate bankruptcy prediction, Expert Systems with Applications, 2014, Vol. 41, No. 5, pp. 2353–2361. DOI: 10.1016/j.eswa.2013.09.033

Srinivasa K. G. Application of Genetic Algorithms for Detecting Anomaly in Network Intrusion Detection Systems, Advances in Computer Science and Information Technology. Networks and Communications, 2012, Vol. 84, pp. 582–591. DOI: 10.1007/ 978-3-642-27299-8_61

Yu K. M., Wu M. F., Wong W. T. Protocol-based classification for intrusion detection, Applied Computer and Applied Computational Science, 2008, Vol. 3, No. 3, pp. 135–141.

Akbar S., Nageswara R. K., Chandulal J. A. Intrusion detection system methodologies based on data analysis, International Journal of Computer Applications, 2010, Vol. 5, No. 2, pp. 10–20. DOI: 10.5120/892-1266

Sethuramalingam S., Naganathan E. R. Hybrid feature selection for network intrusion detection, International Journal of Computer Science and Engineering, 2011, Vol. 3, No. 5, pp. 1773–1780. DOI: 10.4225/75/57a84d4fbefbb

Banati H., Bajaj M. Fire Fly based feature selection approach, IJCSI International Journal of Computer Science Issues, 2011, Vol. 8, № 4, pp. 473–80.

Hothorn T., Hornik K., Zeileis A. Unbiased recursive partitioning: a conditional inference framework, Journal of Computational and Graphical Statistics, 2006, Vol. 15, No. 3, pp. 651–674. DOI: 10.1198/106186006X133933

Breiman L. Stacked Regressions, Machine Learning, 1996, Vol. 24, pp. 49–64. DOI: 10.1007/BF00117832

Strobl C., Boulesteix A.-L., Kneib T., Augustin T., Zeileis A. Conditional variable importance for random forests, BMC Bioinformatics, 2008, Vol. 9, No. 1, pp. 25. DOI: 10.1186/1471- 2105-9-307

Siroky D. Navigating Random Forests and related advances in algorithmic modeling, Statistics Surveys, 2009, Vol. 3, pp. 147–163. DOI: 10.1214/07-SS033

Archer K. J., Kimes R. V. Empirical characterization of random forest variable importance measures, Computational Statistics & Data Analysis, 2008, No. 4, pp. 2249–2260. DOI: 10.1016/ j.csda.2007.08.015

Strobl C., Boulesteix A.-L., Zeileis A., Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution, BMC Bioinformatics, 2007, Vol. 8, No. 1, pp. 1471– 2105. DOI: 10.1186/1471-2105-8-25

Liaw A., Wiener M. Classiыcation and Regression by randomForest. R News, 2002, Vol. 2, No. 3, pp. 18–22.

Aggarwal P., Sharma S. K. Analysis of KDD dataset attributesclass wise for intrusion detection, Procedia Computer Science, 2015, Vol. 57, pp. 842–851. DOI: 10.1016/j.procs.2015.07.490

McHugh J. Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by lincoln laboratory, ACM Transactions on Information and System Security, 2000, Vol. 3, No. 4, pp. 262– 294. DOI: 10.1145/382912.382923

Tavallaee M., Bagheri E., Lu W., Ghorbani A. A detailed analysis of the KDD CUP 99 Data Set, Proceedings of the second IEEE Symposium on Computational Intelligence for Security and Defense Applications, 2009, pp. 53–58. DOI: 10.1109/ CISDA.2009.5356528

NSL-KDD data set for network-based intrusion detection systems [Electronic resource], 2017, Access mode: http://nsl.cs.unb.ca/ NSL-KDD/

Davis J. J., Clark A. J. Data preprocessing for anomaly based network intrusion detection: A review, Computers & Security, 2011, Vol. 30, No. 6–7, pp. 353–375. DOI: 10.1016/j.cose.2011.05.008

Holz T. 13 security measurements and metrics for networks, Dependability Metrics, 2008, pp. 157–165. DOI: 10.1007/978-3-540-68947-8_13


GOST Style Citations


1. Dua S. Data mining and machine learning in cybersecurity / S. Dua, X. Du. – Boca Raton, FL: CRC Press, 2011. – 256 p. DOI: 10.1201/b10867 2. Saxe J. Why security data science matters and how its different: pitfalls and promises of data science based breach detection and threat intelligence [Electronic resource]. – 2015. – Access mode: https://www.blackhat.com/us-15/speakers/Joshua-Saxe.html

3. Gates C. Challenging the anomaly detection paradigm: a provocative discussion / C. Gates, C.Taylor // Proceedings of the Workshop on New Security Paradigms. – 2007. – P. 21–29. DOI: 10.1145/505202.505211

4. Molina L. C. Feature selection algorithms: a survey and experimental evaluation / L. C. Molina, L. Belanche, A. Nebot // Proceedings of IEEE International Conference on Data Mining. – 2002. – P. 306–313. DOI: 10.1109/ICDM.2002.1183917

5. Yang X.-S. Firefly algorithms for multimodal optimization / X.-S. Yang // Stochastic Algorithms: Foundations and Applications. – 2009. – Vol. 5792. – P. 169–178. DOI: 10.1007/978-3-642- 04944-6_14

6. Breiman, L. Random forests / L. Breiman // Machine Learning. – 2001. – № 1. – P. 5–32. DOI: 10.1023/A:1010933404324

7. Random forests – Classification manual [Electronic resource]. – 2017. Access mode: http://www.math.usu.edu/adele/Forests/

8. Strobl, C. Danger: High power! – exploring the statistical properties of a test for random forest variable importance / C. Strobl, A. Zeileis // Proceedings in Computational Statistics. – 2008. – P. 59–66.

9. Xue B. Particle swarm optimization for feature selection in classification: Novel initialization and updating mechanisms / B. Xue, M. Zhang, W. N. Browne // Applied Soft Computing. – 2014. – Vol. 18. – P. 261–276. DOI: 10.1109/ TSMCB.2012.2227469

10. Feng D. Supervised feature subset selection with ordinal optimization / D. Feng, F. Chen, W. Xu // Knowledge-Based Systems. – 2014. – Vol. 56. – P. 123–140. DOI: 10.1016/ j.knosys.2013.11.004

11. Bouaguel W. A fusion approach based on wrapper and filter feature selection methods using majority vote and feature weighting / W. Bouaguel, G. B. Mufti, M. Limam // Proceedings of the International Conference on Computer Applications Technology. – 2013. – P. 1–6. DOI: 10.1109/ICCAT.2013.6522003

12.Wang G. An improved boosting based on feature selection for corporate bankruptcy prediction / G. Wang, J. Ma, S. Yang // Expert Systems with Applications. – 2014. – Vol. 41, № 5. – P. 2353–2361. DOI: 10.1016/j.eswa.2013.09.033

13. Srinivasa K. G. Application of Genetic Algorithms for Detecting Anomaly in Network Intrusion Detection Systems / K. G. Srinivasa // Advances in Computer Science and Information Technology. Networks and Communications. – 2012. – Vol. 84. – P. 582– 591. DOI: 10.1007/978-3-642-27299-8_61

14.Yu K. M. Protocol-based classification for intrusion detection / K. M. Yu, M. F. Wu, W. T. Wong // Applied Computer and Applied Computational Science. – 2008. – Vol. 3, № 3. – P. 135–141.

15. Akbar S. Intrusion detection system methodologies based on data analysis / S. Akbar, R. K. Nageswara, J. A. Chandulal // International Journal of Computer Applications. – 2010. – Vol. 5, № 2. – P. 10–20. DOI: 10.5120/892-1266

16. Sethuramalingam S. Hybrid feature selection for network intrusion detection / S. Sethuramalingam, E. R. Naganathan // International Journal of Computer Science and Engineering. – 2011. – Vol. 3, № 5. – P. 1773–1780. DOI: 10.4225/75/ 57a84d4fbefbb

17. Banati H. Fire Fly based feature selection approach / H. Banati, M. Bajaj // IJCSI International Journal of Computer Science Issues. – 2011. – Vol. 8, № 4. – P. 473–80.

18. Hothorn T. Unbiased recursive partitioning: a conditional inference framework / T. Hothorn, K. Hornik, A. Zeileis // Journal of Computational and Graphical Statistics. – 2006. – Vol. 15, № 3. – P. 651–674. DOI: 10.1198/106186006X133933

19. Breiman L. Stacked Regressions / L. Breiman // Machine Learning. – 1996. – Vol. 24. – P. 49–64. DOI: 10.1007/BF00117832

20. Strobl C. Conditional variable importance for random forests / C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin, A. Zeileis // BMC Bioinformatics. – 2008. – Vol. 9, № 1. – P. 25. DOI: 10.1186/1471-2105-9-307

21. Siroky D. Navigating Random Forests and related advances in algorithmic modeling / D. Siroky // Statistics Surveys. – 2009. – Vol. 3. – P. 147–163. DOI: 10.1214/07-SS033

22. Archer K. J. Empirical characterization of random forest variable importance measures / K. J. Archer, R. V. Kimes // Computational Statistics & Data Analysis. – 2008. – № 4. – P. 2249–2260. DOI: 10.1016/j.csda.2007.08.015

23. Strobl C. Bias in random forest variable importance measures: illustrations, sources and a solution / C. Strobl, A.-L. Boulesteix, A. Zeileis, T. Hothorn // BMC Bioinformatics. – 2007. – Vol. 8, № 1. – P. 1471–2105. DOI: 10.1186/1471-2105-8-25

24. Liaw A. Classiыcation and Regression by randomForest / A. Liaw, M. Wiener // R News. – 2002. – Vol. 2, № 3. – P. 18–22.

25. Aggarwal P. Analysis of KDD dataset attributes-class wise for intrusion detection / P. Aggarwal, S. K. Sharma // Procedia Computer Science. – 2015. – Vol. 57. – P. 842–851. DOI: 10.1016/ j.procs.2015.07.490

26. McHugh J. Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by lincoln laboratory / J. McHugh // ACM Transactions on Information and System Security. – 2000. – Vol. 3, № 4. – P. 262–294. DOI: 10.1145/382912.382923

27. Tavallaee M. A detailed analysis of the KDD CUP 99 Data Set / M. Tavallaee, E. Bagheri, W. Lu, A. Ghorbani // Proceedings of the second IEEE Symposium on Computational Intelligence for Security and Defense Applications. – 2009. – P. – 53–58. DOI: 10.1109/CISDA.2009.5356528

28. NSL-KDD data set for network-based intrusion detection systems [Electronic resource]. – 2017. – Access mode: http://nsl.cs.unb.ca/ NSL-KDD/

29. Davis J. J. Data preprocessing for anomaly based network intrusion detection: A review / J. J. Davis, A. J. Clark // Computers & Security. – 2011. – Vol. 30, № 6–7. – P. 353–375. DOI: 10.1016/ j.cose.2011.05.008

30. Holz T. 13 security measurements and metrics for networks / T. Holz // Dependability Metrics. – 2008. – P. 157–165. DOI: 10.1007/978-3-540-68947-8_13





DOI: https://doi.org/10.15588/1607-3274-2017-3-13



Copyright (c) 2017 Y. N. Imamverdiyev, L. V. Sukhostat

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
Zaporizhzhya National Technical University, 
Zhukovskiy street, 64, Zaporizhzhya, 69063, Ukraine. 
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
E-mail: rvv@zntu.edu.ua

The reference to the journal is obligatory in the cases of complete or partial use of its materials.