Robust anomaly detection methods for contamination network data

372 views

Authors

  • Nguyen Manh Tuan Cyberspace Operation Command, Hanoi, Vietnam
  • Nguyen Hai Hao Cyberspace Operation Command, Hanoi, Vietnam
  • Dang Le Dinh Trang Faculty of Information Technology, Le Quy Don Technical University
  • Nguyen Van Tuan Faculty of Information Technology, Le Quy Don Technical University
  • Cao Van Loi (Corresponding Author) Faculty of Information Technology, Le Quy Don Technical University

DOI:

https://doi.org/10.54939/1859-1043.j.mst.79.2022.41-51

Keywords:

Anomaly detection; Latent representation; One-class classification; Contamination.

Abstract

Recently, latent representation models, such as Shrink Autoencoder (SAE), have been demonstrated as robust feature representations for one-class learning-based network anomaly detection. In these studies, benchmark network datasets that are processed in laboratory environments to make them completely clean are often employed for constructing and evaluating such models. In real-world scenarios, however, we can not guarantee 100% to collect pure normal data for constructing latent representation models. Therefore, this work aims to investigate the characteristics of the latent representation of SAE in learning normal data under some contamination scenarios. This attempts to find out wherever the latent feature space of SAE is robust to contamination or not, and which contamination scenarios it prefers. We design a set of experiments using normal data contaminated with different anomaly types and different proportions of anomalies for the investigation. Other latent representation methods such as Denoising Autoencoder (DAE) and Principal component analysis (PCA) are also used for comparison with the performance of SAE. The experimental results on four CTU13 scenarios show that the latent representation of SAE often out-performs and are less sensitive to contamination than the others.

References

[1]. A. Zimek, E. Schubert, and H.-P. Kriegel, “A survey on unsupervised outlier detection in high-dimensional numerical data,” Statistical Analysis and Data Mining, vol. 5, no. 5, pp. 363–387, 2012.

[2]. G. Pang, L. Cao, and C. Aggarwal, “Deep learning for anomaly detection: Challenges, methods, and opportunities,” in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 1127–1130, 2021.

[3]. G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, “Deep learning for anomaly detection: A review,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2021.

[4]. V. L. Cao, M. Nicolau, and J. McDermott, “Learning neural representations for network anomaly detection,” IEEE Transactions on Cybernetics, no. 99, pp. 1–14, 2018.

[5]. V. L. Cao, M. Nicolau, and J. McDermott, “A hybrid autoencoder and density estimation model for anomaly detection,” in Parallel Problem Solving from Nature, pp. 717–726, Springer, 2016.

[6]. S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie, “High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning,” in Pattern Recognition 58, p. 121–134, 2016.

[7]. H. N. Nguyen, V. C. Nguyen, N. N. Tran, and V. L. Cao, “Feature representation of autoencoders for unsupervised iot malware detection,” in International Conference on Future Data and Security Engineering, pp. 272–290, Springer, 2021.

[8]. A. S. Iliyasu, U. A. Abdurrahman, and L. Zheng, “Few-shot network intrusion detection using discriminative representation learning with supervised autoencoder,” Applied Sciences, vol. 12, no. 5, p. 2351, 2022.

[9]. G. Pang, L. Cao, L. Chen, and H. Liu, “Learning representations of ultrahigh-dimensional data for random distance-based outlier detection,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2041–2050, 2018.

[10]. Y. Zhou, X. Song, Y. Zhang, F. Liu, C. Zhu, and L. Liu, “Feature encoding with autoencoders for weakly-supervised anomaly detection,” IEEE, 2021.

[11]. P. Guansong, S. Chunhua, J. Huidong, and v. d. H. Anton, “Deep weakly-supervised anomaly detection,” arXIV Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2020.

Downloads

Published

19-05-2022

How to Cite

Nguyễn, T., Nguyen Hai Hao, Dang Le Dinh Trang, Nguyen Van Tuan, and Cao Van Loi. “Robust Anomaly Detection Methods for Contamination Network Data”. Journal of Military Science and Technology, no. 79, May 2022, pp. 41-51, doi:10.54939/1859-1043.j.mst.79.2022.41-51.

Issue

Section

Research Articles