Robust anomaly detection methods for contamination network data
372 viewsDOI:
https://doi.org/10.54939/1859-1043.j.mst.79.2022.41-51Keywords:
Anomaly detection; Latent representation; One-class classification; Contamination.Abstract
Recently, latent representation models, such as Shrink Autoencoder (SAE), have been demonstrated as robust feature representations for one-class learning-based network anomaly detection. In these studies, benchmark network datasets that are processed in laboratory environments to make them completely clean are often employed for constructing and evaluating such models. In real-world scenarios, however, we can not guarantee 100% to collect pure normal data for constructing latent representation models. Therefore, this work aims to investigate the characteristics of the latent representation of SAE in learning normal data under some contamination scenarios. This attempts to find out wherever the latent feature space of SAE is robust to contamination or not, and which contamination scenarios it prefers. We design a set of experiments using normal data contaminated with different anomaly types and different proportions of anomalies for the investigation. Other latent representation methods such as Denoising Autoencoder (DAE) and Principal component analysis (PCA) are also used for comparison with the performance of SAE. The experimental results on four CTU13 scenarios show that the latent representation of SAE often out-performs and are less sensitive to contamination than the others.
References
[1]. A. Zimek, E. Schubert, and H.-P. Kriegel, “A survey on unsupervised outlier detection in high-dimensional numerical data,” Statistical Analysis and Data Mining, vol. 5, no. 5, pp. 363–387, 2012.
[2]. G. Pang, L. Cao, and C. Aggarwal, “Deep learning for anomaly detection: Challenges, methods, and opportunities,” in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 1127–1130, 2021.
[3]. G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, “Deep learning for anomaly detection: A review,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2021.
[4]. V. L. Cao, M. Nicolau, and J. McDermott, “Learning neural representations for network anomaly detection,” IEEE Transactions on Cybernetics, no. 99, pp. 1–14, 2018.
[5]. V. L. Cao, M. Nicolau, and J. McDermott, “A hybrid autoencoder and density estimation model for anomaly detection,” in Parallel Problem Solving from Nature, pp. 717–726, Springer, 2016.
[6]. S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie, “High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning,” in Pattern Recognition 58, p. 121–134, 2016.
[7]. H. N. Nguyen, V. C. Nguyen, N. N. Tran, and V. L. Cao, “Feature representation of autoencoders for unsupervised iot malware detection,” in International Conference on Future Data and Security Engineering, pp. 272–290, Springer, 2021.
[8]. A. S. Iliyasu, U. A. Abdurrahman, and L. Zheng, “Few-shot network intrusion detection using discriminative representation learning with supervised autoencoder,” Applied Sciences, vol. 12, no. 5, p. 2351, 2022.
[9]. G. Pang, L. Cao, L. Chen, and H. Liu, “Learning representations of ultrahigh-dimensional data for random distance-based outlier detection,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2041–2050, 2018.
[10]. Y. Zhou, X. Song, Y. Zhang, F. Liu, C. Zhu, and L. Liu, “Feature encoding with autoencoders for weakly-supervised anomaly detection,” IEEE, 2021.
[11]. P. Guansong, S. Chunhua, J. Huidong, and v. d. H. Anton, “Deep weakly-supervised anomaly detection,” arXIV Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2020.