Evaluating the effectiveness of Discriminator network in GAN architecture for phishing URL classification

217 views

Authors

  • Pham Thi Thanh Thuy Faculty of Information Security, Academy of People Security
  • Ta Viet Cuong (Corresponding Author) HMI lab, VNU University of Engineering and Technology

DOI:

https://doi.org/10.54939/1859-1043.j.mst.86.2023.110-119

Keywords:

Phishing URL detection; GAN; Discriminator-based classification.

Abstract

Phishing attack by illegitimate URLs is of the most common security challenges for both individuals and companies in ensuring the security of their information resources. The user passwords, credit card information, or other sensitive information can be stolen by clicking on the malicious URL links. Recently, machine learning based approach is being popularly applied to detect phishing URLs. The classifiers, such as SVM, Random Forest, LSTM, etc., are built on the standard datasets to make a prediction about a URL sample is malign or benign one. Some recent researches focus on using GAN network for enrichment of malicious URL samples utilized in classifier training based on deep learning models. In this work, we explore the ability of training a standard GAN architecture which consists of two adversarial networks of Discriminator and Generator. The URL samples are generated by the Generator network will be refined and feed backed to the Generator by the Discriminator. This helps the Generator generate URL samples that are more and more similar to the real ones. Accordingly, the Discriminant network also learns the malicious and clean characteristics of the URL patterns. In order to evaluate the effectiveness of this learning, the experiments are conducted on completely new testing datasets beyond the training datasets. The experimental results are promising with the classification accuracy of both malign and benign URLs are about 97%.

References

[1]. R. Srinivasa Rao, A. R. Pais, “Detecting phishing websites using automation of human behavior”, in: Proceedings of the 3rd ACM workshop on cyber-physical system security, pp. 33–42, (2017). DOI: https://doi.org/10.1145/3055186.3055188

[2]. C. L. Tan et al., “Phishwho: Phishing webpage detection via identity keywords extraction and target domain name finder”, Decision Support Systems 88, pp. 18–27, (2016). DOI: https://doi.org/10.1016/j.dss.2016.05.005

[3]. D. L. Cook, V. K. Gurbani, M. Daniluk, “Phishwish: a stateless phishing filter using minimal rules”, in: International conference on financial cryptography and data security, Springer, pp. 182–186, (2008). DOI: https://doi.org/10.1007/978-3-540-85230-8_15

[4]. L. Xu, Z. Zhan, S. Xu, K. Ye, “Cross-layer detection of malicious websites”, in: Proceedings of the third ACM conference on Data and application security and privacy, pp. 141–152, (2013).

[5]. B. Eshete, A. Villafiorita, K. Weldemariam, ‘Binspect: Holistic analysis and detection of malicious web pages”, in: International conference on security and privacy in communication systems, Springer, pp. 149–166, (2012). DOI: https://doi.org/10.1007/978-3-642-36883-7_10

[6]. A. Blum, B. Wardman, T. Solorio, G. Warner, “Lexical feature based phishing url detection using online learning”, in: ACM Workshop on Artificial Intelligence and Security, pp. 54–60, (2010). DOI: https://doi.org/10.1145/1866423.1866434

[7]. Madhu Chandra, S., K. T. Chandrashekar. "Malicious url detection using extreme gradient boosting technique", International Research Journal of Modernization in Engineering Technology and Science, Volume:02, Issue:10, pp. 675-682, (2020).

[8]. J. Saxe, K. Berlin, “Expose: A character-level convolutional neural network with embeddings for detecting malicious urls”, file paths and registry keys, arXiv preprint arXiv:1702.08568.

[9]. P. Yang, G. Zhao, P. Zeng, “Phishing website detection based on multidimensional features driven by deep learning”, IEEE access 7, pp. 15196–15209, (2019). DOI: https://doi.org/10.1109/ACCESS.2019.2892066

[10]. Y. Huang, Q. Yang, J. Qin, W. Wen, “Phishing url detection via cnn and attention-based hierarchical rnn”, in: 18th IEEE International Conf. On TrustCom/BigDataSE, pp. 112–119, (2019). DOI: https://doi.org/10.1109/TrustCom/BigDataSE.2019.00024

[11]. A. AlEroud, G. Karabatis, “Bypassing detection of url-based phishing attacks using generative adversarial deep neural networks”, in: Proceedings of the Sixth International Workshop on Security and Privacy Analytics, pp. 53–60, (2020). DOI: https://doi.org/10.1145/3375708.3380315

[12]. T. T. T. Pham, V. N. Hoang, T. N. Ha, “Exploring efficiency of character-level convolution neuron network and long short term memory on malicious url detection”, in: Proceedings of the 2018 VII International Conference on Network, Communication and Computing, pp. 82–86, (2018).

[13]. S. A. Kamran, S. Sengupta, A. Tavakkoli, “Semi-supervised conditional gan for simultaneous generation and detection of phishing urls: A game theoretic perspective”, arXiv preprint arXiv:2108.01852.

[14]. P. Robic-Butez, T. Y. Win, “Detection of phishing websites using generative adversarial network”, in: IEEE International Conference on Big Data. pp. 3216–3221, (2019). DOI: https://doi.org/10.1109/BigData47090.2019.9006352

[15]. H. V. Chi, “Xây dựng cơ sở dữ liệu huấn luyện phục vụ phát hiện URL độc hại”, http://www.antoanthongtin.vn/gp-atm/ (2020) (in Vietnamese).

Downloads

Published

28-04-2023

How to Cite

[1]
P. T. T. T. pham and T. V. C. Ta, “Evaluating the effectiveness of Discriminator network in GAN architecture for phishing URL classification”, JMST, vol. 86, no. 86, pp. 110–119, Apr. 2023.

Issue

Section

Research Articles