FGSM Attack on CNN-based image classifiers: Vulnerability analysis and an effective defense strategy

6 views

Authors

  • Doan Huong Giang Faculty of Control and Automation, Electric Power University
  • Pham Thi Thanh Thuy (Corresponding Author) Khoa An ninh mạng và phòng, chống tội phạm sử dụng công nghệ cao, Học viện An ninh nhân dân

DOI:

https://doi.org/10.54939/1859-1043.j.mst.104.2025.155-163

Keywords:

Convolutional Neural Network (CNN); Adversarial attack; Defense; Adversarial training; Regularization technique.

Abstract

Convolutional Neural Networks (CNNs) have demonstrated significant advantages and have, therefore, been widely applied across various domains. However, adversarial attacks have exposed critical vulnerabilities in these models, posing threats to the security and reliability of deep learning systems. Although numerous studies have investigated adversarial attacks on deep learning models, the specific impact of such attacks on CNN-based image classifiers remains an open issue, especially considering that many widely-used CNN models form the foundation of essential real-world applications. This study analyzes the vulnerabilities of CNN image classifiers under the Fast Gradient Sign Method (FGSM) adversarial attack and proposes an effective defense strategy named WR_FGSM. Experimental results on standard benchmark datasets show that several CNN models suffer significantly from FGSM attacks. The adversarial images generated by this attack not only deceive CNN-based image classifiers but also appear visually indistinguishable from the original images to the human vision. Our proposed WR_FGSM defense incorporates adversarial training—one of the most effective existing defense strategies—along with a regularization technique during the training process. This approach effectively safeguards CNN models against FGSM attacks while maintaining a balance between adversarial robustness and the generalization capability of the models.

References

[1]. Juanjuan Weng, Zhiming Luo, Dazhen Lin, Shaozi Li. “Comparative evaluation of recent universal adversarial perturbations in image classification”. Computers & Security, 136 (2024), 103576. (2024). DOI: https://doi.org/10.1016/j.cose.2023.103576

[2]. Jaydip Sen, Abhiraj Sen, and Ananda Chatterjee. “Adversarial Attacks on Image Classification Mod-els: Analysis and Defense”. arXiv preprint arXiv:2312.16880, (2023).

[3]. Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. “One pixel attack for fooling deep neu-ral networks”. IEEE Transactions on Evolutionary Computation 23, 5, 828–841, (2019). DOI: https://doi.org/10.1109/TEVC.2019.2890858

[4]. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. “Deepfool: a simple and accurate method to fool deep neural networks”. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2574–2582, (2016).

[5]. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. “Towards deep learning models resistant to adversarial attacks”. arXiv preprint arXiv:1706.06083 (2017)

[6]. Jianbo Chen, Michael I Jordan, and Martin J Wainwright. “Hopskipjumpattack: A query-efficient decision-based attack”. In 2020 ieee symposium on security and privacy (sp). 1277–1294, (2020). DOI: https://doi.org/10.1109/SP40000.2020.00045

[7]. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples”. arXiv preprint arXiv:1412.6572, (2014).

[8]. Aleksander Mkadry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. “Towards deep learning models resistant to adversarial attacks”. stat 1050, 9, (2017).

[9]. Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. “Adversarial examples in the physical world”. In Artificial intelligence safety and security. Chapman and Hall/CRC, 99–112, (2018). DOI: https://doi.org/10.1201/9781351251389-8

[10]. Nicholas Carlini and David Wagner. “Towards evaluating the robustness of neural networks”. In 2017 ieee symposium on security and privacy (sp). pp. 39–57, (2017). DOI: https://doi.org/10.1109/SP.2017.49

[11]. Chen, P., Zhang, H., Sharma, Y., Yi, J., & Hsieh, C. J. "ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models". arXiv preprint arXiv:1708.03999, (2017). DOI: https://doi.org/10.1145/3128572.3140448

[12]. Ilyas, A., Engstrom, L., Athalye, A., & Lin, J. "Black-box adversarial attacks with limited queries and information". Proceedings of the 35th International Conference on Machine Learning (ICML), (2018).

[13]. Brendel, W., Rauber, J., & Bethge, M. "Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models". arXiv preprint arXiv:1712.04248, (2018).

[14]. Moosavi-Dezfooli, S. M., Fawzi, A., & Frossard, P. "DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016). DOI: https://doi.org/10.1109/CVPR.2016.282

[15]. Erh-Chung Chen and Che-Rung Lee. “Towards fast and robust adversarial training for image clas-sification”. In Proceedings of the Asian Conference on Computer Vision, (2020). DOI: https://doi.org/10.1007/978-3-030-69535-4_35

[16]. Xu, W., Evans, D., & Qi, Y. “Feature squeezing: Detecting adversarial examples in deep neural net-works”. Network and Distributed Systems Security (NDSS) Symposium, (2018). https://arxiv.org/abs/1704.01155 DOI: https://doi.org/10.14722/ndss.2018.23198

[17]. Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. “The limitations of deep learning in adversarial settings”. In 2016 IEEE European symposi-um on security and privacy (EuroS&P). pp. 372–387, (2016). DOI: https://doi.org/10.1109/EuroSP.2016.36

[18]. Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. “In-verted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Seg-mentation”. CoRR abs/1801.04381, (2018).

[19]. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition”. In The IEEE Conference on Computer Vision and Pattern Recognition. 770–778, (2016). doi:10.1109/CVPR.2016.90 DOI: https://doi.org/10.1109/CVPR.2016.90

[20]. G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger. “Densely Connected Convolutional Net-works”. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261–2269, (2017). DOI: https://doi.org/10.1109/CVPR.2017.243

[21]. Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. “Gradient-based learning applied to document recognition”. In Proceedings of the IEEE, Vol. 86. pp. 2278–2324, (1998). DOI: https://doi.org/10.1109/5.726791

[22]. Alex Krizhevsky. “Learning multiple layers of features from tiny images”. In Master’s thesis, Uni-versity of Toronto. Toronto, Canada, (2009).

[23]. Huong-Giang Doan, Ngoc-Trung Nguyen, “New blender-based augmentation method with quantita-tive evaluation of CNNs for hand gesture recognition”, Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), Vol. 30, No.2, pp. 796–806, pp. 214-221, (2023). DOI: https://doi.org/10.11591/ijeecs.v30.i2.pp796-806

Downloads

Published

25-06-2025

How to Cite

[1]
H.-G. Doan and T. T. T. Pham Thanh, “FGSM Attack on CNN-based image classifiers: Vulnerability analysis and an effective defense strategy ”, JMST, vol. 104, no. 104, pp. 155–163, Jun. 2025.

Issue

Section

Information technology & Applied mathematics