GanTextKnockoff: stealing text sentiment analysis model functionality using synthetic data

GanTextKnockoff: stealing text sentiment analysis model functionality using synthetic data

Authors

  • Pham Xuan Cong Institute of Information Technology, Academy of Military Science and Technology
  • Hoang Trung Nguyen Le Quy Don Technical University
  • Tran Cao Truong Le Quy Don Technical University
  • Do Viet Binh Institute of Information Technology, Academy of Military Science and Technology

DOI:

https://doi.org/10.54939/1859-1043.j.mst.CSCE8.2024.76-86

Keywords:

Text sentiment; Black-box model stealing; Text generation; GAN.

Abstract

Today, black-box machine learning models are often subject to extraction attacks that aim to retrieve their internal information. Black-box model extraction attacks are typically conducted by providing input data and, based on observing the output results, constructing a new model that functions equivalently to the original. This process is usually carried out by leveraging available data from public repositories or synthetic data generated by generative models. Most model extraction attack methods using synthetic data have been concentrated in the field of computer vision, with minimal research focused on model extraction in natural language processing. In this paper, we propose a method that utilizes synthetic textual data to construct a new model with high accuracy and similarity to the original black-box sentiment analysis model.

References

[1]. Barbalau A., Cosma A., Ionescu A. T. et al., "Black-Box Ripper: Copying black-box models using generative evolutionary algorithms," NeuIPS, (2020).

[2]. Che T., Li Y., Zhang R. X. et al., "Maximum-Likelihood Augmented Discrete Generative Adversarial Networks," (2017).

[3]. Dai C. W., Lv M. X, Li K. et al., "MeaeQ: Mount Model Extraction Attacks with Efficient Queries," presented at the arXiv:2310.14047, (2023). DOI: https://doi.org/10.18653/v1/2023.emnlp-main.781

[4]. Goodfellow I. J., Pouget-Abadie J., Mirza M. et al., "Generative Adversarial Nets," presented at the NIPS, (2014).

[5]. Guo J. X., Lu S., Cai H. et al., "Long Text Generation via Adversarial Training with Leaked Information," in AAAI 2018, vol. 32, (2018). DOI: https://doi.org/10.1609/aaai.v32i1.11957

[6]. Hussein Sherif, "Twitter sentiments dataset, version 1", (2024).

[7]. K. Krishna, G. S. Tomar, A. P. Parikh et al., "Thieves on Sesame Street! Model extraction of BERT-based APIs," (2019).

[8]. Khaled K., Nicolescu G., and Magalhaes F. G. de, "Careful What You Wish For: on the Extraction of Adversarially Trained Models," in IEEE PST, (2022). DOI: https://doi.org/10.1109/PST55820.2022.9851981

[9]. Kingma D. P. and Welling M., "Auto-Encoding Variational Bayes," (2013).

[10]. Li J., Tang T., Zhao W. X. et al., "Pretrained Language Models for Text Generation: A Survey," IJCAI, (2021). DOI: https://doi.org/10.24963/ijcai.2021/612

[11]. Lu S., Yu L., Feng S. et al., "CoT: Cooperative Training for Generative Modeling of Discrete Data," presented at the ICML, (2019).

[12]. Ma X., Shen Y., Fang G. et al., "Adversarial Self-Supervised Data-Free Distillation for Text Classification," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6182–6192, (2020). DOI: https://doi.org/10.18653/v1/2020.emnlp-main.499

[13]. Maas A. L., Daly R. E., Pham P. T. et al., "Learning Word Vectors for Sentiment Analysis," pp. 142–150, (2011).

[14]. Meng Y., Huang J., Zhang Y. et al., "Generating Training Data with Language Models: Towards Zero-Shot Language Understanding," Advances in Neural Information Processing Systems, vol. 35, pp. 462-477, (2022).

[15]. Nie W., Narodytska N., and Patel A., "RelGAN: Relational Generative Adversarial Networks for Text Generation," presented at the ICLR, (2019).

[16]. Orekondy T., Schiele B., and Fritz M., "Knockoff Nets: Stealing Functionality of Black-Box Models," in IEEE/CVF, pp. 4954-4963, (2019). DOI: https://doi.org/10.1109/CVPR.2019.00509

[17]. Pham X. C., Hoang T. N., Tran C. T. et al., "Textknockoff: Knockoff nets for stealing functionality of text sentiment models," Journal of Sccience and Technique, Section on ICT, vol. 13, no. 01, (2024). DOI: https://doi.org/10.56651/lqdtu.jst.v13.n01.821.ict

[18]. Pham X. C., Hoang T. N., Tran C. T. et al., "Adaptive Sampling Technique for Building Knockoff Text Sentiment Models (accepted)," in The 18th IEEE-RIVF International Conference on Computing and Communication Technologies, Da Nang, Vietnam, (2024).

[19]. Plaat Aske, “Deep Reinforcement Learning”. Springer Nature, (2023). DOI: https://doi.org/10.1007/978-981-19-0638-1

[20]. R. Socher, A. Perelygin, J. Y. Wu et al., "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank," (2013). DOI: https://doi.org/10.18653/v1/D13-1170

[21]. Rashid A., Lioutas V., Ghaddar A. et al., "Towards Zero-Shot Knowledge Distillation for Natural Language Processing," in Proceedings ofthe 2021 Conference on Empirical Methods in NLP, pp. 6551–6561, (2021). DOI: https://doi.org/10.18653/v1/2021.emnlp-main.526

[22]. Rosa G. H. de and Papa J. P., "A survey on text generation using generative adversarial networks," Pattern Recognition 119: 108098, (2022). DOI: https://doi.org/10.1016/j.patcog.2021.108098

[23]. S. Kariyappa, A. Prakash, and Qureshi M. K, "MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation," (2022). DOI: https://doi.org/10.1109/CVPR46437.2021.01360

[24]. Sutton R. S. and Barto A. G., “Reinforcement Learning: An Introduction”. London, England: The MIT Press, (2015).

[25]. Truong JB., Maini P., Walls R. J. et al., "Data-Free Model Extraction," IEEE/CVF, (2021). DOI: https://doi.org/10.1109/CVPR46437.2021.00474

[26]. V. Sanh, L. Debut, J. Chaumond et al., "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter," NeurIPS, (2019).

[27]. W. Wang, B. Yin, T. Yao et al., "Delving into Data: Effectively Substitute Training for Black-box Attack," CVPR, (2021). DOI: https://doi.org/10.1109/CVPR46437.2021.00473

[28]. W. Wu, J. Zhang, Wei V. J. et al., "Practical and Efficient Model Extraction of Sentiment Analysis APIs," presented at the ICSE 45, (2023).

[29]. Wang K. and Wan X., "SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks," presented at the IJCAI-18, (2018). DOI: https://doi.org/10.24963/ijcai.2018/618

[30]. X. He, L. Lyu, Q. Xu et al., "Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!," (2021).

[31]. Xu J., Ren X., Lin J. et al., "DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text," EMNLP, (2018).

[32]. Y. Liu, M. Ott, N. Goyal et al., "RoBERTa: A Robustly Optimized BERT Pretraining Approach," arXiv:1907.11692v1, (2019).

[33]. Yang P., Wu Q., and Zhang X., "Efficient Model Extraction by Data Set Stealing, Balancing, and Filtering," IEEE IoT Journal, vol. 10, no. 24, (2023). DOI: https://doi.org/10.1109/JIOT.2023.3304345

[34]. Ye Z., Luo W., Naseem M. L. et al., "C2FMI: Corse-to-Fine Black-Box Model Inversion Attack," IEEE Transactions On Dependable And Secure Computing, vol. 21, no. 3, (2024). DOI: https://doi.org/10.1109/TDSC.2023.3285071

[35]. Yu L., Zhang W., Wang J. et al., "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient," AAAI, (2017). DOI: https://doi.org/10.1609/aaai.v31i1.10804

[36]. Z. Liu, J. Wang, and Liang Z., "CatGAN: Category-aware Generative Adversarial Networks with Hierarchical Evolutionary Learning for Category Text Generation," AAAI, (2020). DOI: https://doi.org/10.1609/aaai.v34i05.6361

[37]. Z. Yang, Z. Dai, Y. Yang et al., "XLNet: Generalized Autoregressive Pretraining for Language Understanding," arXiv:1906.08237v2, (2019).

[38]. Zhang J., Li B., Xu J. et al., "Towards Efficient Data Free Black-box Adversarial Attack," in IEEE/CVF, pp. 15115-15125, (2022). DOI: https://doi.org/10.1109/CVPR52688.2022.01469

[39]. Zhou M., Wu J., Liu Y. et al., "DaST: Data-free Substitute Training for Adversarial Attacks," in IEEE/CVF, pp. 234-243, (2020). DOI: https://doi.org/10.1109/CVPR42600.2020.00031

Downloads

Published

2024-12-30

How to Cite

[1]
C. Pham, T.-N. Hoang, C.-T. Tran, and V.-B. Do, “GanTextKnockoff: stealing text sentiment analysis model functionality using synthetic data ”, JMST’s CSCE, no. CSCE8, pp. 76–86, Dec. 2024.

Issue

Section

Articles
Loading...