Intent classification for voice-based military information search on digital maps using integrated BiGRU-CNN network and speech recognition technology

Intent classification for voice-based military information search on digital maps using integrated BiGRU-CNN network and speech recognition technology

Authors

  • Dang Duc Thinh Academy of Military Science and Technology
  • Nguyen Duc Vuong Academy of Military Science and Technology
  • Luong Dinh Ha Academy of Military Science and Technology
  • Nguyen Cong Thanh Military Medical Department
  • Nguyen Chi Thanh Academy of Military Science and Technology
  • Phung Nhu Hai Academy of Military Science and Technology

DOI:

https://doi.org/10.54939/1859-1043.j.mst.CSCE8.2024.87-97

Keywords:

Intent classification; ASR; BiGRU-CNN network; Feature extraction; Digital maps.

Abstract

Searching for information is one of the most important functions of software that supports drafting operational documents on digital maps. To enhance usability and meet the demands of modern military operations, it is necessary to automate the information search function using voice commands. A universal voice search tool that supports searches for various types of information requires an initial step of search intent classification. This paper proposes the development of a search intent classification process using an integrated BiGRU-CNN network and automatic speech recognition technology (ASR). The BiGRU-CNN network leverages the advantages of both BiGRU and CNN models to improve the efficiency of classifying text data converted from speech using the Whisper model. The paper compares the proposed method with those that use separate machine learning models combined with feature extraction methods such as TF-IDF, N-gram, and SVD. While the ASR model used in this research still has constraints, experimental results show that the accuracy of search intent classification reaches up to 98.4%. This result is higher than that of compared methods using simpler machine learning models, demonstrating the effectiveness of the proposed method.

References

[1]. Dang Duc Thinh et al., “A voice search engine for military symbols to enhance the drafting of operational plan documents on digital map,” Journal of Military Science and Technology, Vol.87, pp. 40-49, (2023), https://doi.org/10.54939/1859-1043.j.mst.87.2023.40-49. DOI: https://doi.org/10.54939/1859-1043.j.mst.87.2023.40-49

[2]. Nguyen Duc Dinh, Hoang Van Toan, “System Design Documentation of T3BD Sys-tem,” MITI, (2020).

[3]. Phung Nhu Hai et al., “Enhancing Whisper Model for Vietnamese Specific Domain with Data Blending and LoRA Fine-Tuning,” ICISN, LNNS 1077, pp. 1–11, (2024). DOI: https://doi.org/10.1007/978-981-97-5504-2_18

[4]. Radford, A., et al., “Robust Speech Recognition via Large-ScaleWeak Supervision,” arXiv, (2022), https://doi.org/10.48550/arXiv.2212.04356.

[5]. Monil Charola et al., “Whisper Encoder features for Infant Cry Classification,” Proc. INTERSPEECH, pp. 1773-1777, (2023), doi: 10.21437/Interspeech.2023-1916. DOI: https://doi.org/10.21437/Interspeech.2023-1916

[6]. Loren Lugosch et al., “Speech Model Pre-training for End-to-End Spoken Language Understanding,” arXiv, (2019), https://doi.org/10.48550/arXiv.1904.03670 DOI: https://doi.org/10.21437/Interspeech.2019-2396

[7]. J. W. Kim et al., “Improved Spoken Language Representation for Intent Understanding in a Task-Oriented Dialogue System,” Sensors, 22(4), 1509, (2022), https://doi.org/10.3390/s22041509. DOI: https://doi.org/10.3390/s22041509

[8]. S. Mansalis, “Natural language understanding for dialogue systems using n-best lists,” M.S. thesis, Department of Informatics, Athens University of Economics and Business, (2019).

[9]. Li X, Zhang Y, Jin J, Sun F, Li N, Liang S, “A model of integrating convolution and BiGRU dual-channel mechanism for Chinese medical text classifications,” PLoS ONE 18(3): e0282824. https://doi.org/10.1371/journal.pone.0282824. DOI: https://doi.org/10.1371/journal.pone.0282824

[10]. Wei Yan et al., “Sentiment Analysis of Student Texts Using the CNN-BiGRU-AT Model,” Scientific Programming, (2021), https://doi.org/10.1155/2021/8405623 DOI: https://doi.org/10.1155/2021/8405623

[11]. Shahzad Qaiser, Ramsha Ali, “Text Mining: Use of TF-IDF to Examine the Rele-vance of Words to Documents,” International Journal of Computer Applications, vol. 181, no. 1, (2018), https://doi.org/10.5120/ijca2018917395. DOI: https://doi.org/10.5120/ijca2018917395

[12]. William Cavnar, John M. Trenkle, “N-Gram-Based Text Categorization,” Environ-mental Research Institute of Michigan, (2001).

[13]. Serge Shishkin, Arkadi Shalaginov, Shaunak D. Bopardikar, “Fast approximate truncated SVD,” Numerical Linear Algebra with Applications, vol. 26, no. 1, (2019), https://doi.org/10.1002/nla.2246. DOI: https://doi.org/10.1002/nla.2246

[14]. Diederik P. Kingma, Jimmy Ba, “Adam: A Method for Stochastic Optimization,” arXiv, (2014), https://doi.org/10.48550/arXiv.1412.6980.

[15]. Anqi Mao et al., “Cross-Entropy Loss Functions: Theoretical Analysis and Applica-tions,” arXiv, (2023), https://doi.org/10.48550/arXiv.2304.07288.

Downloads

Published

2024-12-30

How to Cite

[1]
D. T. Dang, Nguyen Duc Vuong, Luong Dinh Ha, Nguyen Cong Thanh, Nguyen Chi Thanh, and N. H. Phung, “Intent classification for voice-based military information search on digital maps using integrated BiGRU-CNN network and speech recognition technology”, JMST’s CSCE, no. CSCE8, pp. 87–97, Dec. 2024.

Issue

Section

Articles
Loading...