A voice search engine for military symbols to enhance the drafting of operational plan documents on digital map
263 viewsDOI:
https://doi.org/10.54939/1859-1043.j.mst.87.2023.40-49Keywords:
Voice search; Feature extraction; Cosine Similarity; Military symbols; Digital map.Abstract
The process of searching for information to serve the construction of operational plan documents on a digital map is still being done manually and needs to be automated in order to improve efficiency. Speech recognition and natural language processing technologies, commonly used in chatbots, virtual assistants, voice commands, and voice search, could be promising tools to overcome this problem. This paper proposes a framework for deploying a voice search engine that uses Whisper, a deep learning-based automatic speech recognition model, and combines TF-IDF, N-gram, and Truncated SVD as feature extraction approaches to search for text ground truth in a dictionary of military symbols using Cosine similarity. Despite the small size of a custom dataset, the experiments show promising results, achieving an accuracy of 82.00%. Our achievement surpasses that of several traditional statistical methods and classification models.
References
[1]. Nguyen Duc Dinh, Hoang Van Toan, “System Design Documentation of T3BD System”, (2020).
[2]. Arthur Brown, “How Does Voice Recognition Work?”, (2021) [Online]. Available: https://www.makeuseof.com/how-does-voice-recognition-work.
[3]. Sethunya R Joseph, Hlomani Hlomani, Keletso Letsholo, Freeson Kaniwa, Kutlwano Sedimo, “Natural Language Processing: A Review”, International Journal of Research in Engineering and Applied Sciences, vol. 6, is. 3, ( 2016).
[4]. Raul Mercado, “Siri vs. Alexa vs. Google Assistant: Which Is Smarter at Answering Questions?”, (2021) [Online]. Available: https://www.makeuseof.com/siri-vs-alexa-vs-google-smarter-answering-questions.
[5]. Somshubra Majumdar, Boris Ginsburg, “MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition”, Audio and Speech Processing (eess.AS), (2020), doi: https://doi.org/10.21437/Interspeech.2020-1058. DOI: https://doi.org/10.21437/Interspeech.2020-1058
[6]. Byeonggeun Kim, Simyung Chang, Jinkyu Lee, Dooyong Sung, “Broadcasted Residual Learning for Efficient Keyword Spotting”, Sound (cs.SD), (2021), doi: https://doi.org/10.48550/arXiv.2106.04140.
[7]. Oleg Rybakov, Natasha Kononenko, Niranjan Subrahmanya, Mirko Visontai, Stella Laurenzo, “Streaming keyword spotting on mobile devices”, Audio and Speech Processing (eess.AS), (2020), doi: https://doi.org/10.21437/Interspeech.2020-1003. DOI: https://doi.org/10.21437/Interspeech.2020-1003
[8]. Ofer Egozi, Shaul Markovitch, Evgeniy Gabrilovich, “Concept-Based Information Retrieval Using Explicit Semantic Analysis”, ACM Transactions on Information Systems, vol. 29, is. 2, pp. 1–34, (2011), doi: https://doi.org/10.1145/1961209.1961211. DOI: https://doi.org/10.1145/1961209.1961211
[9]. Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio, “Light Gated Recurrent Units for Speech Recognition”, IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, is. 2, pp. 92-102, (2018), doi: 10.1109/TETCI.2017.2762739. DOI: https://doi.org/10.1109/TETCI.2017.2762739
[10]. Viet Duong Trinh Anh, Sam Dang Van, Tuan Do Van, Vi Ngo Van Trong, “Vietnamese Automatic Speech Recognition with Transformer”, EasyChair Preprint, no. 7147, (2021).
[11]. General Staff, “Military Symbols”, People's Army Publishing House, (2021).
[12]. Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision”, Audio and Speech Processing (eess.AS), (2022).
[13]. Shahzad Qaiser, Ramsha Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents”, International Journal of Computer Applications, vol. 181, no. 1, (2018), doi: 10.5120/ijca2018917395. DOI: https://doi.org/10.5120/ijca2018917395
[14]. William Cavnar, John M. Trenkle, “N-Gram-Based Text Categorization”, Environmental Research Institute of Michigan, (2001).
[15]. Serge Shishkin, Arkadi Shalaginov, Shaunak D. Bopardikar, “Fast approximate truncated SVD”, Numerical Linear Algebra with Applications, vol. 26, no. 1, (2019), doi: 10.1002/nla.2246. DOI: https://doi.org/10.1002/nla.2246
[16]. Dani Gunawan, C A Sembiring, Mohammad Andri Budiman, “The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents”, Journal of Physics Conference Series, vol. 978, no. 1, (2018), doi: 10.1088/1742-6596/978/1/012120. DOI: https://doi.org/10.1088/1742-6596/978/1/012120