Vietnamese speech command recognition on microcontroller for UGV control
DOI:
https://doi.org/10.54939/1859-1043.j.mst.CSCE9.2025.142-150Keywords:
Edge AI; Arduino Nano Ble 33; ARM Cortex-M; Speech Command recognition; UGV.Abstract
Deep learning on edge computing devices is a feasible approach that not only meets computational efficiency and latency requirements but also provides advantages in terms of security, bandwidth efficiency, and scalability. This paper presents the deployment and execution process of a Vietnamese short command speech recognition task on an ARM Cortex-M microcontroller, targeting applications in autonomous unmanned ground vehicle (UGV) control. Experimental results show that an int8-quantized CNN model for short command recognition achieves an accuracy of 94.7% on ARM Cortex-M4 hardware, with an execution time of only 15 ms. These results demonstrate the feasibility of real-time Vietnamese speech command recognition for UGV control. Furthermore, the findings open up promising directions for deploying deep learning models on ultra-resource-constrained edge devices (EDGE-AI) for practical real-world applications in the future.
References
[1]. Dutta, L.; Bharali, S., “TinyML meets IoT: A comprehensive survey”, Internet of Things, Vol. 16, Article 100461, (2021).
[2]. Viswanatha, V.; A. C., R.; Prasanna, R.; Kakarla, P. C.; VivekaSimha, P.; Mohan, N., “Implementation of tiny machine learning models on Arduino 33 BLE for gesture and speech recognition”, arXiv, Article abs/2207.12866, (2022).
[3]. Moin, A.; Challenger, M.; Badii, A.; Gunnemann, S., “Supporting AI engineering on the IoT edge through model-driven TinyML”, Proceedings of the IEEE Annual Computers, Software, and Applications Conference (COMPSAC), pp. 884–893, (2022).
[4]. Patel, P.; Gupta, N.; Gajjar, S., “Real-time voice recognition system using TinyML on Arduino Nano 33 BLE”, Proceedings of the IEEE International Symposium on Smart Electronic Systems (iSES), pp. 385–388, (2023).
[5]. Waqar, D. M.; Gunawan, T. S.; Morshidi, M. A.; Kartiwi, M., “Design of a speech anger recognition system on Arduino Nano 33 BLE Sense”, Proceedings of the IEEE International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA), pp. 64–69, (2021).
[6]. Pham, D. A., “Implementation of a speech-command-interface on microcontroller with TinyML”, Thesis, Hochschule für Angewandte Wissenschaften Hamburg, (2024).
[7]. Barovic, A.; Moin, A., “TinyML for speech recognition”, arXiv, arXiv:2504.16213, (2025).
[8]. Toma, C.; Popa, M.; Doinea, M., “AI neural networks inference into IoT embedded devices using TinyML for pattern detection within a security system”, Proceedings of the International Conference on Informatics in Economy Education, Research and Business Technologies, pp. 14–22, (2020).
[9]. Liu, Y.-Y.; Zheng, H.-S.; Hu, Y. F.; Hsu, C.-F.; Yeh, T. T., “TinyTS: Memory-efficient TinyML model compiler framework on microcontrollers”, Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 848–860, (2024).
[10]. Cioflan, C.; Cavigelli, L.; Rusci, M.; de Prado, M.; Benini, L., “On-device domain learning for keyword spotting on low-power extreme edge embedded systems”, arXiv, arXiv:2403.10549, (2024).
[11]. Pavan, M.; Mombelli, G.; Sinacori, F.; Roveri, M., “TinySV: Speaker verification in TinyML with on-device learning”, arXiv, arXiv:2406.01655, (2024).
[12]. Hymel, S.; Banbury, C.; Situnayake, D.; Elium, A.; Ward, C.; Kelcey, M.; Baaijens, M.; Majchrzycki, M.; Plunkett, J.; Tischler, D.; Grande, A.; Moreau, L.; Maslov, D.; Beavis, A.; Jongboom, J.; Reddi, V. J., “Edge Impulse: An MLOps platform for tiny machine learning”, arXiv, arXiv:2212.03332, (2023).
[13]. Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D. J., “1D convolutional neural networks and applications: A survey”, Mechanical Systems and Signal Processing, Vol. 151, Article 107398, (2021).
[14]. Wang, X.; Magno, M.; Cavigelli, L.; Benini, L., “FANN-on-MCU: An open-source toolkit for energy-efficient neural network inference at the edge of the Internet of Things”, (năm không ghi rõ).
[15]. Lin, J.; Chen, W.-M.; Lin, Y.; Cohn, J.; Gan, C.; Han, S., “MCUNet: Tiny deep learning on IoT devices”, (2020).
[16]. Lai, L.; Suda, N.; Chandra, V., “CMSIS-NN: Efficient neural network kernels for Arm Cortex-M CPUs”, (2018).
[17]. Davis, S. B.; Mermelstein, P., “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Transactions on Acoustics, Speech, and Signal Processing, (1980).
[18]. “Edge Impulse”, https://edgeimpulse.com.