Research on solutions to accelerate computation for CNN executed on resource-limited SoC-FPGA
DOI:
https://doi.org/10.54939/1859-1043.j.mst.CSCE9.2025.42-50Keywords:
Soc-FPGA; CNN; Classification.Abstract
This paper presents a method for designing and implementing an image recognition model on a System on Chip (SoC) platform integrated with Field Programmable Gate Arrays (FPGA). With the increasing demands for low-latency inference processing in edge computing and high energy efficiency, customized hardware acceleration solutions become essential. The embedded execution results of the model on the FPGA hardware of the SoC are compared with the PyTorch model running on the same resource-limited FPGA hardware, showing that this architecture provides a high-performance, low-latency solution, and demonstrates the feasibility of using Vitis High-Level Synthesis (HLS) to quickly generate specialized IP cores for edge computing applications. The results will serve as a solid foundation for continuing to build more complex network models for larger problems.
References
[1]. M. Muzammul et al., “Enhancing UAV aerial image analysis: Integrating advanced SAHI techniques with real-time detection models on the VisDrone dataset”, IEEE Access, vol. 12, pp. 21621–21633, (2024).
[2]. Lijie Zhou and Weihai Yu, “Improved convolutional neural image recognition algorithm based on LeNet-5”, Journal of Computer Networks and Communications, vol. 2022, art. no. 1636203, (2022).
[3]. Andrés Rodríguez et al., “Exploring heterogeneous scheduling for edge computing with CPU and FPGA MPSoCs”, Journal of Systems Architecture, vol. 98, pp. 27–40, (2019).
[4]. Firas Al-Ali et al., “Novel case study and benchmarking of AlexNet for edge AI: From CPU and GPU to FPGA”, Proceedings of the IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–6, (2020).
[5]. Aravind Vasudevan, Andrew Anderson, and David Gregg, “Parallel multi-channel convolution using general matrix multiplication”, Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 1–8, (2017).
[6]. Afzal Ahmad and Muhammad Adeel Pasha, “Optimizing hardware-accelerated general matrix-matrix multiplication for CNNs on FPGAs”, IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 11, pp. 2692–2696, (2020).
[7]. Mohammadreza Soltaniyeh, Richard P. Martin, and Santosh Nagarakatte, “An accelerator for sparse convolutional neural networks leveraging systolic general matrix-matrix multiplication”, ACM Transactions on Architecture and Code Optimization, vol. 19, no. 3, pp. 1–26, (2022).
[8]. Amir Ofir and Gil Ben-Artzi, “Smm-conv: Scalar matrix multiplication with zero packing for accelerated convolution”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1–10, (2022).
[9]. Prajwal Angadi, Sanket Venkatesh Shenvi, and Sathvik Lokesh, “Design and simulation of AXI4 stream interconnect using Verilog”, Proceedings of the IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS), pp. 1–6, (2024).
[10]. Amit Kumar, Inderpal Singh, and Balraj Singh, “Classification of MNIST dataset using different CNN architectures”, Intelligent Circuits and Systems for SDG 3 – Good Health and Well-Being, CRC Press, pp. 273–280, (2024).
[11]. Rongshi Dai and Yongming Tang, “Accelerator implementation of LeNet-5 convolution neural network based on FPGA with HLS”, Proceedings of the International Conference on Circuits, Systems and Simulation (ICCSS), pp. 1–6, (2019).