Hardware-efficient matrix multiplication core optimization for edge AI on FPGA

Phan Hong Minh; Nguyen Tien Viet; Do Doanh Dien

doi:10.54939/1859-1043.j.mst.IITE.2025.123-130

Các tác giả

Phan Hong Minh (Tác giả đại diện) Viện Công nghệ thông tin và Điện tử, Viện Khoa học và Công nghệ quân sự
Nguyen Tien Viet Phòng Công nghệ thông tin, Quân khu 2
Do Doanh Dien Viện Công nghệ thông tin và Điện tử, Viện Khoa học và Công nghệ quân sự

DOI:

https://doi.org/10.54939/1859-1043.j.mst.IITE.2025.123-130

Từ khóa:

Lõi IP; Nhân ma trận; FPGA-CNN; Vivado-Vitis.

Tóm tắt

Bài báo trình bày một phương pháp tối ưu hoá lõi IP nhân ma trận trên FPGA thông qua việc biến đổi phép tích chập thành phép nhân ma trận. Giải pháp đề xuất khai thác cơ chế tính toán song song kết hợp nạp dữ liệu đồng thời trong cùng một chu kỳ xử lý, giúp giảm nhu cầu bộ nhớ và độ trễ tính toán. Ngoài ra, việc ép kiểu dữ liệu đầu ra từ 64-bit xuống 32-bit góp phần thu nhỏ bộ đệm kết quả, qua đó tiết kiệm đáng kể tài nguyên phần cứng. Kết quả mô phỏng trên ModelSim và Vivadol-Vitis cho thấy thiết kế đạt hiệu suất tính toán và hiệu quả tài nguyên vượt hơn so với các cách triển khai truyền thống, đồng thời vẫn đảm bảo thời gian tính toán ổn định. Công trình này hướng tới thiết kế chip tăng tốc suy luận CNN trên FPGA cho các ứng dụng AI biên, nơi hạn chế về tài nguyên và điện năng là các ràng buộc quan trọng.

Tài liệu tham khảo

[1]. Nguyen, X.-Q. and Pham-Quoc, C., “An FPGA-base Convolution IP Core for Deep Neural Networks Acceleration,” Rev Journal on Electronics and Communications, Vol. 12, No. 1–2, pp. 1–6 (2022). DOI: 10.21553/rev-jec.286. DOI: https://doi.org/10.21553/rev-jec.286

[2]. Han, S., Pool, J., Tran, J., and Dally, W. J., “Learning Both Weights and Connections for Efficient Neural Networks,” Neural Information Processing Systems (NeurIPS), Vol. 28 (2015).

[3]. Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H., “Learning Structured Sparsity in Deep Neural Networks,” Advances in Neural Information Processing Systems (NeurIPS) (2016).

[4]. Gschwend, D., “ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network,” arXiv Preprint, arXiv:2005.06892 (2020).

[5]. Li, Y., et al., “Implementation of Energy‐Efficient Fast Convolution Algorithm for Deep Convolutional Neural Networks Based on FPGA,” Electronics Letters, Vol. 56, No. 5, pp. 234–236 (2020). DOI: https://doi.org/10.1049/el.2019.4188

[6]. Liu, X et al., “WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs,” Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP) (2021). DOI: https://doi.org/10.1109/ASAP52443.2021.00045

[7]. Zhang, Y., et al., “An Efficient Convolutional Neural Network Accelerator Design on FPGA Using the Layer-to-Layer Unified Input Winograd Architecture,” Electronics, Vol. 14, No. 6, Article 1182 (2025). DOI: 10.3390/electronics14061182. DOI: https://doi.org/10.3390/electronics14061182

[8]. Taka, E., Huang, N.-C., Chang, C.-C., Wu, K.-C., Arora, A., and Marculescu, D., “Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration,” arXiv Preprint, arXiv:2502.03763v1 [cs.AR] (2025). DOI: https://doi.org/10.1145/3706628.3708867

[9]. https://www.fpga4student.com/2016/11/matrix-multiplier-core-design.html

[10]. https://people.ece.cornell.edu/land/courses/ece5760/FinalProjects/f2020/bjd86_lgp36/bjd86_lgp36/index.html

[11]. https://www.mathworks.com/help/hdlverifier/xilinxfpgaboards/ug/large-matrix-multiplication-using-ethernet-aximaster.html