Hand action recognition in rehabilitation exercise method using R(2+1)D deep learning network and interactive object information

256 views

Authors

  • Nguyen Sinh Huy (Corresponding Author) Military Information Technology Institute, Academy of Military Science and Technology
  • Le Thi Thu Hong Military Information Technology Institute, Academy of Military Science and Technology
  • Nguyen Hoang Bach Military Information Technology Institute, Academy of Military Science and Technology
  • Nguyen Chi Thanh Military Information Technology Institute, Academy of Military Science and Technology
  • Doan Quang Tu Military Information Technology Institute, Academy of Military Science and Technology
  • Truong Van Minh School of Electrical and Electronics Engineering, Hanoi University of Science and Technology
  • Vu Hai School of Electrical and Electronics Engineering, Hanoi University of Science and Technology

DOI:

https://doi.org/10.54939/1859-1043.j.mst.CSCE6.2022.77-91

Keywords:

Hand action recognition ; Rehabilitation exercises; Object detection and tracking; R(2 1)D

Abstract

Hand action recognition in rehabilitation exercises is to automatically recognize what exercises the patient has done. This is an important step in an AI system to assist doctors to handle, monitor and assess the patient’s rehabilitation. The expected system uses videos obtained from the patient's body-worn camera to recognize hand action automatically. In this paper, we propose a model to recognize the patient's hand action in rehabilitation exercises, which is a combination of the results of a deep learning network recognizing actions on Video RGB, R(2+1)D, and a main interactive object in the exercises detection algorithm. The proposed model is implemented, trained, and tested on a dataset of rehabilitation exercises collected from wearable cameras of patients. The experimental results show that the accuracy in exercise recognition is practicable, averaging 88.43% on the test data independent of the training data. The action recognition results of the proposed method outperform the results of a single R(2+1)D network. Furthermore, the better results show the reduced rate of confusion between exercises with similar hand gestures. They also prove that the combination of interactive object information and the action recognition improve the accuracy significantly.

References

[1]. Fathi, A., Farhadi, A. and Rehg, J.M. “Understanding egocentric activities”. In 2011 international conference on computer vision (pp. 407-414). IEEE, (2011). DOI: https://doi.org/10.1109/ICCV.2011.6126269

[2]. Fathi, A., Li, Y. and Rehg, J. M. “Learning to recognize daily actions using gaze”. In European Conference on Computer Vision (pp. 314-327). Springer, Berlin, Heidelberg, (2012). DOI: https://doi.org/10.1007/978-3-642-33718-5_23

[3]. Fathi, A., Ren, X. and Rehg, J. M. “Learning to recognize objects in egocentric activities”. In CVPR 2011 (pp. 3281-3288). IEEE, (2011). DOI: https://doi.org/10.1109/CVPR.2011.5995444

[4]. Li, Y., Ye, Z. and Rehg, J.M. “Delving into egocentric actions”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 287-295), (2015). DOI: https://doi.org/10.1109/CVPR.2015.7298625

[5]. McCandless, T. and Grauman, K. “Object-Centric Spatio-Temporal Pyramids for Egocentric Activity Recognition”. In BMVC (Vol. 2, p. 3), (2013). DOI: https://doi.org/10.5244/C.27.30

[6]. Pirsiavash, H. and Ramanan, D. “Detecting activities of daily living in first-person camera views”. In 2012 IEEE conference on computer vision and pattern recognition (pp. 2847-2854). IEEE, (2012). DOI: https://doi.org/10.1109/CVPR.2012.6248010

[7]. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y. and Paluri, M. “A closer look at spatiotemporal convolutions for action recognition”. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6450-6459), (2018). DOI: https://doi.org/10.1109/CVPR.2018.00675

[8]. Hara, K., Kataoka, H. and Satoh, Y. “Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?” In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6546-6555), (2018). DOI: https://doi.org/10.1109/CVPR.2018.00685

[9]. Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X. and Van Gool, L. Temporal segment networks for action recognition in videos. IEEE transactions on pattern analysis and machine intelligence, 41(11), pp.2740-2755, (2018). DOI: https://doi.org/10.1109/TPAMI.2018.2868668

[10]. Bochkovskiy, A., Wang, C.Y. and Liao, H.Y.M. “Yolov4: Optimal speed and accuracy of object detection”. arXiv preprint arXiv:2004.10934, (2020).

[11]. Sinh Huy Nguyen, Hoang Bach Nguyen, Thi Thu Hong Le, Chi Thanh Nguyen, Van Loi Nguyen, Hai Vu, "Hand Tracking and Identifying in the Egocentric Video Using a Graph-Based Algorithm,” In Proceeding of the 2022 International Conference on Communications and Electronics (ICCE 2022).

Downloads

Published

30-12-2022

How to Cite

Nguyen Sinh Huy, Le Thi Thu Hong, Nguyen Hoang Bach, Nguyen Chi Thanh, Doan Quang Tu, Truong Van Minh, and Vu Hai. “Hand Action Recognition in Rehabilitation Exercise Method Using R(2+1)D Deep Learning Network and Interactive Object Information”. Journal of Military Science and Technology, no. CSCE6, Dec. 2022, pp. 77-91, doi:10.54939/1859-1043.j.mst.CSCE6.2022.77-91.

Issue

Section

Research Articles

Most read articles by the same author(s)

1 2 > >>