Enhancing the accuracy of rainfall area classification in central Vietnam using machine learning methods
DOI:
https://doi.org/10.54939/1859-1043.j.mst.107.2025.105-113Keywords:
Classification rainfall; Machine learning; LightGBM; Random forest; Himawari-8; ERA-5.Abstract
This study applies machine learning techniques, including Light Gradient Boosting Machine (LGBM), XGBoost (XGB), and Random Forest (RF), in conjunction with multi-source data comprising Himawari-8 satellite observations, ground-based rain gauge measurements, and auxiliary data such as ERA-5 reanalysis and the ASTER Digital Elevation Model (DEM), to enhance rainfall classification accuracy over Central Vietnam. Existing rainfall products in the region, including IMERG Final Run, IMERG Early, GSMaP_MVK_Gauge, PERSIANN_CCS, and FY-4A, are employed to evaluate the performance of the proposed classification approach. The results indicate that all proposed rainfall classification products exhibit high performance. Among them, the rainfall classification product based on LGBM achieved the highest performance across key evaluation metrics, including Probability of Detection (POD), Critical Success Index (CSI), Equitable Threat Score (ETS), and Heidke Skill Score (HSS). Compared to the investigated best-performing reference product, GSMaP_MVK_Gauge, the LGBM improves these metrics by 38.89%, 20.0%, 16.67%, and 13.04%, respectively. These findings highlight the potential of machine learning models, particularly LGBM, in enhancing the classification performance of meteorological models that utilize small but complex and high-dimensional datasets.
References
[1]. F. Ouallouche, M. Lazri, and S. Ameur, “Improvement of rainfall estimation from MSG data using Random Forests classification and regression”, Atmos Res, vol. 211, pp. 62–72, (2018), doi: 10.1016/J.ATMOSRES.2018.05.001.
[2]. X. Liu, H. Duan, W. Huang, R. Guo, and B. Duan, “Classified Early Warning and Forecast of Severe Convective Weather Based on LightGBM Algorithm”, Atmospheric and Climate Sciences, vol. 11, pp. 284–301, (2021), doi: 10.4236/acs.2021.112017.
[3]. V. Dong, A. Nguyen, N. Phat, N. Thanh, N. Huyen, “Improving precipitation estimation accuracy for the Central Vietnam region using the XGBoost model with multi-source data”, TNU Journal of Science and Technology, vol. 229, pp. 69–77, (2024), doi: 10.34238/tnu-jst.11346.
[4]. D. Lavers, A. Simmons, F. Vamborg, and M. Rodwell, “An evaluation of ERA5 precipitation for climate monitoring”, Quarterly Journal of the Royal Meteorological Society, vol. 148, (2022), doi: 10.1002/qj.4351.
[5]. A. Mohammadi et al., “A Multi-Sensor Comparative Analysis on the Suitability of Generated DEM from Sentinel-1 SAR Interferometry Using Statistical and Hydrological Models”, Sensors, vol. 20, p. 7214, (2020), doi: 10.3390/s20247214.
[6]. L. Xuegang et al., “Comparative evaluation of GPM IMERG V07 early, late and final run products compared to IMERG V06 in Sichuan Province, China”, Theor Appl Climatol, vol. 156, (2025), doi: 10.1007/s00704-025-05569-x.
[7]. C. Zhou, L. Zhou, J. Du, J. Yue, and T. Ao, “Accuracy evaluation and comparison of GSMaP series for retrieving precipitation on the eastern edge of the Qinghai-Tibet Plateau”, J Hydrol Reg Stud, vol. 56, p. 102017, (2024), doi: 10.1016/j.ejrh.2024.102017.
[8]. Z. Wang, H. Chai, C. Zhu, H. Ma, N. Zheng, and P. Chen, “Reconstruction of High-Resolution Precipitable Water Vapor of FY-4A Based on GNSS and Remote Sensing Data”, (2025), doi: 10.2139/ssrn.5243923.
[9]. P. Nguyen et al., “The PERSIANN family of global satellite precipitation data: a review and evaluation of products”, Hydrol Earth Syst Sci, vol. 22, pp. 5801–5816, (2018), doi: 10.5194/hess-22-5801-2018.
[10]. C. Gianoglio, S. Zani, M. Colli, and D. Caviglia, “Rainfall Classification in Genoa: Machine Learning vs. Adaptive Statistical Models Using Satellite Microwave Links”, IEEE Access, vol. PP, p. 1, (2024), doi: 10.1109/ACCESS.2024.3458407.
[11]. S. Kolios, N. Hatzianastassiou, C. J. Lolis, and A. Bartzokas, “Accuracy Assessment of a Satellite-Based Rain Estimation Algorithm Using a Network of Meteorological Stations over Epirus Region, Greece”, Atmosphere (Basel), vol. 13, no. 8, (2022), doi: 10.3390/atmos13081286.
[12]. H. Hirose, S. Shige, M. Yamamoto, and A. Higuchi, “High Temporal Rainfall Estimations from Himawari-8 Multiband Observations Using the Random-Forest Machine-Learning Method”, Journal of the Meteorological Society of Japan. Ser. II, vol. 97, (2019), doi: 10.2151/jmsj.2019-040.
