Construction of a virtual PM2.5 observation network in China based on high-density surface meteorological observations using the Extreme Gradient Boosting model
A virtual ground-based PM2.5 observation network based on high-density surface meteorological observations using the Extreme Gradient Boosting model shows great potential in reconstructing historical PM 2.5 data, and surface visibility plays the dominant role in terms of the relative importance of variables in the XGBoost model.
Abstract
With increasing public concerns on air pollution in China, there is a demand for long-term continuous PM<sub>2.5</sub> datasets. However, it was not until the end of 2012 that China established a national PM<sub>2.5</sub> observation network. Before that, satellite-retrieved aerosol optical depth (AOD) was frequently used as a primary predictor to estimate surface PM<sub>2.5</sub>. Nevertheless, satellite-retrieved AOD often encounter incomplete daily coverage due to its sampling frequency and interferences from cloud, which greatly affect the representation of these AOD-based PM<sub>2.5</sub>. Here, we constructed a virtual ground-based PM<sub>2.5</sub> observation network at 1180 meteorological sites across China using the Extreme Gradient Boosting (XGBoost) model with high-density meteorological observations as major predictors. Cross-validation of the XGBoost model showed strong robustness and high accuracy in its estimation of the daily (monthly) PM<sub>2.5</sub> across China in 2018, with R<sup>2</sup>, root-mean-square error (RMSE) and mean absolute error values of 0.79 (0.92), 15.75 μg/m<sup>3</sup> (6.75 μg/m<sup>3</sup>) and 9.89 μg/m<sup>3</sup> (4.53 μg/m<sup>3</sup>), respectively. Meanwhile, we find that surface visibility plays the dominant role in terms of the relative importance of variables in the XGBoost model, accounting for 39.3% of the overall importance. We then use meteorological and PM<sub>2.5</sub> data in the year 2017 to assess the predictive capability of the model. Results showed that the XGBoost model is capable to accurately hindcast historical PM<sub>2.5</sub> at monthly (R<sup>2</sup> = 0.80, RMSE = 14.75 μg/m<sup>3</sup>), seasonal (R<sup>2</sup> = 0.86, RMSE = 12.28 μg/m<sup>3</sup>), and annual (R<sup>2</sup> = 0.81, RMSE = 10.10 μg/m<sup>3</sup>) mean levels. In general, the newly constructed virtual PM<sub>2.5</sub> observation network based on high-density surface meteorological observations using the Extreme Gradient Boosting model shows great potential in reconstructing historical PM<sub>2.5</sub> at ~1000 meteorological sites across China. It will be of benefit to filling gaps in AOD-based PM<sub>2.5</sub> data, as well as to other environmental studies including epidemiology.