Results and discussion Physics authors titles recent submissions

10 A 10-fold cross-validation CV technique Rodriguez et al., 2010 was used to test the potential of model overfitting and predictive power. Previous studies usually used sample-based CV Li et al., 2017b; Ma et al., 2014 or site-based CV Lee et al., 2011; Xie et al., 2015. For sample-based CV, all samples in the model dataset are randomly and equally divided into ten subsets. One subset is used as testing samples and the rest subsets are used to fit the model for each round of validation. For site-based CV, one monitoring station is used for validation and the remaining stations for model fitting in each round. In this study, we chose sample-based 10-fold CV, which is more widely used Ma et al., 2016. We adopted the statistical indicators of the coefficient of determination R 2 , the root-mean-square error RMSE, 3 g m  , the mean prediction error MPE, 3 g m  , and the relative prediction error RPE, defined as RMSE divided by the mean ground-level PM 2.5 to evaluate the model performance.

4. Results and discussion

4.1. Descriptive statistics Fig. 3 shows the histograms and descriptive statistics of the variables in sample dataset. The PM 2.5 concentrations range from 5 3 g m  to 256.96 3 g m  , with an average of 49.43 3 g m  . The B1, B3, and B7 reflectance are mostly distributed in 0~0.2, with mean values of 0.09, 0.13, and 0.08, respectively. We can find that the satellite reflectance, temperature, relative humidity, surface pressure, height of planetary boundary layer, and NDVI have similar patterns with PM 2.5 concentrations, while the satellite observation angles and wind speed have different distributions. 11 Figure 3 . Histograms and descriptive statistics of Ref-PM modeling variables in the sample dataset. 4.2. Comparison between Ref-PM and AOD-PM modeling Table 1. Performance comparison between Ref-PM and AOD-PM modeling Modeling Model fitting Cross-validation R 2 RMSE MPE RPE R 2 RMSE MPE RPE Ref-PM N=4181 MLR 0.34 22.26 16.37 44.99 0.34 22.19 16.37 44.83 BPNN 0.72 14.55 10.73 29.21 0.59 17.51 12.72 35.39 GRNN 0.74 14.57 10.23 29.51 0.61 17.29 12.29 34.94 DBN 0.77 12.71 9.44 25.68 0.64 16.75 12.11 33.84 AOD-PM N=1658 MLR 0.26 24.12 17.33 47.64 0.27 23.97 17.27 47.31 BPNN 0.48 20.01 14.43 39.68 0.40 21.60 15.26 42.86 GRNN 0.56 19.40 13.34 38.45 0.42 21.43 14.95 42.54 DBN 0.53 19.26 13.47 38.20 0.46 20.57 14.60 40.83 Table 1 shows the performance comparison between Ref-PM and AOD-PM modeling. The sample size of AOD-PM modeling dataset N=1658 is much smaller than Ref-PM modeling dataset N=4181. The main reason is that the reflectance data has a finer solution and larger 12 coverage than AOD product. Comparing to AOD-PM modeling, the Ref-PM modeling reports a competitive performance. For MLR model, the cross-validation R 2 and RMSE are 0.34, 22.19 3 g m  for Ref-PM modeling, and 0.27, 23.97 3 g m  for AOD-PM modeling. For learning-based models, the differences between Ref-PM modeling and AOD-PM modeling is larger, for instance, the cross-validation R 2 of DBN model is 0.64 and 0.46, respectively, with a variation of 0.18. These results demonstrate that the Ref-PM modeling obtains a better performance than AOD-PM modeling, especially based on learning-based models BPNN, GRNN, and DBN. However, it should be noted that there is a scale variation between Ref-PM modeling dataset 1 km and AOD-PM modeling dataset 3 km. We resampled the Ref-PM modeling dataset to 3 km, and reconstruct Ref-PM models. The cross-validation R 2 and RMSE of 3 km DBN in Ref-PM modeling is 0.56 and 18.16 3 g m  . With the same resolution, the 3-km Ref-PM modeling still has some advantages in estimating ground-level PM 2.5 . Furthermore, the reflectance data and AOD product have different spatial coverage, we evaluate Ref-PM and AOD-PM modeling with their shared data records N=935, the R 2 and RMSE of the former are 0.64 and 14.12 3 g m  , respectively; 0.48 and 17.64 3 g m  for the latter. All these results show that Ref-PM modeling reports a higher accuracy of PM 2.5 estimation than AOD-PM modeling, which can be probably attributed to the avoidance of intermediate error. For both Ref-PM modeling and AOD-PM modeling, DBN model performs the best, followed by GRNN and BPNN models, and MLR gives the poorest performance. The conventional MLR statistical model cannot well represent the nonlinear process of Ref-PM and AOD-PM modeling. Neural networks BPNN, GRNN have a better capacity to address 13 this issue. With a more complicated structure, deep learning DBN show its superior in describe the relationship between PM 2.5 and predictors. Allowing for the superiority of the DBN model, we will adopt the DBN model for further analysis of Ref-PM modeling. 4.3. Performance of geo-intelligent Ref-PM modeling Figure 4. Results of Geoi-DBN a model fitting and b cross validation. According to our previous study Li et al., 2017a, incorporating geographical correlation can greatly improve the accuracy of PM 2.5 estimation. Fig. 4 shows the scatter plots of Geoi-DBN model. The R 2 and RMSE of model fitting are 0.88 and 9.66 3 g m  , respectively. For cross validation, the Geoi-DBN model achieves a satisfactory result, with R 2 and RMSE value of 0.87 and 9.89 3 g m  , respectively. Our results show this study has a great advantage than previous regional studies of PM 2.5 estimation in Beijing R 2 =0.81, RMSE=18.89 3 g m  Xie et al., 2015 and Yangtze River Delta Region R 2 =0.67, RMSE=15.82 3 g m  Ma et al., 2016. On the other hand, the cross-validation slope for the Geoi-DBN model is 0.87, with an intercept of 6.48 3 g m  . This means that the proposed approach tends to estimate when the ground PM 2.5 concentrations are greater than 50 3 g m  . These results indicate some evidence for estimation bias of our method Li et al., 2017a. The possible reason for this could be that we used point-based monitoring data and a spatially 14 averaged modeling framework. The sampling distribution of monitors in a grid may not give a great estimation of the spatially averaged concentration for that grid. Furthermore, the seasonal and spatial performance of the Geoi-DBN model was evaluated. Seasonally, the R 2 values for spring March, April, and May, summer June, July, and August, autumn September, October, and November and winter January, February, and December are 0.85, 0.78, 0.77, and 0.86, respectively. Spatially, the R 2 and RMSE values between the observed and estimated PM 2.5 were calculated, and are shown in Fig. 4. Overall, the Geoi-DBN model has achieved a satisfactory performance, with 81.63 of grids reporting a high R 2 of greater than 0.8. The higher R 2 and lower RMSE values are found in central Wuhan, however lower R 2 and higher RMSE values are reported in Huanggang, Xiaogan, and Xianning. The possible reason for this could be the uneven distribution of monitoring stations. Figure 5. Spatial distribution of a R 2 and b RMSE between observed PM 2.5 and estimated PM 2.5 . 4.4. Mapping of PM 2.5 distribution On the basis of Geoi-DBN model for Ref-PM modeling, the daily PM 2.5 concentration is estimated. Fig. 6 presents daily estimates on some certain days. We selected one day in each season, and these days have as many as valid estimates. As shown in this figure, the derived PM 2.5 have similar spatial distributions with ground station measurements. The PM 2.5 on 15 January 15, 2016 in winter has the highest level, while the lowest values are found on July 30, 2016 in summer. Figure 6 . Daily estimates of PM 2.5 on some certain days. The white regions indicate missing data. To further evaluate our results at daily scale, we calculated RPE between derived PM 2.5 and observed PM 2.5 on each day, which is shown in Fig. 7. Due to the missing data, only 125 days have valid estimates of PM 2.5 covering more than 2 stations. Over 87 of the days have a RPE value of less than 30. These results demonstrate that the proposed Ref-PM modeling using geo-intelligent deep learning is effective for estimating daily PM 2.5 concentrations. 16 Figure 7. Distribution of daily RPE between observed PM 2.5 and derived PM 2.5 . Figure 8 . Annual mean distribution of PM 2.5 concentrations across WMA in 2016, a derived PM 2.5 , b ground-observed PM 2.5 . It should be noted that we calculated annual mean PM 2.5 with valid days greater than 270 from station measurements. Thus, some stations, which are in service from the later year, are not seen in this figure. Furthermore, the annual mean distribution of PM 2.5 in WMA is mapped. Fig. 8 presents the satellite-derived and ground-observed PM 2.5 concentrations. Overall, the satellite-derived PM 2.5 has a similar spatial pattern with ground measurements. Spatially, higher levels of PM 2.5 concentrations are found in Wuhan, Ezhou, and Huangshi cities, while lower values are reported in the other cities. The PM 2.5 pollution presents a pattern of “higher center, lower vicinity”, which is consistent to the urbanization level of these cities. Wuhan, a comprehensively big city, has a PM 2.5 pollution characters of compound pollution including 17 coal-smoke air pollution and exhaust gas of vehicle. While the coal-smoke air pollution are found in Ezhou, Huangshi cities, which are rapidly developing industrial cities. Moreover, agricultural cities and eco-tourism cities including Xianning and Qianjiang have a lower level of PM 2.5 pollution. 4.5. Discussion The satellite reflectance and observation angles are adopted to establish the Ref-PM modeling. The observation angles are used as auxiliary variables to retrieve AOD from satellite TOA reflectance Kaufman et al., 1997a. We would like to see whether or not they can be eliminated from the input variables. Hence, we further evaluate the influence of observation angles on Ref-PM modeling. The results are presented in Table 2. When the observation angles are not incorporated in the DBN model Non-angles Ref-PM model, the cross-validation R 2 and RMSE are 0.48 and 19.94 3 g m  . However, the model performance is greatly improved from 0.46 to 0.64 for R 2 by considering the observation angles. These results indicate that the satellite observation geometry is important for the Ref-PM modeling. The radiometric calibration was conducted to obtain satellite TOA reflectance from digital number DN signals. It is a linear transformation from DN signals to TOA reflectance, and should have no impact on deep learning. This means that estimating PM 2.5 from DN signals should obtain the same results with Ref-PM modeling. However the calibration coefficients vary by day, the relationship between TOA reflectance and DN signals is not linear during the whole study period. Hence, we constructed the model between PM 2.5 and DN DN-PM modeling for comparison. As shown in Table 2, the DN-PM modeling reports a similar performance with Ref-PM modeling, indicating that the radiometric calibration can probably 18 be omitted for estimating PM 2.5 . However, the objective of this study is to estimate ground PM 2.5 by avoiding AOD retrieval, which is conducted using TOA reflectance. The Ref-PM and AOD-PM modeling should be fully compared. Hence the satellite reflectance is adopted in this paper. Table 2. Performance of various modeling strategies Models Cross-validation R 2 RMSE MPE RPE Non-angles Ref-PM 0.48 19.94 14.60 40.29 Full Ref-PM 0.64 16.75 12.11 33.84 Full DN-PM 0.63 16.80 12.26 33.67 There are still some space to improve this study. Firstly, the retrieval of AOD often meets a challenge, which is the bright surface B70.15 Li et al., 2005. How the bright surface affects Ref-PM modeling should been further discussed in future studies. Secondly, the satellite observations used i.e., B1, B3, B7 reflectance, and angles are inspired by the inspired by the dark target algorithm, the selection of satellite parameters still has room for researching. Whether or not incorporating more channels such as B2, B4, and B5 reflectance into the Ref-PM modeling has a positive effect?

5. Conclusions