Data Entry: Please note that the research database will be replaced by UNIverse by the end of October 2023. Please enter your data into the system https://universe-intern.unibas.ch. Thanks
A comparison of statistical and machine-learning approaches for spatiotemporal modeling of nitrogen dioxide across Switzerland
Journal
Atmos Pollut Res
Volume
13
Number
12
Pages / Article-Number
101611
Abstract
Land use regression modeling has commonly been used to model ambient air pollutant concentrations in environmental epidemiological studies. Recently, other statistical and machine-learning methods have also been applied to model air pollution, but their relative strengths and limitations have not been extensively investigated. In this study, we developed and compared land-use statistical and machine-learning models at annual, monthly and daily scales estimating ground-level NO2 concentrations across Switzerland (at high spatial resolution 100 × 100 m). Our study showed that the best model type varies with context, particularly with temporal resolution and training data size. Linear-regression-based models were useful in predicting long-term (annual, monthly) spatial distribution of NO2 and outperformed machine-learning models. However, linear-regression-based models were limited in representing short-term temporal variation even when predictor variables with temporal variability were provided. Machine-learning models showed high capability in predicting short-term temporal variation and outperformed linear-regression-based models for modeling NO2 variation at high temporal resolution (daily). However, the best performing models, XGBoost and LightGBM, constantly overfit on training data and may result in erratic patterns in the model-estimated concentration surfaces. Therefore, the temporal and spatial scale of the study is an important factor on which the choice of the suitable model type should be based and validation is required whatever approach is used.