A comparison of statistical and machine-learning approaches for spatiotemporal modeling of nitrogen dioxide across Switzerland

Research Database / FORSCHUNGSDATENBANK

Publication 4655959 | Verified

Data Entry: Please note that the research database will be replaced by UNIverse by the end of October 2023. Please enter your data into the system https://universe-intern.unibas.ch. Thanks

Login for users with Unibas email account...

Login for registered users without Unibas email account...

A comparison of statistical and machine-learning approaches for spatiotemporal modeling of nitrogen dioxide across Switzerland

JournalArticle (Originalarbeit in einer wissenschaftlichen Zeitschrift)

ID	4655959
Author(s)	Liu, T. L.; Flückiger, B.; de Hoogh, K.
Author(s) at UniBasel	Liu, Tze-Li Flückiger, Benjamin de Hoogh, Kees
Year	2022
Title	A comparison of statistical and machine-learning approaches for spatiotemporal modeling of nitrogen dioxide across Switzerland
Journal	Atmos Pollut Res
Volume	13
Number	12
Pages / Article-Number	101611
Abstract	Land use regression modeling has commonly been used to model ambient air pollutant concentrations in environmental epidemiological studies. Recently, other statistical and machine-learning methods have also been applied to model air pollution, but their relative strengths and limitations have not been extensively investigated. In this study, we developed and compared land-use statistical and machine-learning models at annual, monthly and daily scales estimating ground-level NO2 concentrations across Switzerland (at high spatial resolution 100 × 100 m). Our study showed that the best model type varies with context, particularly with temporal resolution and training data size. Linear-regression-based models were useful in predicting long-term (annual, monthly) spatial distribution of NO2 and outperformed machine-learning models. However, linear-regression-based models were limited in representing short-term temporal variation even when predictor variables with temporal variability were provided. Machine-learning models showed high capability in predicting short-term temporal variation and outperformed linear-regression-based models for modeling NO2 variation at high temporal resolution (daily). However, the best performing models, XGBoost and LightGBM, constantly overfit on training data and may result in erratic patterns in the model-estimated concentration surfaces. Therefore, the temporal and spatial scale of the study is an important factor on which the choice of the suitable model type should be based and validation is required whatever approach is used.
ISSN/ISBN	1309-1042
edoc-URL	https://edoc.unibas.ch/91609/
Full Text on edoc	Available
Digital Object Identifier DOI	10.1016/j.apr.2022.101611

20/04/2024

Research Database / FORSCHUNGSDATENBANK