Talk

Community members
ORCID iD icon

Downscaling Regional CAMS Reanalysis to the Urban Scale with XGBoost and Gaussian Processes: A Transferable Machine Learning Framework

Talk
In session Artificial Intelligence/ Machine Learning methods in Earth System Sciences , Sept. 3, 2025, 13:30 – 15:15
Exact timing: 14:45 – 15:00
Room info: Lecture Hall

Ramacher, Martin Otto Paul1ORCID iD icon , Keil, P.1ORCID iD icon
  1. Helmholtz-Zentrum Hereon

Urban-scale air quality data is crucial for exposure assessment and decision-making in cities. However, high-resolution Eulerian Chemistry Transport Models (CTMs) with street-scale resolutions (100 m x 100 m), while process-based and scenario-capable, are computationally expensive and require city-specific emission inventories, meteorological fields and boundary concentrations. In contrast, machine learning (ML) offers a scalable and efficient alternative to enhance spatial resolution using existing regional-scale (1 km - 10 km grid resolutions) reanalysis datasets.

We present a reproducible ML framework that downscales hourly NO 2 data from the CAMS Europe ensemble (~10 km resolution) to 100 × 100 m 2 resolution, using 11 years of data (2013–2023) for Hamburg. The framework integrates satellite-based and modelled inputs (CAMS, ERA5-Land), spatial predictors (CORINE, GHSL, OSM), and time indicators. Two ML approaches are employed: XGBoost for robust prediction and interpretability (via SHAP values), and Gaussian Processes for quantifying spatial and temporal uncertainty.

The downscaling is evaluated through random, time-based and leave-site-out validation approaches. Results demonstrate good reproduction of observed spatial and temporal NO 2 patterns, including traffic peaks and diurnal/seasonal trends. The trained models generate over 160 million hourly predictions for Hamburg with associated uncertainty fields. Although developed for Hamburg, the framework has been successfully tested in multiple European cities without reconfiguration, highlighting its portability and robustness. All code and workflows are openly available, contributing to reproducibility and reusability in urban air quality research.

This work bridges Earth system science, infrastructure (via open datasets and standards), and data methods (XGBoost, GPs, SHAP), exemplifying how transparent, transferable ML can support urban digital twins and data-driven environmental policy.