Skip to main content
Scalable and robust Gaussian processes for reanalysis of urban air temperature with crowdsourced meteorological data

Scalable and robust Gaussian processes for reanalysis of urban air temperature with crowdsourced meteorological data

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Zachary Calhoun, Michael Bergin, David Carlson

Abstract

Crowdsourced air temperature data from networks like Weather Underground offer dense spatial coverage and are increasingly used to study the canopy urban heat island (CUHI) effect. However, these observations are noisy: siting conditions, environmental interference, and sensor failures introduce spatially and temporally varying bias. This complicates interpolation, limiting our ability to estimate neighborhood-level air temperature. While interpolation techniques such as kriging account for uncertainty, they do so under the assumption of homoscedasticity. Moreover, they struggle to scale beyond a few thousand observations, which limits their utility on crowdsourced data. To overcome these limitations, we develop a sparse variational Gaussian process model that accounts for heteroscedasticity, allowing us to efficiently produce interpolated air temperature fields with calibrated uncertainty estimates. To test our approach, we apply our model to six years of hourly data across Durham County, North Carolina. This area includes a medium-sized city, so we expect our model to generalize to similarly sized regions with sufficient sensor coverage. Compared to ERA5-Land, it improves estimates of canopy temperature (MAE=0.57°C versus ERA5-Land MAE=3.20°C on held-out locations) and enables high-resolution analysis of CUHI patterns over space and time. We illustrate this by visualizing (1) how CUHI patterns vary with synoptic conditions, (2) differential impacts on heating and cooling demand, and (3) annual hours exceeding 35°C by neighborhood. Our method provides a scalable and statistically rigorous framework for transforming crowdsourced climate data into a gridded reanalysis product. Using these data, we can better quantify urban heat exposure and its impact on health and energy.

DOI

https://doi.org/10.31223/X5B45S

Subjects

Applied Statistics, Climate, Engineering, Environmental Engineering, Meteorology, Oceanography and Atmospheric Sciences and Meteorology, Statistical Models

Keywords

spatiotemporal modeling, Urban Heat Island, citizen-science, Gaussian Process, reanalysis

Dates

Published: 2025-10-22 21:31

Last Updated: 2025-10-23 17:29

License

CC BY Attribution 4.0 International

Additional Metadata

Data Availability (Reason not available):
Upon request