Performance of Machine Learning for Ozone Modeling in Southern California during the COVID-19 Shutdown

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1039/D3EA00159H. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Khanh Do, Arash Kashfi Yeganeh, Ziqi Gao, Cesunica Ivey 

Abstract

We combine machine learning (ML) and geospatial interpolations to create two-dimensional high-resolution ozone concentration fields over the South Coast Air Basin (SoCAB) for the entire year of 2020. Three spatial interpolation methods (bicubic, IDW, and ordinary kriging) were employed. The predicted ozone concentration fields were constructed using 15 building sites predicted by the ML method, and random forest regression was employed to test the predictability of 2020 data based on input data from past years. Spatially interpolated ozone concentrations were evaluated at twelve sites that were independent of the actual spatial interpolations to find the most suitable method for SoCAB. Ordinary kriging interpolation had the best performance overall for 2020: concentrations were overestimated for Anaheim, Compton, LA North Main Street, LAX, Rubidoux, and San Gabriel sites and underestimated for Banning, Glendora, Lake Elsinore, and Mira Loma sites. The model performance improved from the West to the East, exhibiting better predictions for inland sites. The model is best at interpolating ozone concentrations inside the sampling region (bounded by the building sites), with R^2 ranging from 0.56 to 0.85 for those sites, as prediction deficiencies occurred at the periphery of the sampling region, with the lowest R2 of 0.39 for Winchester. All interpolation methods poorly predicted and underestimated ozone concentrations in Crestline during summer (up to 19 ppb). Poor performance for Crestline indicates that the site has a distribution of air pollution level independent from all other sites. Therefore, historical data from coastal and inland sites should not be used to predict ozone in Crestline using data-driven spatial interpolation approaches. The study demonstrates the utility of ML and geospatial techniques for evaluating air pollution levels during anomalous periods. Both ML and CMAQ do not fully capture the irregularities caused by emission reductions during the COVID-19 lockdown period (March – May) in the SoCAB. The results from ML indicate that there has never been a similar pattern in air quality to that of the COVID-19 lockdown in the past. Including 2020 training data in the ML model training improves the model's performance and its ability to predict future abnormalities in air quality.

DOI

https://doi.org/10.31223/X5SX17

Subjects

Civil and Environmental Engineering

Keywords

Ozone, machine learning, COVID-19, modeling, Southern California

Dates

Published: 2023-11-28 11:34

Last Updated: 2024-04-01 17:08

Older Versions
License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
Authors declare no conflicts of interest