Automated machine learning to evaluate the information content of tropospheric trace gas columns for fine particle estimates over India: a modeling testbed

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1029/2022MS003099. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Zhonghua Zheng , Arlene M. Fiore, Daniel M. Westervelt, George P. Milly, Jeff Goldsmith, Alexandra Karambelas, Gabriele Curci, Cynthia A. Randles, Antonio R. Paiva, Chi Wang, Qingyun Wu, Sagnik Dey

Abstract

India is largely devoid of high-quality and reliable on-the-ground measurements of fine particulate matter (PM2.5). Ground-level PM2.5 concentrations are estimated from publicly available satellite Aerosol Optical Depth (AOD) products combined with other information. Prior research has largely overlooked the possibility of gaining additional accuracy and insights into the sources of PM using satellite retrievals of tropospheric trace gas columns. We first evaluate the information content of tropospheric trace gas columns for PM2.5 estimates over India within a modeling testbed using an Automated Machine Learning (AutoML) approach, which selects from a menu of different machine learning tools based on the dataset. We then quantify the relative information content of tropospheric trace gas columns, AOD, meteorological fields, and emissions for estimating PM2.5 over four Indian sub-regions on daily and monthly time scales. Our findings suggest that, regardless of the specific model assumptions, incorporating trace gas modeled columns improves PM2.5 estimates. We use the ranking scores produced from the AutoML algorithm and Spearman's rank correlation to infer the relative dominance of primary versus secondary sources of PM2.5 as a first step towards estimating particle composition. Our comparison of AutoML-derived models to selected baseline machine learning models demonstrates that AutoML is at least as good as model selection and hyperparameter tuning prior to training. The idealized pseudo-observations used in this work lay the groundwork for applying satellite retrievals of tropospheric trace gases to estimate fine particle concentrations in India and serve to illustrate the promise of AutoML applications in atmospheric and environmental research.

DOI

https://doi.org/10.31223/X51D0V

Subjects

Artificial Intelligence and Robotics, Atmospheric Sciences, Civil and Environmental Engineering, Computer Sciences, Earth Sciences, Environmental Engineering, Environmental Sciences, Oceanography and Atmospheric Sciences and Meteorology

Keywords

air quality, PM2.5, data science, machine learning, Automated Machine Learning, AutoML

Dates

Published: 2022-03-21 00:10

Last Updated: 2022-03-21 07:10

License

CC BY Attribution 4.0 International

Additional Metadata

Data Availability (Reason not available):
Scripts and data to reproduce the results and figures are preserved at https://doi.org/10.5281/zenodo.6363824 or https://github.com/zzheng93/code_DSI_India_AutoML. The raw data from GEOS-Chem simulations used for Automated Machine Learning and analysis in this study are available at https://doi.org/10.7916/nwx1-jt94.