Skip to main content
PhyX - Predicting Phytoplankton Community Composition from Satellite Ocean Color

PhyX - Predicting Phytoplankton Community Composition from Satellite Ocean Color

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Susanne Elizabeth Craig, Erdem M. Karaköylü 

Abstract

The way in which phytoplankton communities are structured - often referred to as phytoplankton community composition (PCC) - exerts fundamental control on ocean biogeochemical cycling, climate regulation, and marine ecosystem dynamics. Accurate quantification of these groups from satellite ocean color data remains challenging due to spectral similarities among phytoplankton types and the limitationsof existing empirical and semi-analytical models. In this study, we used an extreme gradient boosting (XGBoost) tree-based regression model to retrieve multiple PCCs and total chlorophyll-a concentrations from simulated hyperspectral remote sensing top-of-atmosphere (TOA) ocean color data as well as some ancillary data. The intent is to mimic what could be gathered from the NASA Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) mission and auxiliary data sources to characterize to char- acterize the environment. In its final form, the model, validated on an out-of-sample set, demonstrated strong predictive performance across most functional groups, with R2 values exceeding 0.95. Dinoflagellate retrievals showed lower accuracy (R2 = 0.53). Further analysis revealed that temperature was a key predictor alongside hyperspectral TOA radiance, suggesting that integrating external temperature data could enhance future retrieval models. Furthermore, despite using only 10% of the available hyperspectral bands, feature importance analysis showed that specific spectral regions disproportionately contributed to model predictions. These findings highlight the potential of machine learning for phytoplankton classification and inform future algorithm development for hyperspectral ocean color missions.

DOI

https://doi.org/10.31223/X5QQ9K

Subjects

Marine Biology

Keywords

phytoplankton, Regression, XGBoost, Shap, Explainable AI

Dates

Published: 2025-08-02 22:01

Last Updated: 2025-08-21 20:12

Older Versions

License

No Creative Commons license

Additional Metadata

Data Availability (Reason not available):
Data and code are available.