Skip to main content
Phytoplankton Community Composition Retrieval from Space

Phytoplankton Community Composition Retrieval from Space

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Susanne Elizabeth Craig, Erdem M. Karaköylü

Abstract

Phytoplankton community composition (PCCs) - also referred to as functional
groups play a key role in ocean biogeochemical cycling, climate regulation, and
marine ecosystem dynamics. Accurate quantification of these groups from satellite
ocean color data remains challenging due to spectral similarities among phytoplank-
ton types and the limitations of existing empirical and semi-analytical models. In
this study, we used an extreme gradient boosting (XGBoost) tree-based regression
model to retrieve multiple PCCs and total chlorophyll-a concentrations from sim-
ulated hyperspectral remote sensing top-of-atmosphere (TOA) ocean color data
as well as some ancillary data. The intent is to mimic what could be gathered
from the NASA Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) mission and
auxiliary data sources to characterize to characterize the environment. In its final
form, the model, validated on an out-of-sample set, demonstrated strong predictive
performance across most functional groups, with R2 values exceeding 0.95. Dinoflag-
ellate retrievals showed lower accuracy (R2 = 0.53). Further analysis revealed that
temperature was a key predictor alongside hyperspectral TOA radiance, suggesting
that integrating external temperature data could enhance future retrieval models.
Furthermore, despite using only 10% of the available hyperspectral bands, feature
importance analysis showed that specific spectral regions disproportionately con-
tributed to model predictions. These findings highlight the potential of machine
learning for phytoplankton classification and inform future algorithm development
for hyperspectral ocean color missions.

DOI

https://doi.org/10.31223/X5QQ9K

Subjects

Marine Biology

Keywords

phytoplankton, Regression, XGBoost, Shap, Explainable AI

Dates

Published: 2025-08-02 17:01

Last Updated: 2025-08-02 17:01

License

No Creative Commons license

Additional Metadata

Data Availability (Reason not available):
Data and code are available.