Skip to main content
Evaluating Sampling Bias and Model Uncertainty in Species Distribution Models of Marine Plankton Using Virtual Ecosystem Data

Evaluating Sampling Bias and Model Uncertainty in Species Distribution Models of Marine Plankton Using Virtual Ecosystem Data

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Zhibo Shao , B. B. Cael, Thelma Panaïotis, Anna Hickman, Stephanie Dutkiewicz, Ben Ward

Abstract

Understanding the biodiversity and biogeography of plankton in the ocean is essential for predicting responses to environmental changes and informing ocean conservation and management strategies. Species distribution models (SDMs) are a pivotal tool in this regard. This study used data from a global marine ecosystem model as a testbed to assess the reliability of various SDMs, including Generalized Linear Model (GLM), Generalized Additive Model (GAM), Random Forest (RF), Boosted Regression Trees (BRT) and Artificial Neural Network (ANN). We used artificial datasets to replicate the sampling patterns of three datasets: a compiled dataset of global scope, the Tara Ocean dataset, and the Atlantic Meridional Transect (AMT) project. Our findings indicate that tree-based algorithms, RF and BRT, exhibit better predictive accuracy and stability compared to GLM, GAM, and ANN, especially when trained with more spatially resolved datasets. We highlight the significant influence of sampling bias on model performance, with models trained on more comprehensive global datasets outperforming those trained on more latitudinally and longitudinally biased data respectively (Tara and AMT). Furthermore, we demonstrate that broad spatial coverage is a more critical determinant of predictive skill than sample size alone, as simply increasing sampling density within a biased region is insufficient to overcome poor spatial representation. Overall, this research underscores the necessity of careful consideration of sampling strategies and model selection in plankton species distribution modelling.

DOI

https://doi.org/10.31223/X5HV0P

Subjects

Life Sciences

Keywords

Species distribution model, ecosystem model, model evaluation

Dates

Published: 2026-04-29 03:05

Last Updated: 2026-04-29 03:05

License

CC BY Attribution 4.0 International

Additional Metadata

Data Availability:
The physical model used in the Darwin simulation is the MIT General Circulation Model (MITgcm), accessible at http://mitgcm.org. The generic ecosystem code is available at https://gitlab.com/jahn/gud, and detailed equations and documentation can be found at https://darwin3.readthedocs.io/en/latest/phys_pkgs/darwin.html. The Darwin model data can be downloaded at https://doi.org/10.7910/DVN/RPL6PT and https://doi.org/10.7910/DVN/LQH9PX. The SDMs model script can be accessed in GitHub https://github.com/ZhiboShao/uncertainty-and-predicitability-of-plankton-SDM-. The data can be found in Zenodo https://doi.org/10.5281/zenodo.14219377 .

Metrics

Views: 20

Downloads: 1