A Novel Heuristic Method for Detecting Overfit in Unsupervised Classification of Climate Model Data

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1017/eds.2023.40. This is version 5 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Emma Joan Douglas Boland, Erin Atkinson , Dan Jones 

Abstract

Unsupervised classification is becoming an increasingly common method to objectively identify coherent structures within both observed and modelled climate data. However, in most applications using this method, the user must choose the number of classes into which the data are to be sorted in advance. Typically, a combination of statistical methods and expertise is used to choose the appropriate number of classes for a given study, however it may not be possible to identify a single `optimal' number of classes. In this work, we present a heuristic method, the Ensemble Difference Criterion, for unambiguously determining the maximum number of classes supported by model data ensembles. This method requires robustness in the class definition between simulated ensembles of the system of interest. For demonstration, we apply this to the clustering of Southern Ocean potential temperatures in a CMIP6 climate model, and show that the data supports between four and seven classes of a Gaussian Mixture Model.

DOI

https://doi.org/10.31223/X5B66V

Subjects

Analysis, Oceanography and Atmospheric Sciences and Meteorology, Physical Sciences and Mathematics

Keywords

climate modelling, , Unsupervised Classification, Methods, Ocean Data, unsupervised classification, methods, Ocean Data

Dates

Published: 2023-03-10 08:04

Last Updated: 2023-10-25 07:25

Older Versions
License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
None