Skip to main content
Investigating the application of LLMs to invertebrate palaeontology through the development of automated taxonomy assistants for brachiopod identification

Investigating the application of LLMs to invertebrate palaeontology through the development of automated taxonomy assistants for brachiopod identification

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Alessandro Carniti, Michael Henry Stephenson, Jiaxi Yang, Shuzhong Shen, Junxuan Fan, Jieping Ye

Abstract

Taxonomic identification is a central practice in palaeontology, underpinning biostratigraphic correlations, palaeobiogeographic reconstructions, and analyses of macroevolutionary patterns. Despite its importance, taxonomy depends on a limited number of specialists and on the synthesis of extensive descriptive literature that is often difficult to access. Recent developments in artificial intelligence provide potential tools to support taxonomic work and improve accessibility and efficiency in fossil identification. Most automated approaches have so far relied on deep learning models trained on photographic datasets of fossil specimens. While effective for some microfossil groups, these systems face substantial limitations when applied to macrofossils, which are often incompletely preserved, morphologically complex, and poorly suited to standardized imaging workflows.
Because palaeontological taxonomy is fundamentally text-based—relying on diagnoses, descriptions, and comparative remarks published in the literature—Large Language Models (LLMs) offer an alternative framework for automated assistance. Here we explore the application of LLM-augmented taxonomy systems (LATS) to invertebrate fossil identification through the development of a prototype system for brachiopods. The system is trained on genus-level diagnoses extracted from the Treatise on Invertebrate Paleontology, Part H: Brachiopoda (Revised), one of the most comprehensive and authoritative compilations of fossil invertebrate taxonomy. Strategies were implemented to address the brevity of diagnoses, including integration with descriptions of higher-rank taxa and adjustable retrieval knowledge basis.
Preliminary testing indicates that the system reliably provides plausible candidate matches and handles complex morphological terminology effectively. LATS thus represent a promising approach for developing automated assistants in macrofossil taxonomy, with potential future integration of expanded textual databases and image-based analyses.

DOI

https://doi.org/10.31223/X5D48J

Subjects

Earth Sciences, Physical Sciences and Mathematics

Keywords

taxonomy, palaeontology, LLM, Artificial intelligence, brachiopods, biodiversity

Dates

Published: 2026-03-19 15:16

Last Updated: 2026-03-19 15:16

License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
None

Data Availability:
Data used for this research are included in the paper or freely accessible at https://geogpt-sg.zero2x.org/

Metrics

Views: 20

Downloads: 0