This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
Investigating the application of LLMs to invertebrate palaeontology through the development of automated taxonomy assistants for brachiopod identification
Downloads
Authors
Abstract
Taxonomic identification is a central practice in palaeontology, underpinning biostratigraphic correlations, palaeobiogeographic reconstructions, and analyses of macroevolutionary patterns. Despite its importance, taxonomy depends on a limited number of specialists and on the synthesis of extensive descriptive literature that is often difficult to access. Recent developments in artificial intelligence provide potential tools to support taxonomic work and improve accessibility and efficiency in fossil identification. Most automated approaches have so far relied on deep learning models trained on photographic datasets of fossil specimens. While effective for some microfossil groups, these systems face substantial limitations when applied to macrofossils, which are often incompletely preserved, morphologically complex, and poorly suited to standardized imaging workflows.
Because palaeontological taxonomy is fundamentally text-based—relying on diagnoses, descriptions, and comparative remarks published in the literature—Large Language Models (LLMs) offer an alternative framework for automated assistance. Here we explore the application of LLM-augmented taxonomy systems (LATS) to invertebrate fossil identification through the development of a prototype system for brachiopods. The system is trained on genus-level diagnoses extracted from the Treatise on Invertebrate Paleontology, Part H: Brachiopoda (Revised), one of the most comprehensive and authoritative compilations of fossil invertebrate taxonomy. Strategies were implemented to address the brevity of diagnoses, including integration with descriptions of higher-rank taxa and adjustable retrieval knowledge basis.
Preliminary testing indicates that the system reliably provides plausible candidate matches and handles complex morphological terminology effectively. LATS thus represent a promising approach for developing automated assistants in macrofossil taxonomy, with potential future integration of expanded textual databases and image-based analyses.
DOI
https://doi.org/10.31223/X5D48J
Subjects
Earth Sciences, Physical Sciences and Mathematics
Keywords
taxonomy, palaeontology, LLM, Artificial intelligence, brachiopods, biodiversity
Dates
Published: 2026-03-19 15:16
Last Updated: 2026-03-19 15:16
License
CC BY Attribution 4.0 International
Additional Metadata
Conflict of interest statement:
None
Data Availability:
Data used for this research are included in the paper or freely accessible at https://geogpt-sg.zero2x.org/
Metrics
Views: 20
Downloads: 0
There are no comments or no comments have been made public for this article.