From linguistic evaluation to mechanistic verification: testing LLM-generated farm recommendations

João Serra; Franca Giannini-Kurina; David Kraus; Meshach Ojo Aderele; Michal Antala; Sheng Wang; Jaber Rahimi; Claire Treat; Sevval Frank; Mary Grace Barbacias; Tom Cripps; Morten Graversgaard; Klaus Butterbach-Bahl; Diego Abalos; Vasilis Michailidis

From linguistic evaluation to mechanistic verification: testing LLM-generated farm recommendations

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.

Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

João Serra, Franca Giannini-Kurina, David Kraus, Meshach Ojo Aderele, Michal Antala, Sheng Wang, Jaber Rahimi, Claire Treat, Sevval Frank, Mary Grace Barbacias, Tom Cripps, Morten Graversgaard, Klaus Butterbach-Bahl, Diego Abalos, Vasilis Michailidis

Abstract

Large language models (LLM) are increasingly used to generate farm-management advice, but their biophysical consequences remain largely unverified. We introduce a process-based verification framework that combines management portfolios generated by ChatGPT and Claude with the process-based model LandscapeDNDC across 11 contrasting agroecosystems. The LLMs produced agronomically plausible interventions, privileging changes in fertiliser timing, splitting and rates, and generally preserved crop yields. However, their agreement was markedly weaker for environmental targets like nitrogen losses and soil organic carbon. LLMs predicted the direction of agri-environmental change more reliably than its magnitude: directional agreement averaged 86%, whereas only 49% of simulated responses fell within the expected ranges. Portfolios that targeted several targets simultaneously rarely performed consistently across sites. Our results show that process-based models can screen AI-generated farm recommendations for environmental burden shifting before they are used in practice.

DOI

https://doi.org/10.31223/X5F78W

Subjects

Agricultural Science, Agriculture, Agronomy and Crop Sciences Life Sciences, Biochemistry, Biophysics, and Structural Biology, Biogeochemistry, Environmental Indicators and Impact Assessment, Research Methods in Life Sciences, Soil Science, Sustainability

Keywords

LLM, PBM, LandscapeDNDC, ChatGPT, Claude, Foundational model, Verification layer, Farm recommendation, Agri-environment, Yields

Dates

Published: 2026-07-03 10:31

Last Updated: 2026-07-03 17:49

Older Versions

Version 1 - 2026-07-03

License

CC BY Attribution 4.0 International

Additional Metadata

Data Availability:
The analysis scripts, processed model outputs and prompting templates will be made available in a public repository upon release of the full manuscript. The present preprint reports a proof-of-concept analysis based on existing LandscapeDNDC demonstration setups.

Metrics

Views: 1231

Downloads: 59