{"pk":49641,"title":"INTUIT: Investigating intuitive reasoning in humans and language models","subtitle":null,"abstract":"We introduce the INtuitive Theory Use and Inference Test (INTUIT), a cognitive test battery targeting common-sense physical and social reasoning. INTUIT adapts classic story-based question-and-answer methods for AI evaluation using VIGNET --- a novel tool that addresses some limitations of existing test batteries through procedurally generated vignettes. We evaluated INTUIT on three GPT models (GPT-4o, GPT-4o-mini, GPT-4.1-mini), one reasoning model (o3-mini), and a human sample (N = 147). Humans generally outperformed models, especially on object function and agent intention inference types. These results highlight INTUIT's sensitivity to intuitive reasoning capabilities and VIGNET's broader application for the evaluation of cognitive capabilities in humans and AI.","language":"eng","license":{"name":"","short_name":"","text":null,"url":""},"keywords":[{"word":"Artificial Intelligence; Computer Science; Reasoning; Theory of Mind"}],"section":"Papers with Poster Presentation","is_remote":true,"remote_url":"https://escholarship.org/uc/item/33z8g5dn","frozenauthors":[{"first_name":"Jonathan","middle_name":"","last_name":"Prunty","name_suffix":"","institution":"University of Cambridge","department":""},{"first_name":"Aoife","middle_name":"","last_name":"O'Flynn","name_suffix":"","institution":"University of Cambridge","department":""},{"first_name":"Patrick","middle_name":"","last_name":"Quinn","name_suffix":"","institution":"University of Cambridge","department":""},{"first_name":"Lucy","middle_name":"G","last_name":"Cheke","name_suffix":"","institution":"University of Cambridge","department":""}],"date_submitted":null,"date_accepted":null,"date_published":"2025-01-01T12:00:00-06:00","render_galley":null,"galleys":[{"label":"Final corrected","type":"pdf","path":"https://journalpub.escholarship.org/cognitivesciencesociety/article/49641/galley/47710/download/"}]}