{"pk":49779,"title":"Why Multimodal Models Struggle with Spatial Reasoning: Insights from Human Cognition","subtitle":null,"abstract":"Multimodal models excel in tasks requiring semantic integra-\ntion of language and vision but struggle with spatial cognition.\nUsing a visual perspective-taking task inspired by cognitive\nscience, we find these models fail when the image and ref-\nerence view differ, reflecting spatial cognition comparable to\na two-year-old child. To explore these disparities further, we\nanalyze internal representations using a human action fMRI\ndataset and voxelwise encoding models, revealing key differ-\nences between AI and human spatial encoding. This work pro-\nvides new benchmarks and insights into bridging artificial and\nbiological cognition.","language":"eng","license":{"name":"","short_name":"","text":null,"url":""},"keywords":[{"word":"Artificial Intelligence; Cognitive Neuroscience; Spatial cognition; fMRI; Knowledge representation"}],"section":"Papers with Poster Presentation","is_remote":true,"remote_url":"https://escholarship.org/uc/item/9px073fx","frozenauthors":[{"first_name":"Bridget","middle_name":"","last_name":"Leonard","name_suffix":"","institution":"University of Washington","department":""},{"first_name":"Kristin","middle_name":"","last_name":"Woodard","name_suffix":"","institution":"University of Washington","department":""},{"first_name":"Scott","middle_name":"","last_name":"Murray","name_suffix":"","institution":"University of Washington","department":""}],"date_submitted":null,"date_accepted":null,"date_published":"2025-01-01T18:00:00Z","render_galley":null,"galleys":[{"label":"PDF","type":"pdf","path":"https://journalpub.escholarship.org/cognitivesciencesociety/article/49779/galley/37741/download/"}]}