{"pk":49613,"title":"Understanding is Seeing: Metaphorical and Visual Reasoning in Multimodal Large Language Models","subtitle":null,"abstract":"Drawing from the Conceptual Metaphor Theory and the\nStructure-Mapping Theory, this paper introduces two exploratory works in the field of metaphorical and visual reasoning using vision models and multimodal large language models. (i) The Multimodal Chain-of-Thought Prompting for\nMetaphor Generation task aimed to generate metaphorical linguistic expressions from non-metaphorical images by using the\nmultimodal LLaVA 1.5 model and the two-step approach of multimodal chain-of-thought prompting. The results showed\nthe model's ability to generate metaphorical expressions, as\n92% of them were classified as metaphors by human evaluators. Additionally, the evaluation revealed interesting patterns\nin terms of metaphoricity, familiarity and appeal scores across\nthe generated metaphors. (ii) The Metaphorical Visual Analogy (MeVA) task consisted in solving visual analogies of the\nkind \"source_domain : target_domain :: source_element : ?\"\nby choosing the correct target element among three difficult\ndistractors, varying in semantic domains and roles. The results showed that all six models and humans performed higher\nthan chance level, with only GPT-4o and ConvNeXt achieving higher than humans. Moreover, the error analysis showed\nthat, in solving the analogies, the most frequent error was the\nselection of distractor 1. These works showed encouraging results for future research in the field of metaphorical and visual\nreasoning, contributing to the broader question of whether AI\nmodels serve as empirical tests of existing cognitive theories.","language":"eng","license":{"name":"","short_name":"","text":null,"url":""},"keywords":[{"word":"Artificial Intelligence; Analogy; Creativity; Language understanding; Natural Language Processing"}],"section":"Papers with Poster Presentation","is_remote":true,"remote_url":"https://escholarship.org/uc/item/1zd9598p","frozenauthors":[{"first_name":"Sofia","middle_name":"","last_name":"Lugli","name_suffix":"","institution":"University of Trento","department":""},{"first_name":"Carlo","middle_name":"","last_name":"Strapparava","name_suffix":"","institution":"FBK-Irst","department":""}],"date_submitted":null,"date_accepted":null,"date_published":"2025-01-01T18:00:00Z","render_galley":null,"galleys":[{"label":"PDF","type":"pdf","path":"https://journalpub.escholarship.org/cognitivesciencesociety/article/49613/galley/37575/download/"}]}