{"pk":27900,"title":"Automatic Estimation of Lexical Concreteness in 77 Languages","subtitle":null,"abstract":"We estimate lexical Concreteness for millions of wordsacross 77 languages. Using a simple regression framework,we combine vector-based models of lexical semantics withexperimental norms of Concreteness in English and Dutch.By applying techniques to align vector-based semantics acrossdistinct languages, we compute and release Concreteness esti-mates at scale in numerous languages for which experimentalnorms are not currently available. This paper lays out thetechnique and its efficacy. Although this is a difficult datasetto evaluate immediately, Concreteness estimates computedfrom English correlate with Dutch experimental norms at ρ= .75 in the vocabulary at large, increasing to ρ = .8 amongNouns. Our predictions also recapitulate attested relationshipswith word frequency. The approach we describe can be readilyapplied to numerous lexical measures beyond Concreteness.","language":"eng","license":{"name":"","short_name":"","text":null,"url":""},"keywords":[{"word":"word2vec"},{"word":"Concreteness; multilingual; skipgram; norms"}],"section":"Publication-based-Talks","is_remote":true,"remote_url":"https://escholarship.org/uc/item/7dz7k3k1","frozenauthors":[{"first_name":"Bill","middle_name":"","last_name":"Thompson","name_suffix":"","institution":"Max Planck Institute for Psycholinguistics","department":""},{"first_name":"Gary","middle_name":"","last_name":"Lupyan","name_suffix":"","institution":"Max Planck Institute for Psycholinguistics","department":""}],"date_submitted":null,"date_accepted":null,"date_published":"2018-01-01T18:00:00Z","render_galley":null,"galleys":[{"label":"PDF","type":"pdf","path":"https://journalpub.escholarship.org/cognitivesciencesociety/article/27900/galley/17538/download/"}]}