{"pk":25756,"title":"Multiple Language Gender Identification for Blog Posts","subtitle":null,"abstract":"In data-driven gender identification, it has been so far largely\nassumed that the same types of (mostly content-oriented) data\nfeatures can be used to differentiate between male and female\nauthors. In most cases, this distinction is done in a monolingual\nscenario. In this work, we discuss a set of features that\ndistinguish between genders in six different datasets of blog\ndata in English, Spanish, French, German, Italian and Catalan\nwith accuracies that range from 77% to 88%. Using a reduced\nset of language-independent structural features in a multilingual\nscenario we first identify the gender and then the gender\nand language of the author, achieving accuracies higher than\n74%.","language":"eng","license":{"name":"","short_name":"","text":null,"url":""},"keywords":[{"word":"Natural Language Processing; Text Categorization;\nAuthor Profiling; Gender Identification"}],"section":"Papers","is_remote":true,"remote_url":"https://escholarship.org/uc/item/66d4z61k","frozenauthors":[{"first_name":"Juan","middle_name":"","last_name":"Soler-Company","name_suffix":"","institution":"Pompeu Fabra University","department":""},{"first_name":"Leo","middle_name":"","last_name":"Wanner","name_suffix":"","institution":"Pompeu Fabra University","department":""}],"date_submitted":null,"date_accepted":null,"date_published":"2015-01-01T18:00:00Z","render_galley":null,"galleys":[{"label":"PDF","type":"pdf","path":"https://journalpub.escholarship.org/cognitivesciencesociety/article/25756/galley/15380/download/"}]}