{"pk":34987,"title":"Segmenting and POS tagging Classical Tibetan using a memory-based tagger","subtitle":null,"abstract":"This paper presents a new approach to two challenging NLP tasks in Classical Tibetan: word segmentation and Part-of-Speech (POS) tagging. We demonstrate how both these problems can be approached in the same way, by generating a memory-based tagger that assigns 1) segmentation tags and 2) POS tags to a test corpus consisting of unsegmented lines of Tibetan characters. We propose a three-stage workflow and evaluate the results of both the segmenting and the POS tagging tasks. We argue that the Memory-Based Tagger (MBT) and the proposed workflow not only provide an adequate solution to these NLP challenges, they are also highly efficient tools for building larger annotated corpora of Tibetan.","language":"en","license":null,"keywords":[{"word":"Tibetan, word segmentation, Memory-Based Tagging"}],"section":"Articles","is_remote":true,"remote_url":"https://escholarship.org/uc/item/8b83z79n","frozenauthors":[{"first_name":"Marieke","middle_name":"","last_name":"Meelen","name_suffix":"","institution":"University of Cambridge","department":"None"},{"first_name":"Nathan","middle_name":"","last_name":"Hill","name_suffix":"","institution":"School of Oriental and African Studies","department":"None"}],"date_submitted":"2017-04-12T22:39:44Z","date_accepted":"2017-04-12T22:39:44Z","date_published":"2017-12-31T08:00:00Z","render_galley":null,"galleys":[{"label":"","type":"pdf","path":"https://journalpub.escholarship.org/himalayanlinguistics/article/34987/galley/26090/download/"}]}