Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian
نویسندگان
چکیده
We present experiments with part-ofspeech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.
منابع مشابه
Weakly Supervised Part-of-Speech Tagging for Morphologically-Rich, Resource-Scarce Languages
This paper examines unsupervised approaches to part-of-speech (POS) tagging for morphologically-rich, resource-scarce languages, with an emphasis on Goldwater and Griffiths’s (2007) fully-Bayesian approach originally developed for English POS tagging. We argue that existing unsupervised POS taggers unrealistically assume as input a perfect POS lexicon, and consequently, we propose a weakly supe...
متن کاملVerbs are where all the action lies: Experiences of Shallow Parsing of a Morphologically Rich Language
Verb suffixes and verb complexes of morphologically rich languages carry a lot of information. We show that this information if harnessed for the task of shallow parsing can lead to dramatic improvements in accuracy for a morphologically rich languageMarathi1. The crux of the approach is to use a powerful morphological analyzer backed by a high coverage lexicon to generate rich features for a C...
متن کاملFactors Affecting Part-of-Speech Tagging for Tagalog
This paper investigates factors contributing to the performance of the POS Tagger for Tagalog language. Tagalog, a morphologically rich language, exhibits complex morphological structure, makes use of morphological information in determining parts of speech of the word, aspect and voice. As word feature information plays important role in efficient tagging, tag set definition capturing word inf...
متن کاملFeature-Rich Part-Of-Speech Tagging Using Deep Syntactic and Semantic Analysis
This paper describes the implementation, improvement and evaluation of the machine translation (MT) system proposed by Jackov (2014) when used as a feature-rich part-ofspeech (POS) tagger for Bulgarian. The system does not rely on POS tagging for morphological disambiguation. Instead, all ambiguities are considered in parsing hypotheses that are scored and the best one is used for tagging. The ...
متن کاملJoint Ensemble Model for POS Tagging and Dependency Parsing
In this paper we present several approaches towards constructing joint ensemble models for morphosyntactic tagging and dependency parsing for a morphologically rich language – Bulgarian. In our experiments we use state-of-the-art taggers and dependency parsers to obtain an extended version of the treebank for Bulgarian, BulTreeBank, which, in addition to the standard CoNLL fields, contains pred...
متن کامل