Unsupervised Part of Speech Tagging Without a Lexicon
نویسندگان
چکیده
Unsupervised dependency parsing frequently assume that input sentences have already been labeled with POS tags. Likewise, most unsupervised POS taggers (including those proposed by [1] and [2]) either produce numeric labels on words without providing a mapping to POS tags or they rely on language specific lexical information such as lists reporting the possible tags that some or all of the words can take. However, linguists have devoted decades of research toward identifying features of word order in various languages and toward understanding principles that influence the structure of natural languages in general [3] [4].
منابع مشابه
A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages
In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i.e., parallel) data. We first induce a translation lexicon from monolingual corpora, based on cognate detection followed by cross-lingual contextual similarity. Second, POS informatio...
متن کاملWeakly Supervised Part-of-Speech Tagging for Morphologically-Rich, Resource-Scarce Languages
This paper examines unsupervised approaches to part-of-speech (POS) tagging for morphologically-rich, resource-scarce languages, with an emphasis on Goldwater and Griffiths’s (2007) fully-Bayesian approach originally developed for English POS tagging. We argue that existing unsupervised POS taggers unrealistically assume as input a perfect POS lexicon, and consequently, we propose a weakly supe...
متن کاملUnsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering
An unsupervised part-of-speech (POS) tagging system that relies on graph clustering methods is described. Unlike in current state-of-the-art approaches, the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs: one based on context similarity of high frequency words, another on log-likelihood statistics for words of lower fre...
متن کاملPart-of-Speech Tagging in Context
We present a new HMM tagger that exploits context on both sides of a word to be tagged, and evaluate it in both the unsupervised and supervised case. Along the way, we present the first comprehensive comparison of unsupervised methods for part-of-speech tagging, noting that published results to date have not been comparable across corpora or lexicons. Observing that the quality of the lexicon g...
متن کاملUnsupervised Learning of Word-Category Guessing Rules
Words unknown to the lexicon present a substantial problem to part-of-speech tagging. In this paper we present a technique for fully unsupervised statistical acquisition of rules which guess possible partsof-speech for unknown words. Three complementary sets of word-guessing rules are induced from the lexicon and a raw corpus: prefix morphological rules, suffix morphological rules and ending-gu...
متن کامل