A Classification-based Algorithm for Consistency Check of Part-of-Speech Tagging for Chinese Corpora
نویسندگان
چکیده
Ensuring consistency of Part-of-Speech (POS) tagging plays an important role in constructing high-quality Chinese corpora. After analyzing the POS tagging of multi-category words in largescale corpora, we propose a novel consistency check method of POS tagging in this paper. Our method builds a vector model of the context of multicategory words, and uses the k-NN algorithm to classify context vectors constructed from POS tagging sequences and judge their consistency. The experimental results indicate that the proposed method is feasible and effective.
منابع مشابه
A Study on Consistency Checking Method of Part-Of-Speech Tagging for Chinese Corpora
Ensuring consistency of Part-Of-Speech (POS) tagging plays an important role in the construction of high-quality Chinese corpora. After having analyzed the POS tagging of multi-category words in large-scale corpora, we propose a novel classification-based consistency checking method of POS tagging in this paper. Our method builds a vector model of the context of multi-category words along with ...
متن کاملA Two-Stage Approach to Chinese Part-of-Speech Tagging
This paper describes a Chinese part-ofspeech tagging system based on the maximum entropy model. It presents a novel two-stage approach to using the part-ofspeech tags of the words on both sides of the current word in Chinese part-of-speech tagging. The system is evaluated on four corpora at the Fourth SIGHAN Bakeoff in the close track of the Chinese part-ofspeech tagging task.
متن کاملTwo-level Word Class Categorization Model in Analytic Languages and Its Implications for POS Tagging in Modern Chinese Corpora
The study of word classes has a history of over 4000 years, and the word class problem in over 1000 analytic languages like Modern Chinese can be seen as the Goldbach Conjecture in linguistics. This paper first outlines the existing problems in the POS tagging of Modern Chinese corpora with a case study of 自信. Then it introduces the Two-level Word Class Categorization Model in analytic language...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملHMM-Based Part-of-Speech Tagging for Chinese Corpora
Chinese part-of-speech tagging is more difficult than its English counterpart because it needs to be solved together wgh the problem of word identification. In this paper, we present our work on Chinese part-ofspeech tagging based on a first-order, fully-connected hsdden Markov model. Part of the 1991 United Daily corpus of approzimately 10 million Chinese characters zs used for training and te...
متن کامل