Shallow Parsing By Weighted Probabilistic Sum
نویسندگان
چکیده
In this paper, we define the chunking problem as a classification of words and present a weighted probabilistic model for a text chunking. The proposed model exploits context features around the focus word. And to alleviate the sparse data problem, it integrates general features with specific features. In the training stage, we select useful features after measuring information gain ratio of each features and assign higher weight to more informative feature by adopting the information gain ratio. At the application time, we classify words into chunk labels while checking consistency of the begin and the end of a chunk. The experimental results show that the model combining general and specific features alleviates the sparse data problem. In addition, the weighted probabilistic model based on information gain ratio outperforms the non-weighted model.
منابع مشابه
A Language-Independent Shallow-Parser Compiler
We present a rule−based shallow− parser compiler, which allows to generate a robust shallow−parser for any language, even in the absence of training data, by resorting to a very limited number of rules which aim at identifying constituent boundaries. We contrast our approach to other approaches used for shallow−parsing (i.e. finite−state and probabilistic methods). We present an evaluation of o...
متن کاملIRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Confidence Estimation in Structured Prediction
Structured classification tasks such as sequence labeling and dependency parsing have seen much interest by the Natural Language Processing and the machine learning communities. Several online learning algorithms were adapted for structured tasks such as Perceptron, PassiveAggressive and the recently introduced Confidence-Weighted learning . These online algorithms are easy to implement, fast t...
متن کاملThe Effect of Rhythm on Structural Disambiguation in Chinese
The length of a constituent (number of syllables in a word or number of words in a phrase), or rhythm, plays an important role in Chinese syntax. This paper systematically surveys the distribution of rhythm in constructions in Chinese from the statistical data acquired from a shallow tree bank. Based on our survey, we then used the rhythm feature in a practical shallow parsing task by using rhy...
متن کاملIntegration of supra-lexical linguistic models with speech recognition using shallow parsing and finite state transducers
This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing. The shallow parsing grammar is derived directly from the full fledged grammar for natural language understanding, and augmented with top-level n-gram probabilities and phrase-level context-dependent probabilities, whi...
متن کامل