A MEMs-based Labeling Approach to Punctuation Correction in Chinese Opinionated Text
نویسندگان
چکیده
This paper presents a maximum entropy models based approach to punctuation prediction and correction for Chinese opinionated texts. This study involves three parts. First, we conduct a survey of punctuation errors in Chinese opinionated texts based on a corpus of online product reviews. Then, we propose a maximum entropy sequence labeling approach to Chinese punctuation prediction. Finally, we perform punctuation error detection and correction by comparing automatically-predicted punctuations with the relevant original punctuations in opinionated texts. Our experimental results show our system is effective for most punctuation errors in Chinese opinionated texts.
منابع مشابه
A CRF Sequence Labeling Approach to Chinese Punctuation Prediction
This paper presents a conditional random fields based labeling approach to Chinese punctuation prediction. To this end, we first reformulate Chinese punctuation prediction as a multiple-pass labeling task on a sequence of words, and then explore various features from three linguistic levels, namely words, phrase and functional chunks for punctuation prediction under the framework of conditional...
متن کاملAligning Parallel Bilingual Corpora Statistically with Punctuation Criteria
We present a new approach to aligning sentences in bilingual parallel corpora based on punctuation, especially for English and Chinese. Although the length-based approach produces high accuracy rates of sentence alignment for clean parallel corpora written in two Western languages, such as French-English or German-English, it does not work as well for parallel corpora that are noisy or written ...
متن کاملNEUOM: Identifying Opinionated Sentences in Chinese and English Text
NEUOM: Identifying Opinionated Sentences in Chinese and English Text Zhang, Ke Wang, Muhua Zhu, Tong Xiao, Jingbo Zhu Natural Language Processing Lab, Northeastern University {zhangcl, xiaotong, zhujingbo}@mail.neu.edu.cn [email protected], [email protected] Abstract This paper introduces our NEUOM system which participates in the opinionated sentence detection task, one of evaluation task...
متن کاملPause and Stop Labeling for Chinese Sentence Boundary Detection
The fuzziness of Chinese sentence boundary makes discourse analysis more challenging. Moreover, many articles posted on the Internet are even lack of punctuation marks. In this paper, we collect documents written by masters as a reference corpus and propose a model to label the punctuation marks for the given text. Conditional random field (CRF) models trained with the corpus determine the corr...
متن کاملKinds of Features for Chinese Opinionated Information Retrieval
This paper presents the results of experiments in which we tested different kinds of features for retrieval of Chinese opinionated texts. We assume that the task of retrieval of opinionated texts (OIR) can be regarded as a subtask of general IR, but with some distinct features. The experiments showed that the best results were obtained from the combination of character-based processing, diction...
متن کامل