A MEMs-based Labeling Approach to Punctuation Correction in Chinese Opinionated Text

نویسندگان

  • Yanqing Zhao
  • Guohong Fu
چکیده

This paper presents a maximum entropy models based approach to punctuation prediction and correction for Chinese opinionated texts. This study involves three parts. First, we conduct a survey of punctuation errors in Chinese opinionated texts based on a corpus of online product reviews. Then, we propose a maximum entropy sequence labeling approach to Chinese punctuation prediction. Finally, we perform punctuation error detection and correction by comparing automatically-predicted punctuations with the relevant original punctuations in opinionated texts. Our experimental results show our system is effective for most punctuation errors in Chinese opinionated texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A CRF Sequence Labeling Approach to Chinese Punctuation Prediction

This paper presents a conditional random fields based labeling approach to Chinese punctuation prediction. To this end, we first reformulate Chinese punctuation prediction as a multiple-pass labeling task on a sequence of words, and then explore various features from three linguistic levels, namely words, phrase and functional chunks for punctuation prediction under the framework of conditional...

متن کامل

Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria

We present a new approach to aligning sentences in bilingual parallel corpora based on punctuation, especially for English and Chinese. Although the length-based approach produces high accuracy rates of sentence alignment for clean parallel corpora written in two Western languages, such as French-English or German-English, it does not work as well for parallel corpora that are noisy or written ...

متن کامل

NEUOM: Identifying Opinionated Sentences in Chinese and English Text

NEUOM: Identifying Opinionated Sentences in Chinese and English Text Zhang, Ke Wang, Muhua Zhu, Tong Xiao, Jingbo Zhu Natural Language Processing Lab, Northeastern University {zhangcl, xiaotong, zhujingbo}@mail.neu.edu.cn [email protected], [email protected] Abstract This paper introduces our NEUOM system which participates in the opinionated sentence detection task, one of evaluation task...

متن کامل

Pause and Stop Labeling for Chinese Sentence Boundary Detection

The fuzziness of Chinese sentence boundary makes discourse analysis more challenging. Moreover, many articles posted on the Internet are even lack of punctuation marks. In this paper, we collect documents written by masters as a reference corpus and propose a model to label the punctuation marks for the given text. Conditional random field (CRF) models trained with the corpus determine the corr...

متن کامل

Kinds of Features for Chinese Opinionated Information Retrieval

This paper presents the results of experiments in which we tested different kinds of features for retrieval of Chinese opinionated texts. We assume that the task of retrieval of opinionated texts (OIR) can be regarded as a subtask of general IR, but with some distinct features. The experiments showed that the best results were obtained from the combination of character-based processing, diction...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013