Two-Phase LMR-RC Tagging for Chinese Word Segmentation

نویسندگان

  • Tak Pang Lau
  • Irwin King
چکیده

In this paper we present a Two-Phase LMR-RC Tagging scheme to perform Chinese word segmentation. In the Regular Tagging phase, Chinese sentences are processed similar to the original LMR Tagging. Tagged sentences are then passed to the Correctional Tagging phase, in which the sentences are re-tagged using extra information from the first round tagging results. Two training methods, Separated Mode and Integrated Mode, are proposed to construct the models. Experimental results show that our scheme in Integrated Mode performs the best in terms of accuracy, where Separated Mode is more suitable under limited computational resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Word Segmentation as LMR Tagging

In this paper we present Chinese word segmentation algorithms based on the socalled LMR tagging. Our LMR taggers are implemented with the Maximum Entropy Markov Model and we then use Transformation-Based Learning to combine the results of the two LMR taggers that scan the input in opposite directions. Our system achieves F-scores of and on the Academia Sinica corpus and the Hong Kong City Unive...

متن کامل

Using Part-of-Speech Reranking to Improve Chinese Word Segmentation

Chinese word segmentation and Part-ofSpeech (POS) tagging have been commonly considered as two separated tasks. In this paper, we present a system that performs Chinese word segmentation and POS tagging simultaneously. We train a segmenter and a tagger model separately based on linear-chain Conditional Random Fields (CRF), using lexical, morphological and semantic features. We propose an approx...

متن کامل

Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese

Word Segmentation is usually considered an essential step for many Chinese and Japanese Natural Language Processing tasks, such as name tagging. This paper presents several new observations and analysis on the impact of word segmentation on name tagging; (1). Due to the limitation of current state-of-the-art Chinese word segmentation performance, a character-based name tagger can outperform its...

متن کامل

Effective Subsequence-based Tagging for Chinese Word Segmentation

Effective Subsequence-based Tagging for Chinese Word Segmentation Hai Zhao, Chunyu Kit (1. Department of Chinese, Translation and Linguistics, City University of Hong Kong, 83 Tat Avenue, Kowloon, Hong Kong SAR, China) Abstract: The research of automatic Chinese word segmentation has been advancing rapidly in recent years, especially since the First International Chinese Word Segmentation Bakeo...

متن کامل

Combining Character-Based and Subsequence-Based Tagging for Chinese Word Segmentation

Chinese word segmentation is the initial step for Chinese information processing. The performance of Chinese word segmentation has been greatly improved by character-based approaches in recent years. This approach treats Chinese word segmentation as a character-wordposition-tagging problem. With the help of powerful sequence tagging model, character-based method quickly rose as a mainstream tec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005