Key Problems in Conversion from Simplified to Traditional Chinese Characters

نویسندگان

  • Xiaodong Shi
  • Yidong Chen
  • Xiuping Huang
چکیده

In this paper we tackle the problem of character conversion from simplified Chinese to traditional Chinese. Of those simplified characters that need conversion, about 9.5% of them have more than 2 counterparts in the traditional scripts. We improve upon the previous log-linear approach first used in (Chen et al 2011) by utilizing more data sets and better translation models. We also show that automatic classification and noise reduction of corpus can achieve better performance. As a proof of the validity of our approach, we scored No. 1 in a recent evaluation of simplified to traditional character conversion systems organized by the Chinese Information Processing Society of China.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

基於對照表以及語言模型之簡繁字體轉換 (Chinese Characters Conversion System based on Lookup Table and Language Model) [In Chinese]

The character sets used in China and Taiwan are both Chinese, but they are divided into simplified and traditional Chinese characters. There are large amount of information exchange between China and Taiwan through books and Internet. To provide readers a convenient reading environment, the character conversion between simplified and traditional Chinese is necessary. The conversion between simp...

متن کامل

The perception of simplified and traditional Chinese characters in the eye of simplified and traditional Chinese readers

Expertise in Chinese character recognition is marked by analytic/reduced holistic processing (Hsiao & Cottrell, 2009), which depends mainly on readers’ writing rather than reading experience (Tso, Au, & Hsiao, 2011). Here we examined whether simplified and traditional Chinese readers process characters differently in terms of holistic processing. When processing characters that are distinctive ...

متن کامل

Chinese Characters Mapping Table of Japanese, Traditional Chinese and Simplified Chinese

Chinese characters are used both in Japanese and Chinese, which are called Kanji and Hanzi respectively. Chinese characters contain significant semantic information, a mapping table between Kanji and Hanzi can be very useful for many Japanese-Chinese bilingual applications, such as machine translation and cross-lingual information retrieval. Because Kanji characters are originated from ancient ...

متن کامل

An investigation Into Traditional Chinese Medicine Hospitals in China: Development Trend and Medical Service Innovation

Background This paper aims to investigate the development trend of traditional Chinese medicine (TCM) hospitals in China and explore their medical service innovations, with special reference to the changing co-existence with western medicine (WM) at TCM hospitals.   Methods Quantitative data at macro level was collected from official databases of China Health Statistical Yearbook and Extracts o...

متن کامل

Chinese Characters Conversion System based on Lookup Table and Language Model

The character sets used in China and Taiwan are both Chinese, but they are divided into simplified and traditional Chinese characters. There are large amount of 朝陽科技大學資訊工程系, Department of Computer Science and Information Engineering, Chaoyang University of Technology E-mail: {s9827608, shwu, s9927605}@cyut.edu.tw The author for corrrespondence is Shih-Hung Wu. 資訊工業策進會, Institute for Information...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013