A Chinese POS Decision Method Using Korean Translation Information

نویسندگان

  • Son-Il Kwak
  • O.-Chol Kown
  • Chang-Sin Kim
  • Yong-Il Pak
  • Gum-Chol Son
  • Chol-Jun Hwang
  • Hyon-Chol Kim
  • Hyok-Chol Sin
  • Gyong-Il Hyon
  • Sok-Min Han
چکیده

In this paper we propose a method that imitates a translation expert using the Korean translation information and analyse the performance. Korean is good at tagging than Chinese, so we can use this property in Chinese POS tagging. Keyword : machine translation, part of speech tagging, corpus Introduction Previous POS(Part Of Speech) tagging methods of Chinese can be largely classified into 2. One is a method using POS tagging rules that is extracted by Chinese experts, and the other is to u se a statistical model, it usually needs huge Chinese corpus. In case of use the first method, it is very difficult to extract rules and these rules can’t represent va rious language examples. So this method was used in the early years of POS tagging and has been used together with other methods. Other method, corpus-based statistical methods have many advantages, but it needs large amount of POS corpus. To solve these problems, some researchers are studying about a method that uses raw text data ext racted from internet[2, 5, 6]. In this paper we propose a POS decision method imitate Chinese translator whose native language is Korean and POS decision rule extracting method using Chinese –Korean bilingual corpus. 1. Availability of Korean translation information in Chinese POS decision Machine translation system is a typical expert system, so the closer translation method to translati on expert, the higher accuracy of the translation system. Translation expert whose mother tongue is Korean generate Korean words from Chinese words w hen he translates Chinese, and verifies their translation and finally complete the translation. For example, consider following sentence. 精密的观察是科学研究的基础(세밀한 관찰은 과학연구의 기초이다.) When the expert translates this sentence, expert generate Chinese words ‘科学研究’ from 《과학 연구》, and he knows these words are right words and 《과학은 연구한다》are not right through his experience. Classical method is to divide the sentence as 精密/的/观察/是/科学/研究/的/基础, and decide the POS of multi-POS word‘研究’using statistical information. This method largely depends on the amount of corpus and their fields. We can use Korean POS tagging system for this problem. Using the POS tagged words of 《세밀한 관찰은 과학연구의 기초이다.》, we can realize the n oun combination of 《과학연구》and it tells us the POS of Chinese word ‘研究’is noun. In conclusion using large amount of Korean POS tagged corpus, we can improve the performance of Chinese POS tagging system. 2. Acquisition method of POS decision rule from Chinese-Korean bilingual corpus. Above-mentioned method can be applied into noun noun conjunction, but it can not be applied in to verb decision and so on. Generally we use following rules for the POS decision of Chinese multi POS words. setpos(0,v) Acquisition of these rules generally needs many efforts and it is difficult to decide the confidence of rule. We can use Chinese-Korean bilingual corpus to extract statistical rule. Consider the acquisition method through a example. Chinese sentence: 我的朋友学习中国语 Translation: 나의 동무는 중국어를 배운다 For above example, do the word dividing for Chinese and do the POS tagging for Korean. 我/的/朋友/学习/中国语 나/N 의/T 동무/N 는/T 중국어/N 를/T 배우다/V ᄂ다/T In this sentence, morpheme analysis of 《배운다》is 《배우다/동사 + ᄂ다/맺음토》and verb t ranslation of 学习 is 《배우다》, so 学习 is used in verb. Chinese sentence Cs and Korean sentence Ks can be represented as follows.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments with POS-based restructuring and alignment-based reordering for statistical machine translation

This paper presents the methods which are based on the part-of-speech (POS) and auto alignment information to improve the quality of machine translation result and the word alignment. We utilize different types of POS tag to restructure source sentences and use an alignment-based reordering method to improve the alignment. After applying the reordering method, we use two phrase tables in the de...

متن کامل

Learning Patterns from the Web to Translate Named Entities for Cross Language Information Retrieval

Named entity (NE) translation plays an important role in many applications. In this paper, we focus on translating NEs from Korean to Chinese to improve Korean-Chinese cross-language information retrieval (KCIR). The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integr...

متن کامل

Korean-Chinese Person Name Translation for Cross Language Information Retrieval

Named entity translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating person names, the most common type of name entity in Korean-Chinese cross language information retrieval (KCIR). Unlike other languages, Chinese uses characters (ideographs), which makes person name translation difficult because one...

متن کامل

NTCIR-5 Chinese, English, Korean Cross Language Retrieval Experiments using PIRCS

In NTCIR-5 our focus is to see if web-assisted query expansion is useful, and to test an EnglishKorean bilingual dictionary. We participated in Chinese, Japanese, Korean and English monolingual retrieval using also web expansion for Chinese and English. We also performed Chinese-English, English-Chinese, English-Korean bilingual, and Chinese-Korean pivot bilingual CLIR. The query translation ap...

متن کامل

Korean-to-Chinese Word Translation using Chinese Character Knowledge

In this paper, we present a way of translating Korean words to Chinese using Chinese character information. A mapping table of Korean and Chinese characters is constructed and used to obtain possible combinations as translation candidates. The candidates are ranked by the combination score which accounts for the possibility of the character combination and context similarity score, which indica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.02435  شماره 

صفحات  -

تاریخ انتشار 2015