Bilingual Unknown Word Alignment Tool for English-Thai

نویسندگان

  • Nithiwat Kampanya
  • Asanee Kawtrakul
  • Mukda Suktarachan
چکیده

This paper presents a bilingual, English and Thai, unknown word alignment tools by using techniques, which are based on global and local characteristics of each word in parallel texts. Distribution and location of words in texts are analyzed generating candidate Thai unknown words with respect to each of English unknown word. Overall accuracy of the unknown word alignment is 90.32% on 6,000 bilingual English-Thai corpora. However, the average 4.5 candidate Thai unknown words per one English unknown word can greatly reduce the time if linguists do the same work manually.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework of 2-step Bilingual Alignment for SMT: in Case Study of Thai-English Translation

This paper presents a framework of a new word alignment process that can be used in an SMT development. The method was designed to include the quality of using dictionary as prior knowledge and the ability of co-occurrence to fill unknown words. The alignment method is split into two separated steps: firstly, the dictionary-based step to guarantee the accurate wordaligning and secondly, co-occu...

متن کامل

An Integrated Tool for Translation-Memory Maintenance

This paper presents an integrated tool to construct and maintain translation-memory for memory-based machine translation. This tool was aimed to automate constructing and validating translation-memory both in word and in phrase levels from English-Thai parallel texts. To align English-Thai words and phrases, the crucial problems that must be resolved include multiple-word-expression boundary am...

متن کامل

Improvement of Statistical Machine Translation using Charater-Based Segmentationwith Monolingual and Bilingual Information

We present a novel segmentation approach for Phrase-Based Statistical Machine Translation (PB-SMT) to languages where word boundaries are not obviously marked by using both monolingual and bilingual information and demonstrate that (1) unsegmented corpus is able to provide the nearly identical result compares to manually segmented corpus in PB-SMT task when a good heuristic character clustering...

متن کامل

Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction

This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually trans...

متن کامل

Constraints on Tone Sensitivity in Novel Word Learning by Monolingual and Bilingual Infants: Tone Properties Are More Influential than Tone Familiarity

This study compared tone sensitivity in monolingual and bilingual infants in a novel word learning task. Tone language learning infants (Experiment 1, Mandarin monolingual; Experiment 2, Mandarin-English bilingual) were tested with Mandarin (native) or Thai (non-native) lexical tone pairs which contrasted static vs. dynamic (high vs. rising) tones or dynamic vs. dynamic (rising vs. falling) ton...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002