Transliteration Using a Network of Phoneme Chunks

نویسندگان

  • In-Ho Kang
  • Gil-Chang Kim
چکیده

In this paper, we present methods of transliteration and back-transliteration. In Korean technical documents and web documents, many English words and Japanese words are transliterated into Korean words. These transliterated words are usually technical terms and proper nouns, so it is hard to find them in a dictionary. Therefore an automatic transliteration system is needed. Previous transliteration models restrict an information length to two or three letters per letter. However, most transliteration phenomena cannot be explained with a single standard rule especially in Korean. Various rules such as the origin of a word and profession of users are applied to each transliteration. The restriction of information length may lose the discriminative information of each transliteration rule. In this paper, we propose the methods that find similar words which have the longest overlap with an input word. To find similar words without the loss of each transliteration rule, phoneme chunks that do not have a length limit are used. By merging phoneme chunks, an input word is transliterated. With our proposed method, we could get 86% character accuracy and 53% word accuracy in an English-to-Korean transliteration test.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks

We present in this paper the method of English-to-Korean(E-K) transliteration and back-transliteration. In Korean technical documents, many English words are transliterated into Korean words in various forms in diverse ways. As English words and Korean transliterations are usually technical terms and proper nouns, it is hard to nd a transliteration and its variations in a dictionary. Therefore ...

متن کامل

Phoneme-based Statistical Transliteration of Foreign Names for OOV Problem

Given a source language term, machine transliteration is to automatically generate the phonetic equivalents in a target language. It is useful in many cross language applications. Recently, there are increasing concerns about automatic transliteration, especially with languages with significant distinctions in their phonetic representations, e.g. English and Chinese. Despite many cross-language...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information

Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on ei...

متن کامل

Learning Phoneme Mappings for Transliteration without Parallel Data

We present a method for performing machine transliteration without any parallel resources. We frame the transliteration task as a decipherment problem and show that it is possible to learn cross-language phoneme mapping tables using only monolingual resources. We compare various methods and evaluate their accuracies on a standard name transliteration task.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Comput. Proc. Oriental Lang.

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2005