Effective Transliteration
نویسندگان
چکیده
The translation of texts written in different languages is required in many domains, such as machine translation and cross-lingual information retrieval. Translating words of a text from a source language into a different target language can be efficiently achieved using a bilingual vocabulary, where every source word has a counterpart in the target language. In practice, however, there are often out-of-vocabulary (OOV) words that do not appear in the bilingual dictionary. This is particularly common for proper nouns such as company, people, place and product names. The goal of this research is to explore the problem of dealing with these words for Persian and related languages such as Urdu. The problem of OOV words has not previously been studied for Persian. We will investigate novel techniques making use of special features of this language for: transliteration; character alignment; and, OOV dictionary construction.
منابع مشابه
Comparative Evaluation of Spanish Segmentation Strategies for Spanish-Chinese Transliteration
This work presents a comparative evaluation among three different Spanish segmentation strategies for Spanish-Chinese transliteration. The transliteration task is implemented by means of Statistical Machine Translation, using Chinese characters and Spanish sub-word segments as the textual units to be translated. Three different Spanish segmentation strategies are evaluated: character-based, syl...
متن کاملAn ensemble of transliteration models for information retrieval
Transliteration is used to phonetically translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. Because transliterations are usually representative index terms for documents, proper handling of the transliterations is important for an effective information retrieval system....
متن کاملA Comparison of Different Machine Transliteration Models
Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models – grapheme-based translit...
متن کاملA Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora
Several models for transliteration pair acquisition have been proposed to overcome the out-of-vocabulary problem caused by transliterations. To date, however, there has been little literature regarding a framework that can accommodate several models at the same time. Moreover, there is little concern for validating acquired transliteration pairs using up-to-date corpora, such as web documents. ...
متن کاملJoint Generation of Transliterations from Multiple Representations
Machine transliteration is often referred to as phonetic translation. We show that transliterations incorporate information from both spelling and pronunciation, and propose an effective model for joint transliteration generation from both representations. We further generalize this model to include transliterations from other languages, and enhance it with reranking and lexicon features. We de...
متن کاملReranking with Multiple Features for Better Transliteration
Effective transliteration of proper names via grapheme conversion needs to find transliteration patterns in training data, and then generate optimized candidates for testing samples accordingly. However, the top-1 accuracy for the generated candidates cannot be good if the right one is not ranked at the top. To tackle this issue, we propose to rerank the output candidates for a better order usi...
متن کامل