Parallel Text Processing: Alignment of Indonesian to Javanese Language
نویسندگان
چکیده
Parallel text alignment is proposed as a way of aligning bahasa Indonesia to words in Javanese. Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s). Keywords—Parallel text alignment, phrase pair combination, edit distance coefficient, Javanese-Indonesian language.
منابع مشابه
IDENTIC Corpus: Morphologically Enriched Indonesian-English Parallel Corpus
This paper describes the creation process of an Indonesian-English parallel corpus (IDENTIC). The corpus contains 45,000 sentences collected from different sources in different genres. Several manual text preprocessing tasks, such as alignment and spelling correction, are applied to the corpus to assure its quality. We also apply language specific text processing such as tokenization on both si...
متن کاملThe Reflection of the Javanese Cultural Concepts in the Politeness of Javanese
Every language may have some entities which may not be owned by another language. The uniqueness of a language is strongly influenced by the culture of its native speakers. Therefore, languages vary cross-culturally. I strongly believe that the way the Javanese people (one of the Indonesian ethnic groups) express politeness is also influenced by the Javanese culture. This article tries to exami...
متن کاملProducing a Cross-Language Dictionary using Statistical Machine Translation: A First Experiment with English and Indonesian
Well-developed Statistical Machine Translation techniques now exist for carrying out word alignment in parallel corpora. A by-product of training data for this task is a set of translation probabilities for the correspondences between target and source tokens. In the literature, these techniques have relied on the use of bitexts of significant size; however, for many languages no such corpora e...
متن کاملHandling Indonesian Clitics: A Dataset Comparison for an Indonesian-English Statistical Machine Translation System
In this paper, we study the effect of incorporating morphological information on an Indonesian (id) to English (en) Statistical Machine Translation (SMT) system as part of a preprocessing module. The linguistic phenomenon that is being addressed here is Indonesian cliticized words. The approach is to transform the text by separating the correct clitics from a cliticized word to simplify the wor...
متن کاملThe C677T mutation in the methylenetetrahydrofolate reductase gene among the Indonesian Javanese population.
The presence of the C677T mutation in the methylenetetrahydrofolate reductase (MTHFR) gene has been regarded as a genetic risk factor for coronary artery diseases and neural tube defects. Although the prevalence of this mutation has been reported from various ethnic populations, few data concerning Indonesian populations are available. We have investigated the frequency of the mutation in 68 In...
متن کامل