Hybrid Algorithm for Word-Level Alignment of Parallel Texts
نویسندگان
چکیده
Given a text in two languages, word alignment task consists of identifying in the two variants of the text specific word occurrences that are mutual translations. The majority of existing text alignment systems follow either a linguistic or a statistical approach. We argue for that both approaches are insufficient when used separately, and suggest a flexible algorithm that combines statistical and linguistic techniques.
منابع مشابه
Building Bilingual Corpus based on Hybrid Approach for Myanmar-English Machine Translation
Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. In this paper, we describe an alignment system that aligns English-Myanmar texts at word level in parallel sentences. Essential for building parallel corpora is the alignment of translated segments with source segments. Since word alignment research on Myanmar and English languages ...
متن کاملA Hybrid Approach to Align Sentences and Words in English-Hindi Parallel Corpora
In this paper we describe an alignment system that aligns English-Hindi texts at the sentence and word level in parallel corpora. We describe a simple sentence length approach to sentence alignment and a hybrid, multi-feature approach to perform word alignment. We use regression techniques in order to learn parameters which characterise the relationship between the lengths of two sentences in p...
متن کاملAligning Words in English-Hindi Parallel Corpora
In this paper, we describe a word alignment algorithm for English-Hindi parallel data. The system was developed to participate in the shared task on word alignment for languages with scarce resources at the ACL 2005 workshop, on “Building and using parallel texts: data driven machine translation and beyond”. Our word alignment algorithm is based on a hybrid method which performs local word grou...
متن کاملIdentifying Word Translations in Non-Parallel Texts
Common algorithms for sentence and word-alignment allow the automatic identification of word translations from parallel texts. This study suggests that the identification of word translations should also be possible with non-parallel and even unrelated texts. The method proposed is based on the assumption that there is a correlation between the patterns of word cooccurrences in texts of differe...
متن کاملA Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts
We present an algorithm for bilingual word alignment that extends previous work by treating multi-word candidates on a par with single words, and combining some simple assumptions about the translation process to capture alignments for low frequency words. As most other alignment algorithms it uses cooccurrence statistics as a basis, but differs in the assumptions it makes about the translation...
متن کامل