Extracting Invertible Translations from pre aligned Texts

نویسنده

  • Michael Carl
چکیده

This paper presents an approach to extract invert ible translations from pre aligned bilingual texts The extracted set of invertible translations is unam biuous because each string occurs only once in either language side Two variants of the algorithms are presented using di erent knowledge resources The knowledge rich variant of the algorithm makes use of a bilingual lexicon in addition to a morphological analyser and a shallow syntax formalism which are similarly used in the knowledge poor algorithm It is shown that the knowledge rich method yields better results than the knowledge poor method

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inducing probabilistic invertible translation grammars from aligned texts

This paper presents an algorithm for extracting invertible proba-bilistic translation grammars from bilingual aligned and linguistically bracketed text. The invertibility condition requires all translation ambiguities to be resolved in the-nal translation grammar. The paper examines the complexity of inducing translation grammars and proposes a number of heuristics to reduce the the theoretical...

متن کامل

ParaConc: Concordance Software for Multilingual Parallel Corpora

Parallel concordance software provides a general purpose tool that permits a wide range of investigations of translated texts, from the analysis of bilingual terminology and phraseology to the study of alternative translations of a single text. This paper outlines the main features of a Windows concordancer, ParaConc, focussing on alignment of parallel (translated) texts, general search procedu...

متن کامل

An Approach to Acquire Word Translations from Non-parallel Texts

Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for non-parallel texts has been around 72% up to now...

متن کامل

Translation as Annotation

In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a text can be seen as a linguistic annotation task which is easier than manual annotation with formal schemes. After translation, formal annotations can be automatically derived...

متن کامل

Mining Parenthetical Translations for Polish-English Lexica

Documents written in languages other than English sometimes include parenthetical English translations, usually for technical and scienti c terminology. Techniques had been developed for extracting such translations (as well as transliterations) from large Chinese text corpora. This paper presents methods for mining parenthetical translation in Polish texts. The main di erence between translati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010