Machine Translation between Language Stages: Extracting Historical Grammar from a Parallel Diachronic Corpus of Polish

نویسنده

  • Amir Zeldes
چکیده

This paper explores methods for the extrapolation of correspondences in a small parallel diachronic corpus taken from the Modern and Middle Polish Bible, in an attempt to answer the question “can historical grammar and lexica be derived directly from a corpus?” The problem of extracting this data is approached from a machine translation point of view: by envisioning texts from different periods as language models for their respective language stages, and historical grammar as a translation model mapping one language stage onto another. This notion is explored using automatic extraction of morphological, lexical and syntactic correspondences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model

In hierarchical phrase-based machine translation, a rule table is automatically learned by heuristically extracting synchronous rules from a parallel corpus. As a result, spuriously many rules are extracted which may be composed of various incorrect rules. The larger rule table incurs more disk and memory resources, and sometimes results in lower translation quality. To resolve the problems, we...

متن کامل

An Alignment Based Technique for Text Translation between Traditional Chinese and Simplified Chinese

Aligned parallel corpora have proved very useful in many natural language processing tasks, including statistical machine translation and word sense disambiguation. In this paper, we describe an alignment technique for extracting transfer mapping from the parallel corpus. During building our system and data collection, we observe that there are three types of translation approaches can be used....

متن کامل

Polish - English Speech Statistical Machine Translation Systems for the IWSLT 2014

This research explores effects of various training settings between Polish and English Statistical Machine Translation systems for spoken language. Various elements of the TED parallel text corpora for the IWSLT 2014 evaluation campaign were used as the basis for training of language models, and for development, tuning and testing of the translation system as well as Wikipedia based comparable ...

متن کامل

Polish - English Speech Statistical Machine Translation Systems for the IWSLT 2013

This research explores the effects of various training settings from Polish to English Statistical Machine Translation system for spoken language. Various elements of the TED parallel text corpora for the IWSLT 2013 evaluation campaign were used as the basis for training of language models, and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR and TER metrics...

متن کامل

Polish to English Statistical Machine Translation

This research explores the effects of various training settings on a Polish to English Statistical Machine Translation system for spoken language. Various elements of the TED, Europarl, and OPUS parallel text corpora were used as the basis for training of language models, for development, tuning and testing of the translation system. The BLEU, NIST, METEOR and TER metrics were used to evaluate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007