Machine Translation between Language Stages: Extracting Historical Grammar from a Parallel Diachronic Corpus of Polish
نویسنده
چکیده
This paper explores methods for the extrapolation of correspondences in a small parallel diachronic corpus taken from the Modern and Middle Polish Bible, in an attempt to answer the question “can historical grammar and lexica be derived directly from a corpus?” The problem of extracting this data is approached from a machine translation point of view: by envisioning texts from different periods as language models for their respective language stages, and historical grammar as a translation model mapping one language stage onto another. This notion is explored using automatic extraction of morphological, lexical and syntactic correspondences.
منابع مشابه
Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model
In hierarchical phrase-based machine translation, a rule table is automatically learned by heuristically extracting synchronous rules from a parallel corpus. As a result, spuriously many rules are extracted which may be composed of various incorrect rules. The larger rule table incurs more disk and memory resources, and sometimes results in lower translation quality. To resolve the problems, we...
متن کاملAn Alignment Based Technique for Text Translation between Traditional Chinese and Simplified Chinese
Aligned parallel corpora have proved very useful in many natural language processing tasks, including statistical machine translation and word sense disambiguation. In this paper, we describe an alignment technique for extracting transfer mapping from the parallel corpus. During building our system and data collection, we observe that there are three types of translation approaches can be used....
متن کاملPolish - English Speech Statistical Machine Translation Systems for the IWSLT 2014
This research explores effects of various training settings between Polish and English Statistical Machine Translation systems for spoken language. Various elements of the TED parallel text corpora for the IWSLT 2014 evaluation campaign were used as the basis for training of language models, and for development, tuning and testing of the translation system as well as Wikipedia based comparable ...
متن کاملPolish - English Speech Statistical Machine Translation Systems for the IWSLT 2013
This research explores the effects of various training settings from Polish to English Statistical Machine Translation system for spoken language. Various elements of the TED parallel text corpora for the IWSLT 2013 evaluation campaign were used as the basis for training of language models, and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR and TER metrics...
متن کاملPolish to English Statistical Machine Translation
This research explores the effects of various training settings on a Polish to English Statistical Machine Translation system for spoken language. Various elements of the TED, Europarl, and OPUS parallel text corpora were used as the basis for training of language models, for development, tuning and testing of the translation system. The BLEU, NIST, METEOR and TER metrics were used to evaluate ...
متن کامل