Verbal Valency in the MT Between Related Languages
نویسندگان
چکیده
The paper analyzes the differences in verbal valency frames between two related Slavic languages, Czech and Russian, with regard to their role in a machine translation system. The valency differences are a frequent source of translation errors. The results presented in the paper show that the number of substantially different valency frames is relatively low and that a bilingual valency dictionary containing only the differing valency frames can be used in an MT system in order to achieve a high precision of the translation of verbal valency.
منابع مشابه
Incorporation of a Valency Lexicon into a TectoMT Pipeline
In this paper, we focus on the incorporation of a valency lexicon into TectoMT system for Czech-Russian language pair. We demonstrate valency errors in MT output and describe how the introduction of a lexicon influenced the translation results. Though there was no impact on BLEU score, the manual inspection of concrete cases showed some improvement.
متن کاملVerb Argument Pairing in Czech-English Parallel Treebank
We describe CzEngVallex, a bilingual Czech-English valency lexicon which aligns verbal valency frames and their arguments. It is based on a parallel Czech-English corpus, the Prague Czech-English Dependency Treebank, where for each occurrence of a verb a reference to the underlying Czech and English valency lexicons is explicitly recorded. CzEngVallex lexicon pairs the entries (verb senses) of ...
متن کاملCzEngVallex: Mapping Valency between Languages
This report presents a guideline for building a resource connected with the project of interlinking Czech and English verbal translational equivalents, based on a parallel, richly annotated dependency treebank containing also valency and semantic roles, namely the parallel Prague CzechEnglish Dependency Treebank. One of the main aims of this project is to create a high-quality and relatively la...
متن کاملCzEngVallex: a Bilingual Czech-English Valency Lexicon
This paper introduces a new bilingual Czech-English verbal valency lexicon (called CzEngVallex) representing a relatively large empirical database. It includes 20,835 aligned valency frame pairs (i.e., verb senses which are translations of each other) and their aligned arguments. This new lexicon uses data from the Prague Czech-English Dependency Treebank and also takes advantage of the existin...
متن کاملThe need for MT-oriented versions of Case and Valency in MT
This paper looks at the use in machine Translation systems of the linguistic models of Case and Valency. It is argued that neither of these models was originally developed with this use in mind, and both must be adapted somewhat to meet this purpose. In particular, the traditional Valency distinction of complements and adjuncts leads to conflicts when valency frames in different languages are c...
متن کامل