Encoding a parallel corpus: The TRIS corpus experience

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Encoding a Parallel Corpus for Automatic Terminology Extraction

We present a status report about an ongoing research project in the field of (semi-)automatic terminology acquisition at the European Academy Bolzano. The main focus will be on encoding a text corpus, which serves as a basis for applying term extraction programq.

متن کامل

The SAWA Corpus: A Parallel Corpus English - Swahili

Research in data-driven methods for Machine Translation has greatly benefited from the increasing availability of parallel corpora. Processing the same text in two different languages yields useful information on how words and phrases are translated from a source language into a target language. To investigate this, a parallel corpus is typically aligned by linking linguistic tokens in the sour...

متن کامل

TweetMT: A Parallel Microblog Corpus

We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of m...

متن کامل

A Parallel Corpus of Translationese

We describe a set of bilingual English–French and English–German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bergen Language and Linguistics Studies

سال: 2013

ISSN: 1892-2449

DOI: 10.15845/bells.v3i1.362