persian parallel corpus

A Parallel Corpus of Translationese

2016

Ella Rabinovich Shuly Wintner Ofek Luis Lewinsohn

We describe a set of bilingual English–French and English–German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) tr...

متن کامل

Lithuanian-Latvian-Lithuanian Parallel Corpus

2012

Andrius Utka Kristine Levane-Petrova Agne Bielinskiene Jolanta Kovalevskaite Erika Rimkute Daira Vevere

The goal of the paper is to present different problems related to the building of Parallel Corpus for two small languages, namely, Latvian and Lithuanian. The Lithuanian-Latvian-Lithuania Parallel Corpus (LILA) will contain 8 million running words; will be bidirectional, aligned on the sentence level. The problems include identifying, acquiring, preparing, and aligning parallel texts.

متن کامل

Multilingwis – Explore Your Parallel Corpus

2017

Johannes Graën Dominique Sandoz Martin Volk

We present Multilingwis2, a web based search engine for exploration of word-aligned parallel and multiparallel corpora. Our application extends the search facilities by Clematide et al. (2016) and is designed to be easily employable on any parallel corpus comprising universal part-of-speech tags, lemmas and word alignments. In addition to corpus exploration, it has proven useful for the assessm...

متن کامل

TweetMT: A Parallel Microblog Corpus

2016

Iñaki San Vicente Iñaki Alegria Cristina España-Bonet Pablo Gamallo Hugo Gonçalo Oliveira Eva Martínez Garcia Antonio Toral Arkaitz Zubiaga Nora Aranberri

We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of m...

متن کامل

Parallel Corpus Development at NVTC

2010

Jocelyn Phillips Carol Van Ess-Dykema Timothy Allison Laurie Gerber

In this paper, we describe the methods used to develop an exchangeable translation memory bank of sentence-aligned Mandarin Chinese English sentences. This effort is part of a larger effort, initiated by the National Virtual Translation Center (NVTC), to foster collaboration and sharing of translation memory banks across the Intelligence Community and the Department of Defense. In this paper, w...

متن کامل

Bulgarian X-language Parallel Corpus

2012

Svetla Koeva Ivelina Stoyanova Rositsa Dekova Borislav Rizov Angel Genov

The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent ...

متن کامل

ASPEC: Asian Scientific Paper Excerpt Corpus

2016

Toshiaki Nakazawa Manabu Yaguchi Kiyotaka Uchimoto Masao Utiyama Eiichiro Sumita Sadao Kurohashi Hitoshi Isahara

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific pape...

متن کامل

Chinese-English Parallel Corpus Construction and its Application

2004

Baobao Chang

Chinese-English parallel corpora are key resources for Chinese-English cross-language information processing, Chinese-English bilingual lexicography, Chinese-English language research and teaching. But so far large-scale Chinese-English corpus is still unavailable yet, given the difficulties and the intensive labours required. In this paper, our work towards building a large-scale Chinese-Engli...

متن کامل

translation of cultural elements from persian into french: analyzing the translation of cultural elements in “guest of mom” by houshang moradi kermani

Journal: :مطالعات زبان و ترجمه 0

مرضیه اطهاری نیک عزم مینا بلوکات

in this article, we have intended to analyze the translation of cultural elements from persian into french by relying on a corpus that addresses directly the iranian culture: “guest of mom”, by houshang moradi kermani, translated by maribel bahia. we have chosen this corpus because we could find a lot of instances of humanitarian love and gentleness as reflected in the portrayal of iranian char...

متن کامل

A Probabilistic Translation Method for Dictionary-based Cross-lingual Information Retrieval in Agglutinative Languages

Journal: :CoRR 2014

Javid Dadashkarimi Azadeh Shakery Heshaam Faili

Translation ambiguity, out of vocabulary words and missing some translations in bilingual dictionaries make dictionary-based Crosslanguage Information Retrieval (CLIR) a challenging task. Moreover, in agglutinative languages which do not have reliable stemmers, missing various lexical formations in bilingual dictionaries degrades CLIR performance. This paper aims to introduce a probabilistic tr...

متن کامل