persian parallel corpus

MIZAN: A Large Persian-English Parallel Corpus

Journal: :CoRR 2018

Omid Kashefi

One of the most major and essential tasks in natural language processing is machine translation that is now highly dependent upon multilingual parallel corpora. Through this paper, we introduce the biggest Persian-English parallel corpus with more than one million sentence pairs collected from masterpieces of literature. We also present acquisition process and statistics of the corpus, and expe...

متن کامل

strategies used in the translation of interlingual subtitling

Journal: :journal of english studies 2011

farid ghaemi janin benyamin

this study was an attempt to identify the interlingual strategies employed to translate english subtitles into persian and to determine their frequency, as well. contrary to many countries, subtitling is a new field in iran. the study, a corpus-based, comparative, descriptive, non-judgmental analysis of an english-persian parallel corpus, comprised english audio scripts of five movies of differ...

متن کامل

comparing k-means clusters on parallel persian-english corpus

Journal: :journal of ai and data mining 2015

a. khazaei m. ghasemzadeh

this paper compares clusters of aligned persian and english texts obtained from k-means method. text clustering has many applications in various fields of natural language processing. so far, much english documents clustering research has been accomplished. now this question arises, are the results of them extendable to other languages? since the goal of document clustering is grouping of docum...

متن کامل

Strategies Used in the Translation of Interlingual Subtitling

Journal: Journal oF English Studies 2011

Farid Ghaemi Janin Benyamin

This study was an attempt to identify the interlingual strategies employed to translate English subtitles into Persian and to determine their frequency, as well. Contrary to many countries, subtitling is a new field in Iran. The study, a corpus-based, comparative, descriptive, non-judgmental analysis of an English-Persian parallel corpus, comprised English audio scripts of five movies of differ...

متن کامل

PEN: Parallel English-Persian News Corpus

2011

Mohammad Amin Farajian

Parallel corpora are the necessary resources in many multilingual natural language processing applications, including machine translation and cross-lingual information retrieval. Manual preparation of a large scale parallel corpus is a very time consuming and costly procedure. In this paper, the work towards building a sentence-level aligned EnglishPersian corpus in a semi-automated manner is p...

متن کامل

Comparing k-means clusters on parallel Persian-English corpus

Journal: Journal of Artificial Intelligence and Data Mining 2015

A. Khazaei, M. Ghasemzadeh,

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

مقایسه متون ترجمه شده و متو ن اصلی: آزمون فرضیه ساده سازی در ترجمه متون فنی همسان

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه شیخ بهایی - دانشکده زبانهای خارجی 1391

نرجس طبیبی, محمدرضا طالبی نژاد,

simplification universal as a universal feature of translation means translated texts tend to use simpler language than original texts in the same language and it can be critically investigated through common concepts: type/token ratio, lexical density, and mean sentence length. although steps have been taken to test this hypothesis in various text types in different linguistic communities, in ...

15 صفحه اول

Feasibility of Automatically Bootstrapping a Persian WordNet

2010

Chris Irwin Davis Dan I. Moldovan

In this paper we describe a proof-of-concept for the bootstrapping of a Persian WordNet. This effort was motivated by previous work done at Stanford University on bootstrapping an Arabic WordNet using a parallel corpus and an English WordNet. The principle of that work is based on the premise that paradigmatic relations are by nature deeply semantic, and as such, are likely to remain intact bet...

متن کامل

A Probabilistic Approach to Persian Ezafe Recognition

2014

Habibollah Asghari Jalal Maleki Heshaam Faili

In this paper, we investigate the problem of Ezafe recognition in Persian language. Ezafe is an unstressed vowel that is usually not written, but is intelligently recognized and pronounced by human. Ezafe marker can be placed into noun phrases, adjective phrases and some prepositional phrases linking the head and modifiers. Ezafe recognition in Persian is indeed a homograph disambiguation probl...

متن کامل

Extracting an English-Persian Parallel Corpus from Comparable Corpora

Journal: :CoRR 2017

Akbar Karimi Ebrahim Ansari Bahram Sadeghi Bigham

Parallel data are an important part of a reliable Statistical Machine Translation (SMT) system. The more of these data are available, the better the quality of the SMT system. However, for some language pairs such as Persian-English, parallel sources of this kind are scarce. In this paper, a bidirectional method is proposed to extract parallel sentences from English and Persian document aligned...

متن کامل