persian parallel corpus

SentiPers: A Sentiment Analysis Corpus for Persian

Journal: :CoRR 2018

Pedram Hosseini Ali Ahmadian Ramaki Hassan Maleki Mansoureh Anvari Seyed Abolghasem Mirroshandel

Sentiment Analysis (SA) is a major field of study in natural language processing, computational linguistics and information retrieval. Interest in SA has been constantly growing in both academia and industry over the recent years. Moreover, there is an increasing need for generating appropriate resources and datasets in particular for low resource languages including Persian. These datasets pla...

متن کامل

Automatic Term Extraction for Cross-Language Information Retrieval Using a Bilingual Parallel Corpus

2008

Tayebeh Mosavi Miangah

Information retrieval is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval refers to a kind of information retriev/al in which the language of the query and that of searched document are different. This paper tries to construct a bilingual lexicon from an English...

متن کامل

Improving Persian-English Statistical Machine Translation:Experiments in Domain Adaptation

2011

Mahsa Mohaghegh Abdolhossein Sarrafzadeh

This paper documents recent work carried out for PeEn-SMT, our Statistical Machine Translation system for translation between the English-Persian language pair. We give details of our previous SMT system, and present our current development of significantly larger corpora. We explain how recent tests using much larger corpora helped to evaluate problems in parallel corpus alignment, corpus cont...

متن کامل

happiness conceptual metaphors in persian: a cognitive corpus-driven approach

Journal: :زبان شناسی و گویش های خراسان 0

مهدیس زورورز آزیتا افراشی سید مصطفی عاصی

this study investigates the conceptual metaphors of happiness in a representative corpus of modern persian. making use of persian linguistic database, we sampled a corpus of contemporary written texts, to represent modern colloquial persian; then we tried to extract the relevant conceptual metaphors of happiness. the sample corpus contains 14 texts written by contemporary iranian writers. analy...

متن کامل

Creation of comparable corpora for English-Urdu, Arabic, Persian

2016

Murad Abouammoh Kashif Shah Ahmet Aker

Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, in the case of under-resourced languages or some specific domains, parallel corpora are not readily available. This leads to under-performing machine translation systems in those sparse data settings. To overcome the low availability of parallel resources the machine translation community has rec...

متن کامل

Unsupervised Discovery of Persian Morphemes

2006

Mohsen Arabsorkhi Mehrnoush Shamsfard

This paper reports the present results of a research on unsupervised Persian morpheme discovery. In this paper we present a method for discovering the morphemes of Persian language through automatic analysis of corpora. We utilized a Minimum Description Length (MDL) based algorithm with some improvements and applied it to Persian corpus. Our improvements include enhancing the cost function usin...

متن کامل

A’laam Corpus: A Standard Corpus of Named Entity for Persian Language

Journal: :Signal and Data Processing 2017

متن کامل

Corpus Hippocraticum 'on the sacred disease'.

Journal: :Bulletin of the Indian Institute of History of Medicine 1998

S K Majumdar

The Hippocratic Corpus was attributed to all branches of healing including internal medicine, surgery, and obstetrics. The Hippocratic collection of treatises (or corpus) was mostly written between 430 and 330 B.C. and some are later works. Some 600 years after Hippocrates, the Corpus were further systematized by Galen and later still by the Persian Islamic physician Avicenna and others. The Co...

متن کامل

A Morphological Lexicon for the Persian Language

2010

Benoît Sagot Géraldine Walther

We introduce PerLex, a large-coverage and freely-available morphological lexicon for the Persian language. We describe the main features of the Persian morphology, and the way we have represented it within the Alexina formalism, on which PerLex is based. We focus on the methodology we used for constructing lexical entries from various sources, as well as the problems related to typographic norm...

متن کامل

the first corpus-based persian academic word list:development and pedagogical implications

Journal: :پژوهشنامه آموزش زبان فارسی به غیر فارسی زبانان 0

رضا رضوانی استادیار آموزش زبان انگلیسی- دانشگاه یاسوج عباس قلتاش استادیار علوم تربیتی، واحد مرودشت، دانشگاه ازاد اسلامی، مرودشت گران ناز زمانی دانشجوی دکترای آموزش زبان انگلیسی - دانشگاه رازی

this article generates the first persian academic word list (pawl) which comprises the most frequently used academic vocabulary in persian academic texts. the pawl was compiled from a corpus of 927,008 running words from academic resources. two principles of range and frequency of word families guided the selection and arrangement of the word list. the corpus included seven books and one hundre...

متن کامل