نتایج جستجو برای: persian parallel corpus
تعداد نتایج: 300662 فیلتر نتایج به سال:
Sentiment Analysis (SA) is a major field of study in natural language processing, computational linguistics and information retrieval. Interest in SA has been constantly growing in both academia and industry over the recent years. Moreover, there is an increasing need for generating appropriate resources and datasets in particular for low resource languages including Persian. These datasets pla...
Automatic Term Extraction for Cross-Language Information Retrieval Using a Bilingual Parallel Corpus
Information retrieval is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval refers to a kind of information retriev/al in which the language of the query and that of searched document are different. This paper tries to construct a bilingual lexicon from an English...
This paper documents recent work carried out for PeEn-SMT, our Statistical Machine Translation system for translation between the English-Persian language pair. We give details of our previous SMT system, and present our current development of significantly larger corpora. We explain how recent tests using much larger corpora helped to evaluate problems in parallel corpus alignment, corpus cont...
this study investigates the conceptual metaphors of happiness in a representative corpus of modern persian. making use of persian linguistic database, we sampled a corpus of contemporary written texts, to represent modern colloquial persian; then we tried to extract the relevant conceptual metaphors of happiness. the sample corpus contains 14 texts written by contemporary iranian writers. analy...
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, in the case of under-resourced languages or some specific domains, parallel corpora are not readily available. This leads to under-performing machine translation systems in those sparse data settings. To overcome the low availability of parallel resources the machine translation community has rec...
This paper reports the present results of a research on unsupervised Persian morpheme discovery. In this paper we present a method for discovering the morphemes of Persian language through automatic analysis of corpora. We utilized a Minimum Description Length (MDL) based algorithm with some improvements and applied it to Persian corpus. Our improvements include enhancing the cost function usin...
The Hippocratic Corpus was attributed to all branches of healing including internal medicine, surgery, and obstetrics. The Hippocratic collection of treatises (or corpus) was mostly written between 430 and 330 B.C. and some are later works. Some 600 years after Hippocrates, the Corpus were further systematized by Galen and later still by the Persian Islamic physician Avicenna and others. The Co...
We introduce PerLex, a large-coverage and freely-available morphological lexicon for the Persian language. We describe the main features of the Persian morphology, and the way we have represented it within the Alexina formalism, on which PerLex is based. We focus on the methodology we used for constructing lexical entries from various sources, as well as the problems related to typographic norm...
this article generates the first persian academic word list (pawl) which comprises the most frequently used academic vocabulary in persian academic texts. the pawl was compiled from a corpus of 927,008 running words from academic resources. two principles of range and frequency of word families guided the selection and arrangement of the word list. the corpus included seven books and one hundre...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید