نتایج جستجو برای: persian parallel corpus

تعداد نتایج: 300662  

Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...

Journal: :CoRR 2014
Maryam Mahmoodi Mohammad Mahmoodi Varnamkhasti

Currently there are lots of plagiarism detection approaches. But few of them implemented and adapted for Persian languages. In this paper, our work on designing and implementation of a plagiarism detection system based on preprocessing and NLP technics will be described. And the results of testing on a corpus will be presented. Keywords— External Plagiarism, Plagiarism, Copy detection, natural ...

Journal: :TinyToCS 2015
Shervin Malmasi

Although widely-studied in recent years, Language Identification (LID) systems for determining the language of input texts often fail to discriminate between similar languages like Croatian-Serbian and Malay-Indonesian. This has brought attention to the task of discriminating similar languages, varieties and dialects – including a recent shared task [3]. Persian (also known as Farsi) and Dari (...

2012
Yadollah Yaghoobzadeh Gholamreza Ghassem-Sani Seyed Abolghasem Mirroshandel Mahbaneh Eshaghzadeh Torbati

Recognizing TimeML events and identifying their attributes, are important tasks in natural language processing (NLP). Several NLP applications like question answering, information retrieval, summarization, and temporal information extraction need to have some knowledge about events of the input documents. Existing methods developed for this task are restricted to limited number of languages, an...

2007
Nick Pendar Serge Sharoff

This paper reports on the compilation of a large Persian Web corpus and the cyclic supervised development of a lexicon and lemmatizer. We discuss the strategies adopted in compiling the corpus as well as some of the challenges in processing and tokenizing it. We also present the word patterns developed for the lemmatizer and the algorithms designed for the supervised lexical acquisition.

2016
Morteza Rezaei Sharifabadi Seyed Ahmad Eftekhari

In this paper we introduce Mahak Samim, a plagiarism detection corpus that consists of Persian academic texts in which plagiarism cases are embedded. This corpus, which can be used for evaluating plagiarism detection systems, consists of more than five thousand artificial plagiarism cases with various lengths and diverse degrees of obfuscation. The development process and the features of the co...

2009
Amir Hossein Jadidinejad Fariborz Mahmoudi Jon Dehdari

Persian is a challenging language in the field of NLP. Rightto-left orthography, complex morphology, complicated grammatical rules, and different forms of letters make it an interesting language for NLP research. In this paper we measure the effectiveness of a simple and efficient stemming algorithm, Perstem, on Persian information retrieval. Our experiments on the Hamshahri corpus at CLEF2009 ...

2017
Niloofar Ranjbar Fatemeh Mashhadirajab Mehrnoush Shamsfard Rayeheh Hosseini pour Aryan Vahid pour

In this paper, we describe our proposed method for measuring semantic similarity for a given pair of words at SemEval2017 monolingual semantic word similarity task. We use a combination of knowledge-based and corpus-based techniques. We use FarsNet, the Persian WordNet, besides deep learning techniques to extract the similarity of words. We evaluated our proposed approach on Persian (Farsi) tes...

Journal: :CoRR 2017
Ebrahim Ansari Mohammad Hadi Sadreddini Lucio Grandinetti Mehdi Sheikhalishahi

Ebrahim Ansari ([email protected]) et al. 2017. Extracting bilingual per-sian italian lexicon from comparable corpora using different types of seed dictionaries. In " Applications of Comparable Corpora " edited book Berlin Linguistic Press (ed.). Bilingual dictionaries are very important in various fields of natural language processing. In recent years, research on extracting new bilingual lex...

Journal: :Research in Computing Science 2014
Mohammad Iman Jamnejad Ali Heidarzadegan Mohsen Meshki

A thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which contains definitions and pronunciations. This paper proposes an innovative approach to improve the classification performance of Persian texts considering a very large thesaurus. The paper proposes a flexible method...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید