نتایج جستجو برای: arabic text classification

تعداد نتایج: 727070  

2007
Dominique Estival Tanja Gaustad Son Bao Pham Will Radford Ben Hutchinson

This paper reports on the application of the Text Attribution Tool (TAT) to profiling the authors of Arabic emails. The TAT system has been developed for the purpose of language-independent author profiling and has now been trained on two email corpora, English and Arabic. We describe the overall TAT system and the Machine Learning experiments resulting in classifiers for the different author t...

2015
Wajdi Zaghouani Nizar Habash Houda Bouamor Alla Rozovskaya Behrang Mohit Abeer Heider Kemal Oflazer

We present our correction annotation guidelines to create a manually corrected nonnative (L2) Arabic corpus. We develop our approach by extending an L1 large-scale Arabic corpus and its manual corrections, to include manually corrected non-native Arabic learner essays. Our overarching goal is to use the annotated corpus to develop components for automatic detection and correction of language er...

2009
Dina A. Said Nayer M. Wanas Nevin M. Darwish Nadia H. Hegazy

Text preprocessing is an essential stage in text categorization (TC) particularly and text mining generally. Morphological tools can be used in text preprocessing to reduce multiple forms of the word to one form. There has been a debate among researchers about the benefits of using morphological tools in TC. Studies in the English language illustrated that performing stemming during the preproc...

2012
Fouad Slimane Slim Kanoun Jean Hennebert Rolf Ingold Adel M. Alimi

This chapter presents a new benchmarking strategy for Arabic screenbased word recognition. Firstly, we report on the creation of the new APTI (Arabic Printed Text Image) database. This database is a large-scale benchmarking of open-vocabulary, multi-font, multi-size and multi-style word recognition systems in Arabic. Such systems take as input a text image and compute as output a character stri...

2012
Alex Zhicharevich Nachum Dershowitz

Language classification is a preliminary step for most natural-language related processes. The significant quantity of multilingual documents poses a problem for traditional language-classification schemes and requires segmentation of the document to monolingual sections. This phenomenon is characteristic of classical and medieval Jewish literature, which frequently mixes Hebrew, Aramaic, Judeo...

Journal: :Int. Arab J. Inf. Technol. 2014
Baraa T. Sharef Nazlia Omar Zeyad T. Sharef

Compared to other languages, there is still a limited body of research which has been conducted for the automated Arabic Text Categorization (TC) due to the complex and rich nature of the Arabic language. Most of such research includes supervised Machine Learning (ML) approaches such as Naïve Bayes (NB), K-Nearest Neighbour (KNN), Support Vector Machine and Decision Tree. Most of these techniqu...

Journal: :JCS 2014
Nidal Yousef Aymen M. Abu-Errub Ashraf Odeh Hayel Khafajeh

Arabic language is distinguished by its morphological richness, which forces the workers in the field of Arabic language Processing (i.e., information retrieval, document’s classification, text summarizing) to deal with many words that seem to be different but in reality they came from an identical root word. One of the methods to overcome this problem is to return the words to their roots. Thi...

2004
Mona Diab Kadri Hacioglu Daniel Jurafsky

To date, there are no fully automated systems addressing the community’s need for fundamental language processing tools for Arabic text. In this paper, we present a Support Vector Machine (SVM) based approach to automatically tokenize (segmenting off clitics), part-ofspeech (POS) tag and annotate base phrases (BPs) in Arabic text. We adapt highly accurate tools that have been developed for Engl...

2005
Waleed Al-Sanie Ameur Touir Hassan Mathkour

Text summarization based on rhetorical structure theory has shown extremely interesting result. The process of extracting the text summary from the result of the rhetorical parser is not a singleton. Different rhetorical structure trees are generated from one text. Unfortunately, the result of the generated summary is not equivalent for those trees, and the correctness of the result is affected...

2014
Kareem Darwish

Arabizi is Arabic text that is written using Latin characters. Arabizi is used to present both Modern Standard Arabic (MSA) or Arabic dialects. It is commonly used in informal settings such as social networking sites and is often with mixed with English. In this paper we address the problems of: identifying Arabizi in text and converting it to Arabic characters. We used word and sequence-level ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید