نتایج جستجو برای: arabic text classification
تعداد نتایج: 727070 فیلتر نتایج به سال:
The first step in any NLP pipeline is to split the text into individual tokens. most obvious and straightforward approach use words as However, given a large corpus, representing all not efficient terms of vocabulary size. In literature, many tokenization algorithms have emerged tackle this problem by creating subwords, which turn limits size corpus. Most techniques are language-agnostic, i.e.,...
Arabic script is the third most widely used writing system after Latin and Chinese, but research in Arabic Optical Character Recognition (OCR) is still nascent in comparison to Latin script. Arabic script is inherently cursive in nature, therefore techniques developed for other scripts are generally inappropriate for Arabic. In this paper we present recent progress in the field of Handwritten A...
Arabic Text categorization is considered one of the severe problems in classification using machine learning algorithms. Achieving high accuracy in Arabic text categorization depends on the preprocessing techniques used to prepare the data set. Thus, in this paper, an investigation of the impact of the preprocessing methods concerning the performance of three machine learning algorithms, namely...
In this paper we conduct a comparative study between two stemming algorithms: khoja stemmer and our new stemmer for Arabic text classification (categorization), using Chisquare statistics as feature selection and focusing on decision tree classifier. Evaluation used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, middle east...
Vast volumes of digital video data are generated recently in our daily life. One of the most challenging problems is classifying and retrieving the desired information from huge collections of digital video. Consequently, the closed caption text has been utilized as an alternative to enhance the video retrieval and classification. Some systems are designed based on English closed caption howeve...
This paper proposes an efficient, Chi-Square-based, feature selection method for Arabic text classification. In Data Mining, feature selection is a preprocessing step that can improve the classification performance. Although few works have studied the effect of feature selection methods on Arabic text classification, limited number of methods was compared. Furthermore, different datasets were u...
A massive amount of documents are being posted online every minute. The task of document classification requires extensive background work on the content of documents, where keyword-based matching alone may not be sufficient. Much research has been carried out in several languages that has revealed significant results. However, Arabic documents still pose a great challenge due to the nature of ...
In this paper, the authors present latent topic model to index and represent the Arabic text documents reflecting more semantics. Text representation in a language with high inflectional morphology such as Arabic is not a trivial task and requires some special treatments. The authors describe our approach for analyzing and preprocessing Arabic text then we describe the stemming process. Finally...
Corresponding Author: Suhad A. Yousif Department of Mathematics and Computer Science, Faculty of Science, Beirut Arab University, Lebanon Email: [email protected] Abstract: Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information (written in Arabic language) on the web. In most text classification processes, featu...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید