نتایج جستجو برای: text documents classification

تعداد نتایج: 694633  

2014
Said Bahassine Mohamed Kissi Abdellah Madani

In this paper we conduct a comparative study between two stemming algorithms: khoja stemmer and our new stemmer for Arabic text classification (categorization), using Chisquare statistics as feature selection and focusing on decision tree classifier. Evaluation used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, middle east...

2017
Graham McDonald Craig MacDonald Iadh Ounis

Government documents must be reviewed to identify any sensitive information they may contain, before they can be released to the public. However, traditional paper-based sensitivity review processes are not practical for reviewing born-digital documents. Therefore, there is a timely need for automatic sensitivity classification techniques, to assist the digital sensitivity review process. Howev...

2014
Sadiq Sani

The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, fail...

Journal: :Intell. Data Anal. 2010
Dong Li Anne Laurent Pascal Poncelet Mathieu Roche

Sentiment classification in text documents is an active data mining research topic in opinion retrieval and analysis. Different from previous studies concentrating on the development of effective classifiers, in this paper, we focus on the extraction and validation of unexpected sentences issued in sentiment classification. In this paper, we propose a general framework for determining unexpecte...

2010
Horacio Saggion Adam Funk

We describe a set of tools, resources, and experiments for opinion classification in business-related datasources in two languages. In particular we concentrate on SentiWordNet text interpretation to produce word, sentence, and text-based sentiment features for opinion classification. We achieve good results in experiments using supervised learning machine over syntactic and sentiment-based fea...

2018
Jiangning Chen John Dever Rundong Du

We define a new centroid estimator for text classification based on the KLdivergence of the classes. The score favors documents that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on several standard data sets indicate that the new method outperforms better than traditional Naive Bayes classifier, especially ...

1999
Andrew McCallum Kamal Nigam

When applying text classification to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance. This paper presents an alternative approach to text classification that requires no labeled documentsi instead, it uses a small set of keywords per class, a class hierarchy and a large quantity of easilyobtained unlabeled documents. The...

2016
Pradnya Kumbhar Manisha Mali

The rapid growth of World Wide Web has led to explosive growth of information. As most of information is stored in the form of texts, text mining has gained paramount importance. With the high availability of information from diverse sources, the task of automatic categorization of documents has become a vital method for managing, organizing vast amount of information and knowledge discovery. T...

2008
SALAMA BROOK

This paper presents a novel holistic technique for classifying and retrieving Arabic handwritten text documents. The retrieval of Arabic handwritten documents is performed in several steps. First, the Arabic handwritten document images are segmented into words, and then each word is segmented into its connected parts. Second, several features are extracted from these connected parts and then co...

Journal: :CoRR 2010
Yasir Safeer Atika Mustafa Anis Noor Ali

With the advancement of technology and reduced storage costs, individuals and organizations are tending towards the usage of electronic media for storing textual information and documents. It is time consuming for readers to retrieve relevant information from unstructured document collection. It is easier and less time consuming to find documents from a large collection when the collection is o...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید