نتایج جستجو برای: text documents classification
تعداد نتایج: 694633 فیلتر نتایج به سال:
In this paper we conduct a comparative study between two stemming algorithms: khoja stemmer and our new stemmer for Arabic text classification (categorization), using Chisquare statistics as feature selection and focusing on decision tree classifier. Evaluation used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, middle east...
Government documents must be reviewed to identify any sensitive information they may contain, before they can be released to the public. However, traditional paper-based sensitivity review processes are not practical for reviewing born-digital documents. Therefore, there is a timely need for automatic sensitivity classification techniques, to assist the digital sensitivity review process. Howev...
The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, fail...
Sentiment classification in text documents is an active data mining research topic in opinion retrieval and analysis. Different from previous studies concentrating on the development of effective classifiers, in this paper, we focus on the extraction and validation of unexpected sentences issued in sentiment classification. In this paper, we propose a general framework for determining unexpecte...
We describe a set of tools, resources, and experiments for opinion classification in business-related datasources in two languages. In particular we concentrate on SentiWordNet text interpretation to produce word, sentence, and text-based sentiment features for opinion classification. We achieve good results in experiments using supervised learning machine over syntactic and sentiment-based fea...
Centroid estimation based on symmetric KL di- vergence for Multinomial text classification prob- lem
We define a new centroid estimator for text classification based on the KLdivergence of the classes. The score favors documents that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on several standard data sets indicate that the new method outperforms better than traditional Naive Bayes classifier, especially ...
When applying text classification to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance. This paper presents an alternative approach to text classification that requires no labeled documentsi instead, it uses a small set of keywords per class, a class hierarchy and a large quantity of easilyobtained unlabeled documents. The...
The rapid growth of World Wide Web has led to explosive growth of information. As most of information is stored in the form of texts, text mining has gained paramount importance. With the high availability of information from diverse sources, the task of automatic categorization of documents has become a vital method for managing, organizing vast amount of information and knowledge discovery. T...
This paper presents a novel holistic technique for classifying and retrieving Arabic handwritten text documents. The retrieval of Arabic handwritten documents is performed in several steps. First, the Arabic handwritten document images are segmented into words, and then each word is segmented into its connected parts. Second, several features are extracted from these connected parts and then co...
With the advancement of technology and reduced storage costs, individuals and organizations are tending towards the usage of electronic media for storing textual information and documents. It is time consuming for readers to retrieve relevant information from unstructured document collection. It is easier and less time consuming to find documents from a large collection when the collection is o...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید