نتایج جستجو برای: text classification rocchio

تعداد نتایج: 641860  

2012
Shereen Albitar Sébastien Fournier Bernard Espinasse

This paper concerns supervised classification of text. Rocchio, the method we choose for its efficiency and extensibility, is tested on three reference corpora "20NewsGroups", "OHSUMED" and "Reuters", using several similarity measures. Analyzing statistical results, many limitations are identified and discussed. In order to overcome these limitations, this paper presents two main solutions: fir...

2012
Shereen Albitar Sébastien Fournier Bernard Espinasse

Aiming at more efficient search on the Internet, it seems adequate to deploy classification techniques using semantic resources in order to restrict this search to the user's domain of interest. In this work, we try to assess the impact of integrating semantic knowledge on text classification. This integration can be realized in different ways. The one we choose in this paper is text conceptual...

Journal: :Expert Syst. Appl. 2009
Duoqian Miao Qiguo Duan Hongyun Zhang Na Jiao

Automatic classification of text documents, one of essential techniques for Web mining, has always been a hot topic due to the explosive growth of digital documents available on-line. In text classification community, k-nearest neighbor (kNN) is a simple and yet effective classifier. However, as being a lazy learning method without premodelling, kNN has a high cost to classify new documents whe...

2006
Xuehai Zhang

Basic theory about text categorization and information retrieval is presented and several important algorithms for text classification are describe in details, such as the Rocchio Algorithm, TFIDF classifiers and Naïve Byes Algorithm, etc. An implementation based on Rocchio Algorithm is also discussed and evaluated. It shows that this method is reasonably efficient given fairly small training d...

2003
Alessandro Moschitti

Current trend in operational text categorization is the designing of fast classification tools. Several studies on improving accuracy of fast but less accurate classifiers have been recently carried out. In particular, enhanced versions of the Rocchio text classifier, characterized by high performance, have been proposed. However, even in these extended formulations the problem of tuning its pa...

2002
Roberto Basili Alessandro Moschitti Maria Teresa Pazienza

Recently, an original extension of the well-known Rocchio model (i.e. the Generalized Rocchio Formula ( )) as a feature weighting method for text classification has been presented. The assessment of such a model requires a statistically motivated parameter estimation method and wider empirical evidence. In this paper, three different corpora have been adopted in two languages. Results suggest t...

Journal: :I. J. Comput. Appl. 2009
Tarek F. Gharib Mena B. Habib Zaki T. Fayed

Text classification (TC) is the process of classifying documents into a predefined set of categories based on their content. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In this paper we applied the Support Vector Machines (SVM) model in classifying Arabic text documents. The results compared with the other traditional classifiers Baye...

2008
Andreas Heß Philipp Dopichaj Christian Maaß

We introduce a new stacking-like approach for multi-value classification. We apply this classification scheme using Naive Bayes, Rocchio and kNN classifiers on the well-known Reuters dataset. We use part-of-speech tagging for stopword removal. We show that our setup performs almost as well as other approaches that use the full article text even though we only classify headlines. Finally, we app...

1997
Thorsten Joachims

The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the Rocchio algorithm, particularly the word weighting scheme and the similarity metric. It also sug...

2007
Mark van Uden

Given a large amount of documents it is hard to find the documents that you need. These days most -if not allof these documents are available electronically. Information Retrieval (IR) systems help in finding the documents that satisfy the user’s information need. There are many techniques that are used by these IR systems. One of these techniques is learning classification. This technique uses...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید