نتایج جستجو برای: text classification rocchio

تعداد نتایج: 641860  

1996
David D. Lewis Robert E. Schapire James P. Callan Ron Papka

Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classiiers. We propose that two machine learning algorithms, the Widrow-Hoo and EG algorithms, be used in training linear text classiiers. In contrast to most IR methods, theoretical analysis provides performance guarantees and guidance on parameter settings for these algorithms. Experimental data is p...

2001
Ali Hadjarian Jerzy W. Bala Peter W. Pachowicz

This paper introduces a multistrategy learning approach to the categorization of text documents. The approach benefits from two existing, and in our view complimentary, sets of categorization techniques: those based on Rocchio’s algorithm and those belonging to the rule learning class of machine learning algorithms. Visualization is used for the presentation of the output of learning.

Journal: :Int. J. Computational Intelligence Systems 2008
Shibin Zhou Kan Li Yushu Liu

In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACL...

2002
Kazuaki Kishida

Pseudo relevance feedback is empirically known as a useful method for enhancing retrieval performance. For example, we can apply the Rocchio method, which is well-known relevance feedback method, to the results of an initial search by assuming that the top-ranked documents are relevant a priori. In this paper, for searching NTCIR-3 patent test collection through pseudo feedback, we try to emplo...

Journal: :CoRR 1997
Manuel de Buenaga Rodríguez José María Gómez Hidalgo Belén Díaz-Agudo

Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our a...

2001
Roberto Basili Alessandro Moschitti

Methods for taking into account linguistic content into text retrieval are receiving a growing attention [16],[14]. Text categorization is an interesting area for evaluating and quantifying the impact of linguistic information. Works in text retrieval through Internet suggest that embedding linguistic information at a suitable level within traditional quantitative approaches (e.g. sense distinc...

2003
Kazuaki Kishida

Pseudo relevance feedback is empirically known as a useful method for enhancing retrieval performance. For example, we can apply the Rocchio method, which is well-known relevance feedback method, to the results of an initial search by assuming that the top-ranked documents are relevant. In this paper, for searching the NTCIR-3 patent test collection through pseudo feedback, we employ two releva...

Journal: :JDIM 2009
Dell Zhang Wee Sun Lee

We identify and explore an Information Retrieval paradigm called Query-By-Multiple-Examples (QBME) where the information need is described not by a set of terms but by a set of documents. Intuitive ideas for QBME include using the centroid of these documents or the well-known Rocchio algorithm to construct the query vector. We consider this problem from the perspective of text classification, a...

2003
Byeong Man Kim Qing Li Jong-Wan Kim

In this work, we propose a new method for extracting user preferences from a few documents that might interest users. For this end, we first extract candidate terms and choose a number of terms called initial representative keywords (IRKs) from them through fuzzy inference. Then, by expanding IRKs and reweighting them using term co-occurrence similarity, the final representative keywords are ex...

2007
Kang Hyuk Lee Judy Kay Byeong Ho Kang

Two important research areas in statistical approaches for automated text categorization are similarity-based learning algorithms and associated thresholding strategies. The combination of these techniques significantly influences the overall performance of text categorization systems. After researching common techniques in both areas, we describe a lazy linear classifier known as the keyword a...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید