YFilter at TREC-9
نویسندگان
چکیده
We built a filtering system YFILTER this year, which we used for experiments on profile updating and thresholds setting. Our focus is using incremental Rocchio for introducing new query terms and term weighting. Although 1, 0.5, 0.25 is a widely used Rocchio ratio for query expansion based on relevance feedback, we found that the optimal setting for information filtering is corpus and profile dependent. In addition to a new Rocchio ratio, we tested a modified idf measure for term weighting (ydf) that is biased towards words with middle range term frequency.
منابع مشابه
The Bias Problem and Language Models in Adaptive Filtering
We used the YFILTER filtering system for experiments on updating profiles and setting thresholds. We developed a new method of using language models for updating profiles that is more focused on picking informative/discriminative words for query. The new method was compared with the well-known Rocchio algorithm. Dissemination thresholds were set based on maximum likelihood estimation that model...
متن کاملHigh-Performance XML Filtering: An Overview of YFilter
We have developed YFilter, an XML filtering system that provides fast, on-the-fly matching of XMLencoded data to large numbers of query specifications containing constraints on both structure and content. YFilter encodes path expressions using a novel NFA-based approach that enables highly-efficient, shared processing for large numbers of XPath expressions. In this paper, we provide a brief tec...
متن کاملYFilter: Efficient and Scalable Filtering of XML Documents
Soon, much of the data exchanged over the Internet will be encoded in XML, allowing for sophisticated filtering and content-based routing. We have built a filtering engine called YFilter, which filters streaming XML documents according to XQuery or XPath queries that involve both path expressions and predicates. Unlike previous work, YFilter uses a novel NFA-based execution model. In this demon...
متن کاملThe Thisl SDR System at TREC-9
This paper describes our participation in the TREC-9 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of a realtime version of a hybrid connectionist/HMM large vocabulary speech recognition system and a probabilistic text retrieval system. This paper describes the configuration of the speech recognition and text retrieval systems, including segmentation and query expansion. ...
متن کامل