Recent Experiments with INQUERY

نویسندگان

  • James Allan
  • Lisa Ballesteros
  • James P. Callan
  • W. Bruce Croft
  • Zhihong Lu
چکیده

Past TREC experiments by the University of Massachusetts have focused primarily on ad-hoc query creation. Substantial eeort was directed towards automatically translating TREC topics into queries, using a set of simple heuristics and query expansion. Less emphasis was placed on the routing task, although results were generally good. The Spanish experiments in TREC-3 concentrated on simple indexing, sophisticated stemming, and simple methods of creating queries. The TREC-4 experiments were a departure from the past. The ad-hoc experiments involved \\ne tuning" existing approaches, and modiications to the INQUERY term weighting algorithm. However, much of the research focus in TREC-4 was on the routing, Spanish, and collection merging experiments. These tracks more closely match our broader research interests in document routing, document ltering, distributed IR, and multilingual retrieval. The University of Massachusetts' experiments were conducted with version 3.0 of the INQUERY information retrieval system. INQUERY is based on the Bayesian inference network retrieval model. It is described elsewhere 7, 5, 12, 11], so this paper focuses on relevant diierences to the previously published algorithms. For the ad-hoc retrieval experiments, the major change to the system was a new estimation technique for term weighting. We also continued to reene our analysis techniques for the TREC topics, our use of passage retrieval, and query expansion using InFinder 1. 1.1 Query Processing TREC topics 201{250 diier from earlier TREC topics in that the elds were removed. This change makes the TREC topics even more dissimilar from user queries in an online system than in the past. The TREC topics observe the niceties of grammar, punctuation and, especially, polite periphrasis. In an online system, users typically discard grammar, punctuation and any non-functional verbiage in an eeort to get the information they want. The removal of the eld created a set of topics that resembled essay questions. Much of our TREC processing this year focussed on creating queries that more closely resemble \real" online queries, by stripping oo the polite circumlocution and its accompanying grammar. As a result, in addition to the standard \stop-phrase" program distributed with INQUERY, which removes the occasional polite circumlocution, we resurrected an old program for removing additional verbiage that is likely to be content-free, especially in questions. For example, in topic 201 1 Formerly called PhraseFinder.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TREC and Tipster Experiments with Inquery

INQUERY is a probablistic information retrieval system based upon a Bayesian inference network model. This paper describes recent improvements to the system as a result of participation in the TIPSTER project and the TREC-2 conference. Improvements include transforming forms-based speciications of information needs into complex structured queries, automatic query expansion, automatic recognitio...

متن کامل

Chinese Information Extraction and Retrieval

I. what was learned from porting the INQUERY information retrieval engine and the INFINDER term finder to Chinese 2. experiments at the University of Massachusetts evaluating INQUERY performance on Chinese newswire (Xinhua), 3. what was learned from porting selected components of PLUM to Chinese 4. experiments evaluating the POST part of speech tagger and named entity recognition on Chinese. 5....

متن کامل

Indri at TREC 2004: Terabyte Track

This paper provides an overview of experiments carried out at the TREC 2004 Terabyte Track using the Indri search engine. Indri is an efficient, effective distributed search engine. Like INQUERY, it is based on the inference network framework and supports structured queries, but unlike INQUERY, it uses language modeling probabilities within the network which allows for added flexibility. We des...

متن کامل

Inquery and Trec-7 1.2 Inroute

ipated in only four of the tracks that were part of the TREC-7 workshop. We worked on ad-hoc retrieval, ltering, VLC, and the SDR track. This report covers the work done on each track successively. We start with a discussion of IR tools that were broadly applied in our work. Although UMass used a wide range of tools, from Unix shell scripts, to PC spreadsheets, three major tools were applied ac...

متن کامل

The INQUERY Retrieval System

As larger and more heterogeneous text databases become available, information retrieval research will depend on the development of powerful, eecient and exible retrieval engines. In this paper , we describe a retrieval system (IN-QUERY) that is based on a probabilis-tic retrieval model and provides support for sophisticated indexing and complex query formulation. INQUERY has been used successfu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995