Experiments in Terabyte Searching, Genomic Retrieval and Novelty Detection for TREC 2004
نویسندگان
چکیده
In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each track separately and present separate conclusions from this work. In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies all of the track experiments described in this document.
منابع مشابه
Amberfish at the TREC 2004 Terabyte Track
The TREC 2004 Terabyte Track evaluated information retrieval in largescale text collections, using a set of 25 million documents (426 GB). This paper gives an overview of our experiences with this collection and describes Amberfish, the text retrieval software used for the experiments.
متن کاملFinding New News: Novelty Detection in Broadcast News
The automatic detection of novelty, or newness, as part of an information retrieval system would greatly improve a searcher’s experience by presenting “documents” in order of how much extra information they add to what is already known instead of how similar they are to a user’s query. In this paper we present a novelty detection system evaluated on the AQUAINT text collection as part of our TR...
متن کاملExperiments in Novelty Detection at Columbia University
This paper describes the method we used for the Novelty Track for the 2002 Text Retrieval Conference (TREC). We tried to adapt tools we are developing for a task closely related to the novelty part of the this track. The system we are building will scan a stream of documents and present to the user only the new information it finds. For the “relevance” part of the TREC, we decided to test the a...
متن کاملImproved Feature Selection and Redundance Computing - THUIR at TREC 2004 Novelty Track
This is the third years that Tsinghua University Information Retrieval Group (THUIR) participates in Novelty task of TREC. Our research on this year’s novelty track mainly focused on four aspects: (1) text feature selection and reduction; (2) improved sentence classification in finding relevant information; (3)efficient sentence redundancy computing; (4) effective result filtering. All experime...
متن کاملExperiments in TREC 2004 Novelty Track at CAS-ICT
The main task in Novelty Track is to retrieve relevant sentences and remove duplicates from a document set given a TREC topic. This track took place for the first time in TREC 2002 and it is refined to four tasks in TREC 2003. Besides 25 relevant documents, irrelevant ones are given in this year of Novelty track. In other words, a given document is either relevant or irrelevant to the topic. Th...
متن کامل