Exploiting information extraction annotations for document retrieval in distillation tasks
نویسندگان
چکیده
Information distillation aims to extract relevant pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. In this paper, we present our approach for using information extraction annotations to augment document retrieval for distillation. We take advantage of the fact that some of the distillation queries can be associated with annotation elements introduced for the NIST Automatic Content Extraction (ACE) task. We experimentally show that using the ACE events to constrain the document set returned by an information retrieval engine significantly improves the precision at various recall rates for two different query templates.
منابع مشابه
Report on the TREC 2003 Experiment: Genomic and Web Searches
This year we took part in the genomic information retrieval and information extraction tasks, as well as the named page and topic distillation searches. In carrying out the last two tasks, we made use of link anchor information and document content in order to construct Web page representatives. This type of document representation uses multi-vectors to highlight the importance of both hyperlin...
متن کاملPredicting Extraction Performance using Context Language Models
Exploiting lexical and semantic relationships in text can dramatically improve information retrieval accuracy. Most notably, named entities and relations between entities are crucial for effective question answering and other information retrieval tasks. Unfortunately, the success in extracting these relationships can vary for different domains and document collections. Predicting extraction pe...
متن کاملReport on the TREC 11 Experiment: Arabic, Named Page and Topic Distillation Searches
This year we took part in the Arabic cross-language information retrieval track (for us limited to monolingual Arabic retrieval) and also in both named page and topic distillation searches. In the last two tasks, we made use of link anchor information and document content in order to construct Web page representatives. This document representation uses multi-vectors in order to highlight the im...
متن کاملIXIR: A statistical information distillation system
The task of information distillation is to extract snippets from massive multilingual audio and textual document sources that are relevant for a given templated query. We present an approach that focuses on the sentence extraction phase of the distillation process. It selects document sentences with respect to their relevance to a query via statistical classification with support vector machine...
متن کاملEvaluation of Document Citations in Phase 2 Gale Distillation
The focus of information retrieval evaluations, such as NIST’s TREC evaluations (e.g. Voorhees 2003), is on evaluation of the information content of system responses. On the other hand, retrieval tasks usually involve two different dimensions: reporting relevant information and providing sources of information, including corroborating evidence and alternative documents. Under the DARPA Global A...
متن کامل