Decomposing Text Processing for Retrieval: Cheshire tries GRID@CLEF

نویسنده

  • Ray R. Larson
چکیده

This short paper is a work in progress describing our participation in the GRID@CLEF task. The GRID@CLEF task is intended to capture in XML form the intermediate results of the text processing phases of the indexing process used by IR systems. Our approach was to create a new instrumented version of the indexing program used with the Cheshire II system. Thanks to an extension by the organizers, we were able to submit runs derived from our system. The system used for this task is a modified version of the Cheshire II IR system, to which output files for the different intermediate streams have been added. The additions, like the original system were written in C. Developing this system required creating parallel modules for several elements of the Cheshire II indexing programs. The current version handles the simplest processing cases, and currently ignores the many specialized indexing modes in the system (such as geographic name extraction and georeferencing).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Back to Basics - Again - for Domain Specific Retrieval

In this paper we will describe Berkeley’s approach to the Domain Specific (DS) track for CLEF 2008. Last year we used Entry Vocabulary Indexes and Thesaurus expansion approaches for DS, but found in later testing that some simple text retrieval approaches had better results than these more complex query expansion approaches. This year we decided to revisit our basic text retrieval approaches an...

متن کامل

Logistic Regression for Metadata: Cheshire takes on AdHoc-TEL

In this paper we will briefly describe the approaches taken by the Berkeley Cheshire Group for the Adhoc-TEL 2008 tasks (Mono and Bilingual retrieval). Since the AdhocTEL task is new for this year, we took the approach of using methods that have performed fairly well in other tasks. In particular, the approach this year used probabilistic text retrieval based on logistic regression and incorpor...

متن کامل

Text Retrieval and Blind Feedback for the ImageCLEF Photo Task

In this paper we will describe Berkeley’s approach to the ImageCLEF “Photo” task for CLEF 2006. This year is the first time that we have participated in ImageCLEF, and we chose to primarily establish a baseline for the Cheshire II system for this task, while we had originally hoped to use GeoCLEF methods for this task, in the end time constraints led us to restrict our submissions to the basic ...

متن کامل

Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR

In this paper we will briefly describe the approaches taken by Berkeley for the main GeoCLEF 2008 tasks (Mono and Bilingual retrieval). The approach this year used probabilistic text retrieval based on logistic regression and incorporating blind relevance feedback for all of the runs and in addition we ran a number of tests combining this type of search with OKAPI BM25 searches using a fusion a...

متن کامل

Cheshire II at GeoCLEF: Fusion and Query Expansion for GIR

In this paper I will describe the Berkeley (group 1) approach to the GeoCLEF task for CLEF 2005. The main technique we are testing is the fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm. We also combine multiple translations of queries in cross-language searching. Since this is t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009