Phrase Identification in Cross-Language Information Retrieval

نویسندگان

  • Mirna Adriani
  • C. J. van Rijsbergen
چکیده

Term-sense ambiguity and the difficulty in translating phrases are the main sources of problem in dictionarybased cross-language information retrieval (CLIR) approaches. We propose a term similarity-based translationphrase identification technique to enhance the retrieval effectiveness of a dictionary-based query translation method. The technique identifies noun-phrases in the target language based on the degree of association between every pair of terms from two sets of translation terms. We demonstrate the effectiveness of the technique through a series of experiments using queries in two source languages, Spanish and Indonesian, to retrieve documents in English from the standard TREC (Text Retrieval Conference) collection. Combining this technique with our earlier term-similarity based sense disambiguation technique results in further retrieval performance improvements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic annotation for concept-based cross-language medical information retrieval

We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes part-of-speech ...

متن کامل

A survey on phrase structure learning methods for text classification

Text classification is a task of automatic classification of text into one of the predefined categories. The problem of text classification has been widely studied in different communities like natural language processing, data mining and information retrieval. Text classification is an important constituent in many information management tasks like topic identification, spam filtering, email r...

متن کامل

Improving Query Translation for Cross-Language Information Retrieval using a Web-based Approach

With the increasing popularity of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing improving approaches for query translation such as noun phrase (NP) identification, translation and words translation selection require special corpus resource. However, those natural language resources are not readily available. In this paper, we propos...

متن کامل

Word Formation Approach to Noun Phrase Analysis for Thai

Noun phrase analysis is one of the most important components in Natural Language Processing (NLP) applications, such as information retrieval, extraction and categorization. For Thai, noun phrase analysis has unique problems, i.e., noun phrase boundary identification, noun phrase decomposition and its relation extraction, and core noun detection. Statistical and rule based Word formation is, th...

متن کامل

Cross-Lingual Medical Information Retrieval through Semantic Annotation

We present a framework for concept-based, cross-lingual information retrieval (CLIR) in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data, whereby documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000