Experiments in cross-language medical information retrieval using a mixing translation module

نویسندگان

  • Tuan Due Tran
  • Nicolas Garcelon
  • Anita Burgun-Parenthoine
  • Pierre Le Beux
چکیده

Given the ever-increasing scale and diversity of medical literature widely published in English on the Internet, improving the performance of information retrieval by cross-language is an urgent research objective. Cross-language medical information retrieval (CLMIR) consists of providing a query in one language and searching medical document collections in one or more different languages. Our users of CLMIR are users who are able to read biomedical texts in English, but have difficulty formulating English queries. This paper proposes a French/English CLMIR system as a mixing model for supporting the retrieval of English medical documents. Methods fall into the category of query translation approach in which we use a hybrid machine translation that combines a pattern-based module with a rule-based translator and includes three steps from pre- to- post-translation. In parallel to this hybrid machine translation, we use multilingual UMLS Methasaurus as a complementary translator. The results show that using a mixing translation module outperforms machine translation-based method and thesaurus-based method used separately.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ontologies in Cross-Language Information Retrieval

We present an approach to using ontologies as interlingua in cross-language information retrieval in the medical domain. Our approach is based on using the Unified Medical Language System (UMLS) as the primary ontology. Documents and queries are annotated with multiple layers of linguistic information (part-of-speech tags, lemmas, phrase chunks). Based on this we identify medical terms and sema...

متن کامل

Ontologies in Croos-Language Information Retrieval

We present an approach to using ontologies as interlingua in cross-language information retrieval in the medical domain. Our approach is based on using the Unified Medical Language System (UMLS) as the primary ontology. Documents and queries are annotated with multiple layers of linguistic information (part-of-speech tags, lemmas, phrase chunks). Based on this we identify medical terms and sema...

متن کامل

University of Hagen at CLEF 2005: Towards a Better Baseline for NLP Methods in Domain-Specific Information Retrieval

The third participation of the University of Hagen at the German Indexing and Retrieval Test (GIRT) task of the Cross Language Evaluation Campaign (CLEF 2005) aims at providing a better baseline for experiments with natural language processing (NLP) methods in domainspecific information retrieval (IR). Our monolingual experiments with the German document collection are based on a setup combinin...

متن کامل

KECIR Question Answering System at NTCIR7 CCLQA

At the NTCIR-7 CCLQA (Complex Cross-Language Question Answering) task, we participated in the Chinese-Chinese (C-C) and English-Chinese (E-C) QA (Question Answering) subtasks. In this paper, we describe our QA system, which includes modules for question analysis, document retrieval, information extraction and answer generation. Besides, we used an online MT (Machine Translation) system to deal ...

متن کامل

Resolving Translation Ambiguity using Monolingual Corpora. A Report on Clairvoyance CLEF-2002 Experiments

Choosing the correct target words is a difficult problem for machine translation. In cross-language information retrieval, this problem of choice is mitigated since more than one translation alternative can be retained in the target query. Between choosing just one word as a translation and keeping all the possible translations for each source word, one can apply a range of filtering techniques...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Studies in health technology and informatics

دوره 107 Pt 2  شماره 

صفحات  -

تاریخ انتشار 2004