Automatic Term Extraction for Cross-Language Information Retrieval Using a Bilingual Parallel Corpus

نویسنده

  • Tayebeh Mosavi Miangah
چکیده

Information retrieval is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval refers to a kind of information retriev/al in which the language of the query and that of searched document are different. This paper tries to construct a bilingual lexicon from an English-Persian parallel corpus which can be directly applied to cross-language information retrieval tasks. For this purpose we used a statistical measure to compute the association value between every content word in the corpus and each of the content words in the corresponding sentence. The highest association value will belong to the most probable translation of the given word. The five highest scores belong to the five possible translations of the given word. The method was tested against a set of English words and the average accuracy of the method was obtained 87.33% for this experiment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

OBJECTIVES We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. MATERIAL AND METHODS We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and seco...

متن کامل

Automatic extraction of bilingual word pairs using inductive chain learning in various languages

In this paper, we propose a new learning method for extracting bilingual word pairs from parallel corpora in various languages. In cross-language information retrieval, the system must deal with various languages. Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important. However, previous works based on statistical methods are insufficien...

متن کامل

Extraction of Cross Language Term Correspondences

This paper describes a method for extracting translations of terms across languages, using parallel corpora. The extracted term correspondences are such that they are useful when performing query expansion for cross language information retrieval, or for bilingual lexicon extraction. The method makes use of the mutual information measure and allows for mapping between single wordto multi-word t...

متن کامل

Bilingual Terminology Extraction Using Multi-level Termhood

Purpose: Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of machine translation, information retrieval, information extraction and text categorization, etc. Bilingual terminology extraction plays an important role in the application of bilingual dictionary compilation, ...

متن کامل

Exploiting a Multilingual Web-based Encyclopedia for Bilingual Terminology Extraction

Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008