Corpus Based Analysis for Multilingual Terminology Entry Compounding

نویسندگان

  • Andrejs Vasiljevs
  • Kaspars Balodis
چکیده

This paper proposes statistical analysis methods for improvement of terminology entry compounding. Terminology entry compounding is a mechanism that identifies matching entries across multiple multilingual terminology collections. Bilingual or trilingual term entries are unified in compounded multilingual entry. We suggest that corpus analysis can improve entry compounding results by analysing contextual terms of given term in the corpus data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language-Independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus

We present a language-pair independent terminology extraction module that is based on a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Statistical filters are applied on the bilingual list of candidate terms that is extracted from the alignment output. We compare the performance of both the alignment and terminology extraction module for three dif...

متن کامل

Computer-aided analysis of multilingual patent documentation

This paper deals with the processing of a multilingual corpus of technical texts. The aim is to extract special purpose terminology. A semi-automatic tool is developed to help professional translators and terminologists not only to identify technical terms but also to detect possible translation equivalences and typical contexts of terms. Definitions of terminology reported in the literature ar...

متن کامل

Exploiting a Multilingual Web-based Encyclopedia for Bilingual Terminology Extraction

Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology e...

متن کامل

LoLo: A System Based On Terminology For Multilingual Extraction

An unsupervised learning method, based on corpus linguistics and special language terminology, is described that can extract time-varying information from text streams. The method is shown to be ‘language-independent’ in that its use leads to sets of regular-expressions that can be used to extract the information in typologically distinct languages like English and Arabic. The method uses the i...

متن کامل

Bilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora

This paper presents several methods for exploiting multiple resources in bilingual lexicon extraction, either from parallel or comparable corpora. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to optimally combine the different resources for bilingual lexicon extraction is presente...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010