Corpus Based Analysis for Multilingual Terminology Entry Compounding
نویسندگان
چکیده
This paper proposes statistical analysis methods for improvement of terminology entry compounding. Terminology entry compounding is a mechanism that identifies matching entries across multiple multilingual terminology collections. Bilingual or trilingual term entries are unified in compounded multilingual entry. We suggest that corpus analysis can improve entry compounding results by analysing contextual terms of given term in the corpus data.
منابع مشابه
Language-Independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus
We present a language-pair independent terminology extraction module that is based on a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Statistical filters are applied on the bilingual list of candidate terms that is extracted from the alignment output. We compare the performance of both the alignment and terminology extraction module for three dif...
متن کاملComputer-aided analysis of multilingual patent documentation
This paper deals with the processing of a multilingual corpus of technical texts. The aim is to extract special purpose terminology. A semi-automatic tool is developed to help professional translators and terminologists not only to identify technical terms but also to detect possible translation equivalences and typical contexts of terms. Definitions of terminology reported in the literature ar...
متن کاملExploiting a Multilingual Web-based Encyclopedia for Bilingual Terminology Extraction
Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology e...
متن کاملLoLo: A System Based On Terminology For Multilingual Extraction
An unsupervised learning method, based on corpus linguistics and special language terminology, is described that can extract time-varying information from text streams. The method is shown to be ‘language-independent’ in that its use leads to sets of regular-expressions that can be used to extract the information in typologically distinct languages like English and Arabic. The method uses the i...
متن کاملBilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora
This paper presents several methods for exploiting multiple resources in bilingual lexicon extraction, either from parallel or comparable corpora. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to optimally combine the different resources for bilingual lexicon extraction is presente...
متن کامل