Designing a Tool for Exploiting Bilingual Comparable Corpora

نویسندگان

  • Peter Bennison
  • Lynne Bowker
چکیده

Translators have a real need for a tool that will allow them to exploit information contained in bilingual comparable corpora. ExTrECC is designed to be a semi-automatic tool that processes bilingual comparable corpora and presents a translator with a list of potential equivalents (in context) of the search term. The task of identifying translation equivalents in a non-aligned, non-translated corpus is a difficult one, and ExTrECC makes use of a number of techniques, some of which are simple and others more sophisticated. The basic design of ExTrECC (graphical user interface, architecture, algorithms) is outlined in this paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

Bilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora

This paper presents several methods for exploiting multiple resources in bilingual lexicon extraction, either from parallel or comparable corpora. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to optimally combine the different resources for bilingual lexicon extraction is presente...

متن کامل

Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization

Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled documents in a source language (e.g. Italian). In this work we present many solutions according to the availability of bilingual resources, and we show that it is possible to deal with the problem even when no such resour...

متن کامل

Exploiting Wikipedia as a Knowledge Base for the Extraction of Linguistic Resources: Application on Arabic-French Comparable Corpora and Bilingual Lexicons

We present simple and effective methods for extracting comparable corpora and bilingual lexicons from Wikipedia. We shall exploit the large scale and the structure of Wikipedia articles to extract two resources that will be very useful for natural language applications. We build a comparable corpus from Wikipedia using categories as topic restrictions and we extract bilingual lexicons from inte...

متن کامل

Named Entities Translation Based On Comparable Corpora

In this paper we present a system for translating named entities from Basque to Spanish based on comparable corpora. For that purpose we have tried two approaches: one based on Basque linguistic features, and a language-independent tool. For both tools we have used BasqueSpanish comparable corpora, a bilingual dictionary and the web as resources.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000