Word-pair extraction for lexicography

نویسندگان

  • Chris Brew
  • David McKelvie
چکیده

We describe an application of sentence alignment techniques and approximate string matching to the problem of extracting lexicographically interesting word-word pairs from multilingual corpora. Since our interest is in support systems for lexicographers rather than in fully automatic construction of lexicons, we would like to provide access to parameters allowing a tunable trade-oo between precision and recall. We evaluate two techniques for doing this. Since sentence alignment tends to associate semantically similar words, approximate string matching draws attention to orthographic similarities, they can be used to serve diierent lexicographic purposes, as can the combination of the two techniques, which amounts, inter alia, to a tool for uncovering faux amis. We conclude by sketching a simple and exible means for allowing lexicographers to provide information which has the potential to improve system performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Lexicography and Lexicology Elexbi, a Basic Tool for Bilingual Term Extraction from Spanish-Basque Parallel Corpora

We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim ofthis work is to develop some techniques for the automatic extraction ofpairs ofequivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a monolingual extraction of term candidates in each language, then the creati...

متن کامل

The Application of Fuzzy Logic to Collocation Extraction

Collocations are important for many tasks of Natural language processing such as information retrieval, machine translation, computational lexicography etc. So far many statistical methods have been used for collocation extraction. Almost all the methods form a classical crisp set of collocation. We propose a fuzzy logic approach of collocation extraction to form a fuzzy set of collocations in ...

متن کامل

Statistical Alignment Models for Translational Equivalence

The ever-increasing amount of parallel data opens a rich resource to multilingual natural language processing, enabling models to work on various translational aspects like detailed human annotations, syntax and semantics. With efficient statistical models, many cross-language applications have seen significant progresses in recent years, such as statistical machine translation, speech-to-speec...

متن کامل

Corpus Analysis for Lexical Database Construction: A Case of Russian and Czech Wordnets

The paper deals with corpus-based methods applied to the particular tasks of lexical database construction. Different techniques of the corpus analysis are discussed and their applicability for the tasks is assessed. Corpus management system Manatee + Bonito developed at the Faculty of Informatics, Masaryk University in Brno, Czech Republic, is presented as a tool that enables to perform all di...

متن کامل

ELexBI, A BASIC TOOL FOR BILINGUAL TERM EXTRACTION FROM SPANISH-BASQUE PARALLEL CORPORA

We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim of this work is to develop some techniques for the automatic extraction of pairs of equivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a previous monolingual extraction of term candidates in each language, the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996