Toward a Bilingual Lexical Database on Connectives: Exploiting a German/Italian Parallel Corpus

نویسندگان

  • Peter Bourgonje
  • Yulia Grishina
  • Manfred Stede
چکیده

English. We report on experiments to validate and extend two language-specific connective databases (German and Italian) using a word-aligned corpus. This is a first step toward constructing a bilingual lexicon on connectives that are connected via their discourse senses. Italiano. Presentiamo una serie di esperimenti per validare ed estendere due database dei connettivi, che sonospecifici per la lingua italiana e per quella tedesca. Abbiamo utilizzato un corpus parallelo allineato a livello della parola. Si tratta di un primo passo verso la costruzione di un lessico bilingue dei connettivi che sono collegati attraverso i loro sensi del discorso.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing a Bilingual Lexical Database System

The current state of progress of a research project for the design and development of a bilingual, Italian-English/English-Italian, lexical database system is presented. The aim is to create an integrated system in which a number of monolingual electronic dictionaries and/or lexical databases can be linked through the medium of a bilingual database. In addition, procedures are being implemented...

متن کامل

Using Parallel Corpora to enrich Multilingual Lexical Resources

This paper describes the use of a bilingual vector model for the automatic discovery of German translations of English terms. The model is built by analysing co-occurence patterns in a parallel corpus of English and German medical abstracts, a method also used for CrossLingual Information Retrieval. The model generates candidate German translations of English words using the cosine similarity m...

متن کامل

The Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction

In this paper, we present a trilingual parallel corpus for German, Italian and Romansh, a Swiss minority language spoken in the canton of Grisons. The corpus called ALLEGRA contains press releases automatically gathered from the website of the cantonal administration of Grisons. Texts have been preprocessed and aligned with a current state-of-the-art sentence aligner. The corpus is one of the f...

متن کامل

Unsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical Documents using UMLS

This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using UMLS. We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relations between terms given by UMLS, a method which achieves 74% prec...

متن کامل

ItalWordNet: extending and exploiting an existing resource for computational tasks

In this paper we discuss how the ItalWordNet semantic database, being built by extending the Italian wordnet developed within the EuroWordNet project, is being exploited for the lexical semantic annotation of a corpus of Italian.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017