Toward a Bilingual Lexical Database on Connectives: Exploiting a German/Italian Parallel Corpus
نویسندگان
چکیده
English. We report on experiments to validate and extend two language-specific connective databases (German and Italian) using a word-aligned corpus. This is a first step toward constructing a bilingual lexicon on connectives that are connected via their discourse senses. Italiano. Presentiamo una serie di esperimenti per validare ed estendere due database dei connettivi, che sonospecifici per la lingua italiana e per quella tedesca. Abbiamo utilizzato un corpus parallelo allineato a livello della parola. Si tratta di un primo passo verso la costruzione di un lessico bilingue dei connettivi che sono collegati attraverso i loro sensi del discorso.
منابع مشابه
Implementing a Bilingual Lexical Database System
The current state of progress of a research project for the design and development of a bilingual, Italian-English/English-Italian, lexical database system is presented. The aim is to create an integrated system in which a number of monolingual electronic dictionaries and/or lexical databases can be linked through the medium of a bilingual database. In addition, procedures are being implemented...
متن کاملUsing Parallel Corpora to enrich Multilingual Lexical Resources
This paper describes the use of a bilingual vector model for the automatic discovery of German translations of English terms. The model is built by analysing co-occurence patterns in a parallel corpus of English and German medical abstracts, a method also used for CrossLingual Information Retrieval. The model generates candidate German translations of English words using the cosine similarity m...
متن کاملThe Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction
In this paper, we present a trilingual parallel corpus for German, Italian and Romansh, a Swiss minority language spoken in the canton of Grisons. The corpus called ALLEGRA contains press releases automatically gathered from the website of the cantonal administration of Grisons. Texts have been preprocessed and aligned with a current state-of-the-art sentence aligner. The corpus is one of the f...
متن کاملUnsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical Documents using UMLS
This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using UMLS. We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relations between terms given by UMLS, a method which achieves 74% prec...
متن کاملItalWordNet: extending and exploiting an existing resource for computational tasks
In this paper we discuss how the ItalWordNet semantic database, being built by extending the Italian wordnet developed within the EuroWordNet project, is being exploited for the lexical semantic annotation of a corpus of Italian.
متن کامل