Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment

نویسندگان

  • Natalie Vargas
  • Carlos Ramisch
  • Helena de Medeiros Caseli
چکیده

We propose a method for joint unsupervised discovery of multiword expressions (MWEs) and their translations from parallel corpora. First, we apply independent monolingual MWE extraction in source and target languages simultaneously. Then, we calculate translation probability, association score and distributional similarity of co-occurring pairs. Finally, we rank all translations of a given MWE using a linear combination of these features. Preliminary experiments on light verb constructions show promising results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Lingual Variation of Light Verb Constructions: Using Parallel Corpora and Automatic Alignment for Linguistic Research

Cross-lingual parallelism and small-scale language variation have recently become subject of research in both computational and theoretical linguistics. In this article, we use a parallel corpus and an automatic aligner to study English light verb constructions and their German translations. We show that parallel corpus data can provide new empirical evidence for better understanding the proper...

متن کامل

Breaking Bad: Parallel Subtitles Corpora and the Extraction of Verb-Particle Constructions

The automatic extraction of verb-particle constructions (VPCs) is of particular interest to the NLP community. Previous studies have shown that word alignment methods can be used with parallel corpora to successfully extract a range of multi-word expressions (MWEs). In this paper the method is applied to a new type of corpus, made up of a collection of subtitles of films and television series. ...

متن کامل

Identifying Phrasemes via Interlingual Association Measures - A Data-driven Approach on Dependency-parsed and Word-aligned Parallel Corpora

It has been understood for a long time that the semantic content of a combination of two or more words often cannot be derived from the semantics of the single words, but that the use of one particular word imposes restrictions upon others (Firth 1957; Evert 2004, 15–17). The semantics is then either determined by the ruling word, e.g., in the case of light verb constructions (attention entails...

متن کامل

Reordering Matrix Post-verbal Subjects for Arabic-to-English SMT

We improve our recently proposed technique for integrating Arabic verb-subject constructions in SMT word alignment (Carpuat et al., 2010) by distinguishing between matrix (or main clause) and non-matrix Arabic verb-subject constructions. In gold translations, most matrix VS (main clause verb-subject) constructions are translated in inverted SV order, while non-matrix (subordinate clause) VS con...

متن کامل

Breaking Bad: Extraction of Verb-Particle Constructions from a Parallel Subtitles Corpus

The automatic extraction of verb-particle constructions (VPCs) is of particular interest to the NLP community. Previous studies have shown that word alignment methods can be used with parallel corpora to successfully extract a range of multi-word expressions (MWEs). In this paper the technique is applied to a new type of corpus, made up of a collection of subtitles of movies and television seri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017