Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment
نویسندگان
چکیده
We propose a method for joint unsupervised discovery of multiword expressions (MWEs) and their translations from parallel corpora. First, we apply independent monolingual MWE extraction in source and target languages simultaneously. Then, we calculate translation probability, association score and distributional similarity of co-occurring pairs. Finally, we rank all translations of a given MWE using a linear combination of these features. Preliminary experiments on light verb constructions show promising results.
منابع مشابه
Cross-Lingual Variation of Light Verb Constructions: Using Parallel Corpora and Automatic Alignment for Linguistic Research
Cross-lingual parallelism and small-scale language variation have recently become subject of research in both computational and theoretical linguistics. In this article, we use a parallel corpus and an automatic aligner to study English light verb constructions and their German translations. We show that parallel corpus data can provide new empirical evidence for better understanding the proper...
متن کاملBreaking Bad: Parallel Subtitles Corpora and the Extraction of Verb-Particle Constructions
The automatic extraction of verb-particle constructions (VPCs) is of particular interest to the NLP community. Previous studies have shown that word alignment methods can be used with parallel corpora to successfully extract a range of multi-word expressions (MWEs). In this paper the method is applied to a new type of corpus, made up of a collection of subtitles of films and television series. ...
متن کاملIdentifying Phrasemes via Interlingual Association Measures - A Data-driven Approach on Dependency-parsed and Word-aligned Parallel Corpora
It has been understood for a long time that the semantic content of a combination of two or more words often cannot be derived from the semantics of the single words, but that the use of one particular word imposes restrictions upon others (Firth 1957; Evert 2004, 15–17). The semantics is then either determined by the ruling word, e.g., in the case of light verb constructions (attention entails...
متن کاملReordering Matrix Post-verbal Subjects for Arabic-to-English SMT
We improve our recently proposed technique for integrating Arabic verb-subject constructions in SMT word alignment (Carpuat et al., 2010) by distinguishing between matrix (or main clause) and non-matrix Arabic verb-subject constructions. In gold translations, most matrix VS (main clause verb-subject) constructions are translated in inverted SV order, while non-matrix (subordinate clause) VS con...
متن کاملBreaking Bad: Extraction of Verb-Particle Constructions from a Parallel Subtitles Corpus
The automatic extraction of verb-particle constructions (VPCs) is of particular interest to the NLP community. Previous studies have shown that word alignment methods can be used with parallel corpora to successfully extract a range of multi-word expressions (MWEs). In this paper the technique is applied to a new type of corpus, made up of a collection of subtitles of movies and television seri...
متن کامل