Semantics-Driven Recognition of Collocations Using Word Embeddings

نویسندگان

  • Sara Rodríguez-Fernández
  • Luis Espinosa Anke
  • Roberto Carlini
  • Leo Wanner
چکیده

L2 learners often produce “ungrammatical” word combinations such as, e.g., *give a suggestion or *make a walk. This is because of the “collocationality” of one of their items (the base) that limits the acceptance of collocates to express a specific meaning (‘perform’ above). We propose an algorithm that delivers, for a given base and the intended meaning of a collocate, the actual collocate lexeme(s) (make / take above). The algorithm exploits the linear mapping between bases and collocates from examples and generates a collocation transformation matrix which is then applied to novel unseen cases. The evaluation shows a promising line of research in collocation discovery.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conditional Random Fields for Spanish Named Entity Recognition Using Unsupervised Features

Unsupervised features based on word representations such as word embeddings and word collocations have shown to significantly improve supervised NER for English. In this work we investigate whether such unsupervised features can also boost supervised NER in Spanish. To do so, we use word representations and collocations as additional features in a linear chain Conditional Random Field (CRF) cla...

متن کامل

Modeling the Non-Substitutability of Multiword Expressions with Distributional Semantics and a Log-Linear Model

Non-substitutability is a property of Multiword Expressions (MWEs) that often causes lexical rigidity and is relevant for most types of MWEs. Efficient identification of this property can result in the efficient identification of MWEs. In this work we propose using distributional semantics, in the form of word embeddings, to identify candidate substitutions for a candidate MWE and model its sub...

متن کامل

Using bilingual word-embeddings for multilingual collocation extraction

This paper presents a new strategy for multilingual collocation extraction which takes advantage of parallel corpora to learn bilingual word-embeddings. Monolingual collocation candidates are retrieved using Universal Dependencies, while the distributional models are then applied to search for equivalents of the elements of each collocation in the target languages. The proposed method extracts ...

متن کامل

Can we determine the semantics of collocations without using semantics?

The extraction of collocations from corpora has been actively worked on since the late eighties. However, so far, an important task of collocation processing, namely the semantic interpretation of the collocate, did not receive much attention, although the semantics of a given word when used as collocate very often varies from the semantics of this word when used in a free co-occurrence. In thi...

متن کامل

Discriminative Ability of WordNet Senses on the Task of Detecting Lexical Functions in Spanish Verb Noun Collocations

Collocations, or restricted lexical co-occurrence, are a difficult issue in natural language processing because their semantics cannot be derived from the semantics of their constituents. Therefore, such verb-noun combinations as “take a break,” “catch a bus,” “have lunch” can be interpreted incorrectly by automatic semantic analysis. Since collocations are combinations frequently used in texts...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016