Modeling the Non-Substitutability of Multiword Expressions with Distributional Semantics and a Log-Linear Model
نویسندگان
چکیده
Non-substitutability is a property of Multiword Expressions (MWEs) that often causes lexical rigidity and is relevant for most types of MWEs. Efficient identification of this property can result in the efficient identification of MWEs. In this work we propose using distributional semantics, in the form of word embeddings, to identify candidate substitutions for a candidate MWE and model its substitutability. We use our models to rank MWEs based on their lexical rigidity and study their performance in comparison with association measures. We also study the interaction between our models and association measures. We show that one of our models can significantly improve over the association measure baselines, identifying collocations.
منابع مشابه
Modeling Semantic Compositionality of Croatian Multiword Expressions
A distinguishing feature of many multiword expressions (MWEs) is their semantic non-compositionality. Determining the semantic compositionality of MWEs is important for many natural language processing tasks. We address the task of modeling semantic compositionality of Croatian MWEs. We adopt a composition-based approach within the distributional semantics framework. We build and evaluate model...
متن کاملDetermining the Semantic Compositionality of Croatian Multiword Expressions
A distinguishing feature of many multiword expressions (MWEs) is their semantic non-compositionality. Being able to automatically determine the semantic (non-)compositionality of MWEs is important for many natural language processing tasks. We address the task of determining the semantic compositionality of Croatian MWEs. We adopt a composition-based approach within the distributional semantics...
متن کاملMultilingual Wordnet sense Ranking using nearest context
In this paper, we combine methods to estimate sense rankings from raw text with recent work on word embeddings to provide sense ranking estimates for the entries in the Open Multilingual Wordnet (OMW).The existing Word2Vec Polyglot2 pre-trained models are only built for single word entries, we, therefore, re-train them with multiword expressions from the wordnets, so that multiword expressions ...
متن کاملLog-linear models and latent semantic indexing applied to mwe identification
A short introduction characterizes the task of identification of multiword expressions and their idiosyncratic properties. Then, this document gives a detailed description of loglinear models and latent semantic analysis. The description enumerates components of the models, estimation techniques for the model parameters and addresses the interpretation of the models and their evaluation. We als...
متن کاملSemantics-based Multiword Expression Extraction
This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of MWEs; the intuition is that a noun within a MWE cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributio...
متن کامل