Algebraic Compositional Models for Semantic Similarity in Ranking and Clustering
نویسندگان
چکیده
Although distributional models of word meaning have been widely used in Information Retrieval achieving an effective representation and generalization schema of words in isolation, the composition of words in phrases or sentences is still a challenging task. Different methods have been proposed to account on syntactic structures to combine words in term of algebraic operators (e.g. tensor product) among vectors that represent lexical constituents. In this paper, a novel approach for semantic composition based on space projection techniques over the basic geometric lexical representations is proposed. In the geometric perspective here pursued, syntactic bi-grams are projected in the so called Support Subspace, aimed at emphasizing the semantic features shared by the compound words and better capturing phrase-specific aspects of the involved lexical meanings. State-of-the-art results are achieved in a well known benchmark for phrase similarity task and the generalization capability of the proposed operators is investigated in a cross-linguistic scenario, i.e. in the English and Italian Language.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملUse of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملContextualizing Semantic Representations Using Syntactically Enriched Vector Models
We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of firstand second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising re...
متن کاملTowards Compositional Tree Kernels
Distributional Compositional Semantics (DCS) methods combine lexical vectors according to algebraic operators or functions to model the meaning of complex linguistic phrases. On the other hand, several textual inference tasks rely on supervised kernel-based learning, whereas Tree Kernels (TK) have been shown suitable to the modeling of syntactic and semantic similarity between linguistic instan...
متن کاملBUAP: Evaluating Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment
The results obtained by the BUAP team at Task 1 of SemEval 2014 are presented in this paper. The run submitted is a supervised version based on two classification models: 1) We used logistic regression for determining the semantic relatedness between a pair of sentences, and 2) We employed support vector machines for identifying textual entailment degree between the two sentences. The behaviour...
متن کامل