Measuring Semantic Similarity in Short Texts through Greedy Pairing and Word Semantics
نویسندگان
چکیده
We propose in this paper a greedy method to the problem of measuring semantic similarity between short texts. Our method is based on the principle of compositionality which states that the overall meaning of a sentence can be captured by summing up the meaning of its parts, i.e. the meanings of words in our case. Based on this principle, we extend wordto-word semantic similarity metrics to quantify the semantic similarity at sentence level. We report results using several word-to-word semantic similarity metrics, based on word knowledge or vectorial representations of meaning. Our approach performs better than similar approaches on the tasks of paraphrase identification and recognizing textual entailment, which are two illustrative semantic similarity tasks. We also report the role of word weighting and of function words on the performance of the proposed method.
منابع مشابه
Conceptual changes of Mihrab, emphasizing on third and fourth century AH sources
Mihrab existed before Islam. This word became one of the main components of Islamic mosques after the Islamic conquest. Structure and function changes of Mihrab during these two periods could be considered in various methods. Considering conceptual changes is one of the historical studies methods. This article aims to investigate a part of conceptual changes that reflects Mihrab’s structural an...
متن کاملThe SIMILAR Corpus: A Resource To Foster The Qualitative Understanding of Semantic Similarity of Texts
We describe in this paper the SIMILAR corpus which was developed to foster a deeper and qualitative understanding of word-to-word semantic similarity metrics and their role on the more general problem of text-to-text semantic similarity. The SIMILAR corpus fills a gap in existing resources that are meant to support the development of text-to-text similarity methods based on word-level similarit...
متن کاملThe Semantics of the Word Istikbar (Arrogance) in the Holy Quran based on Syntagmatic Relations(A Case Study of Semantic Proximity and Semantic Contrast)
The word istikbar (arrogance) is one of the key words in the monotheistic system of the Quran, which has found a special status as a special feature of the opponents and adversaries of the call to the truth. Given the prominent role of this issue in the human life system and its provision of corruption and moral deviations, it is necessary to represent the nature of the elements that make up th...
متن کاملSemantic Similarity of Short Texts
This paper presents a method for measuring the semantic similarity of texts using a corpus based measure of semantic word similarity and a normalized and modified versions of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. In this paper, we focus on computing the sim...
متن کاملA Method for Measuring Sentence Similarity and iIts Application to Conversational Agents
This paper presents a novel algorithm for computing similarity between very short texts of sentence length. It will introduce a method that takes account of not only semantic information but also word order information implied in the sentences. Firstly, semantic similarity between two sentences is derived from information from a structured lexical database and from corpus statistics. Secondly, ...
متن کامل