Benchmarking short text semantic similarity
نویسندگان
چکیده
Short Text Semantic Similarity measurement is a new and rapidly growing field of research. “Short texts” are typically sentence length but are not required to be grammatically correct. There is great potential for applying these measures in fields such as Information Retrieval, Dialogue Management and Question Answering. A dataset of 65 sentence pairs, with similarity ratings, produced in 2006 has become adopted as a de facto Gold Standard benchmark. This paper discusses the adoption of the 2006 dataset, lays down a number of criteria that can be used to determine whether a dataset should be awarded a “Gold Standard” accolade and illustrates its use as a benchmark. Procedures for the generation of further Gold Standard datasets in this field are recommended.
منابع مشابه
Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space
In order to improve the accuracy of short text similarity calculation, this paper presents the idea that use the history of short text messages to construct semantic feature space, then use the vector in semantic feature space to represent short text and do semantic extension, and finally calculate the short text similarity of corresponding vector in the semantic feature space. This method can ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملCorpus-based and Knowledge-based Measures of Text Semantic Similarity
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere,...
متن کاملShort Text Similarity Measure Based on Double Vector Space Model
Short text similarity measure is the basis of classification and duplicate checking of the short texts. Allowing for the insufficient consideration of the sentence semantic and structure information in similarity calculation between two short texts, we propose a novel method of short text similarity calculation based on double vector space model on the basis of traditional vector space model. C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJIIDS
دوره 4 شماره
صفحات -
تاریخ انتشار 2010