Retrofitting Word Vectors of MeSH Terms to Improve Semantic Similarity Measures
نویسندگان
چکیده
Estimation of the semantic relatedness between biomedical concepts has utility for many informatics applications. Automated methods fall into two broad categories: methods based on distributional statistics drawn from text corpora, and methods based on the structure of existing knowledge resources. In the former case, taxonomic structure is disregarded. In the latter, semantically relevant empirical information is not considered. In this paper, we present a method that retrofits the context vector representation of MeSH terms by using additional linkage information from UMLS/MeSH hierarchy such that linked concepts have similar vector representations. We evaluated the method relative to previously published physician and coder’s ratings on sets of MeSH terms. Our experimental results demonstrate that the retrofitted word vector measures obtain a higher correlation with physician judgments. The results also demonstrate a clear improvement on the correlation with experts’ ratings from the retrofitted vector representation in comparison to the vector representation without retrofitting.
منابع مشابه
Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity
MOTIVATION Clustering MEDLINE documents is usually conducted by the vector space model, which computes the content similarity between two documents by basically using the inner-product of their word vectors. Recently, the semantic information of MeSH (Medical Subject Headings) thesaurus is being applied to clustering MEDLINE documents by mapping documents into MeSH concept vectors to be cluster...
متن کاملMultilevel Measures of Document Similarity
Many applications such as document summarization, passage retrieval and question answering require a detailed analysis of semantic relations between terms within and across documents and sentences. Often one has a number of sentences or paragraphs and has to choose the candidate with the highest level of relevance for the topic or question. An additional requirement may be that the information ...
متن کاملTwo Similarity Metrics for Medical Subject Headings (MeSH):
In the present paper, we have created and characterized several similarity metrics for relating any two Medical Subject Headings (MeSH terms) to each other. The article-based metric measures the tendency of two MeSH terms to appear in the MEDLINE record of the same article. The author-based metric measures the tendency of two MeSH terms to appear in the body of articles written by the same indi...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملExploring the Validity of Corpus-derived Measures of Semantic Similarity
Lexical co-occurrence counts from large corpora have been used to construct highdimensional vector-space models of language. In this type of model words are represented as vectors (or points) in a hyperspace, and distances between word vectors are generally considered to reflect semantic similarity. Two issues must be addressed if a vector-space model is to be used as a 'semantic' measuring dev...
متن کامل