Summarizing Topical Contents from PubMed Documents Using a Thematic Analysis
نویسندگان
چکیده
Improving the search and browsing experience in PubMedr is a key component in helping users detect information of interest. In particular, when exploring a novel field, it is important to provide a comprehensive view for a specific subject. One solution for providing this panoramic picture is to find sub-topics from a set of documents. We propose a method that finds sub-topics that we refer to as themes and computes representative titles based on a set of documents in each theme. The method combines a thematic clustering algorithm and the Pool Adjacent Violators algorithm to induce significant themes. Then, for each theme, a title is computed using PubMed document titles and theme-dependent term scores. We tested our system on five disease sets from OMIMr and evaluated the results based on normalized point-wise mutual information and MeSHr terms. For both performance measures, the proposed approach outperformed LDA. The quality of theme titles were also evaluated by comparing them with manually created titles.
منابع مشابه
News2Images: Automatically Summarizing News Articles into Image-Based Contents via Deep Learning
Compact representation is a key issue for effective information delivery to users in mobile content-providing services. In particular, it is more severe when providing text documents such as news articles on the mobile service. Here we propose a method for generating compact image-based contents from news documents (News2Image). The proposed method consists of three modules for summarizing news...
متن کاملHierarchical Re-estimation of Topic Models for Measuring Topical Diversity
A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is sub...
متن کاملتحلیل موضوعی مقالات مرتبط با اعتیاد در پایگاه مدلاین به روش خوشه بندی سلسله مراتبی: 2014-1991
Introduction: Addiction, which has recently attracted the attention of researchers, is a serious problem worldwide. The growth of relevant literature contributes to a better understanding of this problem and improves the interaction between executive organizations and academic institutions. It is important to identify the active subject areas within this field and to explore the topics which ar...
متن کاملText Mining Methods for Mapping Opinions from Georeferenced Documents
With the growing availability of large volumes of textual information on the Web, text mining techniques have been gaining a growing interest. One specific text mining problem that is increasingly relevant relates to the detection of textual expressions that refer to opinions on certain topics and services. A second text mining problem, which has also been gaining a growing interest, is the ide...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کامل