Summarizing Topical Contents from PubMed Documents Using a Thematic Analysis

نویسندگان

  • Sun Kim
  • Lana Yeganova
  • W. John Wilbur
چکیده

Improving the search and browsing experience in PubMedr is a key component in helping users detect information of interest. In particular, when exploring a novel field, it is important to provide a comprehensive view for a specific subject. One solution for providing this panoramic picture is to find sub-topics from a set of documents. We propose a method that finds sub-topics that we refer to as themes and computes representative titles based on a set of documents in each theme. The method combines a thematic clustering algorithm and the Pool Adjacent Violators algorithm to induce significant themes. Then, for each theme, a title is computed using PubMed document titles and theme-dependent term scores. We tested our system on five disease sets from OMIMr and evaluated the results based on normalized point-wise mutual information and MeSHr terms. For both performance measures, the proposed approach outperformed LDA. The quality of theme titles were also evaluated by comparing them with manually created titles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

News2Images: Automatically Summarizing News Articles into Image-Based Contents via Deep Learning

Compact representation is a key issue for effective information delivery to users in mobile content-providing services. In particular, it is more severe when providing text documents such as news articles on the mobile service. Here we propose a method for generating compact image-based contents from news documents (News2Image). The proposed method consists of three modules for summarizing news...

متن کامل

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is sub...

متن کامل

تحلیل موضوعی مقالات مرتبط با اعتیاد در پایگاه مدلاین به روش خوشه بندی سلسله مراتبی: 2014-1991

Introduction: Addiction, which has recently attracted the attention of researchers, is a serious problem worldwide. The growth of relevant literature contributes to a better understanding of this problem and improves the interaction between executive organizations and academic institutions. It is important to identify the active subject areas within this field and to explore the topics which ar...

متن کامل

Text Mining Methods for Mapping Opinions from Georeferenced Documents

With the growing availability of large volumes of textual information on the Web, text mining techniques have been gaining a growing interest. One specific text mining problem that is increasingly relevant relates to the detection of textual expressions that refer to opinions on certain topics and services. A second text mining problem, which has also been gaining a growing interest, is the ide...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015