Blended metrics for novel sentence mining

نویسندگان

  • Wenyin Tang
  • Flora S. Tsai
  • Lihui Chen
چکیده

With the abundance of raw text documents available on the internet, many articles contain redundant information. Novel sentence mining can discover novel, yet relevant, sentences given a specific topic defined by a user. In real-time novelty mining, an important issue is to how to select a suitable novelty metric that quantitatively measures the novelty of a particular sentence. To utilize the merits of different metrics, a blended metric is proposed by combining both cosine similarity and new word count metrics. The blended metric has been tested on TREC 2003 and TREC 2004 Novelty Track data. The experimental results show that the blended metric can perform generally better on topics with different ratios of novelty, which is useful for real-time novelty mining in topics with varying degrees of novelty. 2009 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From bursty patterns to bursty facts: The effectiveness of temporal text mining for news

Many document collections are by nature dynamic, evolving as the topics or events they describe change. The goal of temporal text mining is to discover bursty patterns and to identify and highlight these changes to better enable readers to track stories. Here, we focus on the news domain, where the changes revolve around novel, previously unpublished, “facts” that have an effect on the story de...

متن کامل

Contextual Abstraction Based Clustering Technique for Effective Text Document Mining

Document clustering is considered to be the essential process in grouping the unsupervised documents for effectual applications in text mining and information retrieval. Recently, many research works has been developed for text document clustering. However, performance of clustering the text document is not effective. In order to overcome such limitation, a novel Contextual Abstraction based Do...

متن کامل

Quantifying the informativeness for biomedical literature summarization: An itemset mining method

OBJECTIVE Automatic text summarization tools can help users in the biomedical domain to access information efficiently from a large volume of scientific literature and other sources of text documents. In this paper, we propose a summarization method that combines itemset mining and domain knowledge to construct a concept-based model and to extract the main subtopics from an input document. Our ...

متن کامل

Mining the Correlation between Human and Automatic Evaluation at Sentence Level

Automatic evaluation metrics are fast and cost-effective measurements of the quality of a Machine Translation (MT) system. However, as humans are the end-user of MT output, human judgement is the benchmark to assess the usefulness of automatic evaluation metrics. While most studies report the correlation between human evaluation and automatic evaluation at corpus level, our study examines their...

متن کامل

Characteristics of Pro - c Analogies and Blends between Research Publications

Dr Inventor is a tool that aims to enhance the professional (Pro-c) creativity of researchers by suggesting novel hypotheses, arising from analogies between publications. Dr Inventor processes original research documents using a combination of lexical analysis and cognitive computation to identify novel comparisons that suggest new research hypotheses, with the objective of supporting a novel r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2010