Online Sentence Novelty Scoring for Topical Document Streams
نویسنده
چکیده
The enormous amount of information on the Internet has raised the challenge of highlighting new information in the context of already viewed content. This type of intelligent interface can save users time and prevent frustration. Our goal is to scale up novelty detection to large web properties like Google News and Yahoo News. We present a set of lightweight features for online novelty scoring and fast nonlinear feature transformation methods. Our experimental results on the TREC 2004 shared task datasets show that the proposed method is not only efficient but also very powerful, significantly surpassing the best system at TREC 2004.
منابع مشابه
Efficient Online Novelty Detection in News Streams
Novelty detection in text streams is a challenging task that emerges in quite a few different scenarii, ranging from email threads to RSS news feeds on a cell phone. An efficient novelty detection algorithm can save the user a great deal of time when accessing interesting information. Most of the recent research for the detection of novel documents in text streams uses either geometric distance...
متن کاملDocument-to-Sentence Level Technique for Novelty Detection
Novelty identification is accustomed to distinguishing novel data from an approaching stream of documents. In this study, we proposed a novel methodology for document-level novelty identification by utilizing document-to-sentence-level strategy. This work first splits a document into sentences, decides the novelty of every sentence, then registers the record-level novelty score in view of an al...
متن کاملAn Improved System for Sentence-level Novelty Detection in Textual Streams
Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space model. We present a novel event detection system based on the Incremental Term Frequency-Inverse Doc...
متن کاملEffectiveness of Automated Chinese Sentence Scoring with Latent Semantic Analysis
Automated scoring by means of Latent Semantic Analysis (LSA) has been introduced lately to improve the traditional human scoring system. The purposes of the present study were to develop a LSA-based assessment system to evaluate children’s Chinese sentence construction skills and to examine the effectiveness of LSA-based automated scoring function by comparing it with traditional human scoring....
متن کاملNovelty Detection via Answer Updating
The detection of new and novel information in a document stream is an important component of potential applications. This paper describes an answer updating approach to novelty detection at the sentence level. Specifically, we explore the use of questionanswering techniques for novelty detection. New information is defined as new/previously unseen answers to questions representing a user’s info...
متن کامل