ELIMINATING REDUNDANT AND LESS-INFORMATIVE RSS NEWS ARTICLES BASED ON WORD SIMILARITY AND A FUZZY EQUIVALENCE RELATION by
نویسندگان
چکیده
ELIMINATING REDUNDANT AND LESS-INFORMATIVE RSS NEWS ARTICLES BASED ON WORD SIMILARITY AND A FUZZY EQUIVALENCE RELATION Ian Garcia Department of Computer Science Master of Science The Internet has marked this era as the information age. There is no precedent in the amazing amount of information, especially network news, that can be accessed by Internet users these days. As a result, the problem of seeking information in online news articles is not the lack of them but being overwhelmed by them. This brings huge challenges regarding processing of online news feeds, i.e., how to determine which news article is important, how to determine the quality of each news article, and how to filter irrelevant and redundant information. In this thesis, we propose a method for filtering redundant and less-informative RSS news articles that solves the problem of excessive number of news feeds observed in RSS news aggregators. Our filtering approach measures similarity among RSS news entries by using the Fuzzy-Set Information Retrieval model and a fuzzy equivalent relation for computing word/sentence similarity to detect redundant and less-informative news articles.
منابع مشابه
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles
As the number of RSS news feeds continue to increase over the Internet, it becomes necessary to minimize the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. In order to solve this problem, we present a novel approach, called InFRSS, which consists of a correlation-b...
متن کاملOn-line elimination of local redundancies in evolving fuzzy systems
In this paper, we examine approaches for reducing the complexity of evolving fuzzy systems (EFS) by eliminating local redundancies during training, evolving the models on on-line data streams. Thus, the complexity reduction steps should support fast incremental single-pass processing steps. In evolving fuzzy systems, such reduction steps are important due to several reasons: 1.) originally dist...
متن کاملمقایسه روشهای مختلف یادگیری ماشین در خلاصهسازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت
In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...
متن کاملAgainst Underspecification in Speech Errors
This paper argues against the use of phonological underspecification in feature matrices on the basis of speech error data. Stemberger 1991 argues that phonological underspecification influences the similarity of phonemes. He claims underspecified features do not count toward similarity, based on an analysis of phoneme confusions in a naturally occurring speech error corpus. Using the same corp...
متن کاملCollative Semantics
This paper introduces Collativc Semantics (CS), a new domain-independent semantics for n~tural language processing (NLP) which addresses the problems of lexieal ambiguity, met(> nymy, various semantic relations (conventional relations, redundant relations, contradictory relations, metaphorical relations and severely anomalous relations) aud the introduction of new information. We explain the tw...
متن کامل