Noise reduction through summarization for Web-page classification
نویسندگان
چکیده
منابع مشابه
Noise reduction through summarization for Web-page classification
Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the perfor...
متن کاملTemporal Web Page Summarization
In the recent years the Web has become an important medium for communication and information storage. As this trend is predicted to continue, it is necessary to provide efficient solutions for retrieving and processing information found in WWW. In this paper we present a new method for temporal web page summarization based on trend and variance analysis. In the temporal summarization web docume...
متن کاملApproach for Dimensionality Reduction in Web Page Classification
Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier. Processing time and accuracy are two parameters which influence the performance of a classifier. To reduce the processing time, less informative and redundant te...
متن کاملAutomatic Web Page Classification
Aim of this paper is to describe a method of automatic web page classification to semantic domains and its evaluation. The classification method exploits machine learning algorithms and several morphological as well as semantical text processing tools. In contrast to general text document classification, in the web document classification there are often problems with short web pages. In this p...
متن کاملAutomatic Web Page Classification
To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory Project (http://dmoz.org) manually maintain a hierarchical structure. While manual classification of web pages provides high accuracy, it is very expensive. To automatically include new emerging pages into these hierarchies, web page classification becomes a hot research topic in web infor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Processing & Management
سال: 2007
ISSN: 0306-4573
DOI: 10.1016/j.ipm.2007.01.013