منابع مشابه
Web page classification using spatial information
Extracting and processing information from web pages is an important task in many areas like constructing search engines, information retrieval, and data mining from the Web. Common approach in the extraction process is to represent a page as a “bag of words” and then to perform additional processing on such a flat representation. In this paper we propose a new, hierarchical representation that...
متن کاملA Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification
In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...
متن کاملAutomatic Web Page Classification
Aim of this paper is to describe a method of automatic web page classification to semantic domains and its evaluation. The classification method exploits machine learning algorithms and several morphological as well as semantical text processing tools. In contrast to general text document classification, in the web document classification there are often problems with short web pages. In this p...
متن کاملAutomatic Web Page Classification
To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory Project (http://dmoz.org) manually maintain a hierarchical structure. While manual classification of web pages provides high accuracy, it is very expensive. To automatically include new emerging pages into these hierarchies, web page classification becomes a hot research topic in web infor...
متن کاملWeb Page Downloading and Classification
This paper describes the processes of downloading and classifying Web-based articles in online medical journals as a preliminary step to extracting bibliographic data to populate MEDLINE , the widely used database of the National Library of Medicine (NLM). The processes are combined to develop an automated system named “Web Page Downloading and Classification”. The system downloads the Web page...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2019
ISSN: 1877-0509
DOI: 10.1016/j.procs.2019.06.011