منابع مشابه
Towards a Keyword-Focused Web Crawler
This paper concerns predicting the content of textual web documents based on features extracted from web pages that link to them. It may be applied in an intelligent, keyword-focused web crawler. The experiments made on publicly available real data obtained from Open Directory Project with the use of several classification models are promising and indicate potential usefulness of the studied ap...
متن کاملA Clickstream-based Focused Trend Parallel Web Crawler
The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcomes in an appropriate timely manner. Regarding the fact that employing link dependent Web page impo...
متن کاملRanking Hyperlinks Approach for Focused Web Crawler
The World Wide Web is growing rapidly and many search engines do not cover all the visible pages. Therefore, a more effective crawling method is required to collect more accurate data. In this paper, we introduce an effective focused web crawler containing smart methods. In text analysis, similarity measurement applies to different parts of the Web pages including title, body, anchor text and U...
متن کاملA focused crawler for Dark Web forums
The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling sy...
متن کاملAn Ontology-Based Focused Crawler
In this paper we present a novel approach for building a focused crawler. The goal of our crawler is to effectively identify web pages that relate to a set of predefined topics and download them regardless of their web topology or connectivity with other popular pages on the web. The main challenges that we address in our study concern the following. First we need to be able to effectively iden...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2018
ISSN: 1877-0509
DOI: 10.1016/j.procs.2017.12.075