A Novel Framework for Context Based Distributed Focused Crawler (CBDFC)
نویسندگان
چکیده
Focused crawling aims to search only the relevant subset of the WWW for a specific topic of user interest; leading to the necessity to decide about the relevancy of a document to the topic of interest; especially when the user is not perfect in specifying the exact context of the topic. This paper provides a novel framework of a context based distributed focused crawler that maintains an index of web documents pertaining to the context of keywords resulting in storage of more related documents.
منابع مشابه
Prioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملHidden Web Indexing Using HDDI Framework
There are various methods of indexing the hidden web database like novel indexing, distributed indexing or indexing using map reduce framework. Our goal is to find an optimized indexing technique keeping in mind the various factors like searching, distribute database, updating of web, etc. Here, we propose an optimized method for indexing the hidden web database. This research uses Hierarchical...
متن کاملOntology Driven Focused Crawling of Web Documents
In recent year dynamism of the World Wide Web , the issue of discovering relevant web pages has become an important challenge. Focused crawler aims at selectively seeking out pages that are relevant to a pre-defined set of topics. Most of the current approaches perform syntactic matching, that is, they retrieve documents that contain particular keywords from the user’s query. This often leads t...
متن کاملFocused Crawling
Focused crawling is an efficient mechanism for discovering resources of interest on the web. Link structure is an important property of the web that defines its content. In this thesis, FOCUS a novel focused crawler is described, which primarily uses the link structure of the web in its crawling strategy. It uses currently available search engine APIs, provided by Google, to construct a layered...
متن کاملDesign of an Agent Based Context Driven Focused Crawler
A focused crawler downloads web pages that are relevant to a user specified topic. Most of the existing focused crawlers are keyword driven and do not take into account the context associated with the keywords. This leads to retrieval of a large number of web pages irrespective of the fact whether they are logically related. Thus, the keyword based strategy alone is not sufficient for the desig...
متن کامل