نتایج جستجو برای: crawler
تعداد نتایج: 1856 فیلتر نتایج به سال:
A focused crawler is a web crawler that attempts to download only web pages that are relevant to a pre-defined topic or set of topics. Focused crawling also assumes that some labeled examples of relevant and not relevant pages are available. The topic can be represent by a set of keywords (we call them seed keywords) or example urls. The key for designing an efficient focus crawler is how to ju...
A focused crawler is a Web crawler aiming to search and retrieve Web pages from the World Wide Web, which are related to a domain-specific topic. Rather than downloading all accessible Web pages, a focused crawler analyzes the frontier of the crawled region to visit only the portion of the Web that contains relevant Web pages, and at the same time, try to skip irrelevant regions. In this paper,...
The massive adoption of social media has provided new ways for individuals to express their opinions online. The blogosphere, an inherent part of this trend, contains a vast array of information about a variety of topics. It is a huge think tank that creates an enormous and ever-changing archive of open source intelligence. Mining and modeling this vast pool of data to extract, exploit and desc...
Traditional search engines deal with the Surface Web which is a set of Web pages directly accessible through hyperlinks and ignores a large part of the Web called hidden Web which is a great amount of valuable information of online database which is “hidden” behind the query forms. To access to those information the crawler have to fill the forms with a valid data, for this reason we propose a ...
This paper advocates the use of ontology-supported website models to provide a semantic level solution for a search agent so that it can provide fast, precise, and stable search results. We have based on the technique to develop a focused crawler, which can benefit both user requests and domain semantics. Equipped with this technique, our focused crawler manifests the following interesting feat...
Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed crawlers is currently not fully utilized. The optimal benefits of this approach are usually limited to the sites hosting the crawler. In this work we propose IPMicra, a distributed location aware web crawler that utilizes an IP ad...
The Web has been rapidly “deepened” by massive databases online: Recent surveys show that while the surface Web has linked billions of static HTML pages, a far more significant amount of information is “hidden” in the deep Web, behind the query forms of searchable databases. With its myriad databases and hidden content, this deep Web is an important frontier for information search. In this pape...
A practical distributed web crawler architecture is designed. The distributed cooperative grasping algorithm is put forward to solve the problem of distributed Web Crawler grasping. Log structure and Hash structure are combined and a large-scale web store structure is devised, which can meet not only the need of a large amount of random accesses, but also the need of newly added pages. Experime...
With the fast pace growth of World Wide Web and its dynamic nature coupled with presence of large volume of contents, the web crawlers have become an indispensable part of search engines. The growing use of search engines and their dependency in every day life necessitates that the correct and relevant information is presented to users in response to their search queries. Web crawler plays an i...
The Internet has billions of web pages and these web pages are attached to each other using URL(Uniform Resource Allocation). Web crawler is a main module of Search engine that gathers these documents from WWW. Most of the web pages present on Internet are active and changes periodically. Thus, Crawler is required to update these web pages to update database of search engine. In this paper, pri...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید