Challenges in URL Switching for Implementing Globally Distributed Web Sites
نویسندگان
چکیده
URL, or layer-5, switches can be used to implement locally and globally distributed web sites. URL switches must be able to exploit knowledge of server load and content (e.g., of reverse caches). Implementing globally distributed web sites offers difficulties not present in local server clusters due to bandwidth and delay constraints in the Internet. With delayed load information, server selection methods based on choosing the least-loaded server will result in oscillations in network and server load. In this paper, methods that make effective use of delayed load information are described and evaluated. The new Pick-KX method is developed and shown to be better than existing methods. Load information is adjusted with probabilistic information using Bloom filter summaries of site content. A combined load and content metric is suggested for use for selecting the best server in a globally distributed site.
منابع مشابه
Efficient Summarization of URLs using CRC32 for Implementing URL Switching
We investigate methods of using CRC32 for compressing Web URL strings and sharing of URL lists between servers, caches, and URL switches. Using trace-based evaluation, we compare our new CRC32 digesting method against existing Bloom filter and incremental CRC19 methods. Our CRC32 method requires less CPU resources, generates equal or smaller size digests, achieves equal collision rates, and sim...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملReport on the TREC-10 Experiment: Distributed Collections and Entrypage Searching
For our participation in TREC-10, we will focus on the searching distributed collections and also on designing and implementing a new search strategy to find homepages. Presented in the first part of this paper is a new merging strategy based on retrieved list lengths, and in the second part a development of our approach to creating retrieval models able to combine both Web page and URL address...
متن کاملWebParF: A Web partitioning framework for Parallel Crawlers
With the ever proliferating size and scale of the WWW [1], efficient ways of exploring content are of increasing importance. How can we efficiently retrieve information from it through crawling? And in this “era of tera” and multi-core processors, we ought to think of multi-threaded processes as a serving solution. So, even better how can we improve the crawling performance by using parallel cr...
متن کاملWhere Are They Now? A Case Study of Health-related Web Site Attrition
BACKGROUND When considering health-related Web sites, issues of quality generally focus on Web content. Little concern has been given to attrition of Web sites or the "fleeting" nature of health information on the World Wide Web. Since Web sites may be available for an uncertain period of time, a Web page may not be a sound reference. OBJECTIVE To address the issue of attrition, a defined set...
متن کامل