Structural Characterization of Popular Web Documents
نویسندگان
چکیده
Characterization of Web documents is essential to study performance issues such as minimizing demands on the back-end servers and communication overheads. In addition, characterization of the Web is also important to devise synthetic workload generators for use in the investigation of effective resource management algorithms. Most characterization of Web documents are based on Web files without considering their inherent structure. To display a complete Web page a collection of files that include the files corresponding to the embedded objects in a page must be transferred. A Web object is defined to be this collection, i.e., a Web page and its related embedded files. Our goal in conducting this study is to collect data on the structure and size of Web objects that is particularly useful in improving Web server performance through techniques such as clustering of files, parallel I/O, and data caching on client sites. We report the results of an empirical study conducted on several popular (in top 100 sites) Web sites. We have chosen the popular Web sites for this investigation because they are more likely to be efficiently designed. In addition, popular Web servers also account for a significant portion of the network traffic. We also study the trace of a busy proxy access log to characterize Web objects for regular Web environments.
منابع مشابه
A Characterization of Compound Documents on the Web
Recent developments in office productivity suites make it easier for users to publish rich compound documents on the Web. Compound documents appear as a single unit of information but may contain data generated by different applications, such as text, images, and spreadsheets. Given the popularity enjoyed by these office suites and the pervasiveness of the Web as a publication medium, we expect...
متن کاملStructural Similarity Evaluation Between XML Documents and DTDs
The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received strong attention. Among these is the process of matching XML documents and XML grammars, useful in various applications such as documents classifi...
متن کاملLifetime Behavior and its Impact on Web Caching
The exponential growth of the World Wide Web has made it the most popular information dissemination tool in the world. The growth necessitates caching, prefetching, and replication schemes on the web to alleviate the web server load, conserve the network band-width, and reduce the retrieval latency. At the same time, cache consistency should be maintained to avoid returning stale pages to users...
متن کاملQuery expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملQoS-based Web Service Recommendation using Popular-dependent Collaborative Filtering
Since, most of the organizations present their services electronically, the number of functionally-equivalent web services is increasing as well as the number of users that employ those web services. Consequently, plenty of information is generated by the users and the web services that lead to the users be in trouble in finding their appropriate web services. Therefore, it is required to provi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- I. J. Comput. Appl.
دوره 9 شماره
صفحات -
تاریخ انتشار 2002