Performance Modeling of a Distributed Web Crawler using Stochastic Activity Networks

نویسندگان

  • Mitra Nasri
  • Saeed Shariati
  • Mohammad Abdollahi Azgomi
چکیده

One of the basic requirements of Web mining is a crawler system, which collects the information from the Web. To predict the performance, dependability and other operational measures of a system, it is required to construct and evaluate a formal model of the system. We have constructed a formal model for a distributed crawler, which is based on UbiCrawler, using stochastic activity networks (SANs). The constructed SAN model is used to evaluate some performance measures of the crawler. The results of the evaluation of throughput are same as the published statistics of UbiCrawler. In addition, we have been able to evaluate two other measures that are communication overhead and coverage. In this paper, we will discuss the architecture of the distributed crawler. Then, we will present a SAN model of the crawler and the results of its evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling and Performance Evaluation of Energy Consumption in S-MAC Protocol Using Generalized Stochastic Petri Nets

One of the features of wireless sensor networks is that the nodes in this network have limited power sources. Therefore, assessment of energy consumption in these networks is very important. What has been common practice has been the use of traditional simulators to evaluate the energy consumption of the nodes in these networks. Simulators often have problems such as fluctuating output values i...

متن کامل

Distributed High-Performance Web Crawler Based on Peer-to-Peer Network

Distributing the crawling activity among multiple machines can distribute processing to reduce the analysis of web page. This paper presents the design of a distributed web crawler based on Peer-to-Peer network. The distributed crawler harnesses the excess bandwidth and computing resources of nodes in system to crawl the web. Each crawler is deployed in a computing node of P2P to analyze web pa...

متن کامل

Web Workload Generation According to the UniLoG Approach

Generating synthetic loads which are sufficiently close to reality represents an important and challenging task in performance and quality-of-service (QoS) evaluations of computer networks and distributed systems. Here, the load to be generated represents sequences of requests at a well-defined service interface within a network node. The paper presents a tool (UniLoG.HTTP) which can be used in...

متن کامل

Semantic Web in the Automotive Industry: a case study

This paper describes an ongoing case study on management of engineering data to evaluate the use of Semantic Web technologies in automotive industry. This case study explores specifically the theme of data modeling, navigation and retrieval using Semantic Web, data presentation in html+svg and data analysis using neural networks. It implements a web crawler to automatically collect data of inte...

متن کامل

Loklak - A Distributed Crawler and Data Harvester for Overcoming Rate Limits

Modern social networks have become sources for vast quantities of data. Having access to such big data can be very useful for various researchers and data scientists. In this paper we describe Loklak, an open source distributed peer to peer crawler and scraper for supporting such research on platforms like Twitter, Weibo and other social networks. Social networks such as Twitter and Weibo pose ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006