Dissemination of Dynamic Data on the Internet
نویسندگان
چکیده
Dynamic data is data which varies rapidly and unpredictably. This kind of data is generally used in on-line decision making and hence needs to be delivered to its users comforming to certain time or value based applicationspecific requirements. The main issue in the dissemination of dynamic web data such as stock prices, sports scores or weather data is the maintenance of temporal coherency within the user specified bounds. Since most of the web servers adhere to the HTTP protocol, clients need to frequently pull the data depending on the changes in the data and user’s coherency requirements. In contrast, servers that possess push capability maintain state information pertaining to user’s requirements and push only those changes that are of interest to a user. These two canonical techniques have complementary properties. In pure pull approach, the level of temporal coherency maintained is low while in pure push approach it is very high, but this is at the cost of high state space at the server which results in a less resilient and less scalable system. Communication overheads in pullbased schemes are high as compared to push-based schemes, since the number of messages exchanged in the pull approach are higher than in push based approach. Based on these observations, this paper explores different approaches to combining the two approaches so as to harness the benefits of both approaches. 1 Dynamic Data Dissemination Dynamic data can be defined by the way the date changes. First of all it changes rapidly, changes can even be of the order of one change every few seconds; it also changes unpredictably, making it very hard to use simple prediction techniques or time-series analysis. Few examples of dynamic data are stock quotes, sports scores and traffic or weather data. Such of kind of data is generally used in decision making (for example, stock trading or weather forecasting) and hence the timeliness of delivery of this data to its users becomes very important. Recent studies have shown that an increasing fraction of the data on the world wide web is dynamic. Web proxy caches that are deployed to improve user response times must track such dynamic data so as to provide users with temporally coherent information. The coherency requirements on a dynamic data item depends on the nature of the item and user tolerances. To illustrate, a user may be willing to receive sports and news information that may be out-of-sync by a few minutes with respect to the server, but may desire stronger coherency requirements on data items such as stock prices. A proxy can exploit user-specified coherency requirements by fetching and disseminating only those changes that are of interest and ignoring intermediate changes. For instance, a user who is interested in changes of more than a dollar for a particular stock price need not be notified of smaller intermediate changes. The problem can be termed as the problem of maintaining desired temporal coherency between the source and the user, with the proxy substantially improving the access time, overheads and coherency. We study mechanisms to obtain timely updates from web sources, based on the dynamics of the data and the users’ need for temporal accuracy, by judiciously combining push and pull technologies and by using proxies to disseminate data within acceptable tolerance. Specifically, the proxies (maintained by client organizations) ensure the temporal coherence of data, within the tolerance specified, by tracking the amount of change in the web sources. Based on the changes observed and the tolerance specified by the different clients interested in the data, the proxy determines the time for pulling from the server next, and pushes newly acquired data to the clients according to their temporal coherency requirements. Of course, if the web sources themselves were aware of the clients’ temporal coherency requirements and they were endowed with push capability, then we can avoid the need for mechanisms such as the ones proposed here. Unfortunately, this can lead to scalability problems and may also introduce the need to make changes to existing web servers (which do not have push capabilities) or to the HTTP protocol. 2 Maintaining Temporal Coherency Consider a proxy that caches several time-varying data items. To maintain coherency of the cached data, each cached item must be periodically refreshed with the copy at the server. For highly dynamic data it may not be feasible to maintain strong cache consistency. An attempt to maintain strong cache consistency will result in either heavy network overload or server load. We can exploit the fact that the user may not be interested in every change happening at the source to reduce network utilization as well as server overload. We assume that a user specifies a temporal coherency requirement for each cached item of interest. The value of denotes the maximum permissible deviation of the cached value from the value at the server and thus constitutes the user-specified tolerance. Observe that can be specified in units of time (e.g., the item should never be out-of-sync by more than 5 minutes) or value (e.g., the stock price should never be out-of-sync by more than a dollar). As shown in figure 1, the proxy sits between the user and the server, and handles all communication with the server based on the user constraint. Given the value of , the proxy can use pushor pull-based techniques to ensure that that the temporal coherency requirement (t r) is satisfied. SERVER PROXY USER Push Push/Pull Fig. 1. Proxy-based Model The fidelity of the data seen by users depends on the degree to which their coherency needs are met. We define the fidelity f observed by a user to be the total length of time that the above inequality holds (normalized by the total length of the observations). In addition to specifying the coherency requirement , users can also specify their fidelity requirement f for each data item so that an algorithm that is capable of handling users’ fidelity requirements (as well as t rs) can adapt to users’ fidelity needs. Traditionally the problem of maintaining cache consistency has been addressed either by serveror client-driven approaches. In client-driven approach, cache manager contacts the source periodically to check validity of the cached data. We call this period Time-To-Refresh or TTR. Choosing very small TTR values help in keeping cache consistent although at the cost of bandwidth. On the other hand, very large TTR values may reduce network utilization but only at the cost of reduced fidelity. Pollingeach-time and Adaptive TTR are examples of client-driven techniques. Clearly these techniques are based on the assumption that an optimum TTR value can be predicted using some statistical information. This may not be true for highly dynamic data which is changing unpredictably and independently. The other class of algorithms are server-driven wherein server takes the responsibility of either invalidating or updating the proxy cache. Sending invalidation messages or pushing recent changes are examples of such techniques. Also because of dynamics of the data, none of the above techniques can deliver high fidelity with optimum resource utilization. In the following sections we explain how one can use user specified constraints to offer high fidelity with efficient use of available resources.
منابع مشابه
Guest Editors' Introduction: Dynamic Information Dissemination
A lthough the Web has evolved greatly since its inception, dissemination and delivery of rapidly changing information to large user communities remains a challenge. We all find ourselves in our daily lives relying on dynamic information, which ranges from real-time weather and traffic information to stock quotes to financial and news alerts. We’re also finding ourselves increasingly interested ...
متن کاملنگرش بیماران بستری نسبت به فناور یهای اطلاعاتی در بیمارستانهای شهر کاشان سال 1387
Background and Aim: Information Technologies (ITs) has become a significant resource for dissemination of information and resulted to an increase of health knowledge in communities. With attention to patient - centered approach this study was done to determine knowledge of inpatients about ITs. Materials and Methods: A descriptive - cross-sectional study was done on 461 inpatient in Kashan Uni...
متن کاملAssessment of the Factors Affecting the Acceptance of Online Banking by Consumers with an Emphasis on the Aspect of Risk (Case Study: Customers of Refah Bank in Qazvin Province of Iran)
Nowadays, internet technology provides an opportunity for banks and financial institutions to take advantages in dynamic and competitive turbulent environment in their favor. In addition, considering the importance and status of internet banking and growing trend of it in the country in recent years, now banks and financial institutions have found that maintaining status and effective developme...
متن کاملنقش سواد فناوری اطلاعات در سلامت فردی: دیدگاه بیماران
Introduction: Information Technologies (ITs) has become a significant resource for dissemination of information and resulted to increase of health knowledge in communities. This study was aimed to determine knowledge of inpatients about ITs. Methods: This descriptive - cross-sectional study carried out on 461 inpatients in Kashan University of Medical Sciences hospitals with a reliable (spearma...
متن کاملEdi: Electronic Data Interchange for Statistical Datacollection and Dissemination
In this paper, we will present some experiences in the Netherlands with EDI for statistical datacollection and dissemination. We will consider the changes to be made for large scale EDI datacollection. We will argue that EDI demands a dramatic redesign of the way we collect and process statistical information, but the rewards in terms of response burden, quality and efficiency might be well wor...
متن کاملDissemination of Dynamic Data: Semantics, Algorithms, and Performance
The Internet and the Web are increasingly used to disseminate fast changing data such as sensor data, traffic and weather information, stock prices, sports scores, and even health monitoring information. These data items are highly dynamic, i.e., the data changes continuously and rapidly, streamed in real-time, i.e., new data can be viewed as being appended to the old or historical data, and ap...
متن کامل