A new decentralized periodic replication strategy for dynamic data grids
نویسندگان
چکیده
Data grids provide scalable infrastructure for storage resource and data files management, which support dataintensive applications that need to access to huge amount of data stored at distributed locations around the world. The size of these data can reach the scale of terabytes or even petabytes in many applications. These applications require reaching several main goals, namely efficient accessing, storing, transferring and analyzing a large amount of data in geographically distributed locations. In this situation, replication is a general and simple technique used in data grids to achieve these goals. Indeed, it has as main purposes improving data access efficiency, providing high availability, decreasing bandwidth consumption, improving fault tolerance and enhancing scalability. In this paper, we propose a new classification of replication strategies through two complementary criteria as well as a survey of the induced categories of strategies. In addition, we introduce a new decentralized periodic replication strategy for dynamic data grids assuming limited storage for replicas, called DPRSKP, which stands for Decentralized Periodic Replication Strategy based on Knapsack Problem. This strategy takes into consideration the changing availability of sites. DPRSKP is based on two polynomial-time complexity algorithms. The first one starts by selecting the best candidate files for replication while the second places them in the best locations. The replication problem in DPRSKP is formulated according to the Knapsack problem. In addition, DPRSKP extends the well known LRU and LFU strategies. The simulation experiments were carried out using OptorSim and a dynamic period rather than a static one. The obtained results show that DPRSKP can effectively improve response time, bandwidth consumption, remote file accesses number and local file accesses number as compared with other replication strategies.
منابع مشابه
Improving Data Grids Performance by Using Modified Dynamic Hierarchical Replication Strategy
Abstract: A Data Grid connects a collection of geographically distributed computational and storage resources that enables users to share data and other resources. Data replication, a technique much discussed by Data Grid researchers in recent years creates multiple copies of file and places them in various locations to shorten file access times. In this paper, a dynamic data replication strate...
متن کاملAn Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity
The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...
متن کاملCFS: a new dynamic replication strategy for data grids
Data grids are currently proposed solutions to large scale data management problems including efficient file transfer and replication. Large amounts of data and the world-wide distribution of data stores contribute to the complexity of the data management challenge. Recent architecture proposals and prototypes deal with dynamic replication strategies for a high-performance data grid. This paper...
متن کاملE2DR: Energy Efficient Data Replication in Data Grid
Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...
متن کاملImproving Job Scheduling Performance with Dynamic Replication Strategy in Data Grids
Dealing with a large amount of data in Data Grids makes the requirement for efficient data access more critical. In this paper, we proposed a new approach to replication problem by organizing the data into several data categories that it belongs to. This organizing will help improving placement strategy of data replication. We studied our approach in combination with scheduling issue and evalua...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Scalable Computing: Practice and Experience
دوره 15 شماره
صفحات -
تاریخ انتشار 2014