Benchmarking eventually consistent distributed storage systems
نویسنده
چکیده
Cloud storage services and NoSQL systems, which have recently found widespread adoption, typically offer only "Eventual Consistency", a rather weak guarantee covering a broad range of potential data consistency behavior. The degree of actual (in-)consistency as a service quality, however, is always unknown. To avoid cost of opportunity or actual costs, resulting data inconsistencies have to be resolved within the application layer. Without detailed knowledge on consistency behavior, though, inconsistency handling is inefficient and for some kinds of inconsistency outright impossible. Furthermore, due to the way consistency behavior impacts applications, consistency as a system quality should also be considered during the selection and deployment optimization of cloud storage offerings and NoSQL systems. This as well as studying the impact of system design decisions on consistency behavior requires the necessary means to analyze consistency behavior of eventually consistent storage systems. In this work, we present four main contributions to address the problems outlined above: First, we develop novel consistency metrics which describe consistency behavior for all kinds of consistency, in a precise way, without needless aggregation, and in way that is meaningful to application or storage system developers as well as systems researchers. Second, we identify key influence factors on consistency behavior and combine them into a model of a storage system. We then present two distinct approaches, which predict consistency behavior based on simulations on top of this model. Third, we also present a set of system benchmarking approaches to accurately determine consistency behavior of eventually consistent distributed storage systems via experiments with actually deployed systems. Results of both simulation and system benchmarking are expressed using our novel set of consistency metrics. Fourth, building on 15 extensive experiments with actual systems and a multitude of simulation runs, we demonstrate how inconsistencies can be handled more efficiently leveraging these results. For this purpose, we describe based on a use case how inconsistencies can be resolved in application engineering. We also develop a new middleware-based approach which adds additional consistency guarantees externally to the eventually consistent storage system, thus, alleviating complexity for application developers.
منابع مشابه
Towards Comprehensive Measurement of Consistency Guarantees for Cloud-Hosted Data Storage Services
The CAP theorem and the PACELC model have described the existence of direct trade-offs between consistency and availability as well as consistency and latency in distributed systems. Cloud storage services and NoSQL systems, both optimized for the web with high availability and low latency requirements, hence, typically opt to relax consistency guarantees. In particular, these systems usually o...
متن کاملConsistency in Distributed Storage Systems - An Overview of Models, Metrics and Measurement Approaches
Due to the advent of eventually consistent storage systems, consistency has become a focus of research. Still, a clear overview of consistency in distributed systems is missing. In this work, we define and describe consistency, show how different consistency models and perspectives are related and briefly discuss how concrete consistency guarantees of a distributed storage system can be measured.
متن کاملToward a Principled Framework for Benchmarking Consistency
Large-scale key-value storage systems sacrifice consistency in the interest of dependability (i.e., partitiontolerance and availability), as well as performance (i.e., latency). Such systems provide eventual consistency, which—to this point—has been difficult to quantify in real systems. Given the many implementations and deployments of eventually-consistent systems (e.g., NoSQL systems), attem...
متن کاملFinding Consistency in an Inconsistent World: Towards Deep Semantic Understanding of Scale-out Distributed Databases
We present a new problem in data storage: how to build efficient backup and restore tools for increasingly popular Next-generation Eventually Consistent STorage systems (NECST). We show that the lack of a concise, consistent, logical view of data at a point-in-time is the key underlying problem; we suggest a deep semantic understanding of the data stored within the system of interest as a solut...
متن کاملHybrid Regenerating Codes for Distributed Storage Systems
Distributed storage systems are mainly justified due to their ability to store data reliably over some unreliable nodes such that the system can have long term durability. Recently, regenerating codes are proposed to make a balance between the repair bandwidth and the storage capacity per node. This is achieved through using the notion of network coding approach. In this paper, a new variation ...
متن کامل