Failure detector abstractions for MapReduce-based systems

نویسندگان

Bunjamin Memishi

María S. Pérez-Hernández

Gabriel Antoniu

چکیده

Omission failures represent an important source of problems in data-intensive computing systems. In these frameworks, omission failures are caused by slow tasks, known as stragglers, which can strongly jeopardize the workload performance. In the case of MapReduce-based systems, many state-of-the-art approaches have preferred to explore and extend speculative execution mechanisms. Other alternatives have based their contributions in doubling the computing resources for their tasks. Nevertheless, none of these approaches has addressed a fundamental aspect related to the detection and further solving of the omission failures, that is, the timeout service adjustment. In this paper, we have studied the omission failures in MapReduce systems, formalizing their failure detector abstraction by means of three different algorithms for defining the timeout. The first abstraction, called High Relax Failure Detector (HR-FD), acts as a static alternative to the default timeout, which is able to estimate the completion time for the user workload. The second abstraction, called Medium Relax Failure Detector (MR-FD), dynamically modifies the timeout, according to the progress score of each workload. Finally, taking into account that some of the user requests are strictly deadline-bounded, we have introduced the third abstraction, called Low Relax Failure Detector (LR-FD), which is able to merge the MapReduce dynamic timeout with an external monitoring system, in order to enforce more accurate failure detections. Whereas HR-FD shows performance improvements for most of the user request (in particular, small workloads), MR-FD and LR-FD enhance significantly the current timeout selection, for any kind of scenario, regardless of the workload type and failure injection time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Critical Path based Performance Models for Distributed Queries

Programming models such as MapReduce and DryadLINQ provide programmers with declarative abstractions (such as SQL like query languages) for writing data intensive computations. The models also provide runtime systems that can execute these queries on a large cluster of machines, while dealing with the vagaries of distribution such as messaging, failures and synchronization. However, this level ...

متن کامل

Abstractions for Devising Byzantine-Resilient State Machine Replication

State machine replication is a common approach for making a distributed service highly available and resilient to failures, by replicating it on different processes. It is well-known, however, that the difficulty of ensuring the safety and liveness of a replicated service increases significantly when no synchrony assumptions are made, and when processes can exhibit Byzantine behaviors. The cont...

متن کامل

Cogset: a high performance MapReduce engine

MapReduce has become a widely employed programming model for large-scale data-intensive computations. Traditional MapReduce engines employ dynamic routing of data as a core mechanism for fault tolerance and load balancing. An alternative mechanism is static routing, which reduces the need to store temporary copies of intermediate data, but requires a tighter coupling between the components for ...

متن کامل

The Gap in Circumventing the Consensus Impossibility

The seminal impossibility of reaching consensus in an asynchronous and crash prone system was established for a weak variant of the problem, usually called weak consensus, where a set of processes need to decide on a common value out of two possible values 0 or 1. On the other hand, abstractions that were shown to be, in some precise sense, minimal to circumvent the impossibility were determine...

متن کامل

Programming Abstractions for Clouds

Clouds seem like ’Grids Done Right’, including scalability, transparency, and ease of management. Virtual Machines are the dominant application environments for compute Clouds, however, that does not make application programming any less relevant than “non-virtualized” environments. The limited set of successful Cloud applications show that distributed programming patterns of the type of MapRed...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Inf. Sci.

دوره 379 شماره

صفحات -

تاریخ انتشار 2017

Failure detector abstractions for MapReduce-based systems

نویسندگان

چکیده

منابع مشابه

Critical Path based Performance Models for Distributed Queries

Abstractions for Devising Byzantine-Resilient State Machine Replication

Cogset: a high performance MapReduce engine

The Gap in Circumventing the Consensus Impossibility

Programming Abstractions for Clouds

عنوان ژورنال:

اشتراک گذاری