Highly Fault-Tolerant Parallel Computation (extended abstract)

نویسنده

  • Daniel A. Spielman
چکیده

We reintroduce the coded model of fault-tolerant computation in which the input and output of a computational device are treated as words in an error-correcting code. A computational device correctly computes a function in the coded model if its input and output , once decoded, are a valid input and output of the function. In the coded model, it is reasonable to hope to simulate all computational devices by devices whose size is greater by a constant factor but which are exponentially reliable even if each of their components can fail with some constant probability. We consider ne-grained parallel computations in which each processor has a constant probability of producing the wrong output at each time step. We show that any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model using w log O(1) w processors and time t log O(1) w. The failure probability of the computation will be at most t exp(?w 1=4). The codes used to communicate with our fault-tolerant machines are generalized Reed-Solomon codes and can thus be encoded and decoded in O n log O(1) n sequential time and are independent of the machine they are used to communicate with. We also show how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach

One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Many researchers are using conventional techniques such as RPC, DSM, replication, causal communications and other techniques to provide parallel computing facilities on workstation ne...

متن کامل

Algorithm-based fault-tolerant programming in scientific computation on multiprocessors

EEcient parallel algorithms proposed to solve many fundamental problems in scientiic computation are sensitive to processor failures. Because of its low costs, algorithm-based fault tolerance i s a n i n t e r esting concept for introducing fault tolerance into existing multi-processors. To facilitate fault{tolerant programming in scientiic computation, we have modiied and developed further an ...

متن کامل

Quantum Error Correction and Fault Tolerant Quantum Computing

e?cient fault-tolerant quantum computing arxiv fault-tolerant quantum computing crcnetbase an introduction to quantum error correction and fault quantum error correction and fault tolerant quantum computing fault tolerance in quantum computation eceu fault-tolerant quantum computation world scientific fault -tolerant quantum computation versus realistic noise quantum error correction and fault-...

متن کامل

Fault-tolerant Computation in the Full Information Model (Extended Abstract)

We initiate an investigation of general fault-tolerant distributed computation in the full-information model. In the full information model no restrictions are made on the computational power of the faulty parties or the information available to them. (Namely, the faulty players may be infinitely powerful and there are no private channels connecting pairs of honest players). Previous work in th...

متن کامل

Fault Tolerant Hierarchical Interconnection Network for Parallel Computers (fth)

In this paper we introduce a new interconnection network Fault Tolerant Hierarchical Interconnection network for parallel Computers denoted by FTH(k, 1).This network has fault tolerant hierarchical structure which overcomes the fault tolerant properties of Extended hypercube(EH).This network has low diameter, constant degree connectivity and low message traffic density in comparisons with other...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996