System Resilience at Extreme Scale White Paper

نویسنده

  • Tarek El-Ghazawi
چکیده

Professor Ricardo Bianchini, Rutgers University, Piscataway Professor Tarek El-Ghazawi, George Washington University, Washington D.C. Professor Armando Fox, University of California, Berkeley Forest Godfrey, Cray, Minneapolis Dr. Adolfy Hoisie, Los Alamos National Laboratory, Los Alamos Professor Kathryn McKinley, University of Texas, Austin Professor Rami Melhem, University of Pittsburgh, Pittsburgh Professor James Plank, University of Tennessee, Knoxville Dr. Partha Ranganathan, HP Labs, Palto Alto Josh Simons, Sun Microsystems, Cambridge

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale

Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest the prevalence of very high fault rates in future systems. While the HPC community has developed various resilience solutions, application-level techniques as well as system-based solutions, the solution sp...

متن کامل

Inter-Agency Workshop on HPC Resilience at Extreme Scale

The following report summarizes the proceedings of a three-and-a-half day inter-agency workshop focused on the technical challenges of HPC resilience in the 2020 Exascale timeframe. The resilience problem is not specific to any particular program or agency; coordinated resilience solutions will be challenging because of the need for a truly integrated approach. The interagency workshop therefor...

متن کامل

Resilience Design Patterns - A Structured Approach to Resilience at Extreme Scale (version 1.0)

Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest that very high fault rates in future systems. The errors resulting from these faults will propagate and generate various kinds of failures, which may result in outcomes ranging from result corruptions to ca...

متن کامل

Energy profile of rollback-recovery strategies in high performance computing

Extreme-scale computing is set to provide the infrastructure for the advances and breakthroughs that will solve some of the hardest problems in science and engineering. However, resilience and energy concerns loom as two of the major challenges for machines at that scale. The number of components that will be assembled in the supercomputers plays a fundamental role in these challenges. First, a...

متن کامل

On the Definition of Cyber-Physical Resilience in Power Systems

Modern society relies heavily upon complex and widespread electric grids. In recent years, advanced sensors, intelligent automation, communication networks, and information technologies (IT) have been integrated into the electric grid to enhance its performance and efficiency. Integrating these new technologies has resulted in more interconnections and interdependencies between the physical and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009