Resilience and fault tolerance are challenging tasks in the field of high performance computing (HPC) extreme scale systems. Components fail more often such systems, results application abort. Adopting fault–tolerance techniques can be consistently detect failures continue application’s execution even if exist. A prominent parallel programming specification, message passing interface (MPI), as ...