Increasing MTTF requires increasing reliability. In previous lectures we saw a variety of ways to do so: testing, static analysis, formal methods, etc. A highly reliable system is a highly available system. Reducing MTTR requires reducing the (mean) time it takes for the system to come back up once a failure has occurred. Once a failure occurs, we must detect it, diagnose it, and the recover th...