Resiliency in Distributed Workflows
نویسندگان
چکیده
In this report we present a thorough study of the concept of resiliency in distributed workflow systems. We focus particularly in applying this concept in fields like numerical optimization, where any software or logical error could mean restarting the entire experiment. A theoretical study is presented along with a set of software tools for implementation directions. At the end a resilient algorithm schema is proposed for later refinement and implementation. Key-words: Workflow, Resiliency, Distributed Systems, Fault Prediction Centre de recherche INRIA Grenoble – Rhône-Alpes 655, avenue de l’Europe, 38334 Montbonnot Saint Ismier Téléphone : +33 4 76 61 52 00 — Télécopie +33 4 76 61 52 52 INRIA in ria -0 05 28 84 0, v er si on 2 2 N ov 2 01 0 Resiliency in Distributed Workflows 3 La résilience dans les systèmes de workflow distribués Laurentiu Trifan Thème 1: Modèles de Calcul et Simulation Projet OPALE Rapport de Recherche no 7435 -Octobre 2010 – 42 pages Résumé: Dans ce rapport, nous présentons une étude approfondie de la notion de résilience dans les systèmes de workflow distribué. On a comme objectif particulier l'application de ce concept dans des domaines comme l'optimisation numérique, dont les erreurs des logiciels ou logiques pourraient signifier le redémarrage de l'expérience entière. Une étude théorique est présentée avec un ensemble d'outils logiciels pour la mise en œuvre. En fin, un schéma d'un algorithme de résilience est proposé pour être raffiné et mise en œuvre plus tard. Mots-clés: Workflow, Résilience, Systèmes Distribués, Prédiction des Fautes Centre de recherche INRIA Grenoble – Rhône-Alpes 655, avenue de l’Europe, 38334 Montbonnot Saint Ismier Téléphone : +33 4 76 61 52 00 — Télécopie +33 4 76 61 52 52 RR no 7435 in ria -0 05 28 84 0, v er si on 2 2 N ov 2 01 0
منابع مشابه
COMPUTING SCIENCE Resiliency Variance in Workflows with Choice
Computing a user-task assignment for a workflow coming with probabilistic user availability provides a measure of completion rate or resiliency. To a workflow designer this indicates a risk of failure, especially useful for workflows which cannot be changed due to rigid security constraints. Furthermore, resiliency can help outline a mitigation strategy which states actions that can be performe...
متن کاملResiliency Variance in Workflows with Choice
Computing a user-task assignment for a workflow coming with probabilistic user availability provides a measure of completion rate or resiliency. To a workflow designer this indicates a risk of failure, especially useful for workflows which cannot be changed due to rigid security constraints. Furthermore, resiliency can help outline a mitigation strategy which states actions that can be performe...
متن کاملDynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture
Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...
متن کاملSieve: Actionable Insights from Monitored Metrics in Microservices
Major cloud computing operators provide powerful monitoring tools to understand the current (and prior) state of the distributed systems deployed in their infrastructure. While such tools provide a detailed monitoring mechanism at scale, they also pose a significant challenge for the application developers/operators to transform the huge space of monitored metrics into useful insights. These in...
متن کاملImpact of Policy Design on Workflow Resiliency Computation Time
Workflows are complex operational processes that include security constraints restricting which users can perform which tasks. An improper user-task assignment may prevent the completion of the workflow, and deciding such an assignment at runtime is known to be complex, especially when considering user unavailability (known as the resiliency problem). Therefore, design tools are required that a...
متن کامل