Principled work÷ow-centric tracing of distributed systems
نویسندگان
چکیده
Workow-centric tracing captures the workow of causallyrelated events (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for understanding distributed system behavior. Yet, there is a fundamental lack of clarity about how such infrastructures should be designed to provide maximum benet for important management tasks, such as resource accounting and diagnosis. Without research into this important issue, there is a danger that workow-centric tracing will not reach its full potential. To help, this paper distills the design space of workow-centric tracing and describes key design choices that can help or hinder a tracing infrastructure’s utility for important tasks. Our design space and the design choices we suggest are based on our experiences developing several previous workow-centric tracing infrastructures.
منابع مشابه
Diagnosing performance changes in distributed systems by comparing request flows
Diagnosing performance problems in modern datacenters and distributed systems is challenging, as the root cause could be contained in any one of the system’s numerous components or, worse, could be a result of interactions among them. As distributed systems continue to increase in complexity, diagnosis tasks will only become more challenging. ere is a need for a new class of diagnosis technique...
متن کاملSo, youwant to trace your distributed system? Key design insights from years of practical experience
End-to-end tracing captures the workow of causally-related activity (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for management tasks like diagnosis and resource accounting. Drawing upon our experiences building and using end-to-end tracing infrastruc...
متن کاملPivot Tracing: Dynamic Causal Monitoring for Distributed Systems pdfauthor=Jonathan Mace, Ryan Roelke, Rodrigo Fonseca
Monitoring and troubleshooting distributed systems is notoriously diõcult; potential problems are complex, varied, and unpredictable. _emonitoring and diagnosis tools commonly used today – logs, counters, andmetrics – have two important limitations: what gets recorded is deûned a priori, and the information is recorded in a componentor machine-centric way,making it extremely hard to correlate e...
متن کاملAccess control in ultra-large-scale systems using a data-centric middleware
The primary characteristic of an Ultra-Large-Scale (ULS) system is ultra-large size on any related dimension. A ULS system is generally considered as a system-of-systems with heterogeneous nodes and autonomous domains. As the size of a system-of-systems grows, and interoperability demand between sub-systems is increased, achieving more scalable and dynamic access control system becomes an im...
متن کاملA Virtual Filesystem Framework to Support Embedded Software Development
A VIRTUAL FILESYSTEM FRAMEWORK TO SUPPORT EMBEDDED SOFTWARE DEVELOPMENT We present an approach to simplify the software development process for embedded systems by supporting key development tasks such as debugging, tracing and configuration. The approach is based on the use of distributed filesystem abstractions; principal building blocks within an embedded system in the form of “systems on ch...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016