Data Mining System Execution Traces to Validate Distributed System Quality-of-Service Properties
نویسنده
چکیده
System Execution Modeling (SEM) tools enable distributed system testers to validate Quality-ofService (QoS) properties, such as end-to-end response time, throughput, and scalability, during early phases of the software lifecycle. Analytical capabilities of QoS properties, however, are traditionally bounded by a SEM tool’s capabilities. This chapter discusses how to mine system execution traces, which are a collection of log messages describing events and states of a distributed system throughout its execution lifetime, generated by distributed systems so that the validation of QoS properties is not dependent on a SEM tool’s capabilities. The author uses a real-life case study to illustrate how data mining system execution traces can assist in discovering potential performance bottlenecks using system execution traces. INTRODUCTION Challenges of enterprise distributed system development Enterprise distributed systems, such as mission avionic systems, traffic management systems, and shipboard computing environments, are transitioning to next-generation middleware, such as service-oriented middleware (Pezzini & Natis, 2007) and component-based software engineering (Heineman & Councill, 2001). Although next-generation middleware is improving enterprise distributed system functional properties (i.e., its operational scenarios), Quality-of-Service (QoS) properties (e.g., end-to-end response time, throughput, and scalability) are not validated until late in the software lifecycle, i.e., during system integration time. This is due in part to the serializedphasing development problem (Rittel & Webber, 1973). As illustrated in Figure 1, in serialized-phasing development, the infrastructureand applicationlevel system entities, such as components that encapsulate common services, are developed during different phases of the software lifecycle. Software design decisions that affect QoS properties, however, are typically not discovered until final stages of development, e.g., at system integration time, which is too late in the software lifecycle to resolve performance bottlenecks in an efficient and cost effective manner (Mann, 1996; Snow & Keil, 2001; Woodside, Franks, & Petriu, 2007). Figure 1. Overview of serialized-phased development in distributed systems System Execution Modeling (SEM) tools (Smith & Williams, 2001), which are a form of modeldriven engineering (Schmidt, 2006), assist distributed system developers in overcoming the serialized-phasing development problem shown in Figure 1. SEM tools use domain-specific modeling languages (Ledeczi, Maroti, Karsai & Nordstrom, 1999) to capture both platformindependent attributes (such as structural and behavioral concerns of the system) and platformspecific attributes (such as the target architecture of the system) as high-level models. Model interpreters then transform constructed models into source code for the target architecture. This enables distributed system testers to validate QoS properties continuously throughout the software lifecycle while the “real” system is still under development. Likewise, as development of the real system is complete, distributed system testers can incrementally replace faux portions of the system with its real counterpart to produce more realistic QoS validation results. Although SEM tools enable distributed system developers and testers to validate distributed system QoS properties during early phases of the software lifecycle, QoS validation capabilities are typically bounded to a SEM tool’s analytical capabilities. In order to validate QoS properties unknown to a SEM tool, distributed system testers have the following options: • Use handcrafted solutions. This option typically occurs outside of the SEM. Moreover, this option is traditionally not applicable across different application domains because it is an ad hoc solution, e.g., handcrafting a solution to validate event response-time based on priority in a proprietary system; • Leverage model transformations to convert models to a different SEM tool. This option implies the source and target SEM tool supports the same modeling features and semantics. If the target SEM tools has different modeling features and semantics, then distributed system testers have discrepancies in QoS validation results (Denton, Jones, Srinivasan, Owens, & Buskens, 2008); or • Wait for updates to the SEM tool. This is the best option for distributed system testers because it ensures consistency of QoS validation results when compared to the previous two options. In many cases, however, this option may not occur in a timely manner so that distributed system testers can leverage the updates in their QoS validation exercises. Distributed system testers therefore have to revert to either of the first two options until such updates are available, which can result in the problems previously discussed. Consequently, relying solely on built-in validation capabilities of SEM tools can hinder distributed system testers to thoroughly validate enterprise distributed system QoS properties continuously throughout the software lifecycle. Distributed system testers therefore need improved techniques that will enhance QoS analytical capabilities irrespective of the SEM tools existing capabilities. Solution approach → QoS validation using system execution traces. To address problems associated with limited analytical capabilities of a SEM tool when validating QoS properties, there is a need for methodologies that extend conventional SEM tool methodologies and simplify the following exercises, as illustrated in Figure 2: 1. Capturing QoS property metrics without the SEM tool having a priori knowledge of what metrics (or data) is required to analyze different QoS properties. This step can be accomplished using system execution traces (Chang & Ren, 2007), which are a collection of log messages generated during the execution lifetime of a distributed system in its target environment. The log messages in the system execution trace are lightweight and flexible enough to adapt the many different QoS metrics that formulate throughout the software lifecycle and across different application domains; 2. Identifying QoS property metrics without requiring a priori knowledge of what data (or metrics) is being collected (i.e., the ability to learn at run-time). This step can be accomplished using log formats, which are expressions that identity the static and variable portions of log messages of interest within system execution traces generated in Step 1. The log formats are then used to mine system execution traces and extract metrics of interest for QoS validation; and 3. Evaluating QoS properties without a priori knowledge of how to analyze extracted QoS metrics. This step can be accomplished using dataflow models (Downs, Clare, & Coe, 1988) that enable distributed system testers auto-reconstruct end-to-end system execution traces for QoS validation. Distributed system testers then specify a domain-specific (i.e., user-defined) equation for validating QoS properties using metrics data minded in Step 2. Figure 2. Overview of using dataflow models to mine system execution traces and validate QoS properties Using dataflow models to mine system execution traces enable distributed system testers to validate QoS properties independent of the SEM tool of choice. Likewise, as enterprise distributed systems continue increasing in size (e.g., number of lines of source code, and number of hardware/software resources) and complexity (e.g., envisioned operational scenarios), dataflow models can adapt without modification. This is because dataflow models operate at a higher level of abstraction than system composition (i.e., how components communicate with each other) and system complexity (i.e., the operational nature of the system in its target environment). Likewise, domain-specific analytics associated with dataflow models need not change. This chapter illustrates the following concepts for using dataflow models to mine system execution traces and validate enterprise distributed system QoS properties: • How to use high-level constructs to specify QoS metrics that are to be extracted from system execution traces; • How to represent high-level constructs as dataflow models to ensure correct autoreconstruction of end-to-end system execution traces; and • How to use dataflow models to mine system execution traces and validate enterprise distributed system QoS properties using domain-specific analytical equations. Distributed system testers therefore can focus more on using SEM tools to discover QoS bottlenecks specific to their application domain, and are ensured they will be able to perform such activities irrespective of a SEM tool’s analytical capabilities.
منابع مشابه
Validating Distributed System Test Execution Correctness via System Execution Traces
Effective validation of quality-of-service (QoS) properties (such as event prioritization, latency, and throughput) in distributed real-time and embedded (DRE) system requires evaluating system capabilities in representative execution environments. This validation process typically involves executing DRE systems composed of many software components on many hardware components connected via netw...
متن کاملThe Role of Hospital Service Quality and Informing System in Patients' Willingness to Pay with Mediation of Satisfaction
Background: The willingness to pay is criteria criterion for measuring the patientschr('39') interest to continue their relationship with health centers. The main purpose of this study was to determine the effect of service quality and informing system of hospitals on the patientchr('39')s willingness to pay with mediating role of patient satisfaction. Methods: This is an applied survey. The ...
متن کاملAbstraction and Mining of Traces to Explain Concurrency Bugs
ion and Mining of Traces to Explain Concurrency Bugs Mitra Tabaei Befrouei1∗, Chao Wang2†, and Georg Weissenbacher 1 Vienna University of Technology 2 Virginia Tech Abstract. We propose an automated mining-based method for explaining concurrency bugs. We use a data mining technique called sequential pattern mining to identify problematic sequences of concurrent read and write accesses to the sh...
متن کاملQoS-Based web service composition based on genetic algorithm
Quality of service (QoS) is an important issue in the design and management of web service composition. QoS in web services consists of various non-functional factors, such as execution cost, execution time, availability, successful execution rate, and security. In recent years, the number of available web services has proliferated, and then offered the same services increasingly. The same web ...
متن کاملDynamic Knowledge Extraction from Software Systems Using Sequential Pattern Mining
Software system analysis for identifying software functionality in source code remains as a major problem in the reverse engineering literature. The early approaches for extracting software functionality mainly relied on static properties of software system. However the static approaches by nature suffer from the lack of semantic and hence are not appropriate for this task. This paper presents ...
متن کامل