InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping
نویسنده
چکیده
In provenance research, the large scale of data sets often complicates analysis and understanding. One solution is visualization, a tool that enables scientists and analysts to understand large data sets. The standard format for visualizing provenance is the node-link diagram. However, while effective for understanding local activity, the node-link diagram fails to offer a high-level summary of the relationships within a provenance graph. By focusing on the node-link diagram as the de facto method for graph visualization, the community has failed to explore alternative means that may be more effective. Visualizing filesystem provenance with a radial-based tree layout that is sorted by the time at which system activity occurred instead provides increased accuracy for identifying the high-level activity of a data set. The first contribution of this research is a task taxonomy for filesystem provenance data analysis based on my interviews with domain experts. The second contribution, developed in order to make InProv more effective by identifying the most important nodes and processes in a system, is a new time-based hierarchical grouping method for provenance data. The third contribution is the design of InProv, a radial layout visualization tool for browsing filesystem provenance data over time. Finally, I contribute statistically significant evidence that our radial based layout for filesystem provenance data is more accurate and easier to use than traditional node-link diagrams. A copy of InProv will be made open source-available. In the meantime, please email [email protected] to request a copy.
منابع مشابه
Visualizing Large Hierarchically Clustered Graphs with a Landscape Metaphor
Large graphs appear in many application domains. Their analysis can be done automatically by machines, for which the graph size is less of a problem, or, especially for exploration tasks, visually by humans. The graph drawing literature contains many efficient methods for visualizing large graphs, see e.g. [4, Chapter 12], but for large graphs it is often useful to first compute a sequence of c...
متن کاملA User Study of Techniques for Visualizing Structure and Connectivity in Hierarchical Datasets
Many tree layouts have been created for presenting hierarchical data. However, layouts optimized for some tasks are not adequate for others. In this paper, we focus on identifying tree structures and cross-links generated by hierarchical edge bundling. Our key contribution is the introduction of descriptive features that can be used to characterize trees in terms of their structural and connect...
متن کاملCOAST: A Convex Optimization Approach to Stress-Based Embedding
Visualizing graphs using virtual physical models is probably the most heavily used technique for drawing graphs in practice. There are many algorithms that are efficient and produce high-quality layouts. If one requires that the layout also respect a given set of non-uniform edge lengths, however, force-based approaches become problematic while energy-based layouts become intractable. In this p...
متن کاملApproaches for Exploring and Querying Scientific Workflow Provenance Graphs
While many scientific workflow systems track and record data provenance, few tools have been developed that provide convenient and effective ways to access and explore this information. Two important ways for provenance information to be accessed and explored is through browsing (i.e., visualizing and navigating data and process dependencies) and querying (e.g., to select certain portions of pr...
متن کاملVisualizing gene interaction graphs with local multidimensional scaling
Several bioinformatics data sets are naturally represented as graphs, for instance gene regulation, metabolic pathways, and proteinprotein interactions. The graphs are often large and complex, and their straightforward visualizations are incomprehensible. We have recently developed a new method called local multidimensional scaling for visualizing high-dimensional data sets. In this paper we ad...
متن کامل