Visualizing and Understanding Code Duplication in Large Software Systems
نویسندگان
چکیده
Code duplication, or code cloning, is a common phenomena in the development of large software systems. Developers have a love-hate relationship with cloning. On one hand, cloning speeds up the development process. On the other hand, clone management is a challenging task as software evolves. Cloning has commonly been considered as undesirable for software maintenance and several research efforts have been devoted to automatically detect clones and eliminate clones aggressively. However, there is little empirical work done to analyze the consequences of cloning with respect to the software quality. Recent studies show that cloning is not necessarily undesirable. Cloning can used to minimize risks and there are cases where cloning is used as a design technique. In this thesis, three visualization techniques are proposed to aid researchers in analyzing cloning in studying large software systems. All of the visualizations abstract and display cloning information at the subsystem level but with different emphases. At the subsystem level, clones can be classified as external clones and internal clones. External clones refer to code duplicates that reside in the same subsystem, whereas external clones are clones that are spread across different subsystems. Software architecture quality attributes such as cohesion and coupling are introduced to contribute to the study of cloning at the architecture level. The Clone Cohesion and Coupling (CCC) Graph and the Clone System Hierarchy (CSH) Graph display the cloning information for one single release. In particular, the CCC Graph highlights the amount of internal and external cloning for each subsystems; whereas the CSH Graph focuses more on the details of the spread of cloning. Finally, the Clone System Evolution (CSE) Graph shows the evolution of cloning over a period of time.
منابع مشابه
Visualizing Similarities in Execution Traces
The analysis of execution traces is a common practice in the context of software understanding. A major issue during this task is scalability, as the massive amounts of data often make the comprehension process difficult. A significant portion of this data overload can be attributed to repetitions that are caused by, for example, iterations in the software’s source code. In this position paper,...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملOn Finding Duplication and Near-Duplication in Large Software Systems
This paper describes how a program called dup can be used to locate instances of duplication or nearduplication in a software system. D u p reports both textually identical sections of code and sections that are the same textually except for systematic substitution of one set of variable names and constants for another. Further processing locates longer sections of code that are the same except...
متن کاملVisualizing Object-oriented Software for Understanding and Documentation
Understanding or comprehending source code is one of the core activities of software engineering. Understanding objectoriented source code is essential and required when a programmer maintains, migrates, reuses, documents or enhances source code. The source code that is not comprehended cannot be changed. The comprehension of object-oriented source code is a difficult problem solving process. I...
متن کاملReverse Engineering by Visualizing and Querying
The automatic extraction of high-level structural information from code is important for both software maintenance and reuse. Instead of using specialpurpose tools, we explore the use of a general-purpose data visualization system called Hy+ for querying and visualizing information about object-oriented software systems. Hy+ supports visualization and visual querying of arbitrary graph-like dat...
متن کامل