CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code

نویسندگان

  • Zhenmin Li
  • Shan Lu
  • Suvda Myagmar
  • Yuanyuan Zhou
چکیده

Copy-pasted code is very common in large software because programmers prefer reusing code via copy-paste in order to reduce programming effort. Recent studies show that copy-paste is prone to introducing bugs and a significant portion of operating system bugs concentrate in copy-pasted code. Unfortunately, it is challenging to efficiently identify copy-pasted code in large software. Existing copy-paste detection tools are either not scalable to large software, or cannot handle small modifications in copy-pasted code. Furthermore, few tools are available to detect copy-paste related bugs. In this paper we propose a tool, CP-Miner, that uses data mining techniques to efficiently identify copy-pasted code in large software including operating systems, and detects copy-paste related bugs. Specifically, it takes less than 20 minutes for CP-Miner to identify 190,000 copypasted segments in Linux and 150,000 in FreeBSD. Moreover, CP-Miner has detected 28 copy-paste related bugs in the latest version of Linux and 23 in FreeBSD. In addition, we analyze some interesting characteristics of copy-paste in Linux and FreeBSD, including the distribution of copypasted code across different length, granularity, modules, degrees of modification, and various software versions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Abstraction and Variation

M aster, a friend told me today that I should never use the editor's copy-paste functions when programming , " said the young apprentice. " I thought the whole point of programming tools was to make our lives easier, " he continued. The Master stroked his long grey beard and pressed the busy button on his phone. This was going to be one of those long, important discussions. " Why do you think c...

متن کامل

Copy and paste redeemed

Software development relies critically on code reuse, which software engineers typically realise through handwritten abstractions, such as functions, methods, or classes. However, such abstractions can be challenging to develop and maintain. One alternative form of re-use is copy-paste-modify, a methodology in which developers explicitly duplicate source code to adapt the duplicate for a new pu...

متن کامل

Finding Bugs in Open Source Kernels using Parfait

Parfait is a static bug checking tool for C/C++ source code, which is designed to be both scalable and precise. Requirements for this tool were derived from interaction with the Solaris operating system team, where it was required to check millions of lines of code in a time-efficient manner, with minimal noise and a low cost of integration into the build process. This paper gives an overview o...

متن کامل

Perimeter-Crossing Buses: a New Attack Surface for Embedded Systems

Any channel crossing the perimeter of a system provides an attack surface to the adversary. Standard network interfaces, such as TCP/IP stacks, constitute one such channel, and security researchers and exploit developers have invested much effort into exploring the attack surfaces and defenses there. However, channels such as USB have been overlooked, even though such code is at least as comple...

متن کامل

A Novel Metrics Based Technique for Code Clone Detection

Nowadays, software development is a tricky and time-consuming task. In order to make the development easy, one uses the existing modules with or without a bit change. Modules which are used with or without changes are called as code clones. In several places in case of same or different software, a clone can be used for development purpose. Without having care, copy and paste code can lead to i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004