Scalable Source Code Similarity Detection in Large Code Repositories
نویسندگان
چکیده
منابع مشابه
Efficient plagiarism detection for large code repositories
Unauthorized re-use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time-consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text-based plagiarism detection systems capable of working with large collections...
متن کاملAnalysis of Source Code Repositories
Source code repositories are designed to store a huge amount of source code. They also collect indirectly information useful to analyze the development process. Usually, the last set of data is not used at all due to the lack of specialized tools to collect and analyze such data. This paper presents the early stages of a tool designed to perform acquisition and analysis of data stored in source...
متن کاملSource Code Repositories and Agile Methods
Source repositories are a promising database of information about software projects. This paper proposes a tool to extract and summarize information from CVS logs in order to identify whether there are differences in the development approach of Agile and non-Agile teams. The tool aims to improve empirical investigation of the Agile Methods (AMs) without affecting the way developers write code. ...
متن کاملEfficient and Effective Plagiarism Detection for Large Code Repositories
ABSTRACT: The copying of programming assignments is a widespread problem in academic institutions. Manual plagiarism detection is time-consuming, and current popular plagiarism detection systems are not scalable to large code repositories. While there are text-based plagiarism detection systems capable of handling millions of student papers, comparable systems for codebased plagiarism detection...
متن کاملModel-Based Mining of Source Code Repositories
The Mining Software Repositories (MSR) field analyzes the rich data available in source code repositories (SCR) to uncover interesting and actionable information about software system evolution. Major obstacles in MSR are the heterogeneity of software projects and the amount of data that is processed. Model-driven software engineering (MDSE) can deal with heterogeneity by abstraction as its cor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ICST Transactions on Scalable Information Systems
سال: 2019
ISSN: 2032-9407
DOI: 10.4108/eai.13-7-2018.159353