LLBMC: A Bounded Model Checker for LLVM's Intermediate Representation - (Competition Contribution)
نویسندگان
چکیده
We present LLBMC, a bounded model checker for C programs. LLBMC uses the LLVM compiler framework in order to translate C programs into LLVM’s intermediate representation (IR). The resulting code is then converted into a logical representation and simplified using rewrite rules. The simplified formula is finally passed to an SMT solver. In contrast to many other tools, LLBMC uses a flat, bit-precise memory model. It can thus precisely model, e.g., memory-based re-interpret casts. 1 Verification Approach Bounded model checking (BMC) has proven to be a very successful technique in hardware verification. More recently, it has also been applied for verifying software written in C [1, 4]. Applying BMC for verifying C programs, however, comes with many obstacles that have to be tackled. One of the most important differences is that the syntax and semantics of a programming language like C is much more complicated than a hardware description. One has to deal, e.g., with memory allocation and de-allocation, (function) pointers, complex data structures, and function calls. LLBMC uses an approach which, instead of exploring the source code directly, makes use of existing compiler technology and performs the analysis on a compiler intermediate representation. Such an intermediate representation offers a much simpler syntax and semantics than a programming language like C, and thus eases a logical encoding of the verification problem considerably. We have chosen the LLVM [5] compiler infrastructure and its assemblerlike intermediate representation as the starting point for our approach, but the idea can also be applied to other low-level languages. LLVM is both a (GCCcompatible) C/C++/Objective-C compiler and a library of compiler technologies, providing, e.g., sourceand target-independent optimizations. Our primary goal is to detect memory errors in C code [7, 2, 6]. Memory errors include invalid memory accesses, heap and stack buffer overflows, and invalid frees (e.g., double frees). ? This work was supported in part by the “Concept for the Future” of Karlsruhe Institute of Technology within the framework of the German Excellence Initiative. 2 Software Architecture While LLBMC is designed for C programs, its input format is LLVM-IR, the intermediate representation of the LLVM compiler framework. LLVM-IR is an abstract assembler language that is programming-language-independent. This makes it easier to extend LLBMC to other languages supported by LLVM (like C++ or Objective-C). Furthermore, the challenges in parsing complex high-level language syntax, such as C++, are eliminated. Instead, only a limited instruction set needs to be supported. LLVM-IR is architecture-dependent in the sense that the compiler frontend selects, e.g., the bitwidth of pointers and integer data types. After reading in the LLVM-IR code, LLBMC applies a number of transformations to it. In particular, loops are unrolled, functions are inlined, and the control flow graph is simplified. The transformed code is then converted to ILR, which is a representation of a program in the logic of bit-vectors and arrays plus some extensions, related to memory allocation. ILR provides an explicit state object for the memory content as well as for the state of the memory allocation system. These state objects encode the dependencies between memory accessing instructions in the ILR formula. Because of this, dependencies between instructions in LLVM, which were implicitly given by the ordering of the read and write operations are made explicit in the ILR formula. This change makes the expressions in an ILR formula ordering-independent. The ILR formula is then simplified using rewrite rules, and memory access correctness expressions are reduced to bit-vector formulas (see [2, 7] for details). If no more rewrite rules can be applied, the formula is passed to the SMT solver STP [3]. 3 Strengths and Weaknesses of the Approach LLBMC is tailored towards finding bugs in C programs, especially memory-related ones (not so much towards proving their absence). Detectable errors include: – arithmetic overflow and underflow, including shift overflow, – invalid memory access operations, – invalid memory allocation, including invalid frees, and – overlapping memory regions in memcpy. Furthermore, LLBMC supports checking of user assertions and reachability of labels named “ERROR” in the C-code. It can also detect whether the loop unrolling and function inlining bound was sufficient or has to be increased in order to achieve full coverage. In the competition, LLBMC was used with a fixed unwinding bound of 7 and an automatically determined function inlining bound. It was not checked whether the unwinding bound is sufficient, but only whether the “ERROR” label was reachable within these bounds (as other comparable tools have chosen similar settings). If no error was found, the instance was considered safe. LLBMC was able to successfully handle 146 out of 269 benchmark instances (not participating in category “Concurrency”, as this is not supported by LLBMC), resulting in a
منابع مشابه
LLBMC: Improved Bounded Model Checking of C Programs Using LLVM - (Competition Contribution)
LLBMC is a tool for detecting bugs and runtime errors in C and C++ programs. It is based on bounded model checking using an SMT solver and thus achieves bit-accurate precision. A distinguishing feature of LLBMC in contrast to other bounded model checking tools for C programs is that it operates on a compiler intermediate representation and not directly on the source code. 1 Verification Approac...
متن کاملLLBMC: Bounded Model Checking of C and C++ Programs Using a Compiler IR
Bounded model checking (BMC) of C and C++ programs is challenging due to the complex and intricate syntax and semantics of these programming languages. The BMC tool LLBMC presented in this paper thus uses the LLVM compiler framework in order to translate C and C++ programs into LLVM’s intermediate representation. The resulting code is then converted into a logical representation and simplified ...
متن کاملContext-Bounded Model Checking with ESBMC 1.17 - (Competition Contribution)
ESBMC is a context-bounded symbolic model checker for singleand multi-threaded ANSI-C code. It converts the verification conditions using different background theories and passes them directly to an SMT solver.
متن کاملMemory Management Test-Case Generation of C Programs Using Bounded Model Checking
We describe a novel method to automatically generate and verify memory management test cases for unit tests, which are based on assertions extracted from safety properties typically generated by bounded model checking (BMC) tools. In particular, the proposed method checks for properties related to pointer safety, memory leaks, and invalid deallocation. To investigate our method’s effectiveness,...
متن کاملCompiler-Assisted Software Model Checking and Monitoring
of the Dissertation Compiler-Assisted Software Model Checking and Monitoring by Xiaowan Huang Doctor of Philosophy in Computer Science Stony Brook University 2010 In this dissertation we present a compiler-assisted execution-based software model checking method targeting all languages that are acceptable by the compiler. We treat the intermediate representation of the program under compilation ...
متن کامل