A Decade of Software Model Checking with SLAM - Semantic Scholar

A Decade of Software Model Checking with SLAM Thomas Ball

Vladimir Levin

Sriram K. Rajamani

Microsoft Research Redmond, USA

Microsoft Redmond, USA

Microsoft Research Bangalore, India

[email protected]

[email protected]

ABSTRACT Started ten years ago, the SLAM project produced three major breakthroughs in the automated analysis of software. First, it introduced a new technique called counterexample guided refinement for software (CEGAR), and showed how to build a model checker based on CEGAR for programs written in expressive programming languages such as C. Second, it introduced Boolean programs as a model for abstracting and model checking programs. Third, it identified system software such as device drivers and API usage rules as a “killer app” for the CEGAR technique. Over the past ten years, SLAM scaled, evolved and matured as the core engine inside Microsoft’s Static Driver Verifier (SDV) tool. Several versions of SDV have shipped. The latest version that ships with Windows 7 includes a new version of the SLAM engine and support for many device driver APIs. In this paper, we summarize the core techniques introduced by SLAM, describe the challenges and experiences learned by building and deploying SDV, and survey the progress made by the research community in CEGAR techniques and more broadly in automated defect detection and verification.

1.

INTRODUCTION

Large scale software development is a very hard problem. Software is built in layers, and APIs are exposed by each layer to its clients. APIs come with usage rules, and clients need to satisfy these rules while using the APIs. Violation of API rules can cause runtime errors. Thus, it is useful to consider if such API rules can be formally documented so that programs that use the APIs can be checked at compile time for compliance against the rules. Some API rules such as number of parameters, and data types of each parameter can be checked by compilers using standard type systems. However, certain rules involve hidden state. For instance, consider the rule that an acquire and release of a spin lock need to be done in strict

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Draft for submission to CACM December 2009 Copyright 2010 ACM 978-1-60558-001-2/09/08 ...$10.00.

[email protected]

alternation, or the rule that a file can be read only after it is opened. Such rules cannot be stated and enforced by a compiler using known techniques. We built the SLAM engine (referred as SLAM from now on) to allow programmers to specify stateful usage rules, and statically check if clients follow such usage rules. [3, 4, 6] We wanted SLAM to be scalable, and at the same time have a very low false-error rate. We introduced a technique called CEGAR (or Counterexample Guided Abstraction Refinement) to meet these dual objectives. We found a “killer-app” for the SLAM—checking if Windows device drivers satisfy API usage rules. We wrapped SLAM with a set of rules specific to the Windows driver API, and a tool chain to enable push button validation of Windows drivers, called Static Driver Verifier (SDV). Tools such as SDV are strategically important for the Windows device ecosystem, which encourages and relies on hardware vendors making various devices and writing Windows device drivers, while also requiring these vendors to provide evidence that the device and drivers perform acceptably. Because many thousands of drivers use the same Windows driver API, the cost of manually specifying the API rules and writing them down is amortized over the value obtained by checking the same rules over many device drivers. This paper is a “ten year retrospective” of SLAM and SDV. We give a self-contained overview of SLAM, what we had to do go from the SLAM to a full-fledged SDV product, and our experience in building, deploying and using SDV. The paper is organized as follows. Section 2 describes the internals of the SLAM–the various components that allow the abstraction of a C program to a so called Boolean program, a model checker for Boolean programs, and a facility to incrementally add detail to Boolean programs to eliminate false errors. Section 3 discusses what it took to go from the SLAM to SDV, including the set of rules, environment models to abstract the behavior of the OS, scripts to automate applying SLAM on a device driver using existing build scripts for the driver, and a user interface to convey SDV’s analysis results to the user. Section 4 presents empirical results from using SDV on device drivers over several releases of Windows. Section 5 surveys related work and section 6 concludes with thoughts about the future.

2.

SLAM

We designed the SLIC language [5] (Specification Language for Interface Checking) to specify stateful API rules, and SLAM as a flexible verifier to check if code that uses the API follows these rules. We wanted to build a verifier

1

state {

enum {Unlocked, Locked} state; }

2 3 4 5

KeInitializeSpinLock.call { state = Unlocked; }

7 8 9 10 11 12 13

KeAcquireSpinLock.call { if ( state == Locked ) { error; } else { state = Locked; } }

16 17 18 19 20 21

2 3 4 5 6 7 8 9

14 15

1 1

6

KeReleaseSpinLock.call { if ( !(state == Locked) ) { error; } else { state = Unlocked; } }

10 11 12

.. KeInitializeSpinLock(); .. .. if(x > 0) KeAcquireSpinlock(); count = count+1; devicebuffer[count] = localbuffer[count]; if(x > 0) KeReleaseSpinLock(); ... ...

2 3 4 5 6 7 8 9 10 11 12 13 14 15

.. { state = Unlocked; KeInitializeSpinLock();} .. .. if(x > 0) { SLIC_KeAcquireSpinLock_call(); KeAcquireSpinlock(); } count = count+1; devicebuffer[count] = localbuffer[count]; if(x > 0) { SLIC_KeReleaseSpinLock_call(); KeReleaseSpinLock(); } ... ...

22

(a)

(b)

(c)

Figure 1: (a) Simplified SLIC locking rule; (b) code fragment that uses spinlocks; (c) fragment after instrumentation. that would cover all possible behaviors of the program while checking the rule, as opposed to a testing tool that might check the rule on a subset of behaviors covered by the test. In order for the solution to scale while covering all possible behaviors, we invented the notion of “Boolean programs”. Boolean programs are like C programs in the sense that they have all the control constructs of C programs (sequencing, conditionals, loops, procedure calls), but allow only Boolean variables (with local as well as global scope). Boolean programs made sense as an abstraction for device drivers because we found that most of the API rules that drivers must follow tend to be control dominated, and so can be checked by modeling control flow in the program accurately, and modeling only a few predicates about data that is relevant to each rule being checked. However, the predicates that need to be “pulled in” to the model are dependent on how the client code manages state relevant to the rule. We introduced Counterexample Guided Refinement (CEGAR) to discover the relevant state automatically, so as to balance the dual objectives of scaling to large programs, and reduce false errors.

2.1

SLIC specification language

SLAM was designed to check temporal safety properties of programs using a well-defined interface (or API). Safety properties are those properties whose violation is witnessed by a finite execution path. A simple example of a safety property is that a lock should be alternatively acquired and released. We encode temporal safety properties in a Clike language that allows the definition of a safety automaton [39]. The automaton monitors the execution behavior of a program at the level of function calls and returns. The automaton can read (but not modify) the state of the C program that is visible at the function call/return interface, maintain a history, and signal when a bad state occurs. A SLIC rule has three components: (1) a static set of state variables, described as a C structure; (2) a set of events and event handlers that specify state transitions on the events; (3) a set of annotations that bind the rule to various object instances in the program (not shown in this example).

For example, consider the locking rule given in Figure 1(a). Line 1 declares a C structure containing one field state, an enumeration which can be either Unlocked or Locked, to capture the state of the lock. Lines 3-5 describe an event handler for calls to KeInitializeSpinLock. Lines 7-13 describe an event handler for calls to the function KeAcquireSpinLock. The code for the handler expects the state to be in Unlocked and moves it to Locked (specified in line 9). If the state is already Locked, then it means that the program has called KeAcquireSpinLock twice without an intervening call to KeReleaseSpinLock and it is an error (line 9). Lines 15-21 similarly describe an event handler for calls to the function KeReleaseSpinLock.1 Figure 1(b) shows a piece of code that uses the functions KeAcquireSpinLock and KeReleaseSpinLock. Figure 1(c) shows the same code after it has been instrumented with calls to the appropriate event handlers. We will return to this example later.

2.2

CEGAR via Predicate Abstraction

Figure 2 both pictorially illustrates and presents ML-style pseudo-code of the CEGAR process. The goal of SLAM is to check if all executions of the given C program P (type cprog) satisfy a SLIC rule S (type spec). The function slam takes a C program P (of type cprog) and SLIC rule specification S as input, and passes the instrumented C program to the tail-recursive function cegar along with the predicates extracted from the specification S (specifically, the guards that appear in S as predicates). The instrument function hooks up relevant events via calls to event handlers specified in the rule S, and maps the error statements in the SLIC rule to a unique error state in P 0 . The instrument function guarantees that P satisfies S iff the instrumented program P 0 never reaches the error state. Thus, this function reduces the problem of checking if P satisfies S to checking if P 0 can reach the error state. The first step of the cegar function is to abstract pro1

A more detailed example of this rule would handle different instances of locks but we present the simple version here for ease of exposition.

abstract

cprog P

predicates

cprog P’

instrument

bprog B

check

refine proof of unsat.

spec S

P passes S

trace

symexec validated trace

CEGAR

P fails S

type cprog, spec, predicates, bprog, trace, proof type result = Pass | Fail of trace type chkresult = AbstractPass | AbstractFail of trace type excresult = Satisfiable | Unsatisfiable of proof

let rec cegar (P’:cprog) (preds :predicates) : result = let B: bprog = abstract (P’,preds) in match check(B) with | AbstractPass -> Pass | AbstractFail(trc) -> match symexec(P’, trc) with | Satisfiable -> Fail(trc) | Unsatisfiable(prf) -> cegar P’ ( preds ∪ (refine prf)) let slam ( P:cprog) (S:spec) : result = cegar (instrument (P,S)) (preds S)

Figure 2: Graphical illustration and pseudo-code of CEGAR loop. gram P 0 with respect to the predicate set preds to create a Boolean program abstraction B. The automated transformation of a C program into a Boolean program uses the technique of predicate abstraction [2]. The program B has exactly the same control-flow skeleton as program P 0 . By construction, for any set of predicates preds, every execution trace of the C program P 0 also is an execution trace of B = abstract(P 0 , preds). That is, the execution traces of P 0 are a subset of those of B. The Boolean program B models only the portions of the state of P 0 that are relevant to the current SLIC rule, and uses nondeterminism to abstract away irrelevant state in P 0 . Once the Boolean program B has been constructed, the check function exhaustively explores the state space of B to determine if the (unique) error state is reachable. Since the Boolean program typically models a small number of variables, this exploration is feasible. We use interprocedural dataflow analysis [38] with binary decision diagrams [8] to enable the symbolic reachability analysis to scale to hundreds of Boolean variables. If the check function returns AbstractPass, this means that the error state is not reachable in B and therefore is not reachable in P 0 . In this case, SLAM has proved that the C program P satisfies the specification S. However, if the check function returns AbstractFail with witness trace trc, this means that the error state is reachable in the Boolean program B, but not necessarily in the C program

P 0 . Therefore, the trace trc must be validated in the context of P 0 to prove it really is an execution trace of P 0 . The function symexec symbolically executes the trace trc in the context of the C program P 0 . Specifically, it constructs a formula φ(P 0 , trc) that is satisfiable if and only if there exists an input that would cause program P 0 to execute trace trc. If symexec returns Satisfiable then SLAM has proved that program P does not satisfy specification S and returns the counterexample trace trc. If the function symexec returns Unsatisfiable(prf) then it has found a proof prf that there is no input that would casue P 0 to execute trace trc. The function refine takes this proof of unsatisfiability, reduces it to a smaller proof of unsatisfiability and returns the set of constituent predicates from this smaller proof. The function refine guarantees that the trace trc is not an execution trace of the Boolean program abstract(P 0 , preds ∪ ref ine(prf )) The ability to refine the (Boolean program) abstraction to rule out a spurious counterexample is known as the progress property of the CEGAR process. Despite the progress property, the CEGAR process has no guarantee of terminating - it can progress in refining the Boolean program forever, without discovering a proof of correctness or proof of error. However, as each Boolean program is guaranteed to overapproximate the behavior of the C program, stopping the CEGAR process before it ter-

1 1

slic_error() { assert(false); }

3

2 3

bool {state==Locked};

6 7 8

SLIC_KeAcquireSpinLock_call() { if( {state==Locked}) slic_error(); else {state==Locked} := true; }

11 12 13

6 7 8 9 10

9 10

4 5

4 5

2

SLIC_KeReleaseSpinLock_call() { if( !{state==Locked}) slic_error(); else {state==Locked} := false; }

11 12 13 14 15

14

16

(a)

... ... {state==Locked} := false; KeInitializeSpinLock(); ... ... if(*) { SLIC_KeAcquireSpinLock_call(); KeAcquireSpinLock(); } skip; skip; if(*) { SLIC_KeReleaseSpinLock_Call(); KeReleaseSpinLock(); } ... ...

(b)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

bool {x > 0}; ... {state==Locked} := false; KeInitializeSpinLock(); ... ... if({x>0}) { SLIC_KeAcquireSpinLock_call(); KeAcquireSpinLock(); } skip; skip; if({x>0}) { SLIC_KeReleaseSpinLock_Call(); KeReleaseSpinLock(); } .. ... ...

(c)

Figure 3: (a) Boolean program abstraction for locking and unlocking routines; (b) Boolean Program: CEGAR iteration 1; (c) Boolean Program: CEGAR iteration 2. minates with a definitive result is no different than any terminating program analysis which produces false alarms. In practice, SLAM terminates with a definite result over 96% of the time on large classes of device drivers: for Windows Driver Framework (WDF) drivers, the figure is 100%; for Windows Driver Model (WDM) drivers, the figure is 97%.

2.3

Example

We illustrate the CEGAR process using the SLIC rule from Figure 1(a) and the example code fragment in Figure 1(b). In this program, we have a single spin lock being initialized at line 4. The spin lock is acquired at line 8 and released at line 12. However, both the calls KeAcquireSpinLock and KeReleaseSpinLock are guarded by the conditional (x > 0). Thus, tracking correlations between such conditionals is important to prove this property. Figures 3(a) and 3(b) show the Boolean program obtained by the first application of the abstract function to the code from Figures 1(a) and 1(c), respectively. Figure 3(a) shows the Boolean program abstraction of the SLIC event handler code. Recall that the instrumentation step guarantees that there is a unique error state. The function slic_error at line 1 represents that state. That is, the function slic_error is unreachable if and only if the program satifies the SLIC rule. There is one Boolean variable named {state==Locked}. By convention, we name each Boolean variable with the predicate it stands for, enclosed in curly braces. In this case, the predicate comes from the guard in the SLIC rule (Figure 1(a), line 8). Lines 5-8 and lines 10-13 of Figure 3(a) show the Boolean procedures corresponding to the SLIC event handlers SLIC_KeAcquireSpinLock_call and SLIC_KeReleaseSpinLock_call from Figure 1(a). Figure 3(b) shows the Boolean program abstraction of the SLIC-instrumented C program from Figure 1(c). Note that the Boolean program has the same control flow as the C program, including procedure calls. However, the conditionals at lines 7 and 12 of the Boolean program are nondeterministic, since the Boolean program does not have any predicate that refers to the value of variable x. Also, note that the references to variables count, devicebuffer and localbuffer are elided in lines 10 and 11 (replaced by skip statements in the Boolean program), since the Boolean program does not have any predicates that refer to these variables.

The abstraction in Figure 3(b), while a valid abstraction of the instrumented C is not strong enough to prove that the program conforms to the SLIC rule. In particular, the reachability analysis of the Boolean program performed by the check function will find that slic_error is reachable via the trace 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 13, which skips the call to SLIC_KeAcquireSpinLock_call at line 8 and performs the call to SLIC_KeReleaseSpinLock_call at line 13. Since the Boolean variable state==Lock is false, slic_error will be called in line 11 of Figure 3(a). SLAM feeds this error trace to the symexec function, which symbolically executes this trace over the instrumented C program in Figure 1(c) and determines that this trace is not executable, since the branches in “if” conditions are correlated. In particular, the trace is not executable because there does not exist a value for variable x such that (x > 0) is false (skipping the body of the first conditional) and such that (x > 0) is true (entering the body of the second conditional). That is, the formula ∃x.(x ≤ 0) ∧ (x > 0) is unsatisfiable. The result of the refine function is to add the predicate {x>0} to the Boolean program to refine it. This results in the Boolean program abstraction in Figure 3(c), which includes the Boolean variable {x>0} in addition to {state==Locked}. Using these two Boolean variables, the abstraction in Figure 3(c) is strong enough to prove that slic_error is unreachable for all possible executions of the Boolean program, and hence SLAM proves that this Boolean program satisfies the SLIC rule. Since the Boolean program is constructed to be an overapproximation of the C program in Figure 1(c), the C program indeed satisfies the SLIC rule.

3.

FROM SLAM TO SDV

SDV is a completely automatic tool (using SLAM) that a device driver developer can use at compile time. Requiring nothing more than the build script of the driver, the SDV tool runs fully automatically and checks a set of prepackaged API usage rules on the device driver. For every usage rule that is violated by the driver, SDV presents a possible execution trace through the driver that shows how the rule can be violated. We often see model checking referred to as a “push button” technology [13], giving the impression that the user simply

gives their system to the model checker and receives useful output about errors in the system. In practice, this is far from the truth. In addition to the state space explosion problem [13], there are several other obstacles to make a model checker into a “push button” technology. First, we need to specify the properties we want to check, without which there is nothing for a model checker to do. In complex systems such as the Windows driver interface, such properties are not easy to specify, and they must be debugged themselves. Second, due to the state explosion problem, the code analyzed by the model checker is not the full system in all its gory complexity, but rather the composition of some detailed component (like a device driver) with a so-called “environment model” that is a highly abstract, human written description of the other components of the system (in our case, kernel procedures of the Windows operating system). Third, to become a practical tool in the toolbox of a driver developer, the model checker needs to be encapsulated in a script that incorporates it in the driver development environment, feeds it with driver’s source code and reports results to the user. Thus, for the user to experience a “push button” experience a lot more needs to be done than just building a good model checking engine. This section presents the various components of the SDV tool besides SLAM: driver API rules, environment models, scripts, and user interface. We describe how these components evolved over the past eight years, starting with the formation of the SDV team in Windows in 2002, acceptance of SDV into the WDK (Windows Driver Development Kit) product in 2004, its first public release with the Vista WDK Beta in 2006 (SDV version 1.4 ), public release of SDV version 1.5 with the Windows Vista WDK in 2007, public release of SDV version 1.6 with the Windows Server 2008 WDK, and the public release of SDV version 2.1 (which uses SLAM-2, a new version of SLAM) with the Windows 7 WDK in 2009.

3.1

API rules

As described earlier, SLIC rules capture API-level interactions. They are not specific to a particular device driver, but to a whole class of drivers that use a common API. This offers the advantage that the manual effort spent in writing rules can be amortized by checking the rules on thousands of device drivers that use the API. The SDV team has made significant investment in writing API rules and teaching others in the Windows organization to write API rules. Different classes of devices have different requirements, which leads to class-specific driver APIs. Thus, networking drivers use the NDIS API, storage drivers use the StorPort and MPIO APIs, and display drivers the WDDM API. Until recently, a vast majority of other drivers were written using the WDM API, a very low-level API that provides basic kernel routines (for memory management, synchronization, etc.) as well as routines specific to driver’s operation and life-cycle (I/O, plug-and-play and power controls). Now, many of drivers use the WDF API, which provides higher-level abstractions of common driver actions.

3.2

Environment Models

SLAM is designed as a generic engine for checking properties of a closed C program. However, a device driver is not a closed program with a main procedure, but rather a

library with many entry points (registered with and called by the operating system). This problem is standard to both program analysis and model checking. Before applying SLAM to a driver’s code, we first “close” the driver program with a suitable environment. The environment of a device driver consists of two parts: a top layer called the harness, a main procedure that calls the driver’s entry points; a bottom layer of stubs for the Windows API functions that can be called by the device driver. Thus, the harness calls into the driver, and the driver calls the stubs. Many API rules (in fact, the majority of them) are local to a driver’s entry points, which means that a rule can be checked on each entry point independently. However, there are some complex rules that deal with sequences of entry points. For the rules of the first type, the body of the harness is a non-deterministic switch in which each branch calls a single and different entry point of the driver. For more complex rules, the harness contains a sequence of such non-deterministic switches. A stub is a simplified implementation of an API function. Its purpose is to approximate the input-output relation of the API function. Ideally, this should be an overapproximation. In many cases, a driver API function returns a scalar that indicates success or failure. In these cases, the API stub usually ends with a non-deterministic switch over possible return values. In many cases, a driver API function allocates a memory object and returns its address, sometimes through an output pointer parameter. In these cases, a harness allocates a small set of such memory objects and the stub picks up one of them and returns its address.

3.3

Scaling out rules and models

In the early days, the SLAM/SDVteam itself wrote the API rules, based on input from driver API experts. The driver API experts educated us on how developers could misuse the API and we wrote rules to catch each such misuse. We tested the rules on drivers with injected bugs and then ran SDV with these rules on real Windows drivers. We discussed the found bugs with driver owners and API experts to refine the rule, and so on. A senior manager commented at that time that “It takes a Ph.D. to develop API rules”. While this was a starting point, we soon realized that this approach would not scale, since there are a large number of driver APIs. Thus, we invested significant effort in creating a discipline for writing SLIC rules and spreading this discipline among device driver API developers and testers. As of 2007, the SDV team improved SLAM and refined the WDM and WDF API rules (and related environment models), which enabled Microsoft to ship SDV versions 1.5 and 1.6. The team also formulated the basics of rule development and driver environment model construction. This helped us transfer rule development to two software engineers with backgrounds far removed from formal verification, enabling them to succeed and later spread the rule development to others. Since 2007, driver API teams regularly employ summer interns working to develop new API rules: for WDF, NDIS, StorPort and MPIO APIs, for an API used to write file system mini-filters (for example, anti-viruses) and Windows services. Remarkably, all interns succeeded in writing API rules: their rules found true bugs in real drivers. In total, SDV now has 470 API rules. Over 200 API rules shipped with SDV 2.1 for the WDM, WDF and NDIS APIs.

No more than 60 of these 210 API rules were written by formal verification specialists. All other rules were written from scratch or re-written from earlier drafts by software engineers or interns with little to no experience in formal verification. It is worth noting that the SLIC rules for WDF were developed during the design phase of WDF, whereas the WDM rules were developed long after WDM came into existence. The formalization of the WDF rules influenced the design of WDF; if a rule could not be naturally expressed in SLIC, the WDF designers tried to refactor the API to make it easier to verify. This experience showed that verification tools such as SLAM can be forward looking design aids, in addition to being checkers for legacy APIs (such as WDM).

3.4

Abstractions to write rules and models

A driver’s interface consists of API functions invoked by the driver and driver entry points (or callbacks) invoked by the kernel. Since an API rule is written for a class of drivers and not for a particular driver, we abstract entry points using generic role names, for example, fun_ReadFromDevice, rather than concrete names used in a particular driver (such as, say, FooReadNextIOPacket ). To facilitate detection of entry points, we introduce function types that capture generic roles of entry points (for example, ReadFromDevice) and ask the driver writer to declare entry points using these role types. Next, we distinguish property detection rules from verification rules. A verification rule targets defects in a driver. A property detection rule detects whether a given property is applicable for a driver and classifies drivers into three categories: drivers for which the property is applicable; drivers for which the property is not applicable; drivers for which we cannot determine if the property is applicable. The execution of each verification rule is conditioned on a driver satisfying one or more property detection rules. This simple technique reflects the orthogonal features of drivers: drivers can be function drivers, filter drivers, bus drivers, powerpolicy owners, etc. It also introduces natural modularization of complex rules, enabling their reuse, and partitioning the verification of a complex rule into a series of simpler checks.

3.5

Scripts

SDV contains a set of scripts that perform various functions such as: combining together rules and environment models; detecting source files of a driver and its build parameters; running the SLIC compiler on rules and the C compiler on a driver’s and environment model’s source code to generate an intermediate representation (IR); invoking SLAM on the generated IR; reporting the summary of the results and error traces for bugs found by SLAM in a GUI. The SDV team worked hard to ensure that these scripts provide a very high degree of automation for the user. The user does not need to specify anything other than the build scripts that used to build the driver.

4.

EXPERIENCE FROM SDV

The first version of SDV (1.3, not released externally) found, on average, one real bug per driver in thirty sample drivers shipped with the Driver Development Kit (DDK) for Windows Server 2003. These sample drivers already had been well tested. The importance of eliminating defects in the WDK samples is difficult to overstate, as the code of

sample drivers often is copied by third party driver developers (so much so that certain developers in Windows could recognize defects from crashed third party device drivers as defects that had previously been in the samples). Versions 1.4 and 1.5 of SDVwere applied to Windows Vista drivers. In the sample WDM drivers shipped with the Vista WDK (WDK, the renamed DDK), SDV found on average approximately one real bug per two drivers. This is not surprising, as these samples were mostly modifications of sample drivers from the Windows Server 2003 DDK, with fixes applied for the defects found by SDV 1.3. The newly found defects were due in part to a better set of SDV rules and in part to modifications in the drivers. For Windows Server 2008, SDV version 1.6 contained new rules for WDF drivers. With these new rules, SDV found one real bug per three WDF sample drivers. The low bug count is explained by two factors. First, WDF sample drivers are much simpler than WDM sample drivers. Furthermore, as explained before, the WDF rules were designed during the design of the WDF driver model itself and in parallel with the development of the WDF sample drivers, which explains the higher quality of these samples. For the Windows 7 WDK, SDV 2.0 found on average one new real bug per a sample driver of WDF type, and several bugs on all WDM sample drivers, on a set of seventy drivers of which a half are of WDF type and a half of WDM type. This data is explained by more focused efforts to refine WDF rules and very few modifications in the WDM sample drivers. SDV 2.0 shipped with the Windows 7 WDK in 2009, with rules for WDM, WDF and NDIS-miniport drivers, as follows: 74/13 WDM rules, 94/9 WDF rules and 36/1 NDIS rules. Here, the first number is for verification rules and the second number for property-detection rules (used only as pre-conditions enabling other rules). The quality characteristics of WDM and WDF rules are: on WDM drivers, 90% reported defects are true bugs, non-result checks (with timeouts, etc.) make up 3.5% of all checks; on WDF drivers, 98% reported defects are true bugs, non-result checks make up 0.04%. During the development cycle of Windows 7, SDV 2.0 was applied as a quality gate to Microsoft inbox drivers and sample drivers shipped with the WDK. SDV was applied later in the cycle after all other tools. Yet, it found real 270 bugs in 140 WDM and WDF drivers. All bugs found by SDV in Microsoft drivers were fixed. We do not have reliable data on bugs found by SDV in third party device drivers. To give the reader a flavor for the performance of SDV we give some statistics from a recent run of SDV on 100 drivers and 80 SLIC rules. The largest driver in this set is about 30K lines of code, and the total size of all drivers is 450K lines of code. The total runtime for the 8000 runs (each driver-rule combination is a run) is about 30 hours on an 8-core machine. We kill a run if it exceeds 20 minutes, and SDV yields useful results (i.e., either a bug or pass) on over 97% of the runs. Thus, we find that SDV checks drivers with acceptable performance and yields useful results on a large fraction of the runs.

5.

RELATED WORK

SLAM builds on decades of research in formal methods. Model checking [13, 12, 37] has been used extensively to

algorithmically check temporal logic properties of models. Early applications of model checking were in hardware [34], and in protocol design [28]. In the compiler and programming languages community, abstract interpretation [18] provides a broad and generic framework to compute fixpoints using abstract lattices. The particular abstraction used by SLAM is called predicate abstraction [25]. The predicate abstraction algorithm makes use of an automated theorem prover [19] to construct the abstraction. Our contribution was to show how to perform predicate abstraction on C programs [2]. The Bandera project explored the idea of user-guided finite-state abstractions for Java programs [17], based on predicate abstraction and manual abstraction, but without automatic refinement of abstractions. SLAM’s Boolean program model checker (Bebop) computes fixpoints on the state space of the generated Boolean program, which can include recursive procedures. Bebop uses the Context Free Language Reachability algorithm [38], and implements it symbolically using Binary Decision Diagrams [8]. Boolean programs generated by SLAM have been used to study and improve the performance of several model checkers [21, 32]. The idea of using counterexamples to iteratively add more detail to the model was first proposed as “localization reduction” by Kurshan [31] in the context of hardware model checking, where initially several latches are omitted and replaced by free variables, and counterexamples guided inclusion of previously omitted latches. This was later generalized to add predicates based on counterexamples by Clarke et al [14] in the context of hardware (finite-state systems) around the same time that Ball and Rajamani [3] extended CEGAR to the realm of software (infinite-state systems). SLAM and its practical application to checking device drivers was enthusiastically received by the research community, and several related projects were started by research groups in academia and industry. At Microsoft, the ESP and Vault projects started in the same group as SLAM and explored different ways of checking API usage rules [33]. The Blast project [27] at Berkeley proposed a technique called “lazy abstraction” to optimize the work in constructing and maintaining the abstractions across the iterations in the CEGAR loop. McMillan proposed interpolants as a more systematic and general way to perform refinement [35]. Predicates generated from interpolants were found to have nice local properties and were used to implement local abstractions in Blast [26]. Other contemporary techniques for analyzing C code against temporal rules in the early 2000s included the metalevel compilation approach of Engler and colleagues [20], and an extension of SPIN to handle ANSI C [29]. The Cqual project uses “type qualifiers” to specify API usage rules and uses type inference to check C code against the type qualifer annotations [22]. SLAM works by computing an overapproximation of the C program, or a “may analysis” [24]. The may analysis is refined using symbolic execution on traces, inspired by the PREfix tool [9], or a “must analysis”. In the past few years, must analysis using efficient symbolic execution on a subset of paths in the program has been shown to be very effective in finding bugs [23]. Recent work has explored how to combine may and must analysis in more general ways [24]. Another way to perform underapproximation or must anal-

ysis is to unroll loops a fixed number of times and perform “bounded model checking” [11] using satisfiability solvers. This idea has been pursued by several projects, including CBMC [15], F-Soft [30] and saturn [1]. CEGAR has been generalized to concurrent programs, properties of heap manipulating programs [7], and the problem of program termination [16]. The Magic model checker checks properties of concurrent programs where threads interact through message passing [10]. Qadeer and Wu used SLAM to encode all interleavings with two context switches in a concurrent program [36].

6.

CONCLUSIONS

The last decade witnesses a resurgence of interest in the automated analysis of software for the dual purposes of program falsification (defect detection) and program verification, due to advances in program analysis, model checking, and automated theorem proving. A unique contribution of the SLAM project was to combine the overapproximation techniques (Boolean program abstraction) used to prove programs correct with underapproximation techniques (symbolic execution of a program trace) to prove programs incorrect, as well as to automatically refine the Boolean program abstraction. Windows device drivers provided the crucible in which SLAM was tested and refined, resulting in the Static Driver Verifier tool, which ships as part of the Windows Driver Kit.

Acknowledgements For their many contributions to and support of SLAM and SDV over the years, directly and indirectly, we thank Nikolaj Bjørner, Ella Bounimova, Sagar Chaki, Byron Cook, Manuvir Das, Satyaki Das, Giorgio Delzanno, Leonardo de Moura, Manuel F¨ ahndrich, Nar Ganapathy, Jon Hagen, Rahul Kumar, Shuvendu Lahiri, Jim Larus, Rustan Leino, Xavier Leroy, Juncao Li, Jakob Lichtenberg, Rupak Majumdar, Johan Marien, Con McGarvey, Todd Millstein, Arvind Murching, Mayur Naik, Aditya Nori, Bohus Ondrusek, Adrian Oney, Edgar Pek, Andreas Podelski, Shaz Qadeer, Bob Rinne, Robby, Stefan Schwoon, Adam Shapiro, Rob Short, Fabio Somenzi, Antonios Stampoulis, Donn Terry, Abdullah Ustuner, Westley Weimer, Georg Weissenbacher, Peter Wieland, and Fei Xie.

7.

REFERENCES

[1] A. Aiken, S. Bugrara, I. Dillig, T. Dillig, B. Hackett, and P. Hawkins. An overview of the Saturn project. In PASTE 07: Workshop on Program Analysis for Software Tools and Engineering, pages 43–48, 2007. [2] T. Ball, T. D. Millstein, and S. K. Rajamani. Polymorphic predicate abstraction. ACM Trans. Program. Lang. Syst., 27(2):314–343, 2005. [3] T. Ball and S. K. Rajamani. Boolean programs: A model and process for software analysis. Technical Report MSR-TR-2000-14, Microsoft Research, January 2000. [4] T. Ball and S. K. Rajamani. Automatically validating temporal safety properties of interfaces. In SPIN 01: SPIN Workshop, pages 103–122, 2001. [5] T. Ball and S. K. Rajamani. SLIC: A specification language for interface checking. Technical Report MSR-TR-2001-21, Microsoft Research, 2001.

[6] T. Ball and S. K. Rajamani. The SLAM project: Debugging system software via static analysis. In POPL 02: Principles of Programming Languages, pages 1–3, January 2002. [7] D. Beyer, T. A. Henzinger, G. Th´eoduloz, and D. Zufferey. Shape refinement through explicit heap analysis. In FASE 10: Fundamental Approaches to Software Engineering, pages 263–277, 2010. [8] R. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8):677–691, 1986. [9] W. R. Bush, J. D. Pincus, and D. J. Sielaff. A static analyzer for finding dynamic programming errors. Software-Practice and Experience, 30(7):775–802, June 2000. [10] S. Chaki, E. Clarke, A. Groce, S. Jha, and H. Veith. Modular verification of software components in c. In ICSE 03: International Conference on Software Engineering, pages 385–395, 2003. [11] E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 1999. [12] E. M. Clarke and E. A. Emerson. Synthesis of synchronization skeletons for branching time temporal logic. In Logic of Programs, pages 52–71, 1981. [13] E. M. Clarke, E. A. Emerson, and J. Sifakis. Model checking: algorithmic verification and debugging. Commun. ACM, 52(11):74–84, 2009. [14] E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction refinement. In CAV 00: Computer-Aided Verification, pages 154–169. 2000. [15] E. M. Clarke, D. Kroening, and F. Lerda. A tool for checking ANSI-C programs. In TACAS 04: Tools and Algorithms for the Construction and Analysis of Systems, pages 168–176, 2004. [16] B. Cook, A. Podelski, and A. Rybalchenko. Abstraction refinement for termination. In SAS 05: Static Analysis Symposium, pages 87–101, 2005. [17] J. Corbett, M. Dwyer, J. Hatcliff, C. Pasareanu, Robby, S. Laubach, and H. Zheng. Bandera : Extracting finite-state models from Java source code. In ICSE 00: Internation Conference on Software Engineering, 2000. [18] P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for the static analysis of programs by construction or approximation of fixpoints. In POPL 77: Principles of Programming Languages, pages 238–252, 1977. [19] L. de Moura and N. Bjørner. Z3: An Efficient SMT Solver. In TACAS 08: Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340, 2008. [20] D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In OSDI 00: Operating System Design and Implementation, pages 1–16. Usenix Association, 2000. [21] J. Esparza and S. Schwoon. A BDD-based model checker for recursive programs. In CAV 01: Computer Aided Verification, pages 324–336, 2001. [22] J. S. Foster, T. Terauchi, and A. Aiken. Flow-sensitive

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

type qualifiers. In PLDI 02: Programming Language Design and Implementation, pages 1–12. ACM, 2002. P. Godefroid, M. Y. Levin, and D. A. Molnar. Automated whitebox fuzz testing. In NDSS 08: Network and Distributed System Security, 2008. P. Godefroid, A. V. Nori, S. K. Rajamani, and S. D. Tetali. Compositional may-must program analysis: unleashing the power of alternation. In POPL 10: Principles of Programming Languages, pages 43–56. ACM, 2010. S. Graf and H. Sa¨ıdi. Construction of abstract state graphs with PVS. In CAV 97: Computer Aided Verification, pages 72–83. 1997. T. A. Henzinger, R. Jhala, R. Majumdar, and K. L. McMillan. Abstractions from proofs. In POPL 04: Principles of Programming Languages, pages 232–244, 2004. T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In POPL 02: Principles of Programming Languages, pages 58–70, January 2002. G. Holzmann. The SPIN model checker. IEEE Transactions on Software Engineering, 23(5):279–295, May 1997. G. Holzmann. Logic verification of ANSI-C code with SPIN. In SPIN 00: SPIN Workshop, pages 131–147, 2000. F. Ivancic, Z. Yang, M. K. Ganai, A. Gupta, and P. Ashar. Efficient sat-based bounded model checking for software verification. Theor. Comput. Sci., 404(3):256–274, 2008. R. Kurshan. Computer-aided Verification of Coordinating Processes. Princeton University Press, 1994. S. La Torre, M. Parthasarathy, and G. Parlato. Analyzing recursive programs using a fixed-point calculus. In PLDI ’09: Programming Language Design and Implementation, pages 211–222. ACM, 2009. J. R. Larus, T. Ball, M. Das, R. DeLine, M. F¨ ahndrich, J. Pincus, S. K. Rajamani, and R. Venkatapathy. Righting software. IEEE Software, 21(3):92–100, May/June 2004. K. McMillan. Symbolic Model Checking: An Approach to the State-Explosion Problem. Kluwer Academic Publishers, 1993. K. L. McMillan. Interpolation and SAT-based model checking. In CAV 03: Computer-aided Verification, pages 1–13, 2003. S. Qadeer and D. Wu. KISS: keep it simple and sequential. In PLDI 04: Programming Language Design and Implementation, pages 14–24, 2004. J. Queille and J. Sifakis. Specification and verification of concurrent systems in CESAR. In 5th International Symposium on Programming, pages 337–350, 1982. T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis via graph reachability. In POPL 95: Principles of Programming Languages, pages 49–61, 1995. M. Y. Vardi and P. Wolper. An automata theoretic approach to automatic program verification. In LICS 86: Logic in Computer Science, pages 332–344. IEEE Computer Society Press, 1996.