Composable thread coloring - Semantic Scholar

Composable Thread Coloring ∗ Dean F. Sutherland

William L. Scherlis

Institute for Software Research, Carnegie Mellon [email protected] [email protected]

This paper introduces the language-independent concept of “thread usage policy.” Many multi-threaded software systems contain policies that regulate associations among threads, executable code, and potentially shared state. A system, for example, may constrain which threads are permitted to execute particular code segments, usually as a means to constrain those threads from accessing or writing particular elements of state. These policies ensure properties such as state confinement or reader/writer constraints, often without recourse to locking or transaction discipline. Our approach allows developers to concisely document their thread usage policies in a manner that enables the use of sound scalable analysis to assess consistency of policy and as-written code. This paper identifies the key semantic concepts of our thread coloring language and illustrates how to use its succinct source-level annotations to express models of thread usage policies, following established annotation conventions for Java. We have built a prototype static analysis tool, implemented as an integrated development environment plug-in (for the Eclipse IDE), that notifies developers of discrepancies between policy annotations and as-written code. Our analysis technique uses several underlying algorithms based on abstract interpretation, call-graphs, and type inference. The resulting overall analysis is both sound and composable. We have used this prototype analysis tool in case studies to model and analyze more than a million lines of code. Our validation process included field trials on a wide variety of complex large-scale production code selected by the host organizations. Our in-field experience led us to focus on potential adoptability by real-world developers. We have developed techniques that can reduce annotation density to less than one line per thousand lines of code (KLOC). In addition, the prototype analysis tool supports an incremental and iterative approach to modeling and analysis. This approach enabled field trial partners to directly target areas of greatest concern and to achieve useful results within a few hours.

and parallel languages; F.3.1 [LOGICS AND MEANINGS OF PROGRAMS]: Specifying and Verifying and Reasoning about Programs—Mechanical verification General Terms

Languages, Verification

General Terms State consistency, Thread policy, Multicore, State confinement, Annotation, Java, Race conditions

1.

Introduction

Many modern software systems employ shared-memory multithreading and are built using components such as libraries and frameworks. As a result, developers must manage the interactions between multiple threads as they execute within those components. To manage this complexity, developers turn to abstraction and information hiding [20]; they treat components as “black boxes” with known interfaces that should explicitly specify all of the necessary pre- and post-conditions of the design contract, while using an appropriate level of abstraction to hide unnecessary detail. However, many current interfaces lack explicit specification of thread-related preconditions.1 In the absence of explicit specifications, developers must make assumptions about what the missing preconditions might be; these assumptions are frequently incorrect. Failure to comply with the actual thread-related preconditions can lead to errors such as state corruption and deadlock. These errors are often intermittent and difficult to diagnose. This paper introduces the concept of “thread usage policy”—a group of often unspecified preconditions used as part of a strategy for managing access to shared state by regulating which specific threads are permitted to execute particular code segments or to access particular data fields. The concept of thread usage policy is not language specific; similar issues arise in many popular languages, including Java, C#, C++, Objective-C, Ada, and others. Currently, the preconditions contained in thread usage policies can be hard to identify, poorly thought out, unstated, poorly documented, incorrectly expressed, out of date, or simply difficult to find. This inspired us to devise means to specify these preconditions in a form that developers would find both useful and acceptable. We have developed a simple, formal specification language for modeling thread usage policies, which we call the language of “Thread Coloring.” We have devised appropriate abstractions of the key semantic building blocks of thread usage policies—thread identity, concrete code segments, and data fields—so that developers can build a model of the thread usage policy (policy, hereafter) by expressing preconditions as simple precise annotations in program code. The form of these annotations follows conventions now well-established for languages such as Java and C#. Our choice of abstractions has enabled us to model policies that support a wide

Categories and Subject Descriptors D.3.2 [PROGRAMMING LANGUAGES]: Language Classifications—Concurrent, distributed, ∗ This

material is based upon work supported by the following grants: NASA: NCC2-1298 and NNA05C530A; Lockheed Martin: RRMHS1798; ARO: DAAD190210389; IBM Eclipse: IC-5010. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsors or of the U.S. government.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PPoPP’10, January 9–14, 2010, Bangalore, India. c 2010 ACM 978-1-60558-708-0/10/01. . . $10.00 Copyright

1 Users of thread coloring write many preconditions and few postconditions;

thus, for the remainder of the paper we will simply write “preconditions.” We will clearly state necessary postconditions in Section 2.

233

variety of strategies for managing access to shared state, including simple thread confinement of data fields; confinement of data fields to a purpose such as an “event thread;” use of client callback methods; regulating use of locking in cases where data fields are shared solely between two specific threads; regulating access to locks where certain threads are permitted to interact with specific locks, but not others; and mode-based control of access such as the single-writer/multiple-reader pattern. We have devised a static analysis that assesses the consistency of a body of code with the expressed policy models of the components used by that code. The analysis identifies specific program points that are inconsistent with the expressed model and provides sufficient information about each inconsistency to enable the developer to fix the error, whether in the model or the code. We developed a prototype analysis tool—implemented as an Eclipse IDE plug-in—and used it in case studies to analyze a wide variety of applications, including large-scale GUI clients and their framework APIs, a production system used for chip design that embodies simulation infrastructure [22], an astronomy application for viewing star charts, and a compute-intensive robot-motion-planning system. Developers used the analysis results to identify errors in their code; in many cases, they incorporated fixes for these errors in their shipping systems.

There is one persistent user-thread called the DatabaseThread, created in com.sun.electric.tool.Job. This thread acts as an inorder queue for Jobs, which are spawned in new threads. Jobs are mainly of two types: Examine and Change. These jobs interact with the Database, which are objects in the package hierarchy com.sun.electric.database. The Rules are: 1. Jobs are spawned into new Threads in order of receipt. 2. Only one Change job can run at a time. 3. Many Examine jobs can run simultaneously. 4. Change and Examine jobs cannot run at the same time. 5. Examine jobs can only look at the Database. 6. Change jobs can look at and modify the Database. Because only one Change job can run at a time, the Change job is just run in the DatabaseThread. Examine jobs are spawned into their own threads, which terminate when the Examine job is done. Figure 1. Electric developer’s thread usage policy

Our in-field case studies provide evidence of external validity in two ways: (1) we diagnosed errors that the developers of the code being analyzed had failed to find; and (2) the code to be analyzed was selected by the host organizations rather than by our team.

Considerations for practicability. Based on our in-field case studies involving complex large-scale production code, we identified three considerations for the feasibility and practicability2 of the design of both the thread coloring language and our analysis tool. The first consideration is to devise means to greatly reduce the extent of annotation required to express complex models in large-scale production Java systems. We use several techniques, including exploitation of inheritance, policy inference within components, and a kind of ad hoc polymorphism for annotations [16]. These techniques can reduce annotation density to less than one annotation per KLOC, on average, in large-scale framework client code. We exclude annotations on the APIs of language-standard libraries from this computation; once written, those annotations apply to all programs written in that language. The second consideration is to provide useful analysis results in very large systems development projects. We developed techniques that support composition in a manner analogous to separate compilation. These techniques have enabled us to separately analyze two 100KLOC components within a multi-million LOC commercial system and to obtain useful results for developers. The final consideration is to develop an interaction model that rewards developers and evaluators with increments of benefit for each increment of their modeling and analysis effort. We achieve this by providing incremental results that, while provisional and contingent, can assist developers as they elaborate models and undertake analyses. This has enabled several development teams to use early tool prototypes to develop and verify models of policy for components of their larger systems. In one case study, for example, we modeled and analyzed a targeted portion of an application, focusing on a particular area of concern—in this case, verification of the thread confinement of critical data—while avoiding the need to model and analyze the bulk of the ~100KLOC application.

2.

Thread coloring is a simple, formal specification language that enables developers to express models of the thread usage policies in their code, as well as the policies of the possibly-opaque components used by it.3 In this section, we present the process of using the language to identify and model policies, using a non-trivial example taken from production code to illustrate this process. The Electric tool [22] is an open-source VLSI design tool that is nearly twenty years old. Version 8 of Electric was the first open source release of the tool after its translation into Java by Sun Microsystems. This translation was motivated in part by a desire to use multithreading to improve performance. Electric is actively maintained, and is used by a wide variety of design teams in industry. Our case study was performed on version 8.0.1, which contains roughly 140KLOC in forty-four Java packages. 2.1

Identifying thread usage policies

Our first step is to identify all of the policies. Although some development teams in our case studies were unaware of the intended policies in their systems, most teams had a policy as part of their tacit “shared lore.” These policies were rarely recorded in any written form. Thus, we use information gathered from the developers to develop and model hypothetical policies. We use the analysis tool to assess the consistency of as-written code with the hypothetical model; through iteration and reverse engineering, we converge on a model of the intended policy. We use the Electric tool in our worked example because the developers had a pre-existing written thread usage policy, which appears in slightly edited form in Figure 1. We have thus identified one of the two key policies that needed to be expressed; the other is the policy for Java’s AWT/Swing GUI framework, used by Electric’s GUI. In less than one day of effort, we expressed models of these two policies, and used the findings produced by our analysis

Validation. We have validated our approach through more than fifteen case studies, more than ten of which were performed in the field on production code selected by the host organizations. In the course of these case studies we analyzed well over one million lines of code. This validation experience demonstrates that our approach is sufficiently expressive and scalable to capture and analyze realworld thread usage policies in a wide variety of production code.

3A 2 Practicability

The thread coloring language

more-complete presentation of the language and its formal semantics appears in [24].

is the potential to be put into practice successfully.

234

1 2 3

/ ∗ ∗ @ColorDeclare AWT, Compute ∗ @ I n c o m p a t i b l e C o l o r s AWT, Compute ∗ @MaxColorCount AWT 1 ∗ /

Figure 2. AWT/Swing annotations tool to diagnose a set of “seemingly random intermittent failures” experienced by the development team and their users. Policy for the GUI. The Electric GUI uses the AWT and Swing frameworks, which share a single thread usage policy. The GUI framework implementation is not multi-threaded; rather, it executes in its own “event thread” and prescribes rules for how non-GUI threads may interact with it through the framework APIs. The salient points of the policy (after [5]) are: 1. There is at most one “AWT thread” at a time per application. 2. There may be any number of separate “Compute threads.” 3. A Compute thread is forbidden to paint or to handle AWT data structures or events. Failure to comply can lead to exceptions from within the AWT, because the AWT avoids both potential deadlock and data races by accessing its internal data structures from within a single thread, without the use of locks. 4. Extended computation on the AWT thread is forbidden; “brief” computation is acceptable. While the thread is computing, it cannot respond to events or repaint the display; this “freezes” the GUI until the computation finishes.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Figure 3. Class Job from Electric (Part 1)

Modeling thread usage policies. The thread coloring language uses formal program annotations to specify preconditions that express relationships among (1) concrete code segments, (2) abstractions on thread identity, and (3) abstractions on data fields. The thread coloring language identifies code segments in two ways: by the choice of particular program points at which annotations are placed, and through the use of “scoped promise” annotations, described in Section 5.2, that use simple syntactic polymorphism (a.k.a. wildcards) to achieve the effect of positioning annotations at multiple program points. The language identifies threads through an intermediate abstraction called a thread role, because we observed that developers focus on the role or purpose a thread performs (e.g. the “event thread” in a GUI, or a “writer thread” in a simulation) rather than on its identity. A thread may perform one role or many roles; its roles may change during execution, such as when a worker thread is taken from or returned to a thread pool. Similarly, a thread role may be performed by only one thread at a time, or by many threads at once. The thread coloring language abstracts data fields to data regions in order to name shared state without total compromise of abstraction and information hiding in a component. We use the mechanism introduced by Greenhouse and Boyland [13, 14], because our system is more concerned with effects than with aliasing. Regions match our needs better than ownership types [7] or the ownership domains of [2]. We declare data regions using annotations in the code; we map fields into regions using additional annotations. 2.2

/∗∗ ∗ @ColorDeclare DBChanger , DBExaminer ∗ @ I n c o m p a t i b l e C o l o r s DBChanger , DBExaminer , AWT ∗ @MaxColorCount DBChanger 1 ∗ / p u b l i c a b s t r a c t c l a s s J o b implements . . . , R u n n a b l e { /∗∗ Thread which e x e c u t e s a l l Database change Jobs . ∗/ p r i v a t e s t a t i c c l a s s DBChangesThread e x t e n d s T h r e a d { ... p u b l i c v o i d r u n ( ) { / / @Grant DBChanger for ( ; ; ) { Job job = waitChangeJob ( ) ; job . run ( ) ; } } } p r i v a t e synchronized Job waitChangeJob ( ) { for ( ; ; ) { ... i f ( j o b . j o b T y p e == EXAMINE) { ... T h r e a d t = new T h r e a d ( j o b , j o b . jobName ) ; t . start () ; continue ; } / / j o b . j o b T y p e == CHANGE | | j o b T y p e == UNDO i f ( . . . no−r u n n i n g−e x a m i n e r s . . . ) { return job ; } } t r y { w a i t ( ) ; } c a t c h ( I n t e r r u p t e d E x c e p t i o n e ) {} } }

Expressing models of thread usage policy at APIs.

Our next step is to declare any needed thread roles. The @ColorDeclare annotation declares names for roles; these names are thread colors. Colors are opaque identifiers that have their own name-space. Colors can be fully qualified based on the location of their declaration; this is exactly analagous to a fully qualified name for a Java method. The Electric policy mentions two thread roles: one for examining the Database and one for changing it. Our declaration of these two roles is in Figure 3. We will use the DBChanger color for all change Jobs and the DBExaminer color for all examine Jobs. Simi-

235

larly, in the GUI policy we see two thread roles: the “AWT thread” and “Compute threads.” We declare these roles on line 1 of Figure 2, thus capturing the roles described in rules 1 and 2 of the GUI policy. We use these colors for the AWT thread4 and for all other threads, respectively. Note that for libraries and other black-box code, we support “stand-off” annotations that avoid reliance on source-code access; rather, they identify program points in other ways, mainly by using scoped promise annotations. The actual annotations for the AWT/Swing API use the stand-off form; we present the in-line form for clarity. Next, we declare global constraints on thread roles. For example, an important aspect of the GUI policy indicates that the “AWT thread” and the “Compute threads” are distinct; it is forbidden for any thread to perform both roles at once. Similarly, the Electric policy implies that the Change and Examine roles are distinct. In each case, this incompatibility property allows us to conclude that a thread that has one of these roles necessarily excludes all of the others. The tool enforces the incompatibility automatically by both introducing appropriate additional preconditions at all color constraints (see below) as well as appropriate postconditions after each @Grant annotation (see Section 2.3). We state these incompatibilities using the annotations on line 2 of Figure 2 and line 3 of Figure 3. Note that the annotation for Electric specifies the incompatibility for Electric’s thread roles as well as for the GUI’s AWT thread.5 These annotations capture the third rule of the GUI policy and the analogous—but implicit—rule for Examine and Change jobs from the Electric policy. Note that if we had followed the Electric policy 4 We defer to common usage and refer to “the AWT thread” rather than the more precise formulation “the only thread performing the AWT role.” 5 Because all non-AWT threads are seen by the GUI as threads performing the Compute role , threads with the DBChanger or DBExaminer colors should also have the Compute color. For purposes of this example, however, it is sufficient to state that they are incompatible with the AWT color.

literally, our annotation might have incorrectly omitted the AWT thread.6 Finally, both the GUI policy and the Electric policy identify thread roles that may be performed by at most one thread at a time. We document this with the annotations on line 3 of Figure 2 and line 4 of Figure 3.7 Because “any number” is the default, we omit @MaxColorCount annotations for the other colors. Preconditions for API methods. The third rule in the GUI policy8 has an important implication: because access to the GUI’s internal state is limited to the AWT thread, we may invoke most GUI methods only from the AWT thread. To express this constraint, we first introduce two new concepts. As noted earlier, the roles that a thread performs may change in the course of execution. At any given time in execution, therefore, we may speak of the set of current role names as the color bindings of the thread. However, since our analysis tool performs a static analysis, we need a static approximation of the color bindings of a thread. That approximation is a color environment, the set of all color bindings for all threads that may ever execute at a particular program point over all possible program executions. Because a color environment is a set of color bindings, we can represent a color environment using a Boolean expression over colors; colors omitted from these expressions lack a required value. All possible color bindings that satisfy the expression are members of the color environment. We defer to Section 3 the explanation of how the analysis tool computes the color environment at the beginning of a method. We use the phrase color constraint to specify the acceptable color environment for a given method or data region. We express color constraints using Boolean expressions over colors. Consistency of code and model roughly amounts to satisfaction of appropriate color constraints by the color environments found at call sites and data references. We place color constraint annotations with the definition of the method or data region being constrained, thus implicitly specifying the location of the constraint. For example, a method that requires the AWT thread (e.g., to assure state confinement to a thread role) is tagged with “@Color(" AWT")”. This is a precondition for the method. We omit postconditions for colors because invoking a method never changes the color environment at the call site. By default, overriding methods inherit the color constraint of their parent; interfaces are treated analogously. Thus, GUI framework callback methods inherit their constraint from the framework API. A consequence of the @IncompatibleColors annotations in this example is that @Color("AWT") is equivalent to @Color("AWT & !Compute & !DBChanger & !DBExaminer"). Clearly, because the constrained methods will execute only on the AWT thread, their data accesses will conform to the rule. Most methods in the GUI framework APIs have similar annotations. Many libraries and frameworks contain methods that may legitimately be invoked from any thread. A @Transparent constraint on such methods expresses this flexibility, which differs from accidental flexibility in that the constraint represents an explicit statement that the transparency of the annotated method is a design feature. 2.3

33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

@Color ( " DBChanger | DBExaminer " ) public void run ( ) { try { i f ( j o b T y p e == CHANGE) { / / @Revoke DBExaminer / / @Grant DBChanger / / ∗∗ Change s u p p o r t c o d e ELIDED doIt () ; } e l s e i f ( j o b T y p e == EXAMINE) { / / @Revoke DBChanger / / @Grant DBExaminer doIt () ; } e l s e i f ( j o b T y p e == UNDO) { / / @Revoke DBExaminer / / @Grant DBChanger changingJob = t h i s ; doIt () ; } } catch ( Throwable e ) { . . . } finally { i f ( j o b T y p e == Type . EXAMINE) { / / @Revoke DBChanger / / @Grant DBExaminer / / ∗∗ c l e a n−up f r o m e x a m i n e j o b ELIDED } e l s e { / / @Revoke DBExaminer / / @Grant DBChanger / / ∗∗ c l e a n−up f r o m c h a n g e j o b ELIDED } } } }

Figure 4. Class Job from Electric (Part 2) to express policy models for internal components by adding color constraints to methods and data regions within the client code. Granting thread colors. We add and remove colors by placing @Grant and @Revoke annotations at the program points where a thread begins an activity, and so performs a role appropriate to that activity. Intuitively, these annotations assign a list of colors to and remove a list of colors from threads. From a static semantics perspective, @Grant and @Revoke adjust the current color environment of the lexical block. Thus, @Grant adds the listed colors to every color binding in the color environment; @Revoke removes its listed colors analogously. Both @Grant and @Revoke annotations change the color bindings of a thread only until the end of the lexical block in which the annotations appear; the thread’s color bindings are restored to their previous value on block exit. This restoration maintains the postcondition described above. We permit @Grant and @Revoke annotations only at the beginning of lexical blocks. This restriction on placement of these annotations allows us to use a flow-insensitive analysis. In early case studies, we used prototype tools based on flow-sensitive analysis. It became clear that the added expressiveness confused developers, and was rarely needed even in the most complex multi-threaded code. Thus the flow-sensitive analysis failed to justify its high cost in performance. This observation allowed us to switch to a flowinsensitive analysis. In the cases where there is impact, additional annotation and/or a slight code refactoring resolves the difficulty. Our case study experience has shown, however, that the need to adjust the color environment by hand arises infrequently. In Java client code, new activites commonly begin in the run() methods of Runnables and Executors, or the call() methods of Callables; thus, these methods are common locations for @Grant and @Revoke annotations. To model Electric’s policy, we hypothesize that these locations—including the creation points of jobs—are the most likely targets for associating thread roles with threads. We will check this—and provide an example of reverse-engineering policy—by examining the source code. Figures 3 and 4 show relevant code extracted from class Job. The declaration of DBChangesThread, the one thread allowed to change the Database, is on line 7 of Figure 3. The run() method for DBChangesThread is on line 12. Normal thread usage patterns

Expressing models of thread usage policy in client code

We perform two main steps when building policy models for client code. The first step is to assign colors to any threads that either are newly created or are re-used from thread pools. The second step is 6 This

mistake would not, however, have prevented our analysis from finding their thread usage policy error. 7 The current prototype omits enforcement of this annotation. 8 We defer discussion of expressing similar rules in the Electric policy to Section 2.3.

236

would indicate that this method is executed as the main body of its thread. Accordingly, we annotate it with a @Grant DBChanger annotation. As a consequence of this annotation, the method invocations on lines 11 and 12 are known to execute from the only thread with the DBChanger color. Method waitChangeJob, on line 16, handles Examine jobs separately from Change and Undo9 jobs; this raises our confidence in our choice of thread roles for this modeling effort. Two other items are of particular interest in this part of Figure 3: • Each Examine job is started in a new thread of its own (line 21). • On line 26 we see that Change and Undo jobs are permitted to

continue only when there are no outstanding Examine jobs.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

private s t a t i c c l a s s CheckSchematicHierarchically extends Job { protected CheckSchematicHierarchically ( . . . ) { s u p e r ( . . . , J o b . Type . CHANGE, . . . ) ; ... startJob () ; } @Color ( " DBChanger " ) public boolean d o I t ( ) { ... S c h e m a t i c . doCheck ( c e l l ) ; ... return true ; } ...

Figure 5. Example Change Job from Electric 8.0.1

Both observations are consistent with the policy. Figure 4 shows additional code from class Job. The run() method handles Change, Examine, and Undo jobs separately. This makes it easy for us to tell where to place @Grant annotations reflecting our knowledge of thread colors. We next consider our knowledge of this run() method’s call stack. It will be invoked either from the DBChangesThread’s run() method (see line 12 of Figure 3) or from the Java thread library, after a thread for an Examine job is started on line 21. Thus, for purposes of our model, the run() method might be called from either a color environment including the DBChanger color or one including the DBExaminer color. The @Color annotation on line 33 constrains its callers in exactly this fashion. We must now place annotations so that the analysis tool has the correct color environment for each arm of the ’if’-statement; fortunately, the code clearly states the intended thread role for each case. Ideally, we would like to place @Grant annotations in each arm of the ’if’-statement. The @Grant annotation produces an error when one of its listed colors is incompatible with some color in the current thread’s color bindings. Consider, for example, a thread performing the DBExaminer role that invokes the run() method in Figure 4. If the @Grant DBChanger on line 37 were placed before the ’if’statement on line 36 and the @Grant were permitted to succeed, the resulting color bindings would contain both DBExaminer and DBChanger, thus violating the global constraint established by the @IncompatibleColors annotation. This error would be reported by the analysis tool. However, even with the @Grant DBChanger annotation on line 37, our flow-insensitive analysis tool would report an error because it cannot know that lines 36 through 39 are executed only by Change jobs. The @Revoke DBExaminer on line 36 is needed to inform the analysis tool that threads with the DBExaminer color never execute the Change job code. Paired @Revoke and @Grant annotations are necessary ’by-hand’ user adjustments of the color environment to prevent the analysis tool from inaccurately reporting errors. Some flow-sensitive analysis techniques—including typestates [4] and ownership types [2, 7, 9], among others—are able to analyze such cases without this kind of user intervention.

of the declarations of Jobs, we inserted these annotations in fewer than fifteen minutes using the Eclipse program editor. As a result of these @Color annotations, the analysis tool uses its inference algorithm (detailed in Section 5.2) to propagate the relevant color environments through Electric’s call graph. All methods reachable from any Job will thus be known to be reached by the DBChanger color, DBExaminer color, or both, as appropriate. Changing source code A better approach than “by-hand” adjustment of the color environment is to refactor the code slightly so that it better matches the analysis tool’s flow-insensitive limitations. This approach both eliminates the need for paired @Revoke and @Grant annotations and also clarifies the client code. Such refactoring proved to be straightforward in every case study where changes to source code were permitted. We accomplish this in Electric by introducing three new classes, each of which extends class Job. We’ll create one class each for Examine, Change, and Undo jobs. Figure 6 shows the text of our new ExamineJob class. The key change is that we constrain the doIt () method of ExamineJob to be invoked only by the DBExaminer thread (as seen on line 14), and constrain the doIt() methods of ChangeJob and UndoJob to be invoked only from the DBChanger thread. We then modify each creation point for a job to use the appropriate newly written subclass of Job. Because we propagate annotations from parent classes to child classes, the individual Job creation sites no longer require annotation to indicate the desired thread usage; rather, they inherit their color constraint from their parent class. Figure 7 shows the resulting new version of Figure 5; similar changes appear at all Job creation sites. This refactoring replaces the eleven thread coloring annotations in the run() method of class Job with six thread coloring annotations: two in each of the three new sub-classes. It also removes the 155 annotations found at each of the Job creation sites. More importantly, the run() method for each subclass can be substantially simplified. At line 14 of Figure 6 on the following page we see the ExamineJob version of this method. Note that it is declared to be a DBExaminer-colored method, and that its body has been simplified to contain only the portions of the original run() method that pertain to Examine jobs. Similar simplifications apply to Change and Undo jobs as well. Note that the instances of ExamineJob, ChangeJob and UndoJob are not fully substitutable for all instances of their parent class. This may appear unfortunate but it is not, in fact, a newly introduced issue in the code. Prior to our introduction of these new subclasses, individual instances of class Job already suffered from this problem; some due to the difference between Examine, Change and Undo jobs, and others because they are actually implementing different operations. In either case, we have not introduced any new incompatibility.

Annotating individual Jobs. We now investigate the creation points of Jobs. Each class that defines a job contains a nested class similar to Figure 5. Note that on line 3 the developers declare that CheckSchematicHierarchically is a Change job. Examine jobs and other Change jobs are noted similarly throughout Electric. We add a color constraint the doIt() method of each Job with a @Color annotation that requires the appropriate color for the job, as on line 7 of Figure 5. There are 155 Jobs in Electric 8.0.1, so we used 155 annotations for this purpose. Because of the stylized form 9 Although

Undo jobs were not mentioned in the developer’s policy, we begin by treating them as Change jobs.

237

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

p u b l i c a b s t r a c t c l a s s ExamineJob e x t e n d s J o b { ... p r o t e c t e d ExamineJob ( . . . ) { s u p e r ( . . . , J o b . Type . EXAMINE, . . . ) ; ... startJob () ; } /∗∗ ∗ @Color DBExaminer ∗/ public a b s t r a c t boolean d o I t ( ) ;

1 2 3 4 5 6 7 8 9 10

Figure 8. Extracts from a Database class (Electric v8.0.1)

/ ∗ ∗ G u a r a n t e e d t o be c a l l e d o n l y f o r EXAMINE j o b s ∗ @Color DBExaminer ∗/ public void run ( ) { try { numExamine ++; doIt () ; } catch ( Throwable e ) { . . . } finally { d a t a b a s e C h a n g e s T h r e a d . endExamine ( t h i s ) ; } }

such data members were made using accessor methods, which were themselves constrained with @Color annotations. Further details on data coloring are reported elsewhere [24]. 2.4

2 3 4 5 6 7 8 9 10 11 12 13 14

Observations on related work.

Our expressed policy model for Electric—including both the constraints on the Database API methods as well as the locations at which we assign the DBExaminer and DBChanger colors to threads—acts as a proxy for managing permission to access the data contained within the Database component of Electric. Permissions-based analyses such as [8], and analyses based on both permissions and type-states such as [4] may offer a more direct approach to expressing these permissions. However, these analyses have not yet been demonstrated at scale. Our analysis neither assesses the correctness of the locking discipline used in class Job (see Figure 3), nor provides direct assessment of the lack of data races in the system. Among the rich body of prior work on locking and data races, [13, 15] assure the absence of data races through the use of models of locking behavior; [1, 6, 7, 12] assure the absence of data races using type systems; [11, 17, 19, 21] assure the absence of data races through the use of dataflow-based and context-sensitive methods; and [23] reports on an early static lockset-based analysis to assure the absence of data races. A common aspect of these approaches is that they assume the use of locks to regulate concurrent access to state. Because state in modern GUIs is generally confined to a single thread role without use of locks, these approaches have difficulty in addressing the need to respect the GUI’s restrictions regarding thread roles. Note that the role of event thread is not about thread identity. The role may be associated with more than one thread over time—for example, when displaying a modal dialog box—even though only one such thread is active at any given instant.

}

Figure 6. New ExamineJob class for Electric 8.0.1

1

package com . s u n . e l e c t r i c . d a t a b a s e . n e t w o r k ; p u b l i c c l a s s JNetwork { @Color ( " DBChanger " ) p u b l i c v o i d addName ( S t r i n g nm , b o o l e a n e x p o r t e d ) { . . . } @Color ( " DBChanger | DBExaminer " ) public int getNetIndex ( ) { . . . } @Color ( " DBChanger | DBExaminer " ) public I t e r a t o r getExportedNames ( ) { . . . } @Color ( " DBChanger | DBExaminer " ) public I t e r a t o r hasPorts ( ) { . . . }

private s t a t i c c l a s s CheckSchematicHierarchically extends ChangeJob { protected CheckSchematicHierarchically ( . . . ) { s u p e r ( . . . ) ; / / No n e e d t o p a s s CHANGE a r g u m e n t ... startJob () ; } / / Color c o n s t r a i n t i n h e r i t e d from p a r e n t public boolean d o I t ( ) { ... S c h e m a t i c . doCheck ( c e l l ) ; ... return true ; } ...

Figure 7. New version of Change Job from Electric 8.0.1 Annotating the Database API. The steps we have taken so far to model policy identify code that is invoked from the job kinds of interest. However, they fail to enforce an important desired property—only Change jobs may modify the Database, and only Examine or Change jobs may read the Database. We state this property by adding preconditions for the API of the Database. Figure 8 shows extracts of the code from JNetwork.java, a typical class from Electric’s Database module. We see accessor methods with names starting with “get” and “has”; we hypothesize that such methods only examine the Database. We also see methods with names starting with “add”; we hypothesize that such methods change the Database. We constrain these methods with the annotation @Color("DBChanger") for the change methods, and @Color ("(DBChanger | DBExaminer)") for the examine methods. As a consequence, code that invokes addName(...) from an examine job will be found inconsistent with the expressed model, because the color environment fails to satisfy the constraint on the addName(...) method—that is, the expression DBExaminer => DBChanger evaluates to false. At this point, we might also choose to apply a color constraint to some or all of the data members of the Database classes to ensure that they are in fact accessed only from appropriate threads. This process is known as data coloring. An important use of data coloring is to assess thread role confinement for data. Data coloring was not needed for the Electric case study because all accesses to

3.

Static consistency checking

The concise annotations of the thread coloring language express a model of thread usage policy. Our static analysis checks the consistency of the as-written code with the expressed model. Developers benefit from such an analysis because the implications of a thread usage policy are neither obvious nor local, so policy compliance is difficult to assure using testing and inspection. Our static analysis uses color environments to approximate the dynamic color bindings of threads. Because both color environments and color constraints are represented as Boolean expressions, we can check constraint satisfaction using Boolean logic. If the color environment at a call site, for example, satisfies the constraint on the called method, that call site is consistent with the expressed model. This consistency check is the core of our static analysis. The consistency check. The color environment specified by a color constraint is an upper bound on the environment satisfying that constraint. We therefore use that environment as the initial color environment for the body of the constrained method. We check a single method for consistency through a syntax-

238

directed traversal of the method, locating method/constructor invocations, data references, and lexical blocks containing @Grant and @Revoke annotations. At each call site or data reference, we check whether the current color environment satisfies the constraint, if any, on all possibly-invoked methods or possibly-touched data regions. At each @Grant annotation, we check whether the colors granted would be inconsistent with global constraints from @IncompatibleColors annotations. As a consequence of lexical scoping, at @Grant and @Revoke sites we save the prior color environment and use the modified environment within the scope of the lexical block; we restore the saved environment on block exit. The consistency check is flow-insensitive because color environments can change only at lexical block entries and exits, and the check performed at call sites and data references requires no context beyond the local color environment. When the syntax-directed traversal encounters a block tagged with a @Grant or @Revoke annotation, it must modify the color environment. Recall, a color environment is set of color bindings, each of which is a set of color names; any thread executing at this program point must have a set of colors that exactly matches one of these bindings. The effect of the @Grant or @Revoke annotation is to modify potentially all of the color binding sets to include or exclude the listed colors in the annotation, according to whether it is a @Grant or @Revoke, respectively. We represent environments and constraints as binary decision diagrams. We stated in Section 2.2 that methods marked as @Transparent may be invoked from any thread. In navigating the call graph (superset), the method-boundary annotations are treated as cutpoints. Therefore, any method invoked by a transparent method must also be transparent. There are known ways to work around such restrictions (e.g., “polyvariance”), but these have not yet been required. A successful consistency check for a single method provides a proof that it complies with the expressed policy, contingent on the assumptions that (1) all of its callers obey its color constraint, and (2) all actually invoked methods (via object-oriented method dispatch) are compatible with the constraints on the known overridings of their parent methods. If we assume that the entire program is available and that all methods are annotated with color constraints,10 we can check the entire program for consistency by iteratively checking the consistency of each method. If all methods are consistent, the entire program is also consistent. The consistency check is complicated by the definitions of “possibly-touched data regions” and “possibly-invoked methods” in the presence of potential aliasing of data references and objectoriented method dispatch. The precision of our data coloring analysis depends directly on the precision and power of the underlying (upper-bound) effects analysis. Thread coloring avoids depending on the specific choice of effects analysis; any analysis that provides upper bound effects results could be substituted. Determining the correct set of possibly-invoked methods at each call site is complicated by method dispatch, interface implementation, and the possibility of incomplete code—whether due to separate analysis, dynamic loading of code, or the use of reflection. To address these complications, we build a call graph that includes all known overridings of (or implementations of, in the case of Java Interfaces) the statically-apparent invoked method. We address unknown overridings by treating our color constraint annotations as analysis cut-points. In the event of overriding methods whose color constraint is incompatible with the constraint on the statically visible method at a particular call site, we require that the developer assure that the methods actually invoked from that call site have a constraint that is compatible with the visible constraint. A compatible constraint is satisfied by any color environment that satisfies 10 We

Name

Size in KLOC

App Info

GraphLayout, Clock (lab)

0.4 x2

Standard demo Applets

SkyViewCafé (lab)

25

star-chart visualization

JPL “4Thread” (field)

50

robot-motion planning, no GUI

JHotDraw (lab)

60

Open-source Graphic Editor

JPL Chill GDS (field)

100

Electric (lab & field)

140

Space science ground data system. Soft realtime. VLSI design tool

NDA 1 (field)

>5000

Large eCommerce server fwk.

NDA 2 (lab)

>3000

Large client-server app.

NDA 3 (field)

>1000

Large database app.

Five add’l case studies (field, 1 NDA)

50 to >1000

Varied, all fielded applications

Table 1. Significant Case Studies the known constraint on the parent. This requirement that the developer assure what the analysis tool cannot addresses the potential for unsoundness. The effort required has not been a burden in practice.

4.

Case Studies and Validation

We performed eleven case studies in the field; all systems analyzed during these engagements were production code. The in-field studies share three important characteristics: (1) an extremely compressed time-line, with multiple systems to analyze in two to five days on-site; (2) a lack of advance knowledge of the systems to be studied (with one exception); and (3) a need to produce the first interesting result within the first half-day of each system analysis. This final characteristic is particularly important because rapid modeling and analysis progress is impossible without the presence of at least one developer of the system being analyzed. Catching and holding the interest of a busy developer requires early and frequent demonstration of value following each increment of modeling effort. Small case studies. SkyViewCafé is interesting for two reasons. First, it contained the largest policy we have seen to date, with over thirty different thread roles. Second, it raised our awareness of the difficulty developers face in becoming aware of changes in framework policies. Developers of widely-used frameworks such as Swing make great efforts to inform their users of their thread usage policies. Unfortunately, when these policies change, this effort must contend with a vast body of books, articles, and example code that explain prior policies. SkyViewCafé was originally implemented in Java version 1.1 as an AWT applet. It was later updated to use the Swing GUI framework and to comply with the second AWT/Swing policy. It was updated yet again to comply with the third—and most recent— AWT/Swing policy. Change-logs and comments in the source show that the developer understood the policy details; nevertheless, the code failed to achieve compliance with the policy. This demonstrates that even developers who understand a new policy encounter difficulties when upgrading their system to support that policy. The NASA Jet Propulsion Laboratory (JPL) “4thread” application is a robot-motion-planning tool that produces optimized motion-plans for spacecraft and rovers. The developer produced the complete expression of his policy model overnight, with minimal training. This was an encouraging sign that the thread coloring language may be adoptable in the real world. The 60KLOC JHotDraw study showed us that access to developers is often a necessary ingredient for success. Because we lacked input from the developers, we had to infer their thread us-

relax these assumptions in Section 5.2.

239

1 2 3 4 5 6 7 8 9 10

public class Highlight { private Poly polygon ; / / @Color ( "AWT" ) −− i n f e r r e d by a n a l y s i s t o o l public void showHighlight ( . . . ) { if ( . . . ) { / / draw o u t l i n e o f p o l y . . . = ( p o l y g o n . g e t S t y l e ( ) == . . . ) ; drawOutlineFromPoints ( . . . , polygon . g e t P o i n t s ( ) , ... }

Large case studies. Three case studies performed under nondisclosure agreement—shown in Table 1 as NDA 1, 2, and 3— involved systems in excess of one million lines of code. During the in-lab study we analyzed two subsystems, each ~100KLOC in size, drawn from a main system of more than 3000KLOC. During each of the two in-field studies, we analyzed one subsystem of ~150KLOC, drawn from main systems of roughly 5000 and nearly 2000 KLOC, respectively. In each case, the developers were concerned with analysis of certain critical subsystems within their larger systems. Accordingly, we divided the systems into modules (see Section 5.1), and analyzed only the modules of concern. We modeled interfaces to the rest of the system only as required. In all three cases, we lacked sufficient time to model the entirety of the subsystems analyzed; we had time only for the portions and properties of greatest interest. The large case studies provided two main lessons. First, as we expected, the number of thread usage policies in a system grew with system size. However, we were surprised to find that model complexity remained essentially constant. This allowed us to concentrate on doing more modeling, rather than on inventing new modeling approaches during a case study engagement. It also suggests that relatively simple thread usage policies suffice to describe a wide variety of fielded code. The second lesson was that an incremental approach to modeling is far more important than we had anticipated. At the time of our first large case study, our prior experience with small and medium-sized case studies had already led us to address practicability issues (see Section 5). We expected that our support for composable analysis, our light-weight annotations and our low required annotation density would enable us to build complete models of entire subsystems within the timeframe of a case study engagement. We were wrong. Our techniques for analysis of partiallyannotated systems—originally designed and built to support incremental adoption by busy developers, rather than to support largescale programs—allowed us to provide useful results in spite of the time limits. In the absence of those techniques, our large-scale case studies would have ended in failure.

...);

Figure 9. Extracts of class Highlight from Electric 8.0.1

1 2 3 4 5

p u b l i c c l a s s P o l y implements Shape { @Color ( " ( DBChanger | DBExaminer ) " ) p u b l i c P o l y . Type g e t S t y l e ( ) { r e t u r n . . . ; } @Color ( " ( DBChanger | DBExaminer ) " ) public Point2D [ ] g e t P o i n t s ( ) { return . . . ; }

Figure 10. Extracts of class Poly from Electric 8.0.1 age policy by inspection of the source code. Unfortunately, none of our hypothetical policies were both self-consistent and also consistent with the code. Without information regarding the actual design intent, we were unable to determine which inconsistencies between our hypothetical policy models and the as-written code were caused by deficiencies in the models and which were due to errors in the code. Medium-sized case studies. The “Chill GDS” space science ground system application at JPL was missing its soft-realtime deadlines. The team wanted to improve performance, in part, by reducing locking overhead. They had identified heavily-used data that they hypothesized might be accessed from only one thread, and wanted help confirming their thread-confinement hypothesis. We used data coloring to prove that all but one of the groups of data were confined to a single role, whose thread identity changed over time. The system design guaranteed a maximum of one thread at a time performing that role. The team removed all locking for the non-shared data. The lead developer said that she would have considered the change “too risky” without our analysis results. We have discussed our experience with modeling thread policy for Electric in Section 2. The analysis tool discovered inconsistencies between Electric’s as-written code and the policy model we expressed. Specifically, the analysis reported thousands of inconsistencies in Electric’s GUI implementation. Figure 9 shows extracts from class Highlight, one of the problematic GUI classes. The tool discovered, by inference on the call graph, that the showHighlight method on line 4 is invoked only from the AWT thread; its color constraint is thus inferred to be @Color(“AWT”). It calls polygon.getStyle() and polygon.getPoints(), both of which are part of the Database API. Figure 10 shows the relevant code. Clearly, the GUI calls Database getter methods without participating in the reader/writer locking scheme implemented in class Job. This violates the Electric policy. Similar errors were common throughout the GUI code. Interestingly, the Jobs all obeyed the Electric policy. As newcomers to Java programming, Electric’s developers had forgotten that the GUI always executes from the AWT thread. This was the cause of the “seemingly random” failures in their system. The Electric development team diagnosed the problem independently by investing “weeks of time” from “multiple developers” to determine the source of their “random crashes.” Our approach took fewer than eight working hours. In-field case study experience suggests that working side-by-side with a member of the development team could have reduced this time to less than four hours.

Five additional case studies. Many of the challenges described in this paper required us to upgrade our language, our analysis techniques, and our implementation. The five case studies on the bottom row of Table 1 provided opportunities to demonstrate that these upgrades addressed the challenges from earlier studies; these studies did not pose new challenges and so are not discussed here.

5.

Practicability issues

Our experience modeling and analyzing production code led us to emphasize practicability issues in the design and implementation of our prototype analysis tool. For example, the scale of commercial systems drove us to support a compositional approach to analysis. The compressed time-lines encountered in the field drove our need for greatly reduced annotation density as well as for incremental modeling and analysis of the partially-modeled systems. In addition, compressed time-lines for addressing the stated needs of the host organization forced us to recognize the importance of focusing on a targeted portion of the system. Finally, the need to catch and hold the interest of busy developers led us to refine these approaches well beyond our initial goals. These considerations fundamentally shaped the design of both the thread coloring language and the analysis techniques employed. 5.1

Support for large systems

We use two related approaches to support analysis of large systems. The first approach is to support medium-sized systems in a single analysis run that completes in a reasonably short time. The second

240

Reduction techniques used None Constraint inheritance Plus constraint inference Plus scoped promises Plus module-scoped promises

Number of annotations

Annotation density in annos/KLOC

13110 10310 2950 888 35

~94 ~73.6 ~21 ~6.3 0.25

Table 2. Annotation reduction in Electric v8.0.1 approach is to provide means to divide large systems into subsystems whose analysis results can be combined. Our case study experience has shown that our analysis tool is able to analyze a medium-sized system in a single run. For example, Electric v8.0.1 contains ~140KLOC spread across forty-four Java packages. A ground-up analysis of Electric completes in 20 minutes on a 2.0GHz(x2) Power Mac G5 with 2GB memory. Our tool performs lazy incremental recomputation; typical re-analysis times for Electric ranged from thirty seconds to five minutes. We support compositional analysis through the use of Modules— a packaging construct analogous to JSR294 Modules [18].11 A detailed discussion of modules is beyond the scope of this paper. For purposes of this discussion, the salient features of modules are: (1) modules are sealed; all code—including dynamically loaded code—that is part of a module is identified at analysis time and is presented for analysis; and (2) modules control static visibility through the use of explicitly identified interfaces; only those Java entities that are part of a module’s interface are visible outside that module. We call methods in such an interface API methods. Our analysis supports composition by treating the identified interface of each module as an API whose annotation is exactly analogous to the component interfaces discussed earlier. As with component APIs, the expressed model at the interface is seen by both components. This is a key enabler of compositional analysis—we can derive consistency results for code and model in one component (the Electric tool in our worked example) on the basis of assumed consistency of code in other components (the AWT/Swing library framework in the worked example). The analysis tool assists in tracking where further verification steps are needed. 5.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

while ( ! w o r k l i s t . isEmpty ( ) ) { mToProcess = w o r k L i s t . g e t ( ) ; c u r r C E n v = mToProcess . g e t I n i t i a l E n v ( ) ; f o r e a c h ( p l a c e i n mToProcess . g e t C a l l s i t e s A n d B l o c k s ( ) ) { switch ( place . kind ) { c a l l S i t e : f o r e a c h (m i n p l a c e . p o t e n t i a l l y I n v o k e d ( ) ) { newCEnv = m. c a l l C E n v . o r (m. c a l l C E n v , c u r r C E n v ) ; i f ( newCEnv ! = m. c a l l C E n v ) { m. c a l l C E n v = newCEnv ; w o r k l i s t . add (m) ; } } break ; blockStartGrantRev : cEnvStack . push ( currCEnv ) ; currCEnv = processRevokes ( place , currCEnv ) ; c u r r C E n v = p r o c e s s G r a n t s ( p l a c e , c u r r C E n v ) ; break ; b l o c k E n d G r a n t R e v : c u r r C E n v = c E n v S t a c k . pop ( ) ; break ; }}}

Figure 11. Color constraint inference pseudo-code Constraint inheritance disproportionally reduces annotation density for code that extends frameworks whose APIs have thread coloring annotations; although Electric’s GUI module contains only one third of the methods, it receives 96% of the benefit. Constraint inheritance reduces the required number of annotations from 13110 to 10310. However, this number is still too large for practicability. Color constraint inference. We further reduce annotation density by inferring color constraint annotations for methods that lack them. For an unconstrained method, we can usually use the union of the color environments at the call sites that invoke that method (its calling environment) as its inferred color constraint. The soundness of this inference depends on two properties: (1) all possible call sites have been considered, and (2) our computation of the calling environment is robust in the presence of strongly connected regions in the call graph, i.e. self-recursive and mutually-recursive methods. The sealed property of modules provides the necessary guarantee that the analysis has seen all code contained in the module. Thus, we have the complete set of internal call sites. We use the visibility property of modules to limit our inference of color constraints to only those unconstrained methods that cannot be invoked from outside of the module (i.e., the local methods). Thus, we exclude all API methods as well as those methods that override or implement a parent that is outside of the module.13 The soundness of color constraint inference depends on correct computation of the calling environment in strongly connected regions in the call graph. The computation which meets this need is an abstract interpretation iteration to an upper bound over the call graph of the program being analyzed. The lattice for the iteration is a disjunction lattice with Boolean expressions as members, false as Bottom, and a special “uncolored” value as Top. The rationale for “uncolored” as Top is that a call from a method known to be uncolored should force the calling environment to be uncolored. This differs from the constraint true; methods that are @Transparent have a value of true, and are colored. Thus, Top must be more restrictive than true. We use Top as the initial calling environment for all methods with no predecessors in the call graph. The iteration is implemented using a fairly standard work-list algorithm. As seen in Figure 11, the initial color environment for each method is fetched from its explicitly-written color constraint—if present—or from its computed calling environment. While traversing a method—on lines 6 through 8—at each call site, the algorithm updates the calling environment of all possibly-invoked methods to include the current color environment at the call site. When the up-

Reducing annotation density

The consistency checking algorithm described in Section 3 assumes the presence of a color constraint annotation for every method and constructor in a program. To assess the realism of this assumption, we now consider the number of annotations needed to model Electric’s thread usage policy and to perform the consistency check. Electric contains 9814 methods, excluding private ones.12 If all of our annotation reduction techniques were turned off, each non-private method would require a color constraint annotation. There would also be additional overhead annotations. Complete hand-annotation of Electric would require 13110 annotations, or ~94 annotations per KLOC. This clearly fails to be practicable. We now present techniques for reducing annotation density, measured in annotations per KLOC of non-library, non-framework code. Table 2 shows the effect of applying successive techniques to Electric v8.0.1. Color constraint inheritance. Recall that an overriding method inherits its color constraint from its transitive parents by default. 11 When

we began our work with modules in 2006, neither JSR297’s original superpackages scheme nor its current module approach was available. Building our own module system was the only available solution. 12 Private methods never require hand-annotation because color constraint inference can always compute their calling color environment.

13 Such

methods could be invoked via dynamic method dispatch even though they are not statically visible outside the module.

241

date changes the calling environment for a method, the algorithm adds that method to the work list, lines 8 and 9. Lines 12 through 15 compute the effect of @Grant and @Revoke annotations on the current color environment on entry to their lexical block. When the algorithm finishes, it has computed an observed calling environment for all methods in the call graph; this result is complete only for local methods. We use the calling environment as the color constraint for each unconstrained local method. When the inference algorithm produces a calling environment of Top for an unconstrained local method, we say that the method is uncolored. Color constraint inference is successful at reducing the number of annotations; it infers constraints for ~94% of Electric’s unconstrained local methods. Unfortunately, nearly 3000 annotations remain to be written; 21 annotations per KLOC is still not practicable.

Undecided (info) Almost consistent (warning)

Undecided (warning)

Consistent coloring (positive)

Inconsistent coloring (error)

Complete Information

Figure 12. Coloring result states

dotted lines represent retractions of model information. Note that adding color constraints cannot change a finding of consistency to one of inconsistency. We will consider these states individually. Analysis of a module that entirely lacks thread coloring annotations—for its own code, as well as for all of the external frameworks and APIs it interacts with—produces no error messages; rather, these results are in the “Undecided (info)” state. The analysis tool also produces messages for each API method, identifying them as good locations for additional color constraint annotations. Similarly, at any call site where both the enclosing method and all possible invoked methods have color information—the “complete information” case—the analysis tool reports either positive results or errors. The remaining two result states involve calls between colored code and uncolored code. Strict, conservative analysis would report such calls as errors because they cannot be proven consistent; this is the view of our consistency check. Reporting such invocations as errors proved to be a poor choice; it yielded too many error messages when policy models were incomplete. This effect has been previously noted by others, including [25]. Thus, we treat these cases as warnings unless directed otherwise by the user. This approach requires that the analysis tool be able to distinguish uncolored code from colored code. When an uncolored call site invokes a colored method, the analysis tool produces an “Undecided” warning message at the call site. These call sites would benefit from annotation that establishes the color environment at the call site. Similarly, call sites in colored code that potentially invoke an uncolored method are neither provably consistent nor provably inconsistent. The analysis tool produces a warning message at the call site and an “Almost consistent” message for the method that contains the call site. When the invoked method is an API method, the analysis tool also proposes the API method’s observed calling environment as a possible color constraint for that method. The above discussion is limited to method call sites. The prototype analysis tool also produces results for data references, @Grant and @Revoke annotations, and color constraint inference. Each of these has its own rules for result states, messages, and suggestions.

Scoped Promises. We use scoped promises (due to Halloran [16]) to take advantage of the stylized naming schemes seen at APIs (also previously noted in [3]), as well as the small number of distinct Boolean expressions used for the color constraints at those APIs. A scoped promise is an annotation that (1) carries another annotation as its payload, (2) applies only within the scope in which it is placed, and (3) uses wild-carded matching on method signatures, field declarations, and class names to determine the specific points within its scope at which the payload annotation will be placed. The scoped promise @Promise "@Color (DBExaminer | DBChanger)"for get*(**)| has*(**) carries as its payload a typical color constraint for Electric’s Database accessor methods. The expression after the for is the location specification. The special argument wild-card “**” matches any number of arguments of any type. Location terms may be parenthesized and combined using “&”, “|”, and “!” operators. Thus, the example matches any method in scope whose name begins with either “get” or “has”, without regard to the number or type of its arguments. We use scoped promises primarily to reduce annotation density for API methods. The API of Electric’s Database contains 1731 methods that are referenced from other modules. By using scoped promises, we replace over 1700 color constraint annotations with six scoped promises in each of the nine packages in the Database module. Similar use of scoped promises in the remainder of Electric saves an additional 331 annotations. Thus, 888 annotations remain to be written after the use of scoped promises, for an annotation density of ~6.3 per KLOC. If the analysis tool supported a modulescoped location for scoped promise annotations, we could combine the annotations into a single module-scoped file containing only six annotations. In the end, we would need only 35 annotations for the entire program—an annotation density of 0.25 per KLOC. Because scoped promises derive their effectiveness largely from stylized naming conventions, the degree to which they reduce annotation density varies from one system to another; these results are typical within our case study experience. 5.3

Least Information

Support for incrementality

We discussed a limited form of incrementality by dividing large programs into separately-analyzed modules. However, modules that are large enough to provide significant benefit for color constraint inference are too large to be modeled completely in a short period of time. Thus, we must provide useful results for modules whose policy models are incomplete. Support for modules with incomplete policy models introduces three new requirements for the analysis tool. It must (1) suppress error messages for uncolored code; (2) provide useful and actionable suggestions for additional color constraint annotations; and (3) limit claims of consistency or inconsistency to fully provable cases. The five states shown in Figure 12 represent the possible result states for an analysis finding at a call site. Solid lines represent possible changes due to insertion of color constraint annotations;

6.

Limitations of our analysis

There are three limitations not yet addressed in our as-implemented analysis: Java reflection, dynamically loaded code, and the thread coloring language’s handling of the run() methods of Runnables. Java reflection allows its users to break encapsulation; further, developers can write complex code that computes the name of a function to be invoked. Our experience with reflection in case studies is that documentation specifies which methods it may invoke. Since a general solution to reflection is elusive, we require advance agreement between the parties on the two sides of the reflective invocation. The caller specifies what thread roles may use reflection to invoke specific methods from the other component. The author

242

7.

of the invoked methods is responsible for ensuring that they have explicitly-written color constraints that match the caller’s specification. In this special case, the analysis tool cannot check whether the as-written code and the required color constraints are consistent. However, the uses of reflection we have encountered are so rare and so stylized that hand-checking them has been easy. The second limitation arises because the analysis tool assumes that each component loaded at runtime will interact only with components that are consistent with those it observed at analysis time. Although this property cannot be checked statically, it can be enforced by a custom class loader, as in [10]; this is future work. The final limitation is our current approach to assessing the consistency of the color constraint on the compare() methods of generic container classes (e.g. Java Collections), the run() methods of Runnables and Executors, and the call() methods of Callables with their calling environment. The common aspects of these cases are that (1) the program points from which they may be invoked are often on the far side of the API of a separately analyzed component, and (2) the color constraints on the various instance methods are necessarily and legitimately incompatible both with any possible constraint for their parent methods and with the constraints on other instances of the same class. The thread coloring language and analysis lack a technique for assuring that the run() method passed to such an API can actually execute in the color environment of the invocation point; the language fails to provide means by which the API author can specify an appropriate color constraint. We can resolve this difficulty by adding support for parameterized color constraints to the language. This capability would allow us to parameterize the declaration of a field, variable or parameter (we speak only of “field” hereafter) with a color constraint, and to analyze that annotation as though it were part of the type of the field. At each assignment to the field, the analysis would check that the assigned class contains a color constraint that satisfies the constraint on the field. Thus, for example, the Runnable parameter of Swing’s invokeLater method could be specified as “any Runnable whose run() method has a color constraint that is satisfied by the calling environment AWT,” e.g., the environment of the AWT thread that will execute the method. Similarly, one could declare a Collection field that accepts only those ArrayLists whose compare() method meets the constraint on the field. A parameterization feature designed along these lines should entirely resolve the limitation in our analysis. In the absence of a parameterized color constraint feature, our case study experience suggests that run() methods can be handchecked for consistency at the cost of a few moments of thought about the code in question. The cost and risk of this by-hand check are acceptable for a prototype implementation. We have two interim approaches for Collections. The first is to hand-check as for Runnables. This approach requires low effort, but poses some risk of unsoundness. The second interim approach is to use data coloring to constrain the field(s) that hold references to such Collections to be accessible only from the correct thread roles. We then use the effects system to prove that the Collections in question are accessed only through the constrained fields, thus guaranteeing that the intended constraint on the compare() method can never be violated. This approach requires significant effort, but avoids introducing potential unsoundness. The costs and risks of these interim approaches are acceptable for a prototype implementation. Design and implementation of a parameterization feature for color constraints remains as future work.

Conclusion

We see general value in the thread coloring approach as an efficient means for static management of state consistency and other concurrency issues without recourse to locks or other runtime features. Explicit thread coloring annotations would permit developers to clearly and unambiguously express thread usage policies both to developers of client code and for their own future use. When such annotations change, client developers would be able to assess the compliance of their code with the latest policy. We have validated the thread coloring approach through more than ten field studies in which we analyzed a total of over one million lines of production code. These studies involved systems ranging from 50KLOC to more than 5000KLOC. In the course of these studies, we modeled thread usage policies that support a wide variety of strategies for managing access to shared state; we expressed those models using an average of roughly six annotations per KLOC. In short, we have demonstrated that our approach is sufficiently expressive and scalable to capture and analyze realworld thread usage policies in a practicable manner. Incorporation of thread coloring in IDEs could benefit software developers generally. Developers can use thread coloring annotations to build models of thread usage policies and to document the threading behavior of their code. These annotations would enable use of tools to assure consistency of as-written code with stated policies and to guide repairs when inconsistencies are identified.

Acknowledgments Greg Hartman and Elizabeth Sutherland spent many hours reading, editing and critiquing this paper. The anonymous referees provided many useful suggestions and insightful questions and comments. These inputs vastly improved the quality of this paper. The authors greatly appreciate the value of these contributions.

References [1] R. Agarwal and S. D. Stoller. Type inference for parameterized race-free Java. In Proc. Conference on Verification, Model Checking and Abstract Interpretation, pages 149–160, 2004. [2] Jonathan Aldrich and Craig Chambers. Ownership Domains: Separating Aliasing Policy from Mechanism. In ECOOP, pages 1–25, 2004. [3] AspectJ Team. The AspectJ Programming Guide, 2004. URL http://eclipse.org/aspectj/doc/released/progguide/index.html. [4] Kevin Bierhoff and Jonathan Aldrich. Lightweight Object Specification with Typestates. In FSE, pages 217–226, September 2005. [5] Joseph Bowbeer. The last word in Swing threads – Working with asynchronous models, May 2005. URL http://java.sun.com/products/jfc/tsc/articles/threads/threads3.html. [6] Chandrasekhar Boyapati and Martin Rinard. A parameterized type system for race-free Java programs. In OOPSLA, pages 56–69, 2001. [7] Chandrasekhar Boyapati, Robert Lee, and Martin Rinard. Ownership types for safe programming: preventing data races and deadlocks. In OOPSLA, pages 211–230, 2002. [8] J. Boyland. Checking interference with fractional permissions. In R. Cousot, editor, Static Analysis: 10th International Symposium, volume 2694 of LNCS, pages 55–72, 2003.

243

[18] JSR294 Expert Group. JSR 294: Improved modularity support in the Java programming language. URL http://jcp.org/en/jsr/detail?id=294.

[9] David G. Clarke, John M. Potter, and James Noble. Ownership types for flexible alias protection. In OOPSLA, pages 48–64, 1998.

[19] Mayur Naik, Alex Aiken, and John Whaley. Effective static race detection for Java. In PLDI ’06, pages 308–319, 2006.

[10] John Corwin, David F. Bacon, David Grove, and Chet Murthy. MJ: A Rational Module System for Java – and its applications. In OOPSLA, pages 241–254, 2003.

[20] D. L. Parnas. On the criteria to be used in decomposing systems into modules. Commun. ACM, 15(12):1053–1058, December 1972.

[11] Dawson Engler and Ken Ashcraft. RacerX: Effective, static detection of race conditions and deadlocks. In SOSP, pages 237–252, 2003. [12] Cormac Flanagan and Stephen N. Freund. Type-based race detection for Java. In PLDI, 2000.

[21] Polyvios Pratikakis, Jeffrey S. Foster, and Michael Hicks. Locksmith: Context-sensitive correlation analysis for race detection. In PLDI ’06, pages 320–331, 2006.

[13] Aaron Greenhouse. A Programmer-oriented Approach to Safe Concurrency. PhD thesis, Carnegie Mellon, May 2003.

[22] StaticFreeSoftware. Electric. URL http://www.staticfreesoft.com/productsFree.html.

[14] Aaron Greenhouse and John Boyland. An Object-Oriented effects system. In ECOOP, pages 205–229, 1999.

[23] N. Sterling. Warlock: A static data race analysis tool. In USENIX Winter Technical Conference, pages 97–106, 1993.

[15] Aaron Greenhouse, T. J. Halloran, and William L. Scherlis. Observations on the assured evolution of concurrent Java programs. Sci. Comput. Program., 58(3):384–411, 2005.

[24] Dean F. Sutherland. The Code of Many Colors: Semi-automated Reasoning about Multi-Thread Policy for Java. PhD thesis, Carnegie Mellon University, Pittsburgh, PA 15213, May 2008.

[16] Timothy J. Halloran. Towards a Scalable and Adoptable Approach to Analysis-based Verification of Mechanical Program Properties. PhD thesis, Carnegie Mellon, to appear.

[25] Yichen Xie and Alex Aiken. Context- and path-sensitive memory leak detection. SIGSOFT Softw. Eng. Notes, 30(5): 115–125, 2005. ISSN 0163-5948.

[17] Thomas A. Henzinger, Ranjit Jhala, and Rupak Majumdar. Race checking by context inference. In PLDI, pages 1–13, 2004.

244