Likewise, we shall use slice and program slice interchangeably. 3. We shall often refer to dependent program points as the dependent and dependee program ...
Indus - Java Program Slicer Venkatesh Prasad Ranganath, Kansas State University
Table of Contents Background ............................................................................................................... 1 Program Slicing .................................................................................................. 1 Dependences ...................................................................................................... 1 Java Program Slicer ............................................................................................. 2 Design and Architecture ............................................................................................... 2 Design Rationale ................................................................................................. 2 Architecture ....................................................................................................... 3 Implementation details ................................................................................................. 5 Bird's eye view ................................................................................................... 5 The Details ........................................................................................................ 7 Closing Note .............................................................................................................. 9 Bibliography .............................................................................................................. 9
Background Program Slicing Program slicing is a well known (at least in the research arena) program analysis technique that can be used to find the program points1 affected by a given program point and vice versa. Given a program point the slicing algorithm identifies the program points that affect the given program point. The program in which the slice is identified is referred to as substrate program, the given program point is referred to as slice criterion, and the identified program points constitute a program slice w.r.t. the given criteria2 . A program slice constructed by identifying program points that affect a given program point is called a backward slice. A program slice composed of program points affected by a program point is called a forward slice. We use the term complete slice to indicate a program slice which is the union of both backward and forward slice w.r.t the given criteria. We refer to this aspect of the slice as type of the slice. It is clear to see that the program points identified as mentioned above may not constitute an executable program. Hence, we introduce the term executable slice to indicate that a slice is executable.
Dependences The concept of a program point affecting/affected by another program point is captured as dependences. Dependence can be thought of as a relation between two program points x and y that indicates if x depends on y. In a dependence relation between x and y where x depends on y, we refer to x as the dependent and y as the dependee 3. There are many notions of dependences and data and control dependence are the most common and simple notions of dependences that can occur even in simple non-procedural sequential programs. In a 1 A program point may be an expression or a statement in a program. 2 We shall use criteria and slice criteria interchangeably. Likewise, we shall use slice and program slice interchangeably. 3
We shall often refer to dependent program points as the dependent and dependee program point as the dependee.
1
Indus - Java Program Slicer
simple setting data dependence indicates if the variable being read at a program point is influenced by another program point at which the same variable is being written. Similarly, control dependence indicates if the flow of control to a program point is dependent on another program point. Given the notion of dependences, a program slice can be thought of as the transitive closure of the slice criteria based on dependence relation. Please refer to the user guide of StaticAnalyses module for more information about dependences.
Java Program Slicer Most of the literature about program slicing concentrates on it's use for the purpose of program understanding. Lately there have been efforts to apply slicing in areas such as error detection [cite Snelting/ Krinke] and model size reduction in model checking [CorbettICSE00]. The last application has been our driving force to design and implement a program slicer for Java in Java.4. Bandera is a tool kit that can be used to verify properties about a Java program via model checking. Given a property various tools are used to extract a model of the program from the source and verify if it has the given property, hence, verifying the program also has the property. During the process of extracting a model, we have applied a program slicing to prune out parts of the program that are not necessary to discharge the existence of the given property. Our first implementation of a Java program slicer was unsuccessful from 2 reasons. One reason was it was buggy. The second reason was that it was tightly coupled with Bandera. Both these reasons compelled us to design and implement a Java program slicer from scratch, hence, the current product and the document you are reading! Please refer to [HatcliffSAS99] for more information about slicing for the purpose of model checking.
Design and Architecture Design Rationale Any one with a mind for software engineering will realize that dependence information and program slicing should not be tightly coupled as the latter depends on the former while the former does not depend on the latter. Hence, we have separated the slicer from dependence analyses. This means that the slicer module depends on "a" dependence module to provide the information it requires via a well defined "minimalistic" interface. Hence, the slicer can be composed with any implementation of the specified interface. Similarly, any application requiring dependence information can use the dependence analyses as is5 The net effect being we were able to break down the previous single large chunk of the slicer into two smaller reusable modules. Our previous implementation of the slicer was monolithic as it was specifically designed and implemented for Bandera. Hence, it was geared to generate executable backward slices. As mentioned before this is just one type of a slice that can be generated. Our experience indicates that various types of slices and various properties required of the slice can be combined in various ways and not all combinations are valid. For example, it may be possible to collect minimal extra program points, such as return statements in procedures, along with minor alterations to the slice into a backward slice to make it executable such that the behavior of the slice is identical to that of the substrate program up until the program points that are the slice criteria. However, the same is not true for forward slices as the future state of the program depends on the previous states beyond the criteria, hence, requiring a backward slice from the criteria and this makes the slice a complete slice. This is the reason we decoupled the generation of a type of a slice and ensuring any property, such as executability, required of the slice. This is reflected "as is" in 4
In accordance with "Every good work of software starts by scratching a developer's personal itch", one of the 7 lessons in "The Cathedral and The Bazaar" document. 5 May be with a thin interface adaptation layer.
2
Indus - Java Program Slicer
the design by having a module to generate the slice of a type while another module "massages" the generated slice so that it possesses the required property. We refer to the former module as the slicing engine and the latter is considered as part of post processing phase.
Figure 1. The design of the Slicer
Most literature on slicing do not make the distinction between the identification of the slice and the representation of the slice as they do not consider the end application. For those familiar with slicing this may seem rather too subtle and artificial but it is not. The reason being that by definition a program slice is the just some parts of the program picked based on some algorithm by tracking dependences and this process only concerns the identification of these parts and nothing more. The application that uses the slice decides on the representation of the slice. If the application is a visual program understanding tool, it may require the slice to be represented as tagged AST nodes. An application that validates program slicers will require that the slice to be residualized as a XML document which can be compared with another XML document containing the expected slice. If slicing was used to remove unnecessary code, say logging, from the code base as a form of optimization then slice will require the slice to residualized into executable form, say a class file in Java. This clearly indicates that the process of identification of the slice and the representation of the slice are two different activities and we have used this distinction to further modularize our design by breaking down the post processing into slice post processing phase and residualization phase. One major ramification of the above distinction is that it enables one to view program slicing as an analysis contrary to the traditional view as a program transformation. This may enable other transformations such as specialization to be combined with program slicing. Figure 1 provides a graphical illustration of various parts of the slicer along with their dependences based on control flow between them. This modularization of the slicer renders various parts of the slicer to be libraries which leads to another benefit: customization. Given these library modules the users will be able to assemble a slicer customized to their needs without much hassle.
Architecture 3
Indus - Java Program Slicer
The slicer is available as a single unit with many modules. Each module is assigned a particular functionality. The classes of a module may solely provide the functionality of the module or collaboratively provide the functionality along with other classes in the module. Each module will also provide a welldefined interface if the functionality is aimed for extension by the user. Based on this design principle, the following modules exist in the slicer. Figure 2 provides a UML style illustration of the modules and dependences between them. slicer
This module is responsible for the identification of the slice, hence, it contains factory classes required to generate slice criteria, classes that contain the algorithm to identify the slice, and classes to collect the identify the slice. In our implementation, we have chosen to identify the slice by annotating the AST nodes that are part of the slice. Note that as mentioned earlier, this is a plausible representation technique as well.
slicer.processing
This module contains various forms of post processing that can be performed on the identified slice in the slice post processing phase. For example, the functionality of making a sliced executable is realized as a class with a well-defined interface. The user can implement this interface to hook in another post processing strategy.
slicer.transformations
This module contains classes that transform the program based on the identified slice. One may use other transformations which may be driven by the identified slice but was not intended to be driven by it. However, the basic intention was to capture the transformations that are specific to slicing in this module. Hence, the user would find classes that can be used to residualize a slice in this module.
tools.slicer
This module contains classes that package all the relevant parts required for slicing as a "slicer" facade or tool that can be readily used by the end application. The facades adhere to Indus Tool API for the sake of consistency and compositionality. Most first time users would want to start experimenting with the tool implementation available in this module and later use these classes as examples to assemble a dedicated "slicer".
toolkits
This module contains adapter classes that adapt facade/tool classes available in tools.slicer module to be amenable to a tool kit via preferably a tool API. For example, we adapt the facades for Bandera in and plan to do the same for Eclipse.
Figure 2. UML-style dependence/relationship between various modules in the slicer
4
Indus - Java Program Slicer
Implementation details This implementation uses Jimple from the Soot toolkit as the intermediate representation. Soot is available from the Sable group at McGill University. Hence, the object system should be represented in Jimple to use this slicer. The reader should be comfortable with the basic concepts of Soot. Each of the modules listed in the section called “Architecture” is a implemented as a Java package with the same name rooted in the package edu.ksu.cis.indus. Hence, the fully qualified Java package name of the module tools.slicer is edu.ksu.cis.indus.tools.slicer. However, we shall refer to the packages via their module-based name rather than with their fully qualified Java name for the sake of simplicity.
Bird's eye view tools.slicer.SliceXMLizerCLI is a class that uses the tools.slicer.SlicerTool 6 to generate the slice and it residualizes the slice as an XML document and each class in the slice as a Jimple file and a class file. Following is a snippet of the main code in this class. We will provide a walk through the main control flow of this class below. public static void main(final String[] args) { final SliceXMLizer _driver = ..... tools.slicer.SlicerToolSlicerTool
5
Indus - Java Program Slicer
_driver.initialize(); _driver.execute(); _driver.writeXML(); _driver.residualize(); } protected final void execute() { slicer.setTagName(nameOfSliceTag); slicer.setSystem(scene); slicer.setRootMethods(rootMethods); slicer.setCriteria(Collections.EMPTY_LIST); slicer.run(Phase.STARTING_PHASE, true); }
The XMLizer is created and initialized in the first 2 lines of main. This is followed by the execution of the slicer which is followed by the writing of the slice and the substrate program as XML document7 writeXML and the residualization of the slice as Jimple files and class files at residualize. These documents and class files are used as artifacts in the regression test framework used to test the slicer. As we mentioned earlier we use a annotation-based approach to identify the slice. We use the inherent support in Soot to tag AST nodes to identify the slice, hence, in this step we provide the name of the tag that should be used to annotate AST nodes of the substrate program to identify them as belonging to the slice. Soot uses a Scene as a abstraction of the system that is being operated on. All the classes and it's components can be accessed from the Scene via well defined interfaces. To use the slicer the user loads up the classes that form the system into a Scene and provide it to the slice in this step. Given just the criteria, the slicer can include parts of the system that may not be relevant in a particular run. Although this information is useful in impact analysis, it is overly imprecise in most cases. Hence, the user should identify the set of methods in the system that should be considered as entry point while generating the slice. The identified entry point methods or root methods (from the view of a call graph) is provided to the slicer in this step. The slice criteria is set in this step. However, it may be shocking that the code is passing an empty collection of criteria. As the slicer was designed and implemented as part of a larger model checking project, the SlicerTool has the logic that can be switched on to auto generate criteria which are crucial to detect deadlocks in the system. These criteria would correspond exactly to enter_monitor and exit_monitor statements. As for the part of toggling switches, SlicerTool is based on Indus Tool API which has inherent support for configuration based on XML data via a SWT-based GUI. Hence, the tools.slicer package comes with a default configuration that is used if none are specified and it controls the toggling of various switches. This default configuration will use all possible dependences in their most precise mode to calculate an executable backward slice that preserves the deadlocking property of the system. The wheels start to roll here. Although the invoked method is part of the Indus Tool API, the simplified under-the-hood view is that the tool is asked to verify if it's current configuration. If so, it is asked to execute. Please refer to the documentation of Indus for the details of the arguments. The slicer tool executes in 3 stages: starting/initial, dependence calculation, and slicing. If it seems that these phases depart from the phases mentioned earlier, it is because the tool is providing a facade. A user just wanting to customize the residualization process can extend SlicerTool to alter the post processing phase suitably and use the extended version. The classes from tools.slicer.processing and transformations.slicer will be used in the post processing phase. If he/she want more fine tuned customization then they are advised to put together a new facade on lines similar to that of SlicerTool according to their needs. 6
Indus - Java Program Slicer
We will get into the guts of the slicer in the next section.
Figure 3. SlicerTool Configuration GUI
As the slicer adheres to Indus Tool API it comes with a built in configuration GUI (as illustrated in Figure 3) that can be used from inside the application using the slicer. The configuration logic comes with serialization and deserialization support as well.
The Details In this section we shall deal with the implementation of the SlicerTool. In particular, we shall only 7
Indus - Java Program Slicer
present the details of how the slicing engine is setup and driven to identify the slice which is later massaged via post processing. The following snippet is the only sequence of method invocations required on the slicing engine to identify a slice. void execute() { .... engine.setTagName(tagName); engine.setCgi(callGraph); engine.setSliceType( _slicerConfig.getProperty( SlicerConfiguration.SLICE_TYPE)); engine.setInitMapper(initMapper); engine.engine.setBasicBlockGraphManager(bbgMgr); engine.setAnalysesControllerAndDependenciesToUse( daController, _slicerConfig.getNamesOfDAsToUse()); engine.setSliceCriteria(criteria); engine.initialize(); engine.slice(); postProcessSlice(); }
The root methods/entry point methods set on the SlicerTool earlier will be used to construct a call graph which will be used by the slicing algorithm to deal with interprocedural control flow. Hence, the call graph provided by ICallGraphInfo interface defined in StaticAnalysis project is provided to the engine in this step. The type of the slice is set in this step. Note that this does not specify anything about any additional property required of the slice. This is more of a residue of the fact that the instantiation of an object in Java is represented as 2 statements in Jimple. This is in accordance with the byte code format where the object is created by one instruction and later one initialized by another instruction. Hence, the coupling between an allocation site and the constructor invocation site needs to be explicated and this is provdied via an implementation of edu.ksu.cis.indus.interfaces.INewExpr2InitMapper interface as it is done in this step. As a matter of optimization, rather than creating basic blocks of the graphs every time it is required, we cache the graphs via a edu.ksu.cis.indus.common.graph.BasicBlockGraphMgr class instance. Also, this makes it very easy to vary control flow graph representation used to create the basic block graph across all analyses and transformations being used. This manager instance is provided to the slicing engine in this step. In this step the slicing engine is provided with a analysis controller that was used to drive various analyses along with the IDs of the dependence analyses that should be considered while slicing. The analysis controller serves as reference container for the dependence analyses. The slicing criteria are provided to the slicing engine in this step. If the user wants to create slicing criteria on his own then he/she should use the slicer.SliceCriteriaFactory. In this step we request the slicing engine to initialize itself. This step should succeed for the slicer to function assuming the provided objects are in valid states. The slicing engine identifies the slice in this step by annotating/tagging the AST nodes that belong to the slice with a tag of the name provided to it. The call to postProcessSlice method in the SlicerTool combines various post processing classes to massage the slice and the core of this method is given below.
if (((Boolean) _slicerConfig.getProperty( 8
Indus - Java Program Slicer
SlicerConfiguration.EXECUTABLE_SLICE)).booleanValue()) { final ISlicePostProcessor _postProcessor = new ExecutableSlicePostProcessor(); _postProcessor.process(_methods, bbgMgr, _collector); } if (_sliceType.equals(SlicingEngine.FORWARD_SLICE)) { _gotoProcessor = new ForwardSliceGotoProcessor(_collector); } else if (_sliceType.equals(SlicingEngine.BACKWARD_SLICE)) { _gotoProcessor = new BackwardSliceGotoProcessor(_collector); } else if (_sliceType.equals(SlicingEngine.COMPLETE_SLICE)) { _gotoProcessor = new CompleteSliceGotoProcessor(_collector); } _gotoProcessor.process(_methods, bbgMgr);
The generated slice is massaged to make it executable, if required, in this step. Depending on the type of slice, a goto processor is picked. The purpose of this processing is to ensure that the control flow skeletal of the slice is identical to that of the substrate program as unconditional jumps are not considered by the slicing algorithm for the reason that they do not alter the control flow during execution. The slice is processed through the selected goto processor to provide a possibly extended slice. For the interested reader, _collector is an object that is used by the slicing engine to do bookkeeping operations pertaining to the identification of the slice. In particular, it annotates the AST nodes that are part of the slice and maintains auxiliary information about the identified slice. However, the users should be concerned with this class if they plan to add to the post processing phase.
Closing Note The XMLizing classes used by this project and it's parent and sibling projects use the xmlzing framework to drive the slicer. So, we urge you to peruse the source code of these classes before asking questions on the forum or the mailing list. We will be glad to answers any question you may have regarding the usage, but it probably would be faster if the user mocked an existing working piece of code while starting to use a new tool. The reader is encouraged to use the modules as is or to extend them as required. In the due process, the users are urged to submit bug reports of any bugs uncovered with suitable information about the triggering input and configuration. The interface of the modules are not fixed as the development team has not forseen all possible applications and tweaks to the slicer. Hence, the users are encouraged to raise change requests to the development team along with any feature requests they may have. However, please note that the development team may not be able to implement all requested features in which case they will assist by providing any information or alterations to enable the requested features. Please refer to Indus [http://indus.projects.cis.ksu.edu] for more documentation, distribution, mailing list, forums, and links to other subprojects. We hope you have a pleasant experience using our product.
Bibliography [HatcliffSAS99] John Hatcliff. James C. Corbett. Matthew B. Dwyer. Stefan Sokolowski. Hongjun 9
Indus - Java Program Slicer
Zheng. “A Formal Study of Slicing for Multi-threaded Programs with JVM Concurrency Primitives”. Proceedings on the 1999 International Symposium on Static Analysis (SAS'99). Sep 2000. [CorbettICSE00] James C. Corbett. Matthew B. Dwyer. John Hatcliff. Shawn Laubach. Corina S. Pasareanu. Robby. Hongjun Zheng. “Bandera: Extracting Finite-state Models from Java source code”. Proceedings of the 22nd International Conference on Software Engineering (ICSE'00). 439-448. June 2000.
10