http://wala.sourceforge.net/wiki/index.php/Shrike_technical_overview. R. Jhala ... use cases, DiSL allows the programmer to define custom markers intercepting.
Java Bytecode Instrumentation Made Easy: The DiSL Framework for Dynamic Program Analysis Lukáš Marek1 , Yudi Zheng2 , Danilo Ansaloni3 , Aibek Sarimbekov3, Walter Binder3 , Petr Tůma1 , and Zhengwei Qi2 1 Charles University, Czech Republic {lukas.marek,petr.tuma}@d3s.mff.cuni.cz 2 Shanghai Jiao Tong University, China {zheng.yudi,qizhwei}@sjtu.edu.cn 3 University of Lugano, Switzerland {danilo.ansaloni,aibek.sarimbekov,walter.binder}@usi.ch
Abstract. Many software development tools (e.g., profilers, debuggers, testing tools) and frameworks (e.g., aspect weavers) are based on bytecode instrumentation techniques. While there are many low-level bytecode manipulation libraries that support the development of such tools and frameworks, they typically provide only low-level abstractions and require detailed knowledge of the Java Virtual Machine. In addition, they often lack the necessary infrastructure for load-time instrumentation with complete bytecode coverage to ensure that each method with a bytecode representation gets instrumented. In this paper we give an introduction to DiSL, a domain-specific aspect language and framework for bytecode instrumentation that reconciles high expressiveness of the language, high level of abstraction, and efficiency of the generated code. We illustrate the strengths of DiSL with a concrete analysis as a case study. The DiSL framework is open-source and has been successfully used in several research projects. Keywords: Bytecode instrumentation, aspect-oriented programming, domain-specific languages, Java Virtual Machine.
1
Introduction
Java bytecode instrumentation is widely used for implementing dynamic program analysis tools and frameworks. It is supported by a variety of rather low-level bytecode manipulation libraries, such as ASM1 , BCEL2 , Soot [9], ShrikeBT3 , or Javassist [2, 3]; the latter also provides a source-level API. Mainstream aspectoriented programming (AOP) languages such as AspectJ [4] offer a convenient pointcut/advice model that allows expressing certain instrumentations in a concise manner; for this reason, some researchers have implemented dynamic program analysis tools with AOP, such as DJProf [7]. Unfortunately, mainstream 1 2 3
http://asm.ow2.org/ http://commons.apache.org/bcel/ http://wala.sourceforge.net/wiki/index.php/Shrike_technical_overview
R. Jhala and A. Igarashi (Eds.): APLAS 2012, LNCS 7705, pp. 256–263, 2012. c Springer-Verlag Berlin Heidelberg 2012
Java Bytecode Instrumentation Made Easy
257
AOP languages severely limit instrumentation flexibility, making it impossible to implement some relevant instrumentations. In addition, the high-level programming model results in some overhead, as the implementation of some language features is expensive. DiSL [5, 10] is a new domain-specific language and framework for Java bytecode instrumentation. DiSL is inspired by AOP, but in contrast to mainstream AOP languages, it features an open join point model where any region of bytecodes can be selected as a join point (i.e., code location to be instrumented). DiSL reconciles high-level language constructs resulting in concise instrumentations, high expressiveness (i.e., any instrumentation can be expressed in DiSL), and efficiency of the inserted instrumentation code. Thanks to the pointcut/advice model adopted by DiSL, instrumentations are similarly compact as aspects written in AspectJ. However, in contrast to AspectJ, DiSL does not restrict the code locations that can be instrumented, and the code generated by DiSL avoids expensive operations (such as object allocations that are not visible to the programmer). Furthermore, DiSL supports instrumentations with complete bytecode coverage [6] out-of-the-box and tries to avoid structural modifications of classes that would be visible through reflection and could break the instrumented code. In this paper we give a tutorial-style introduction to DiSL programming, helping the developers of dynamic program analysis tools to get started with DiSL. Due to the limited space, we only cover the basic features of DiSL; for a discussion of the design, we refer to [5]. Section 2 introduces the DiSL framework using an evolving program analysis example and pointing out the advantages of DiSL. Section 3 describes the instrumentation process with DiSL. Finally, Section 4 concludes.
2
DiSL by Example
A common example of a dynamic program analysis tool is a method execution time profiler, which usually instruments the method entry and exit join points and introduces storage for timestamps. We describe the main features of DiSL by gradually developing the instrumentation for such a profiler. 2.1
Method Execution Time Profiler
Each DiSL instrumentation is defined through methods declared in standard Java classes. Each method—called snippet in DiSL terminology—is annotated so as to specify the join points where the code of the snippet shall be inlined.4 In the first version of our execution time profiler, we only output the entry and exit time of each executed method. As illustrated in Figure 1, the first snippet prints the entry time and the second snippet prints the exit time. The @Before annotation on the first snippet directs the framework to inline the snippet before each marked region of bytecodes (representing a join point); the use of the @After 4
The name of the method is not constrained and can be arbitrarily chosen by the programmer.
258
L. Marek et al.
public class SimpleProfiler { @Before(marker=BodyMarker.class) static void onMethodEntry() { System.out.println("Method entry " + System.nanoTime()); } @After(marker=BodyMarker.class) static void onMethodExit() { System.out.println("Method exit " + System.nanoTime()); } }
Fig. 1. Instrumenting method entry and exit
public class SimpleProfiler { @SyntheticLocal static long entryTime; @Before(marker=BodyMarker.class) static void onMethodEntry() { entryTime = System.nanoTime(); } @After(marker=BodyMarker.class) static void onMethodExit() { System.out.println("Method duration " + (System.nanoTime() - entryTime)); } }
Fig. 2. Data passing with a synthetic local variable
annotation places the second snippet after (both normal and abnormal) exit of each marked region. The marked bytecode regions are specified with the marker parameter of the annotation. In our example, BodyMarker marks the whole method (or constructor) body. The resulting instrumentation thus prints a timestamp upon method entry and exit. DiSL provides a library of markers (e.g., BasicBlockMarker, BytecodeMarker) for intercepting many common bytecode patterns. For special use cases, DiSL allows the programmer to define custom markers intercepting user-defined bytecode sequences. The elapsed wall-clock time from method entry to method exit can be computed in the after snippet. To perform the computation, the timestamp of method entry has to be passed from the before snippet to the after snippet. Whereas in traditional AOP languages this would be handled using the around advice, DiSL supports data passing between snippets using synthetic local variables [1]. Synthetic local variables are static fields annotated as @SyntheticLocal. The variables have the scope of a method activation and can be accessed by all snippets that are inlined in the method; that is, they become local variables on the stack. Synthetic local variables are initialized to the default value of their declared type (e.g., 0, false, null). Figure 2 illustrates the use of a synthetic local variable named entryTime for computing the method execution time.
Java Bytecode Instrumentation Made Easy
259
@After(marker=BodyMarker.class) static void onMethodExit(MethodStaticContext msc) { System.out.println(msc.thisMethodFullName() + " duration " + (System.nanoTime() - entryTime)); }
Fig. 3. Accessing the method name through static context
The output of the profiler should also contain the name of each profiled method. The information about the instrumented class, method, and bytecode region can be obtained through special static context interfaces. Static context interfaces that provide the desired information can be declared as arguments to the snippet, in any order. At instrumentation time, DiSL replaces calls to these interfaces with the corresponding static context information. This implementation choice improves the efficiency of the resulting tools, since static context information is computed at instrumentation time rather than at runtime. In our example, we are interested in the predefined static context MethodStaticContext, which provides the method name, signature, and modifiers (among other static data about the intercepted method and its enclosing class). Figure 3 refines the after snippet of Figure 2 to access the fully qualified name of the instrumented method. DiSL provides a set of commonly used static context interfaces. The DiSL programmer may also define custom static context interfaces to perform additional static analysis at instrumentation time or to access information not directly provided by DiSL. 2.2
Profiler with Stack Trace
Another extension to our example provides the stack trace of each profiled method. There are several ways to obtain the stack trace information in Java, such as calling the getStackTrace() method from java.lang.Thread. However, frequent calls to this method may be expensive. As an alternative approach, DiSL allows programmers to obtain stack trace information using instrumentation. The algorithm to reify the call stack is simple. Upon method entry, the method name is pushed onto a thread-local shadow call stack. Upon method exit, the method name is popped off the shadow call stack. Figure 4 shows two additional snippets for call stack reification. Each thread maintains its shadow call stack, referenced by the thread-local variable cs.5 In our example, cs is initialized for each thread in the before snippet. The shadow call stack can be accessed from all snippets using cs; for example, it could be included in the profiler’s output. To make sure all snippets observe the shadow call stack in a consistent state, the two snippets for call stack reification have to be inserted in a correct order relative to the other snippets. DiSL allows the programmer to specify the order in 5
DiSL offers a particularly efficient implementation of thread-local variables with the @ThreadLocal annotation.
260
L. Marek et al.
@ThreadLocal static Stack cs; // call stack @Before(marker=BodyMarker.class, order=1000) static void enterCS(MethodStaticContext msc) { if (cs == null) { cs = new Stack(); } cs.push(msc.thisMethodFullName()); } @After(marker=BodyMarker.class, order=1000) static void exitCS() { cs.pop(); }
Fig. 4. Call stack reification @After(marker=BodyMarker.class) static void onMethodExit(MethodStaticContext msc, DynamicContext dc) { int identityHC = System.identityHashCode(dc.getThis()); ... }
Fig. 5. Accessing dynamic context information
which snippets matching the same join point should be inlined as a non-negative integer in the snippet annotation. The smaller this number, the closer to the join point the snippet is inlined. In our profiler, the time measurement snippets and the stack reification snippets match the same join point (i.e., method entry, resp. exit). We assign a high order value (1000) to the call stack reification snippets and keep the lower default order value (100) of the snippets for time measurement.6 Consequently, the callee name is pushed onto the shadow call stack before the entry time is measured, and the exit time is measured before the callee name is popped off the stack. 2.3
Profiling Object Instances
Our next extension addresses situations where the dependency of the method execution time on the identity of the called object instance is of interest. Figure 5 refines the after snippet of Figure 2 by computing the identity hash code of the object instance on which the intercepted method has been called. The instrumentation in Figure 5 uses a dynamic context interface (i.e., DynamicContext) to get a reference to the current object instance. Similar to the static context interfaces, the dynamic context interfaces are also passed to the snippets as method arguments. Unlike the static context information, which is resolved at instrumentation time, calls to the dynamic context interface are replaced with code that obtains the required dynamic information at runtime. Besides the reference used in the example, DiSL provides access to other dynamic context information including the local variables, method arguments, and values on the operand stack. 6
If snippet ordering is used, it is recommended to override the value in all snippets for improved readability.
Java Bytecode Instrumentation Made Easy
261
public class LoopGuard { @GuardMethod public static boolean methodContainsLoop() { ... // Loop detection based on control flow analysis } }
Fig. 6. Loop guard skeleton @Before(marker=BodyMarker.class, guard=LoopGuard.class) static void onMethodEntry() { ... } @After(marker=BodyMarker.class, guard=LoopGuard.class) static void onMethodExit(...) { ... }
Fig. 7. Time measurement snippets with loop guard
2.4
Profiling Only Methods with Loops
Often, it is useful to restrict the instrumentation to certain methods. For example, one may want to profile only the execution of methods that contain loops, as methods containing repetitions are likely to contribute more to the overall execution time. DiSL allows programmers to restrict an instrumentation using the guard construct. A guard is a user-defined condition evaluated during instrumentation. This condition determines whether a snippet matching a particular join point is inlined or not. The signature of a guard restricting the instrumentation only to methods containing loops is shown in Figure 6. The guard is a Java class where one method must carry the @GuardMethod annotation. The methodContainsLoop guard method implements the detection of a loop in a method (not shown, a loop detector based on control flow analysis is included as part of DiSL). The loop guard is associated with a snippet using the guard annotation parameter, as illustrated in Figure 7. Note that the loop guard is not used in the snippets for stack reification, since we want to maintain complete stack trace information without omitting the stack frames of methods that do not contain any loops.
3
The Instrumentation Process with DiSL
The DiSL framework supports deployment of instrumentations, sanity checking, and debugging. To reduce perturbation of the observed application and facilitate complete bytecode coverage, the DiSL engine can run in a separate Java Virtual Machine (JVM); a native JVMTI agent within the observed application’s JVM sends all loaded classes (starting with java.lang.Object) to the DiSL engine for load-time instrumentation. Alternatively, the DiSL engine offers an offline mode for static bytecode instrumentation. Complete bytecode coverage introduces additional issues. Before the main method of an application is called, the JVM executes bootstrap code. Executing
262
L. Marek et al.
the inserted instrumentation during bootstrap may crash the JVM. Furthermore, if the inserted instrumentation uses the Java class library that is itself instrumented, infinite recursion can occur. DiSL solves these issues with the help of polymorphic bytecode instrumentation [6], which allows resorting to unmodified code during bootstrap and during execution of the inlined snippets. Instrumentation code inserted by DiSL is intended only for application observation, it should never alter the observed application’s control flow. DiSL has been carefully designed to prevent the insertion of any code that could change the control flow. It can automatically insert code intercepting all exceptions originating from the snippets, reporting an error if an exception is thrown (and not handled) by the inserted code regions. Once the instrumentation has reached production-level quality, this exception checking can be turned off to reduce overhead. To ease debugging, DiSL also supports dumping of the unmodified and instrumented classes.
4
Conclusion
This paper gives an introduction to DiSL, a domain-specific aspect language and framework for bytecode instrumentation. DiSL eases the development of instrumentation-based dynamic program analysis tools without restricting the join points that can be instrumented. Instrumentation code to be inserted is specified in the form of snippets, that is, annotated Java methods. Snippets have access to comprehensive static and dynamic context information. By default, the DiSL framework instruments every method that has a bytecode representation, including methods in the Java class library and in dynamically generated code. The design of DiSL rules out two types of errors that are often made when resorting to low-level bytecode manipulation techniques. First, as snippets are Java code that is compiled, the bytecode within each snippet is guaranteed to be valid. Second, snippets cannot tamper with local variables and stack locations used by the instrumented base program. Nonetheless, the DiSL programmer needs to be aware of the semantics of individual bytecodes in order to correctly access the desired context information. It is possible to write incorrect instrumentations that result in instrumented classes failing bytecode verification. Hence, we are developing advanced checkers for DiSL in our ongoing research. DiSL offers a unique combination of a high-level pointcut/advice model with the flexibility and detailed control of low-level bytecode instrumentation. As such, it is a valuable tool for the programming language and software engineering communities—both in academia and in industry—that can reduce the effort needed for developing new dynamic analysis tools and instrumentation-based programming frameworks. DiSL may also serve as a framework for implementing higher-level analysis or aspect languages. DiSL is available open-source7 and has already been used in several projects, such as for building a toolchain for workload characterization [8]. 7
http://disl.ow2.org
Java Bytecode Instrumentation Made Easy
263
Acknowledgments. The research presented in this paper has been supported by the Swiss National Science Foundation (project CRSII2_136225), by the European Commission (Seventh Framework Programme grant 287746), by the Czech Science Foundation (projects GACR P202/10/J042 and GACR 201/09/H057), by Charles University institutional funding SVV-2012265312, by a Sino-Swiss Science and Technology Cooperation (SSSTC) Institutional Partnership (project IP04–092010), by the National Natural Science Foundation of China (project 61073151), and by the Science and Technology Commission of Shanghai Municipality (project 11530700500).
References 1. Binder, W., Ansaloni, D., Villazón, A., Moret, P.: Flexible and efficient profiling with aspect-oriented programming. Concurrency and Computation: Practice and Experience 23(15), 1749–1773 (2011) 2. Chiba, S., Nishizawa, M.: An Easy-to-Use Toolkit for Efficient Java Bytecode Translators. In: Pfenning, F., Macko, M. (eds.) GPCE 2003. LNCS, vol. 2830, pp. 364–376. Springer, Heidelberg (2003) 3. Chiba, S.: Load-Time Structural Reflection in Java. In: Bertino, E. (ed.) ECOOP 2000. LNCS, vol. 1850, pp. 313–336. Springer, Heidelberg (2000) 4. Kiczales, G., Lamping, J., Menhdhekar, A., Maeda, C., Lopes, C., Loingtier, J.M., Irwin, J.: Aspect-Oriented Programming. In: Aksit, M., Auletta, V. (eds.) ECOOP 1997. LNCS, vol. 1241, pp. 220–242. Springer, Heidelberg (1997) 5. Marek, L., Villazón, A., Zheng, Y., Ansaloni, D., Binder, W., Qi, Z.: DiSL: a domain-specific language for bytecode instrumentation. In: AOSD 2012: Proceedings of the 11th International Conference on Aspect-Oriented Software Development, pp. 239–250 (2012) 6. Moret, P., Binder, W., Tanter, É.: Polymorphic bytecode instrumentation. In: AOSD 2011: Proceedings of the 10th International Conference on Aspect-Oriented Software Development, pp. 129–140 (2011) 7. Pearce, D.J., Webster, M., Berry, R., Kelly, P.H.J.: Profiling with AspectJ. Software: Practice and Experience 37(7), 747–777 (2007) 8. Sewe, A., Mezini, M., Sarimbekov, A., Ansaloni, D., Binder, W., Ricci, N., Guyer, S.Z.: new Scala() instanceof Java: A comparison of the memory behaviour of Java and Scala programs. In: ISMM 2012: Proceedings of the International Symposium on Memory Management. pp. 97–108 (2012) 9. Vallée-Rai, R., Gagnon, E., Hendren, L., Lam, P., Pominville, P., Sundaresan, V.: Optimizing Java Bytecode Using the Soot Framework: Is It Feasible? In: Watt, D.A. (ed.) CC 2000. LNCS, vol. 1781, pp. 18–34. Springer, Heidelberg (2000) 10. Zheng, Y., Ansaloni, D., Marek, L., Sewe, A., Binder, W., Villazón, A., Tuma, P., Qi, Z., Mezini, M.: Turbo DiSL: Partial Evaluation for High-Level Bytecode Instrumentation. In: Furia, C.A., Nanz, S. (eds.) TOOLS 2012. LNCS, vol. 7304, pp. 353–368. Springer, Heidelberg (2012)