Debugging by Remote Re ection Ton Ngo and John Barton? IBM T. J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598 email:
[email protected], john
[email protected]
Abstract. Re ection in an object-oriented system allows the structure of objects and classes to be queried at run-time, thus enabling \metaobject" programming such as program debugging. Remote Re ection allows objects in one address space to re ect upon objects in a dierent address space. Used with a debugger, remote re ection makes available the full power of object-oriented re ection even when the object examined is within a malfunctioning or terminated system. We implemented remote re ection as an extension to an interpreter to create a very eective debugger for Jalape~no, a Java Virtual Machine written in Java.
1 Introduction Re ection in an object-oriented language supports programs that manipulate the elds (data values) of an object using symbolic names speci ed at run-time. For instance, an object obj may provide a method getClass() to describe its own type, the type (class) in turn may provide a method getFields() to describe its elds (data values), and the eld object may provide a get() method for accessing the corresponding data value from an object. Re ection enjoys extensive support in several modern object-oriented languages such as Java. A standard Java object provides numerous re ective methods for querying its internal values. The package java.lang.re ect provides a complete set of utilities to manage Java objects re ectively. With the description of the object encoded in the re ection methods, a program can inspect and manage an arbitrary object without any special knowledge about the object. This meta-object programming[5, 9] is especially useful for system components or utility programs such as debuggers. In an object-oriented system, re ective methods are encapsulated within the object; therefore to access the internal values of an object, the re ection code must be executed in the same address space where the data resides. Although this is the desired behavior in most cases, debugging is one case where this encapsulation of code and data may present a problem. The reason is that a program being debugged generally needs to be halted, i.e. its execution frozen at an arbitrary point, so that its values and states can be inspected reliably. ?
Current address: Hewlett Packard Laboratories, MS 1U-17, 1501 Page Mill Road, Palo Alto, CA 94304
In the case of debugging user applications, the debugger can still take full advantage of re ection since the user application is running on a stable system. The system can halt the application thread but continue to execute the debugger thread. Such a debugger is typically called in-process because it runs in the same process as the program being debugged. For the case of debugging system code, re ection is not possible for several reasons. First, halting the system itself would prevent the system from responding to any re ective queries. Second, allowing the system to execute the request will unintentionally change the states of the system while it is being inspected1 . Third, if the system crashes, the core image can be saved and inspected post-mortem, but no code can be executed. To solve these problems, the debugger must execute in a dierent process and control the system being developed through some debugging interface provided by the operating system. Consequently, such a debugger is called out-of-process. This situation arose in the development of Jalape~no[1, 2], a virtual machine for Java servers under development at the IBM T. J. Watson Research Center. Jalape~no is a compile-only system: instead of being interpreted, a method is compiled and optimized directly into machine instructions. Because the entire system is written in Java including the runtime, the compiler and garbage collector, re ection is used extensively to integrate the various components; consequently, there is a strong motivation for the Jalape~no debugger to use the same re ection facilities to inspect the system. In this paper, we propose Remote Re ection as a technique that allows a program to execute a re ection method on an object that resides in a dierent process. In our case, this technique allows the debugger to make re ective queries to the Jalape~no system that has been completely halted in a dierent process. Remote re ection thus extends the power of re ection across dierent address spaces, improving the reusability of object-oriented codes. Although this technique was developed for Jalape~no and Java, we believe it is applicable to other Java implementations and other object-oriented languages. In the remainder of the paper, we will discuss remote re ection within the context of Java. Section 2 describes the general programming model for using remote re ection. Section 3 describes an implementation of remote re ection for Jalape~no. Section 4 illustrates the implementation with a detailed example, and possible further developments are discussed in Section 5.
1.1 Related works The Sun JDK debugger [8] and the more recent Java Platform Debugger Architecture [7] are also out-of-process and are based on re ection; however, there are several important dierences. First, the Sun approach requires a debugging thread running internally in the virtual machine, dedicated to responding to external queries. For Jalape~no this is not possible for the reasons described earlier. 1
Consider debugging thread scheduling code: dispatching the debugger thread itself would change the thread states
Second, the re ection interface for the debugger is dierent from the internal re ection interface. In contrast, remote re ection requires no eort on the target system and the same re ection interface is used internally or externally.
2 Remote Re ection Consider (1) a Java Virtual Machine (JVM) in a remote process that has been halted at an arbitrary point; (2) a program written based on the re ection interface of this remote JVM; and (3) another JVM in a local process executing this program. Remote re ection allows the program in the local JVM to execute a re ection method that operates directly on an object residing in a remote JVM. The key to remote re ection is a proxy object in the local JVM called the Remote Object which represents the real object in the remote JVM. As illustrated in Figure 1, the programming model for remote re ection is simple yet eective. The user speci es that certain methods of the re ection interface will return remote objects from a dierent JVM. These methods in the local JVM are said to be mapped to the remote JVM since they serve as the link between the two JVM's. Once a remote object is obtained from a mapped method, all values or objects derived from it will also originate from the remote JVM. Aside from the list of mapped methods, a remote object is indistinguishable from a normal object in the local JVM from the program perspective. Consider a simple example in Figure 2. To compute the line number, the method Debugger.lineNumberOf() obtains a table of VM Method's, selects the desired element and invokes its virtual getLineNumberAt() method. This re ection method then consults the object's internal array to return a line number. Supposed that on a local JVM with remote re ection, the static method VM Dictionary.getMethods() has been mapped to an array of VM Method's in the remote space. When we execute lineNumberOf(), the variable methodTable receives the initial remote object from VM Dictionary.getMethods(). The variable candidate then gets another remote object from accessing the remote array, and nally the method getLineNumberAt() is invoked on the remote object. The uniform treatment of local and remote objects provides the main advantage of remote re ection. Because a remote object is logically identical to a local object, a program uses the same re ection interface whether it executes in-process or out-of-process. As a result, the maintenance of both the re ection interface and programs using it is greatly simpli ed. A second advantage is that no eort is required in the remote JVM, since remote re ection relies on the underlying operating system to access the JVM address space. Finally, mapping per method instead of per class allows exibility in selecting the object to be mapped. A class may have some instances in the local process and other instances in the remote process without con ict. While a mapped method is not necessarily tied to a single object in the remote JVM, in practice, it is more convenient to map access method which return a speci c object.
remote JVM
local JVM classA.getObj()
remote object
real object
real object
classA
classA
xx xx
xx
xx
xx xx
Fig. 1. Programming model for remote re ection: certain methods, e.g. classA.getObj(), are specially designated to return remote objects that are proxies for the real objects in the remote JVM. In this Figure, the boxes in each JVM represent objects with elds. There are two real instances of classA: one in the local JVM and one in the remote JVM.
3 Implementation In this section, we describe an implementation in the Jalape~no system. In Java, remote re ection is supported at the level of the virtual machine by either the interpreter or the runtime compiler. Our debugging environment involves three components: the Jalape~no system being debugged, the debugger, and a Java interpreter that has been extended to support remote re ection. The extension includes managing the remote object and extending the bytecodes to operate on the remote object. Remote re ection also requires operating system support for access across processes. This functionality is typically provided by the system debugging interface, which in the Jalape~no implementation is the Unix ptrace facility. Our implementation is simpli ed by the fact that the debugger only makes queries and does not modify the remote JVM (except by a user command); therefore, we do not have to address the issue of creating new objects in the remote space.
3.1 Remote Object
The remote object is simply a wrapper that holds sucient information to nd the real object in the remote process. For Jalape~no, this includes the type of the object and its real address. Remote objects originate from a mapped method or another remote object. In rst case, the address is provided to the interpreter from the process of building the boot image[1]. For the latter case, the address is computed based on the eld oset from the address of the remote object. For native methods, a complete implementation will involve extending the JNI implementation to handle remote objects. However in our implementation, it was sucient to clone the remote object and remote 1-D array of primitives because this satis es the need of the debugger.
3.2 Bytecode extensions
Since the initial remote object is obtained via mapped method, the bytecode invokestatic or invokevirtual to invoke a method are extended as follows. The target class and method are checked against the mapping list. Those to be mapped are intercepted so that the actual invocation is not made. Instead, if the return type is an object, a remote object is created containing the type and the address of the corresponding object in the remote JVM. If the return type is a primitive, the actual value is fetched from the remote JVM. In addition, all bytecodes that operate on a reference need to be extended to handle remote objects appropriately - for Java, this includes 23 bytecodes. If the result of the bytecode is a primitive value, the interpreter computes the actual address, makes the system call to obtain the value from the remote address space, and pushes the value onto the local Java stack. If the result is an object, the interpreter computes the address of the eld holding the reference, makes the system call to obtain the eld value and pushes onto the Java stack a new remote object with the appropriate type.
4 Example In this section, we return to the example in Figure 2 to analyze the actions that occur when the call lineNumberOf(5,4) is executed. For reference, Figure 2 also shows the bytecodes for the methods with dashed lines correlating the Java source lines with its bytecodes. In Figure 3, the box at the right represents the remote JVM, showing a number of objects that have been created in its space. Recall that the static method VM Dictionary.getMethods() has been mapped to the array of VM Method objects in the remote space. The states of the Java stack at successive points are shown in the top and bottom rows, labeled with highlighted numbers from 1 to 11. The state numbers are cross-referenced between Figure 2 and Figure 3. Also shown in Figure 3 are the remote objects (center) in the local JVM that serve as proxies to the corresponding real objects in the remote JVM. Due to limited space, we will only examine in details states 1-3 at the beginning and states 9-11 at the end; the remaining states exhibit similar behavior. First, the interpreter recognizes VM Dictionary.getMethods() as a mapped method and intercepts the bytecode invokestatic to create the initial remote object. The remote object contains the return type VM Method array and the address of the real array in the remote process. In Figure 3, the Java stack in state (1) shows the local variable methods on top of the stack and holding a reference to the newly created remote object. The following bytecodes astore 3, aload 3 and iload 1 make the preparation for accessing the remote array, resulting in Java stack state (2). When the interpreter executes the bytecode aaload to access an array element, it detects that the array reference is a remote object. Since it is an array of objects, the interpreter determines the element type, computes the address and pushes onto the stack the new remote object, resulting in Java stack state (3).
java source class Debugger { public int lineNumberOf(int methodNumber, int offset) { VM_Method[] methodTable = VM_Dictionary.getMethods(); VM_Method candidate = methodTable[methodNumber]; int lineNumber = candidate.getLineNumberAt(offset); return lineNumber; } } class VM_Method { private int[] lineTable; public int getLineNumberAt(int offset) { if (offset > lineTable.length) return 0; return lineTable[offset]; } }
compiled bytecode Method int lineNumberOf(int, int) 1 invokestatic #18 astore_3 aload_3 2 iload_1 3 aaload astore 4 aload 4 iload_2 4 invokevirtual #24 istore 5 iload 5 ireturn
compiled bytecode Method int getLineNumberAt(int) iload_1 5 aload_0 getfield #14 6 7 arraylength if_icmple 11 iconst_0 ireturn 8 aload_0 9 getfield #14 10 iload_1 11 iaload ireturn
Fig. 2. Example: Java programs making re ective queries and the corresponding bytecodes. The dashed lines correlates the Java source lines with its bytecodes. The highlighted numbers refer to successive states of the Java stack during the program execution; they are cross-referenced with Figure 3. The execution continues likewise with more remote objects created. To arrive at state (9), the bytecode get eld accesses the eld lineTable and the interpreter creates another remote object on top of the stack. In state (10), the array index is pushed onto the stack and the interpreter executes iaload to access the array element. It detects that the array is remote and that the element type is an integer. The interpreter computes the address of the array element and makes the system call to read the value from the remote JVM space. In state (11), the value from the remote array is placed on the stack as a return value. It is worth noting that remote objects are always temporary; they only exist on the Java stack because they contain real addresses in the remote process that are only valid until the remote JVM resumes execution.
5 Status and future works The interpreter with the remote re ection extension was completed and together with the Jalape~no debugger has been indispensable in the development of the Jalape~no system. Future extensions include several possibilities. For a production JVM system, debugging requires additional care since the system cannot be taken down and restarted for each debugging session. In this situation, remote re ection must be able to connect to the running system with-
Stack state in local process while executing bytecodes method intercepted by interpreter: generate first remote object methods
array access: get another remote object
...
5 methods
1
candidate
2
invoke virtual method on a remote object
...
4 candidate
Remote Java VM methodTable
4
3
0
remote object VM_Method[ ] address
VM_Method header
remote object VM_Method address
lineTable
lineTable
0 remote object lineTable address
3
17
9
this 3
5
lineTable 3
6 get a field of a remote object
10 3
...
7 array length: get real value
this
lineTable
3 lineTable
17
8
9
10
11 array access: get real int value for array element
Fig. 3. Example: the successive states of the Java stack (top and bottom rows) during
the execution of the re ective queries, showing the remote objects being computed and resolved. The states are labeled with highlighted numbers to cross-reference with the bytecodes in Figure 2. The Remote Java VM box (right) shows three real objects existing in the remote space, while the boxes in the center are the remote objects serving as proxies for the real objects.
out any signi cant side eect. A system facility other than ptrace will be necessary so that the running system would retain most of its own process control. Remote re ection can also be useful in the Java Platform Debugger Architecture of Java 2. We would only need to base the Java Debug Interface implementation directly on the internal re ection interface of the target JVM. This JDI implementation and a JDI-based debugger would run on another JVM that has been extended for remote re ection. This con guration would then bring to JPDA the same capabilities in debugging Jalape~no: low level debugging and the ability to halt the JVM to avoid perturbing its state. For distributed Java applications runing on several remote JVM's, remote re ection can provide the convenience of a shared memory programming model, allowing them to readily access remote objects. However, since the applications would not be halted, synchronization as well as other issues will need to be studied carefully.
6 Conclusions Re ection is an important addition to object-oriented systems. In this paper, we describe remote re ection, a transparent mapping technique that preserves the bene ts of re ection in situations where it is necessary to decouple the code and the data involved in re ection because they reside in dierent address spaces. While the concept of re ection programming has been used across processes, our technique oers several advantages not present in previous eorts. First, it is not necessary to de ne a dierent re ection interface to use across processes; the same interface is used whether the program is in-process or out-of-process. Second, no eort is required in the target process; therefore it does not need to be functional and its state is not perturbed. We describe a simple programming model using remote re ection and an implementation in the Jalape~no system, a Java Virtual Machine developed at the IBM T. J. Watson Research Center. In this context, remote re ection is used to support an out-of-process debugger for the Jalape~no system code. Remote re ection allows the debugger to exploit the bene ts of both in-process and out-of-process debugging, resulting in a very eective tool for developing an object-oriented system.
References 1. Bowen Alpern, Dick Attanasio, John J. Barton, Michael G. Burke, Perry Cheng, Jong-Deok Choi, Anthony Cocchi, Stephen Fink, David Grove, Michael Hind, Susan Flynn Hummel, Derek Lieber, Vassily Litvinov, Ton Ngo, Mark Mergen, Vivek Sarkar, Mauricio J. Serrano, Janice Shepherd, Stephen Smith, V. C. Sreedhar, Harini Srinivasan, and John Whaley. The Jalape~no Virtual Machine. IBM Systems Journal, 2000, Vol 39, No 1, pp 211-238. 2. Bowen Alpern, Dick Attanasio, John J. Barton, Anthony Cocchi, Susan Flynn Hummel, Derek Lieber, Ton Ngo, Mark Mergen, Janice Shepherd, and Stephen Smith. Implementing Jalape~no in Java. ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA), November 1999, pp 314-324. 3. James Gosling, Bill Joy, and Guy Steele. The Java Language Speci cation. The Java Series. Addison-Wesley, 1996. 4. Dan Ingalls, Ted Kaehler, John Maloney, Scott Wallace, and Alan Kay. Back to the Future, The Story of Squeak. ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA), October 1997, pp 318-326. 5. Gregor Kiczales, Jim des Rivieres, and Daniel G. Bobrow. The Art of the Metaobject Protocol. The MIT Press, 1992. 6. Tim Lindholm and Frank Yellin. The Java Virtual Machine Speci cation. The Java Series. Addison-Wesley, 1996. 7. Sun Microsystems. Java 2 SDK Standard Edition. 8. Sun Microsystems. Java Development Kit 1.1. 9. Andreas Paepcke. Object-Oriented Programming: The CLOS Perspective. MIT Press, 1993.