Byte code Transformations for Distributed Threads in Java Eddy Truyen
Danny Weyns
Pierre Verbaeten
Distrinet, KULeuven Celestijnenlaan 200A 3001 Leuven ++32(0)16327602
Distrinet, KULeuven Celestijnenlaan 200A 3001 Leuven ++32(0)16327602
Distrinet, KULeuven Celestijnenlaan 200A 3001 Leuven ++32(0)16327566
[email protected]
[email protected]
[email protected]
ABSTRACT In this paper, we study the shift of thread semantics that arises when adapting a centralized Java program for execution in a distributed environment. More specifically we focus on distributed applications that are developed by means of a distributed control flow programming model like Java RMI or OMG CORBA. The shift in thread semantics causes unexpected execution results or run-time errors if these differences were not taken into account by the programmer. We overcome this semantical gap between local and distributed programming by extending Java programming with the notion of distributed thread identity. Propagation of a globally unique, distributed thread identity provides a uniform mechanism by which all the program’s constituent objects involved in a distributed control flow can uniquely refer to that distributed thread as one and the same computational entity. We have implemented distributed thread identity by means of byte code transformation of application programs.
Keywords Threads, Distributed systems, Java
1. INTRODUCTION With the growing need for increased scalability and 7x24 hours on-line systems, the mass production of cheap and powerful computing devices, and the improved networking facilities, companies are evolving their intra-domain systems and applications (e.g. internal business workflow, departmental appointment systems, high performance processing systems) from monolithic centralized servers to modular, decentralized systems executing in a distributed environment. A distributed environment consists of multiple physical computing nodes that are interconnected by means of a network. Different distributed programming models exists such as asynchronous messaging, publish/subscribe systems, blackboard/tuple spaces (e.g Linda, JavaSpaces). However the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference ’00, Month 1-2, 2000, City, State. Copyright 2000 ACM 1-58113-000-0/00/0000…$5.00.
mainstream of intra-domain distributed systems are developed using an object-based control flow programming model such as Java RMI or OMG CORBA (method invocation between objects). This model is popular because it inherits some of the benefits of object-oriented programming languages such as Java and C++ and is similar to the ‘good old’ RPC inter-process communication style. Another advantage is that control flow programming models provide a good level of location transparency. However, writing distributed applications or adapting a local program1 for execution in a distributed environment remains difficult, because of the inherent shift of paradigms and semantics [4]. In Java RMI, well-known differences with ‘normal’ Java programming are the separation between class and interface of a remote object (shift of paradigms), the pass-by-copy semantics of non-remote arguments to a remote method invocation (shift of semantics) and the inherently more complicated failure modes of remote method invocation (shift of semantics). The paradigm shift makes it necessary to re-engineer large parts of existing centralized programs. Shift of semantics potentially leads to unexpected execution results or run-time errors, if these differences were not taken into account by the programmer. Some of these problems are well-studied and practical solutions have been worked out that make the implementation of distributionrelated aspects more transparent to the programmer [4][1]. In this paper, we study a very particular shift of semantics, namely the shift of thread semantics that arises when adapting a local Java program for execution in a distributed environment. A thread is the unit of computation. It is a sequential flow of control within a single address space (i.e. JVM). More specifically we focus on distributed applications that are developed by means of an object-based control flow programming model like Java RMI or OMG CORBA. It is important to understand that the computational entities in this kind of applications execute as flows of control that may cross physical node boundaries, contrary to how conventional Java threads are confined to a single address space. In the remainder of this paper we refer to such a distributed computational entity as a distributed thread of control, in short distributed thread [2]. A distributed thread is a logical sequential flow of control that may span several address spaces (i.e. Java Virtual Machines (JVMs)). As shown in Figure 1, a distributed thread τ is physically implemented as a concatenation of local (per JVM) threads 1
A local program executes completely within the boundaries of one logical address space, e.g Java Virtual Machine (JVM)
[t1,...,t4] sequentially performing remote method invocations when they transit JVM boundaries.
τ
//media:7000
stub foo
t1
RMI
//multi:8000
t2
t4
bar
toe
t3
//tessy:9000 Figure 1. A distributed thread The semantics of distributed threads differs from local JVM threads. This shift of semantics causes unexpected execution results or run-time errors if these differences were not taken into account by the programmer. In this paper we extend Java programs with the notion of distributed thread identity in order to reduce the shift of threading semantics between local and distributed programming. Propagation of a globally unique distributed thread identity provides a uniform mechanism by which all the program’s constituent objects involved in a distributed thread can uniquely refer to that distributed thread as one and the same computational entity.
execution of a block of “critical” code, called the monitor region. In Java, there are two kinds of monitor regions: synchronized code sections and synchronized methods. Each monitor region in a Java program is associated with an object reference. When a thread arrives at the first instruction in a monitor region, the thread must obtain a lock on the referenced object. The thread is not permitted to execute the code until it obtains the lock. Once the thread has obtained the lock, it enters the monitor region. When the thread leaves the monitor region, the lock is released, such that another thread can obtain it. A single thread is permitted to lock the same object multiple times when re-entering the same monitor region recursively. As a consequence, only the owning thread is permitted to lock an object again. The JVM maintains a count of the number of times the object has been locked. Each time the thread acquires a lock on the same object the count is incremented. Each time the thread releases the lock the count is decremented. When the count reaches zero, the lock is released and is made available for other threads. Figure 2 depicts a threading scenario in which a thread re-enters the same monitor region. Suppose an object foo has a synchronized code section SCS in a method f(). A thread t1 enters this SCS. Inside the SCS a call for method b() on bar is invoked. In the body of b() another call for method f() on foo is invoked. Because thread t1 already owns the lock on the SCS it gets access a second time. Notice that objects foo and bar are constructed and executed within a local program P, that is, both objects execute on the same JVM.
b() f()
We have implemented this Java extension by means of byte code transformation of application programs. As such the mechanism of propagation can be added to existing applications and is reusable on top of different middleware platforms. This paper is structured as follows: first, in section 2 we discuss some well-known problems with Java threads in a distributed execution environment. In section 3 we give a definition of distributed thread and distributed thread identity, and explain why this notion provides the key mechanism to solve the problems in section 2. In section 4 we describe the implementation of distributed thread identity in Java. Section 5 illustrates the benefits of distributed task identity with two examples. Section 6 looks to some related work. Finally we conclude in section 7.
f()
foo
//media:8000 Figure 2. Thread t1 re-enters a synchronized code section. Suppose now that we adapt program P for execution in a distributed environment, such that the objects foo and bar execute on different JVMs. We use Java RMI, a popular distributed control flow programming model, to achieve this.
b()
2. UNEXPECTED DEADLOCK By means of an example, we point out the specific shift in thread semantics that arises when adapting a local Java program for execution in a distributed environment using a distributed control flow programming model such as Java RMI or OMG CORBA. It is this uncontrolled shift in thread semantics that causes subtle problems with distributed execution. Java supports multithreading at the language level. This support is centered on synchronization. Synchronization involves the coordination of activities and data access among multiple threads. The Java Virtual Machine uses the monitor mechanism to support synchronization. A monitor enforces a one-thread-at-a-time
bar
t1
t2 bar
t1 f()
foo
t3
f()
//multi:7000 //media:8000 Figure 3. A distributed deadlock. Figure 3 depicts the new threading situation. Now the execution trace is implemented by a concatenation of three local (per JVM)
threads, sequentially performing remote method invocations when they transit JVMs. Again thread t1 enters the synchronized code section SCS in f() of foo. The call for b() on bar is now a remote RMI call that starts up a new thread t2 at node multi:7000. When bar calls f() the corresponding RMI starts up another thread t3 at node media:8000. Because the identity of t3 however differs from the identity of the application thread t1 it gets no right to enter the SCS in f(). So t3 has to wait for t1 while t1 waits for t3. As a consequence, this execution ends in a deadlock. The problem is clear: the access to the synchronized code section is regulated based on comparing thread identity and this works well for the local execution but not for the distributed execution, because in the latter case the Java RMI implementation uses another thread t3 for re-entering the SCS. However from a control flow point of view, the threads t1 and t3 logically belong to the same distributed thread. The JVM monitor manager is however not aware of this logical connection between threads t1 and t3.
can also be external control instances that provide non-functional services to applications such as load balancing or fault tolerance. With respect to the subject of this paper, we claim that distributed thread identity, implemented as an attribute, provides the basis for an elegant solution for the problem in section 2. A distributed thread identity provides a uniform mechanism by which all the program’s constituent objects involved in a distributed thread can uniquely refer to that distributed thread as one and the same computational entity. Propagation and inspection of distributed thread identity enables kernel object as well as application objects to access the currently executing distributed thread. “A distributed thread’s attributes may be modified and accumulated in a nested fashion as it executes operations within objects.” A programming model for distributed threads sets out the rules for a valid accumulation (creation) and modification of attributes during the distributed thread’s life. With respect to the subject of this paper, two simple rules should be enforced for distributed thread identities:
3. DISTRIBUTED THREADS AND DISTRIBUTED THREAD IDENTITY Existing work[2] in the domain of (real-time) distributed operating systems has already identified the notion of distributed thread has as a powerful basis for solving distributed resource management problems. We borrow the definition of distributed threads from this work: “A distributed thread is the locus of control point movement among objects via operation invocation. It is a distributed computation, which transparently and reliably spans physical nodes, contrary to how conventional threads are confined to a single address space.” As such a distributed thread is a sequential flow of control that can cross physical node boundaries. In opposition to this, a conventional thread is a sequential flow of control that remains within the boundaries of its local address space. With respect to the subject of this paper, a distributed thread can be implemented by the concatenation of local (per node) threads sequentially performing remote method invocations when they transit nodes. The distributed control flow programming models Java RMI and OMG CORBA naturally support this: by remote method invocations and returns, a distributed thread can extend and retract its locus of control point movement among a program’s constituent methods in object instances that may be dispersed across a multiplicity of physical computing nodes. “A distributed thread carries attributes related to the nature, state and service requirements of the computation it represents. These attributes may be inspected by the kernel and its clients.” Distributed thread attributes are thus propagated with the locus of execution point movement and can be inspected by a kernel object and the application objects executing in the environment provided by that kernel. A kernel object is here referred to as the whole of protocols and run-time data structures that support the execution of applications. Examples of kernel objects in the context of this paper are the JVM and middleware platforms with a distributed control flow programming model (e.g. Java RMI). Kernel objects
1.
A distributed thread identity must be accumulated once, more specifically when the distributed thread is scheduled for the first time.
2.
A distributed thread identity must not be modified after its accumulation. This is because it must provide physically dispersed objects with a unique and immutable reference to a distributed thread.
With support for distributed thread identification, it is easy to see how the problem in section 2 can be solved now: The distributed execution in Figure 3 will not run into a deadlock if access to the synchronized code section could be regulated by comparing distributed thread identities.
4. DISTRIBUTED THREAD IDENTITY FOR JAVA We implemented a generic Java extension that implements a simple programming model for distributed threads, in particular distributed thread identity. However before extending distributed Java programming with this notion, the question arises how to do this best. We propose three requirements that must be fulfilled by the chosen implementation strategy. •
It must be possible to integrate the functionality transparently into existing Java applications.
•
The implementation of distributed threads must be portable across different heterogeneous platforms.
•
The mechanism of distributed threads should be dynamically integrate-able with the existing middleware platforms such as Java RMI, OMG CORBA, and Microsoft DCOM.
4.1 Byte code transformation Byte code transformation[3] is an excellent candidate that satisfies all three requirements. First, it extends Java language features in a user transparent way. For example, a custom class loader can automatically perform the byte code transformation at load-time. Second, the functionality of distributed threads can be added without modifying the JVM, as such it is portable on any system, as long as a standard JVM is installed on that system. Dynamic
//transformed method code
integration with a distributed control flow programming mechanism is possible if done carefully. In the remainder of this section, we will occasionally point out in more detail how dynamic integration can be achieved. With BCEL [3], M. Dahm created an excellent API for developers to build their own custom-made byte code transformers. We used BCEL to develop a byte code transformer that extends Java programs with the notion of distributed threads, more specifically distributed thread identity. In the rest of this paper we refer to this transformer as the DTI transformer.
4.1.1 Propagation of distributed thread identity To achieve propagation of distributed thread identities, the DTI transformer extends the signature of each method with an additional argument of class D_Thread_ID. D_Thread_ID is a serializable class that implements an immutable, globally unique identifier. For example, the signature of a method f() of a class C is rewritten as :
f(int i, Bar bar, D_Thread_ID id) { …
bar.b(i, id); … }
When f() is called the D_Thread_ID is passed as an actual parameter to f(). Inside the body of f(), b() is invoked on bar. The body of f() passes on its turn the D_Thread_ID it received as an extra argument to b(). This way the distributed thread identity is automatically propagated with the control flow from method to method. Dynamic integration with a distributed object-based middleware such as Java RMI is simply achieved if the stub classes for the different remote interfaces are generated after the DTI transformation has been performed.
//original signature int f(int i, Bar b) //transformed signature int f(int i, Bar bar, D_Thread_ID id)
All the interfaces implemented by class C must of course also be instrumented in the same way. For pre-defined JDK interfaces, a new interface class is generated. For example the java.lang.Runnable interface is transformed as follows: //original interface interface java.lang.Runnable { public void run();
4.1.2 Accumulation and modification The Java thread programming model offers the application programmer the possibility to start up a new thread from within the run() method of a class that implements the java.lang.Runnable interface. As stated above, the DTI transformer will instrument this kind of classes such that they will implement the D_Runnable interface instead. However, in order to stay compatible with the normal thread programming model the D_Runnable interface extends java.lang.Runnable. The example class Bar illustrates this. Before the transformation this class is defined as: class Bar implements java.lang.Runnable { ..
}
void run() { .. }
//new interface interface D_Runnable extends java.lang.Runnable {
}
After transformation the class is rewritten as: class Bar {
public void run(D_Thread_ID id);
implements D__Runnable {
}
void run(D_Thread_ID id) { .. }
The signature of every method invoked in the body of f() must also be extended with the same D_Thread_ID argument type. Finally, a byte code instruction is inserted before every method invoke-instruction for pushing the D_Thread_ID local variable as additional actual parameter on the operand stack. Thus: //original method code f(int i, Bar bar) { … bar.b(i); … }
void run() { .. } }
As stated in section 3, the identity of a distributed thread must be accumulated at the moment when the distributed thread is started. This behavior is encapsulated in the D_Thread class, which serves as abstraction for creating a new distributed thread. A new distributed thread can simply be started with: Bar b = new Bar(); D_Thread dt = new D_Thread(b); dt.start();
The D_Thread class itself is defined as:
class D_Thread
implements java.lang.Runnable {
public static D_Thread_ID getCurrentThread() {}
public D_Thread(D_Runnable o) { object = o; } public void run() { object.run(id); } public void start() { id = new D_Thread_ID(); new Thread(this).start();
D_Thread_ID inspection by a kernel object can be supported by a kernel-defined (static method, transformer) pair. First, a kernel object’s class implements a kernel-defined interface for accepting notification of the event when a distributed thread passes by a specific marked point in the program. This event listener-style interface defines in particular one or more static ‘bottleneck’ methods [9] for passing the identity of that distributed thread to the kernel object. Second, a kernel-defined byte code transformer inserts on the marked points in the program, a code block for invoking this static method with the current D_Thread_ID. As such the kernel is notified when a distributed thread’s locus of movement passes by that specific point. The examples in section 5 will show that the strong combination of a kernel-defined bottleneck interface plus a conforming transformer makes it easy to replace JVM mechanisms (e.g. monitors) and augment existing middleware implementations (e.g. mobile object platforms) with new kernel-defined functionality acting upon whole distributed threads as individual computational entities.
} private D_Runnable object;
4.2 Implementation of the DTI transformer
private D_Thread_ID id;
At byte code level each method has an indexed list of local variables. For each non-static method the local variable with index 0 is the this-reference. Thereafter follow the arguments as defined in the signature. For static methods the first argument of the argument list gets the index 0. Furthermore the list of nonabstract methods is extended with the 'classic' local variables as defined in the body of the method. Thus extending the signature of a method with the D_Thread_ID argument constructs an additional entry in the list of the local variables of that method. This must be done in a fully consistent way, as well for all references to local variables in the code of the method, as for the Constant Pool references of the class file.
}
As stated in section 3, distributed thread identities may never be modified. As such, D_Thread does not provide any method operations for this. Furthermore, D_Thread_Id objects are stored as either a private instance member of class D_Thread or as a local variable on a JVM stack. Nonetheless it remains possible for a malicious person to modify distributed thread identities. For example it is possible to corrupt D_Thread_Id’s when they are sent over the physical network. Or a byte code transformer could be written that inserts malicious code for modifying the value of the local variable pointing to a D_Thread_ID object. It’s clear that additional security measures are necessary. This is subject to further research.
4.1.3 Inspection of distributed thread identity Since distributed thread identities are propagated with method invocation as an additional, last argument, it is possible to compute for every method definition which local variable on the JVM stack points to the corresponding D_Thread_ID object. Based on this information specific byte codes can inspect the value of the local D_Thread_ID variable at any point in the method code. As stated in section 3, as well application objects as kernel object must be able to inspect the distributed thread identity. Different approaches for implementing inspection must be followed for the two kinds of objects. D_Thread_ID inspection by application object. The application’s constituent methods are the locus of control point movement of a distributed thread. Therefore, every application object is able to inspect the identity of the distributed thread that is currently executing one of its methods. To support this in the distributed thread programming model, the D_Thread class defines a static operation getCurrentThread(), but the implementation of this method is encapsulated in the DTI transformer, which transparently replaces all invocations of this operation with a code block that returns the value of the local D_Thread_ID variable.
Only rewriting the signature of the class methods is insufficient. To propagate the distributed thread identity with the control flow we had to extend the signature of all invoked methods in the body of each method too. Replacing the Runnable interface with the D_Runnable interface is quite easy. We only have to screen the implemented interfaces of each application class and change the corresponding entry in the Constant Pool. Note that the byte code transformer rewrites the byte code in a selective way. The current implementation does not rewrite calls on methods of the JDK classes. This choice is basically made for practical reasons. This is however subject to further research, because for certain application use cases or requirements of kernel objects it may be necessary to rewrite the JDK libraries.
5. DISTRIBUTED THREAD IDENTITY AT WORK In this section we illustrate the benefits of distributed threads and distributed thread identity by showing how the problem described in section 2 and other problems can be solved.
5.1 Unexpected deadlock with monitor First we look how the JVM monitor mechanism works in a local execution environment. Then we look at the problem with execution in a distributed environment and will see how it runs
into trouble. Finally, we show how distributed thread identity offers a solution for the problem.
5.1.1 Local execution We go out from an object configuration as illustrated in Figure 2 and follow the code sample of Figure 4. class Foo {
the operand stack as argument for the monitorenter instruction. Then at line 7, access to the synchronized code section is regulated by the monitorenter instruction. Then follows the synchronized code section. In line 43 the locked reference is pushed again on the operand stack for releasing the lock. The code from line 45 until 50 releases the lock in case an exception is thrown. Method void f()
... void f() {
0 invokestatic #21 3 astore_1
Bar bar = new Bar();
4 aload_0 synchronized(this) { if(firstTime) {
5 astore_2 6 aload_2 //push object to lock
System.out.println("first time!");
7 monitorenter
firstTime = false;
8 aload_0
bar.b(this);
9 getfield #11
}
...
else
40 invokevirtual #13
System.out.println("second time!"); }
43 aload_2 //push locked object 44 monitorexit 45 goto 51
}
48 aload_2 //push locked object class Bar {
49 monitorexit
...
50 athrow
void b(Foo foo) { foo.f();
} }
51 return Exception table: from
to
target type
8
43
48
any
Figure 4. A block of synchronized code.
Figure 5. Original byte code for a synchronized code block.
Assume a thread t1 invokes f() on object foo. First it creates bar, a new Bar object. Next it meets the synchronized code section (SCS), so t1 must obtain the lock on object foo before it can enter the SCS. Because t1 calls f() for the first time the variable firstTime is still set. So the control flow chooses the if-branch, prints some text to the output, clears firstTime and invokes method b() on bar with the this reference as a parameter. In b() another call for f() is invoked on the same object foo. Again t1 constructs a Bar object and meets the SCS. Because thread t1 already owns the lock on the object foo it is permitted to re-enter the SCS. This time firstTime is false and so t1 executes the elsebranch. Another text is printed to the standard output. Next t1 leaves the SCS twice and thus releases the lock on the object foo, making it available for other threads.
The analysis in section 2 has shown that a deadlock will occur because the JVM monitor does not allow the distributed thread to re-enter the critical section. This is because when trying to obtain the lock on object foo a second time, the distributed thread of control implicitly executes in a new JVM thread that was spawned by the remote method invocation.
5.1.2 Distributed Execution Suppose we adapt the above program for execution in a distributed environment such that objects foo and bar reside on different hosts, as illustrated in Figure 3. Lets first have a look how access to the synchronized code section is implemented at the byte code level. Figure 5 shows the relevant fragments of the f() method in Foo. The specific code associated with the synchronized code section is marked in bold. In lines 4-5 the reference to the object to lock is placed into local variable 2 (here, the ’this reference’ at local variable 0 is the object to lock). This local variable will be used by both the monitorenter and monitorexit instructions. In line 6 the object to lock is pushed on
To solve this problem we develop a new kernel object, D_Monitor that regulates access to the SCS based on distributed thread identity, instead of JVM thread identity. A monitor-defined byte code transformer (see Figure 6), associated with the D_Monitor, inserts on appropriate points in the program, code blocks for inspecting the D_Thread_ID of the current distributed thread (lines 5, 45 ,53). Furthermore, the monitorenter and monitorexit JVM instructions are replaced by invocations of two static bottleneck methods defined on the D_Monitor kernel with the same name (line 6, 46, 54). These methods take the original reference to the locked (or to be locked) object and the distributed thread identity as arguments. In this way, the D_Monitor can map each locked object on the D_Thread_ID that currently owns the lock on that object. Whenever a thread invokes monitorenter(), D_Monitor checks the map to see if the lock is free. If so the thread gets the lock on that object and can enter the SCS. If the lock is already taken a comparison between the D_Thread_ID of the requesting thread and the D_Thread_ID owning the lock is performed. If the distributed thread identities are not equal, the requesting thread has to wait. In the other case, when the distributed thread identities do match, the requesting thread is
allowed to re-enter the synchronized code section again. A counter associated with the lock is incremented each time the distributed thread re-enters the synchronized code section. Each time monitorexit() is performed the counter on the lock is decremented. If this counter reaches zero the lock on the object is released and waiting threads are notified so they can compete subsequently for the lock.
τ
lock
wait() RMI
τ
kernel object Local D_Monitor
Local D_Monitor
JVM1
JVM2
Method void f(D_THREAD_ID) 0 invokestatic #21 3 astore_1 4 aload_0
//push object to lock
5 aload_1
//push current D_THREAD_ID
τ
distributed application
bottleneck interface
τ
τ
waitPerformedBy( )
releaseAllLocks( )
DistributedCoordinator
6 invokestatic
Figure 7 Implementing a distributed monitor
9 aload_0 10 getfield #11 ... 41 invokevirtual #13 44 aload_0 //push locked object 45 aload_1 //push current D_THREAD_ID 46 invokestatic 49 goto 58 52 aload_0 //push locked object 53 aload_1 //push current D_THREAD_ID 54 invokestatic 57 athrow 58 return Exception table: from 9
to 44
target type 52
any
Figure 6 Rewritten byte code for the synchronized code block of the example in Figure 5 Once the class files of our example in Figure 4 have been hauled subsequently through the DTI and D_Monitor-defined byte code transformers, the execution in the distributed environment will no longer end in a deadlock. Distributed thread identity obviously provides here the key mechanism that allows us to implement locking semantics of local Java programs.
5.2 Implementing a distributed monitor Distributed thread identity also flattens the way for implementing a kernel object that implements a distributed monitor completely conforming to the monitor semantics for local Java programs For example, the monitor semantics for local Java programs imposes that if a thread performs a wait() operation, then all the locks owned by that thread must be released. Again, distributed thread identity provides the key mechanism that allows us to preserve these semantics for distributed threads also. For example in Figure 7, a distributed thread τ starts at JVM1, takes there a lock, then moves on to JVM2 and performs their a wait() operation. At that moment, the monitor semantics requires that all locks must be released, thus also the one in JVM1.
In order to realize this, different local (per JVM) D_Monitor kernel objects must synchronize with each other. For example, in Figure 7, all D_Monitor objects must be notified when the distributed thread τ performs a wait operation somewhere. In the case of such an event, the distributed thread identity is again the key information needed by the D_Monitor objects in order to know which locks must be released. When the distributed thread transcends back to JVM1, the distributed thread must first reobtain the locks, which where released when performing the wait() operation. To achieve the latter, a new bottleneck method is defined on D_Monitor. The D_Monitor-defined byte code transformer must insert byte codes (for invoking this new bottleneck method) at the necessary points in the application’s program, for example on return of the remote method invocation.
5.3 Distributed thread serialization In this section we describe how distributed thread identity solved a problem that we encountered with ongoing work in a new project. We first shortly introduce the goal and main approach of this project, before discussing the problem and the solution with distributed thread identities.
5.3.1 Introduction to the project The goal of this project is the development of a byte code transformer and management architecture that supports run-time repartitioning of intra-domain, Java RMI-based, distributed applications. Run-time repartitioning aims to improve the global load balance or network communication overhead of a running application by repartitioning the object configuration of the application dynamically over the available physical nodes at runtime. Existing work offers this support in the form of new middleware platforms with a dedicated execution model and object migration support that aligns well with run-time repartitioning [8][11]. A disadvantage of this approach is that existing ordinary RMI-based applications, which have obviously not been developed with support for repartitioning in mind, must partially be rewritten such that they become compatible with the programming model of the new middleware platform. The approach of our project is to develop a byte code transformer that transparently adds new functionality to an existing Java application such that this application becomes automatically repartition-able by the external monitoring and management architecture. The motivation behind this approach taken is that programmers do not want to distort their applications to match the
programming model of whatever new middleware platform. In our approach, the byte code transformation inserts wrappers that transparently integrate support for object migration (possibly implemented by such new middleware platforms) This run-time repartitioning process consists of 4 successive phases. In the first phase, the management architecture allows an administrator to monitor the application’s execution + specify a specific request for repartitioning the application over the available physical nodes. In the second phase, the application takes a snapshot of its own global execution state, capturing the state of all threads that are executing in the application. After this, the execution of all application objects is temporarily suspended and the corresponding thread states are stored as serialized data in a global thread context repository. In the third phase, the management architecture carries out the initial request for repartitioning by migrating the necessary objects over the network. Existing work [4] has shown that support for object migration can be transparently added to the application by a byte code transformer. In the final and fourth phase, the execution of the application is resumed. Before the execution can be resumed, the execution state of all threads is first reestablished from the stored data in the global thread repository. The advantage of having the phases two and four is that execution and object migration are kept completely orthogonal to each other, avoiding race conditions and tricky problems with migration. For example a typical problem, solved in this way, is that an object waiting on a pending reply of a (remote) method invocation can anyway be migrated. Unfortunately capturing of execution state is completely not supported by current Java technology. However, we already had implemented a portable mechanism for serialization of local JVM threads, called Brakes [10]. This thread serialization mechanism is implemented by instrumenting the original application code at the byte code level, without modifying the Java Virtual Machine. Capturing (and reestablishing) the execution state of a distributed application, developed with a distributed control flow programming model such as Java RMI, requires however a mechanism for serializing (and deserializing) distributed threads. Reusing Brakes for this required a complete redesign of the Brakes byte code transformer. Before discussing this problem, we shortly describe the implementation of Brakes. In Brakes the execution state of a thread is extracted from the application code that is executed in that thread. For this, a byte code transformer inserts capture and reestablishing code blocks at specific code positions in the application code. With each thread three flags (called isSwitching, isRestoring, isRunning) are associated that represent the execution mode of that specific thread. When the isSwitching flag is on, the thread is in the process of capturing its state. Likewise, a thread is in the process of reestablishing its state when its isRestoring flag is on. When the isRunning flag is on, the thread is in normal execution. Each thread is associated with a separate Context object into which its state is switched during capturing, and from which its execution state is restored during reestablishing. The process of capturing a thread’s state (indicated by the emptyheaded arrows in Figure 8) is then implemented by tracking back the control flow, i.e. the sequence of nested method invocations
that are on the stack of that thread. For this the byte code transformer inserts after every method invocation instruction a code block that switches the stack frame of the current method into the context and returns control to the previous method on the stack, etc. This code block is only executed when the isSwitching flag is set. The process of reestablishing a thread’s state (indicated by the full-headed arrows in Figure 8) is similar but restores the stack frames in reverse order on the stack. For this, the byte code transformer inserts in the beginning of each method definition a code block that restores stack frame data of the current method and subsequently creates a new stack frame for the next method that was on the stack, etc. This code block is only executed when the isRestoring flag is set.
thread ti
3
b()
f()
1
foo
1
3 restore 2
bar
switch 2
b()
isSwitchingi
f()
call stack ti
contexti
capturing
isRestoringi reestablishing
Figure 8 Thread Capturing/Reestablishment in Brakes A context manager, acting as a kernel object towards the distributed application, manages both Context objects and flags. The inserted byte codes switch/restore the state of the current thread into/from its context via a context-manager-defined bottleneck interface. Context objects are managed on a per thread basis by the context manager. So every thread has its own Context object, exclusively used for switching the state of that thread. The right context object is looked up by the context manager using the thread identity as hashing key.
5.3.2 Capturing Distributed Threads This section describes the problem that we encountered when trying to reuse Brakes for capturing distributed threads without underlying support for distributed thread identity. In order to implement phases two and four of the repartitioning process robustly, such distributed threads should be captured/reestablished as whole entities in one atomic step. However, we developed Brakes without having anticipated the need for capturing distributed threads. In Brakes execution state is after all saved on per local JVM thread basis. This works well for capturing local control flow but not for capturing a distributed thread of control. Figure 9 illustrates the problem for an example distributed thread τ, implemented by a concatenation of multiple local (per JVM) threads, sequentially performing remote method invocations when they transit JVMs. For example, once thread ti reaches method b() on object bar, the call p() on object poe is performed as a remote method invocation. This remote call implicitly starts a new thread tj at host multi.
//media :7000
//multi :8000
stub b()
foo
f()
bar
thread ti
RMI
p()
r() poe
thread tj
τ
New static bottleneck interface for inspecting distributed thread identity. As stated in the description of Brakes, JVM thread stack frames are switched into the associated Context object via a static bottleneck interface defined by the context manager. For example, switching an integer from the stack into the context is done by inserting in the method’s byte code at the appropriate code position an invocation of the following static method.
p() b()
r()
f()
p()
call stack ti
contexti
i
j context j
i
public static curPushInt(int i) { Context c = getContext(Thread.currentThread());
call stack tj
c.pushInt(i);
j
}
The appropriate Context object into which to switch is looked up with the current thread identity as hashing key. Such pushmethods are defined for all basic types as well as for object references. Complementary, there are pop-methods for restoring execution state from the context. Crucial in the push and pop methods is the identification of the Context object for the actual thread.
Figure 9 Context per JVM thread Physically, the threads ti and tj hold their own local subset of stack frames, but logically the total set of frames belongs to the same distributed thread. The context manager is however not aware of this logical connection between threads ti and tj. As a consequence contexts and flags of these JVM threads will be managed as separate computational entities, although they should be logically connected. Without this logical connection, it becomes difficult to robustly capture and reestablish a distributed thread as a whole entity in one atomic step. For example, it becomes quasi impossible for the context manager to determine the correct sequence of contexts that must be restored for reestablishment of a specific distributed thread. Of course we could modify the Brakes transformer with support for capturing distributed thread of controls, but this involves huge changes in the design of the Brakes transformer.
However, in order to allow the context manager to manage Context objects on a per distributed thread basis, this static bottleneck interface must be changed such that the context manager can inspect the identity of the current distributed thread. public static pushInt(int i, D_Thread_ID id) { Context c = getContext(id); c.pushInt(i); }
As such the Brakes byte code transformer must only be modified to support “D_Thread_ID inspection by a kernel object” (see section 4.1.3).
5.3.3 Distributed thread identity to the rescue Distributed thread identity provides an elegant solution to this problem. More specifically if we augment the Brakes transformer with functionality for D_Thread_ID inspection by a kernel object (i.e. the context manager) via a bottleneck interface, it is possible to implement a context manager that manages contexts on a per distributed thread basis (see Figure 10). This update to the Brakes transformer entails only slight modification. //media :7000
thread ti
Application
τ
//multi :8000
stub b()
foo
f()
Distributed Architecture. Figure 10 motivates clearly that the implementation of the context manager must become distributed now.
Application
bottleneck interfaces
LocalContextManager bar
RMI thread tj
p() poe
r()
τ
//media:8000
τ r()
τ
τ
LocalContextManager
//multi:9000
τ
DistributedThreadManager
p() b() f()
call stack τ
Contextτ
τ
τ τ
Contextτ
//localhost:7000
D_Threadτ
Figure 10 Context per Distributed Thread The local implementation of the context manager must be adapted in two ways:
Figure 11. Distributed Architecture of the context manager.
Figure 11 sketches the architecture of such an distributed implementation. Capturing and restoring code blocks still communicate with the bottleneck interface of the local context manager, but the captured execution state is now managed per distributed thread by a central distributed thread manager. The distributed thread manager manages global flags that represent the execution state of the distributed application as a whole. These global flags are kept synchronized with the flags of the local context managers. DistributedThreadManager offers a public interface to initiate the capturing and reestablishing of the distributed execution state. The capturing of the execution state of a distributed application is started by calling the operation captureState()on the distributed thread manager. This method sets the isSwitching flag on all local context managers by broadcast. As soon as a distributed thread detects that the isSwitching flag is set, the distributed thread will start switching itself into the context. Reestablishment of the distributed application is initiated by calling the operation resumeApplication() on the distributed thread manager. This method sets the isRestoring flag on each local context manager and restarts the execution of all distributed threads by calling the start() method defined on D_Thread. Each distributed thread detects immediately that the isRestoring flag is set, and thus restores itself from the context.
5.3.4 Concrete Application We developed a prototype implementation for run-time repartitioning, conforming the 4 phases as described in section 5.3.1. Since object migration is necessary during the third phase of the run-time repartitioning process and we did not yet implement a byte code transformer that transparently adds object migration to an application (as in [4]), we initially used the mobile object system Voyager 2.0 [6] as distributed programming model. We implemented a simple text translation system in Voyager. The application is then transformed by the DTI transformer, and subsequently by the Brakes transformer. In order to allow dynamic integration with Voyager, stubs for remote interfaces must be generated after the byte code transformations are performed. The distributed architecture as shown in Figure 11 defines an abstract framework that must be instantiated by a concrete distributed implementation. We used Voyager for this too although Java RMI was also possible. LocalContextManager and DistributedThreadManager are thus implemented as Voyager objects. Context-manager-defined bottleneck interfaces are dynamically added to the application by the Brakes transformer.
6. RELATED WORK The discussion on related work is organized according to the related fields that our work touches.
6.1 Distributed Threads As stated in section 3, the notion of distributed thread has been introduced in the Alpha distributed real-time OS kernel developed by D. Jensen at CMU [2]. This work focuses on a distributed control flow programming model for implementing distributed real-time applications, which allows to specify for each distributed thread the acceptable end-to-end timeliness. The main goal of
distributed threads in the Alpha kernel was integrated end-to-end resource management based on propagation of scheduling parameters such as priority and time constraints. In contradiction to this, our motivation behind adopting the notion of distributed threads is to reduce the shift of threading semantics between local and distributed programming based on propagation of distributed thread identity. Recent joint work of Jensen with Sun and OMG CORBA is applying the notion of distributed threads to distributed object systems. Within the context of a Java Specification Request an initial specification and reference implementation, enhancing the Java RMI implementation with the distributed thread abstraction, has been completed [5]. OMG CORBA has incorporated into the joint proposal for Dynamic Real-Time CORBA the distributed thread abstraction too [7]. We however showed that the notion of distributed thread (identity) can also be implemented at the application-level by means of a generic byte code transformation algorithm. The advantage of this is that the mechanism of distributed thread identity can be integral reused on top of different Java-enabled middleware platforms. The examples in section 5 have also shown that the strong combination of a middleware-defined bottleneck interface and a conforming byte code transformer makes it easy to extend existing middleware platforms with the capacity of acting upon distributed threads as whole computational entities.
6.2 Byte Code Transformations for Distributed Applications The Doorastha system[4] allows to implement fine-grained optimizations for distributed applications just by means of code annotations. Doorastha is not an extension or a superset of the Java language but instead is based on annotations to pure Java programs. By means of these annotations it is possible to dynamically select the required semantics for distributed execution. For example, Doorastha is not restricted to pass objects of a certain class always by-copy or always by-reference, but allows to switch the transfer mode transparently. This allows to develop a program in a centralized (multi-threading) setting first and then prepare it for distributed execution by annotation. A byte code transformer will then generate a distributed program whose execution conforms to the selected annotations.
6.3 Concurrent Object-Oriented Languages The active object model[1][8] integrates object-orientation and concurrency in a homogeneous object model. This work starts from the premise that traditional synchronization mechanisms such as monitors are too low-level mechanisms to control the concurrency inherent to distribution. Instead the active object model describes an application as a set of active, autonomous objects that have control over their own synchronization. A request on an active object is not immediately processed but delayed until the active object explicitly accepts it. Guards can be defined on methods that define in which states of the active object a method may be accepted by the active object. This paper however focuses on the typical problems related with object-based distributed control flow programming models, such as Java RMI and CORBA, which are nowadays the standard distribution platforms. The solutions proposed in this paper are thus only relevant in the context of such programming models. The active object model provides other mechanisms and structures
that also cope well with some of the problems described in this paper (e.g. serialization of distributed execution state [11]).
9. REFERENCES
7. CONCLUSION
[1] G. Agha. Actors: “A Model of Concurrent Computation in
In this paper we described several problems associated with the execution of Java programs in a distributed execution environment. We showed that a synchronized code section that is perfectly executed in a local setting runs into a deadlock when the application is distributed by means of a distributed control flow programming model such as Java RMI, or OMG CORBA. We also showed that a mechanism for the serialization of a local execution state no longer works for capturing distributed execution state. Both problems are illustrations of the same general problem: the inability to identify distributed threads as whole computational entities. We integrate the notion of distributed threads into Java programs as a mechanism to reduce the shift of threading semantics between local and distributed programming. Propagation of a globally unique distributed thread identity provides a uniform mechanism by which all the program’s constituent objects involved in a distributed thread can uniquely refer to that distributed thread as one and the same computational entity. We have employed byte code transformation to implement this mechanism at the application-level. The strong combination of a middlewaredefined bottleneck interface and a conforming byte code transformer enables also middleware platforms to act upon a distributed thread as one and these same computational entity.
8. ACKNOWLEDGEMENTS This research was supported by a grant from the Flemish Institute for the advancement of scientific-technological research in the industry (IWT). We would like to thank Tim Coninx and Bart Vanhaute for their valuable contribution to this work.and the many fruitful discussions with them. Finally, a word of appreciation goes to Tom Holvoet and Frank Piessens for their useful comments to improve this paper.
Distributed Systems”, MIT Press, 1986.
[2] R Clark, D.E. Jensen, and F.D Reynolds, “An Architectural Overview of the Alpha Real-time Distributed Kernel”, in Proceedings of the USENIX Workshop on Microkernel and Other Kernel Architectures, April 1992.
[3] Markus Dahm, Freie Universität Berlin. Byte Code Engineering. In Clemens Cap, editor, Proceedings JIT'99, 1999.
[4] Markus Dahm, Freie Universität Berlin. The Doorastha system, Technical Report B-I-2000.
[5] D. E. Jensen, Distributed Real-Time Specification Java Specification Request 000050, http://java.sun.com/aboutJava/communityprocess/jsr.html
[6] ObjectSpace Inc., Object space VOYAGER,
Core
Technology 2.0, 1998.
[7] OMG, Joint Initial Proposal for Dynamic Real-Time CORBA, October 1999, Object Management Group
[8] B.
Robben. Language Technology and Metalevel Architectures for Distributed Objects, Phd KULeuven, 1999, ISBN 90-5682-194-6.
[9] C. Szyperski, Component Software: Beyond Object-Oriented Programming, Addison-Wesley, 1996.
[10] Eddy Truyen, Bert Robben, Bart Vanhaute, Tim Coninx, Wouter Joosen and Pierre Verbaeten, KULeuven, Department of Computer Science, B-3000 Leuven, Belgium. Portable Support for Transparent Thread Migration in Java, 2000.
[11] Eddy Truyen, Bart VanHaute, Bert Robben, Frank Matthijs, Erik Van Hoeymissen, Wouter Joosen, Pierre Verbaeten, "Supporting Object Mobility - from thread migration to dynamic load balancing", OOPSLA'99 Demonstration, November '99, Denver, USA.