Region-based Memory Management for Real-time Java * Teresa Higuera, Valerie Issarny INRIA-Rocquencourt, Domaine de Voluceau, BP 105,78153Le Chesnay Ctdex, France Email:
[email protected]
Michel Banltre, Gilbert Cabillic, Jean-Philippe Lesot, Frkdkric Parain INRIA-IRISA, Campus de Beaulieu 35032 Rennes Ckdex, France
Abstract
This paper focuses on one such extension that we are developing. This relates to making Java garbage collection real-time while accounting for relevant Java specifications2, i.e., the Real-Time Specification for Java (RTSJ) from the RTJEG (The Real-Time for Java Expert Group) [4] and the K-Virtual Machine3 (KVM) targeting small-memory, limited-resource, network connected devices. Implicit garbage collection has always been recognized as a beneficial support from the standpoint of promoting the development of robust programs. However, this comes along with overhead regarding both execution time and memory consumption, which makes (implicit) garbage collection poorly suited for small-sized embedded real-time systems. However, there has been extensive research work in the area of making garbage collection compliant with real-time requirements. Main results relate to offering garbage collection techniques that enable bounding the latency caused by the execution of garbage collection. Proposed solutions may roughly be classified into two categories:
This paper addresses the issue of improving the performance of memory managementfor real-time Java applications, building upon the Real-Time Specification for Java (RTSJ) from the Real-Time Java Expert Group. In afirst step, a thorough analysis of the parameters influencing the performance of memory management together with ways of improvement are presented. The implementation of a memory management solution compliant with the RTSJ and integrating the proposed improvements is then sketched. Keywords : Java, Real-Time, Embedded, Garbage Collection, Memory Regions, Write Barriers, Performance.
1. Introduction The use of wireless Personal Digital Assistant (PDA) devices is foreseen to outrun the one of PCs in the near future. However, for this to actually happen, there is still the need to devise adequate software and hardware platforms. The use of PDAs should be as convenient as the one of PCs and in particular must not overly restrict the applications that are supported. In general, the environment must accommodate the embedded small-scale constraints associated with PDAs, and enable the execution of the applications traditionally supported on the desktop such aq soft real-time multimedia applications that are becoming increasingly popular. In the context of the activities of the Solidor group at INRIA,we are currently developing a Java-based software environment accounting for the above requirements l. Although, Java ha9 some shortcomings regarding the target devices, these shall be solved in the near future in the light of ongoing work on extending Java to meet the requirements appertained to embedded real-time software [9].
( I ) Incremental garbage collection (e.g., [2]) enables the interleaved execution of the garbage collector with the application. Since this allows the application to execute while the Garbage Collector (GC) has been launched, a mechanism, called barrier, is used to keep the state of the GC consistent by coordinating the execution of the GC and of the application. For instance, write barriers detect when the application updates pointers.
(2) Region-based memory allocation (e.g.. 151) enables grouping related objects within a region. Commonly, regions are used explicitly in the program code as shown in Figure 1. The proposed code shows a realtime thread, which allocates an array of 10 integers in
.
'http: //www. rt j o r g and http://www.j-consortium.org/rtjwg.html 3http://java.sun.com/products/cldc/wp/
*This work has been padally funded by Texas Instruments. 'http://www.irisa.fr/solidor/work/scratchy.htrd
387 0-7695-1089-2/01 $10.00 0 2001 EEE
the heap, and another of 20 integers in the memory region called myRegion. import javax.realtime; class Allocator implements Runnable { public void run() { HeapMemory.instance 0 .newArray (Integer, 10) ; int[l x = new int[20];
1 1 class RegionUseExample { public static void main (String[] args) { ScopedMemory myRegion = new VTMemory (1024, 2*1024) ; RealtimeThread task = new RealtimeThread( null, null, new MemoryParameters (1024, 0) , myRegion, null, new Allocator ( ) ) ; task-start 0 ;
1 1
Figure 1. Using memory regions In RTSJ. Note that the two above collection strategies are complementary4: incremental garbage collection may be used within some regions in order to limit their size, while the use of regions allows reducing the runtime overhead due to garbage collection. Application of the above strategies has been studied in the context of Java, which is in particular highlighted by the RTSJ. The RTSJ introduces memory regions and allows the implementation of real-time compliant garbage collectors to be run within regions except within those associated with hard timing constraints. The extensive research work in the area of garbage collection gives us with sound base ground to tackle the issue of devising an efficient implicit memory reclamation strategy that is compliant with the RTSJ and hence combines incremental GC and memory regions. We specifically consider the following incremental GC: the non-copying collector called treadmill. The basic algorithm is as follows: an object is colored white when not reached by the GC, black when reached, and grey when it has been reached, but its descendants may not be (i.e., they are white). Grey objects make a wavefront, separating the white (unreached) from the black (reached) objects, and the application must preserve the invariant that no black objects have a pointer to a white object, which is achieved using read barriers in [21. The collection is completed when there are no more grey objects, and all the white objects can then be recycled. To
coordinate the application and the GC we use write barriers instead of read barriers [ 151. This decision is motivated because write barriers are more efficient than read barrier, allows cooperation with our region barrier implementation, and the resulting collector can further be easily extended to generational, distributed, and parallel collection. In the context of RTSJ, this paper proposes suclh a study, focusing on minimizing the execution time overhead caused by write barriers in implicit memory reclamatim. The area of minimizingmemory consumption (e.g., dealing with memory fragmentation) is as crucial but is ignored here. A thorough analysis of the parameters influencing the performance of write barriers is presented regarding the management of memory regions (Section 2). A solution improving the write barrier performance of both memory regions and the GC is then given (Section 3). Results of this analysis are further exploited to derive a memory management solution for a Java environment aimed at. wireless PDAs, which we are experimenting through adaptation of the KVM (Section 4). Finally, a summary of our contribution together with an overview of our ongoing and future research work towards offering an overall memory management solution for next-generation wireless PDAs conclude this paper (Section 5).
2. Analyzing the Performance of Regioin Management From a real-time perspective, the GC introduces unpredictable pauses that are not tolerated by real-time task. Real-time collectors eliminate this problem but introduce a high overhead. This must not lead to undertake the unsafe primitive solution that consists in letting the application programmer to explicit deal with memory reclamation. An intermediate approach is to use memory regions within which allocation and deallocation are customized, and also space locality is improved. Each memory region is then managed so as to embed objects that are related regarding associated lifetime and real-time requirements. Such a facility is supported by the RTSJ through the three following kinds of regions [4]: ( 2 ) immortal memory that contains objects whose life ends only when the JVM terminates; ( i i ) (nested) scoped memory that enables grouping objects having well-defined lifetimes and that may either offer temporal guarantees or not on the time taken to create objects; and (iii) the conventional heap. The immortal memory region YS never subject to garbage collection and may be exploited by hard real-time tasks. Scoped memory regions may or may not be subject to internal real-time garbage collection depending on their temporal properties. A scoped region gels collected as a whole once it is no longer used. Garbage collection within the heap relies on the (real-time) G C of the JVM.The RTSJ imposes strict rules on assignments to or
‘We have omitted the generational garbage collection strategy, which enables minimizing the overhead caused by garbage collection. However, this may be seen as a special case of the region-based scheme although objects may move among regions. Under generational collection, objects get grouped within regions according to their age.
388
thread using the scoped region exits, or when an inner scope returns from its enter ( 1 method. Note that by collecting regions, problems associated with reference counting collectors are solved: the space to store reference counters is minimal, and cyclic structures can be collected because they are within the same region. Before cleaning a region, the finalize ( ) method of all the objects in the region must be executed, and it cannot be reused until all the finalizers execute to completion.
from regions. The JVM must detect illegal assignments and throw an exception when they occur. Since objects allocated in regions may contain references to objects in the heap, the GC must scan regions for references to objects within the heap. In general, the management of memory regions introduces overhead, which we characterize in the following subsections.
2.1. Region Management Overhead
2.2. Region Barrier Implementation
From a real-time perspective. regions give predictable performance, since the cost of every allocation operation is easily bounded. The overall cost introduced by region management is given by the cost associated with: region allocation, object allocation, reference counter updates, and region deletion. The time cost to allocate a new region is always constant. The region implementation given in 151 presents a time overhead that is constant per instruction executed. In RTSJ, the ScopedMemory abstract class has three subclasses: VTMemory, LTMemory, and ScopedPhysicalMemory. Since objects allocated in these memory regions are not garbage collected5it is safe to associate this subclasses with a NoHeapRealt imeThread. Assuming that objects in a ScopedMemory region are not subject to garbage collection and may not be moved, the time to allocate an object is proportional to the object size, and in the worst case may include time to acquire additional memory for the region. Whereas an allocation in a VTMemory region may take variable time, the time taken in a LT Memory region is linear to the object size. Then, the memory space for a LTMemory region must be continuous6,and its size is further specified upon creation, remaining fixed over its lifetime. Then, an instance of VTMemory is created with an initial size and may grow up to a given maximum size. The ScopedPhisicalMemory subclass support aregion with special memory attributes (e.g. ALIGNED, BYTESWAP, DMA, and SHARED). Scoped regions can be nested. A safe region implementation requires that a region gets deleted only if there is no external reference to it. This problem has been solved by using a reference-counter for each region. Then, every scoped region is associated with a reference counter that keeps track of the use of the region by threads, and a simple reference-counting GC collects scoped memory regions when their counter reaches 0. The reference-counter is increased when entering a new scoped through the enter ( ) method, the creation of a real-time thread with a scoped region, or the opening of an inner scope. It is decreawd when returning from the enter ( ) method, when the real-time
The life of objects allocated in scoped regions is governed by the control flow. To maintain the safety of Java and avoid dangling references, objects within a scoped region can only be referenced by objects from the same region or within an inner region, and objects within immortal regions or within the heap cannot reference an object allocated in a scoped region. The above must be checked when executing instructions that store references within other objects (or arrays), which can be implemented by a region stack associated with each memory region, similarly to the contaminated collection solution [3]. We detail below how to support such a functionality: When an object is created, it is associated with the scope of the active region. The putfield (aputfield-quick) instruction causes the object X to reference Y,whereas the aastore (aastore-quick)instructionstores a reference Y into an array X of references. Then, the scope of X must be inner than the scope of Y. This check can be made by using the region stack, from the scope of X (the active region) down to the scope of Y (an outer region). If the scope of Y is not found in the stack (i.e., the heap that is the outest region and hence at the bottom of the stack is reached), this is notified by throwing an exception. The putstatic (aputstatic-quick) insmction causes the scope of the referenced object to be the outermost region (i.e., the bottom of the stack). Then, checking the stack is not needed. When removing a region, the top of the region stack is adjusted, and it is sure that there is no object dependent on an older scoped region. As an exception, the NoHeapRealt imeThread tasks must not interfere with the collector, and cannot access any pointer within the heap. This is dealt with by not including the heap scope in the region stack assotiated to these mks.
'This is always the case for objects within an instance of LTMemory but this is not mandatory for objects within an instance of VTMemory. 6Typically, the ScopedMemory implementation is made by using malloc ( ) and free ( ) routines to manipulate memory.
389
structions executed by a Java application is a write: into the heap or another memory region. Thus we estimate ithe interregion reference overhead as 0.05 * wrifeBarrierRegion, where the torifeBarrierRegioii parameter is the average number of instructions executed by the algorithm diescribed in 2.2. In the same way. we calculate the write barrier cost introduced by the GC (i.e., 0.05 * torifeBorrierCol!ector). But. in this case. the uiriteBarrierCollecfor parmeter is the average cost to preserve the three colors invariant (e.g., test if a white object is referenced by a black object. then grey the referenced object and link it to the greyLisr). Since in our system the GC coexists with memory regions. some additional overhead is introduced, because a test to check whether both objects are allocated into the heap must be added9.
23. Interaction with the GC Since objects allocated within regions may contain references to objects in the heap, the GC must scan regions for references to objects within the heap. Thus. the collector must take into account the external references. adding them to its reachability graph. To facilitate this task. each object allocated outside the heap is colored black. In this way. a reference from an object allocated in a region (i.e., black) to an object in the heap that is still not reached (i.e.. white) is treated as a write barrier (i.e., the white object is greyed so as to bereached by the GC). In addition,as NoHeapRealtimeThread tasks must not interfere with the collector, and cannot access any object within the heap; we introduce a fourth color (e.g., red) meaning that the object cannot reference objects within the heap. A reference from a red object allocated for a critical task to another object allocated in the heap (i.e., white, black, or grey) causes a MemoryAccessError exception. Then, the performance of our collector is impacted by the write barrier overhead introduced by the cooperation with memory regions'. We thus add the getWriteBarrierOverhead ( 1 method to the MemoryArea abstract class. which further serves identifying region barrier overhead (see 2.4). Note that for write barrier-based collectors (e.g.. incremental or generational collector), this method gives the write barrier overhead caused by the GC.
JACK
I
8 , 5 5 3 * IO6
% of R e f e r e 1 7 Into the Heao
Data References
Executed Instructions
I
2,Ol-l*
lo6
I
50.i-l
3
Table 1. Memory reference characteristics. The most common approach to implement write barriers is by in line code. consisting in generating the instructions executing write barrier events with every store olperation. This solution requires compiler cooperation (e.g.. IIIT), and presents a serious drawback because it will double the application's size. Regarding systems with limited memory such as PDAs. this code expansion overhead is considered prohibitive. Alternatively, we can instrument the bytecode interpreter, avoiding space problems, but this still requires
bits) as %OOOOOOOOOOO, and the CARMASK field (< 20 : 16 > bits) as %11111,we divide the memory address space in 32 regions, each onedivided in 16KBytes cars (see Figure 3). If we choose a %11110 value for the CAR-MASK, then we have 16 regions, and a car size of 32K Bytes. The reference-based write barriers of picoJavaI1 (see Figure 4) can be used to implement incremental collectors. An incremental collector traps when a white object is written into a black object (e.g., the GC-TAG field for black objects is set to %11and is set to %00 for white objects). In order to use this hardware mechanism, we adapt our algorithm as follows: (i) in the header object, the < 31 : 30 > bits give the color of the object and the < 18 : 14 > bits
3.2 Improving Write Barrier Performance Regarding the proposed solution, which configures picoJava-I1 to enable page-based and reference-based write barriers, the gc-not i f y traps under the following conditions: (A) when a black object references a white object, ( B ) when a red object references a non red object, ( C )when an object allocated in a scoped region references an object allocated in another region, (D)when an object from the heap or from persistent memory references a scoped object, or (E) when an object from the heap (persistent memory) references an object from persistent memory (the heap). -0 different mechanisms detect the above conditions: (i) conditions A and B are detected by reference-based write ~
"Note that is not needed accessing the region stack when the reference is from the heap or from persistent memory.
391
GC-TAG .
,. ...,..... ................ ...
.....,. ...
...... .... ..
!:. I. i. .....,......... I ObjectReference i ........ .............. .. .. ......... , ~
OC,TAO
.............................
...............,
i..... O i........................................................ 0; StoreData 1
"
\ gc-notify trap
0
a. Reference-based mechanism. if ( PSR.GCE = 1 )then gcindex , S I Z E - H < 29 : 19 >, CARMASK < 18 : 14 >, SIZE-L < 13 : 7 >, TYPE < 6 : 2 >, x < 1 >, and H < 0 >. We thus take six bits of the KVM S I Z E < 31 : 8 > field (i.e., the maximum size of the objects has thus been reduced from 16MBytes to 256KBytes). Note the small average size of Java objects in SPECjvm98 [ 113 applications (e.g. Jess 40Bytes, Db 31, Javac 36, MM 25, and Jack 31 ). We also reduce the KVM TYPE < 7 : 2 > field since 5 bits are sufficient (i.e., only 20 types are handled). These bits have been used as the GC-TAG < 31 : 30 >, and the C A R ~ S K< 18 : 14 > fields of picoJava-I1 (i.e., these fields are used to store the color of the object and the embedding memory region). The KVM MARK-BIT that is used by the collector to mark the object is no longer used because objects are marked by color. Then, this bit is exploited to support the x bit of piCOJava-11. Finally, the KVM STATIC-BIT is now used to mean the H bit of piCOJava-11. Since the collector does not move objects, handles are sup-
EnableWriteBarrier: priv-read-psr spush Ox1000 seti Ox0000 ior pri v-wr it e-ps r ret
//The GCE bit is set
DisableWriteBarrier: priv-read-psr //The GCE bit is unset spush OxEFFF seti OxFFFF iand priv-write-psr ret
Figure 6. Enabling and disabling barriers. This implementation is efficient, but quite inflexible. We must configure the system to determine the virtual region memory map. In addition, our solution requires the size of a region to be a multiple of the car size, which may possibly introduce internal fragmentation. Finally, for a VTMemory scoped region that can change its size up to its marimum-
393
ConfigureWriteBarrier: spush OxlOBO seti OxOOlF pr iv-writ e-gc-config goto EnableWriteBarrier
References
//Reference-based //Page-based
K. Ali. A Simple Generational Real-Time Garbage Collection Scheme. Computing Paradigms and Computational Intelligence (New Generation Computing), 16(2):201-22 1, December 1998. H. Baker. The Treadmill: Real-Time Garbage Collection without Motion Sickness. In Proc. of the Workshop on Garbage Collection in Object-Oriented Systeims. OOPSLA‘91, 1991. Also appears as SIGPLAN Notices 27(3), pages 66-70, March 1992. D. Cannarozzi, M. Plezbert, and R. Cytron. Contaminated Garbage Collection. In Proc. of the Conference on Programming Languages Design and Implementation (PLD!), volume 35, pages 264-273. ACM SIGPLAN, May 2000. G. Bollella and J. Gosling. The Real-Time Specification for Java. IEEE Computer, June 2000. D. Gay and A. Aiken. Memory Management with Explicit Regions. In Proc. of the Conference of Programming Language Design and Implementation (PLDI), pages 3 13-323. ACM SIGPLAN, June 1998. D. Gay and B. Steensgaard. Stack Allocating Objects i n Java. Technical report, Research Microsoft, 1998. In preparation. J Consortium, Inc. Core Real-Time Extensions for the Java Platform. Technical report, NewMonics, Inc, August 1999. J. Kim and Y. Hsu. Memory System Behavior of Java Programs: Methodology and Analysis. In Proc. of the ACM Java Grande 2000 Conference, June 1000. M.T. Higuera, V. Issamy, M. Banfitre, G. Cabillic, J.P. Lesot, and E Parain. Java Embedded Real-Time Systems: An Overview of Existing Solutions. In Proc. oflthe International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC), pages 391-399. E E E , March 2000. A. Petit-Bianco and T. Tromey. Garbage Collection for Java in Embedded Systems. In Proc. of IEEE Workshop on Programming Languagesf o r Real-Time Industrial Apl,ications., pages 59-67, December 1998. Standard Performance Evaluation Council. SPEC JVM98 benchmarks. Technical report, ., 1998. http://www.spec.org/osg/jvm98. Sun Microsystems. picoJava-II I’rogram-mer’s Reference Manual. Technical report. http://www.sun.com/microelectronics/picoJava, March 1999. Sun Microsystems. KVM Technical Specification. Technical report, Java Community Process, May 7000. The Real-Time for Java Expert Group. Real-Time Specification for Java. Technical report, RTJEG, June 2000. http://www.rtj.org. P. Wilson and M. Johnstone. Real-Time Non-Copying Garbage Collection. In ACM OOPSL4 Work.rhop on Garbage Collection and Memory Manageeent, September 1993. B. Zom. Barrier Methods for Garbage Collection. Technical report, Department of Computer Science, University of Colorado at Boulder, CU-CS-494-90, November 1990.
//Set the GCE bit
Figure 7. Configuring write barriers.
Size, the additional memory must be assigned in terms. of cars. This problem can be unpractical for classes dealing with U 0 mapped memory (e.g., ScopedPhysicalMemory), which specify in their constructornot only the size of the region, but also the base address. Another problem with this solution is that it omits write barriers in native code, which may be addressed using either of the two followingsolutions: (2) forcing the native code to register their writes explicitly, or (ii) using virtual memory protection to detect and register changes. The latter solution needs further investigation because it is not trivial to combine real-time bounded collection with barriers supported in the MMU. In conclusion, this is an alternative implementation to memory regions that is less flexible but more efficient than the proposed software-based one. In general, the use of either of the two introduced solutions will depend upon the behavior of the target application.
5. Conclusion This paper has presented solutions for improvingthe performance of memory management in the RTSJ, hence addressing performance improvement of both garbage collection and region management. Our proposal builds upon existing work since the area of memory management in general, and of garbage collection in particular, has for long been deserving a great deal of attention in the programming language and system communities. The contribution of our work comes from the adaptation and integration of relevant solutions, in the context of the RTSJ, based on the analysis of the parameters that are the most influential in memory management performance. In addition, we have discussed the implementation of the resulting memory management solution within the KVM. We are currently finalizing our implementation and our next study will be on assessing the performance of our memory management system. Our solutions for improving performance of memory management partly addresses the use of hardware aid by exploitingexisting hardware support for Java. In general, our study should be complemented with work on improvingmemory management performance at the hardware level consideringboth hardware aid and the features of the underlying processor (e.g., impact of garbage collection upon cache management [8]). 394