Region-based Memory Management for Real-time Java
Teresa Higuera, Val´erie Issarny I NRIA-Rocquencourt, Domaine de Voluceau, BP 105, 78153 Le Chesnay C´edex, France Email:
[email protected]
Michel Banˆatre, Gilbert Cabillic, Jean-Philippe Lesot, Fr´ed´eric Parain I NRIA -I RISA, Campus de Beaulieu 35032 Rennes C´edex, France
Abstract
This paper focuses on one such extension that we are developing. This relates to making Java garbage collection real-time while accounting for relevant Java specifications 2 , i.e., the Real-Time Specification for Java (RTSJ) from the RTJEG (The Real-Time for Java Expert Group) [4] and the K-Virtual Machine 3 (KVM) targeting small-memory, limited-resource, network connected devices. Implicit garbage collection has always been recognized as a beneficial support from the standpoint of promoting the development of robust programs. However, this comes along with overhead regarding both execution time and memory consumption, which makes (implicit) garbage collection poorly suited for small-sized embedded real-time systems. However, there has been extensive research work in the area of making garbage collection compliant with real-time requirements. Main results relate to offering garbage collection techniques that enable bounding the latency caused by the execution of garbage collection. Proposed solutions may roughly be classified into two categories:
This paper addresses the issue of improving the performance of memory management for real-time Java applications, building upon the Real-Time Specification for Java (RTSJ) from the Real-Time Java Expert Group. In a first step, a thorough analysis of the parameters influencing the performance of memory management together with ways of improvement are presented. The implementation of a memory management solution compliant with the RTSJ and integrating the proposed improvements is then sketched. Keywords: Java, Real-Time, Embedded, Garbage Collection, Memory Regions, Write Barriers, Performance.
1. Introduction The use of wireless Personal Digital Assistant (PDA) devices is foreseen to outrun the one of PCs in the near future. However, for this to actually happen, there is still the need to devise adequate software and hardware platforms. The use of PDAs should be as convenient as the one of PCs and in particular must not overly restrict the applications that are supported. In general, the environment must accommodate the embedded small-scale constraints associated with PDAs, and enable the execution of the applications traditionally supported on the desktop such as soft real-time multimedia applications that are becoming increasingly popular. In the context of the activities of the Solidor group at I NRIA, we are currently developing a Java-based software environment accounting for the above requirements 1 . Although, Java has some shortcomings regarding the target devices, these shall be solved in the near future in the light of ongoing work on extending Java to meet the requirements appertained to embedded real-time software [9].
(1) Incremental garbage collection (e.g., [2]) enables the interleaved execution of the garbage collector with the application. Since this allows the application to execute while the Garbage Collector (GC) has been launched, a mechanism, called barrier, is used to keep the state of the GC consistent by coordinating the execution of the GC and of the application. For instance, write barriers detect when the application updates pointers. (2) Region-based memory allocation (e.g., [5]) enables grouping related objects within a region. Commonly, regions are used explicitly in the program code as shown in Figure 1. The proposed code shows a realtime thread, which allocates an array of 10 integers in 2 http://www.rtj.org and http://www.j-consortium.org/rtjwg.html 3 http://java.sun.com/products/cldc/wp/
This work has been partially funded by Texas Instruments. 1 http://www.irisa.fr/solidor/work/scratchy.html
1
the heap, and another of 20 integers in the memory region called myRegion. import javax.realtime; class Allocator implements Runnable { public void run() { HeapMemory.instance().newArray(Integer, 10); int[] x = new int[20]; } } class RegionUseExample { public static void main (String[] args) { ScopedMemory myRegion = new VTMemory(1024, 2*1024); RealtimeThread task = new RealtimeThread( null, null, new MemoryParameters(1024, 0), myRegion, null, new Allocator()); task.start(); } }
Figure 1. Using memory regions in RTSJ. Note that the two above collection strategies are complementary4 : incremental garbage collection may be used within some regions in order to limit their size, while the use of regions allows reducing the runtime overhead due to garbage collection. Application of the above strategies has been studied in the context of Java, which is in particular highlighted by the RTSJ. The RTSJ introduces memory regions and allows the implementation of real-time compliant garbage collectors to be run within regions except within those associated with hard timing constraints. The extensive research work in the area of garbage collection gives us with sound base ground to tackle the issue of devising an efficient implicit memory reclamation strategy that is compliant with the RTSJ and hence combines incremental GC and memory regions. We specifically consider the following incremental GC: the non-copying collector called treadmill. The basic algorithm is as follows: an object is colored white when not reached by the GC, black when reached, and grey when it has been reached, but its descendants may not be (i.e., they are white). Grey objects make a wavefront, separating the white (unreached) from the black (reached) objects, and the application must preserve the invariant that no black objects have a pointer to a white object, which is achieved using read barriers in [2]. The collection is completed when there are no more grey objects, and all the white objects can then be recycled. To 4 We have omitted the generational garbage collection strategy, which enables minimizing the overhead caused by garbage collection. However, this may be seen as a special case of the region-based scheme although objects may move among regions. Under generational collection, objects get grouped within regions according to their age.
coordinate the application and the GC we use write barriers instead of read barriers [15]. This decision is motivated because write barriers are more efficient than read barrier, allows cooperation with our region barrier implementation, and the resulting collector can further be easily extended to generational, distributed, and parallel collection. In the context of RTSJ, this paper proposes such a study, focusing on minimizing the execution time overhead caused by write barriers in implicit memory reclamation. The area of minimizing memory consumption (e.g., dealing with memory fragmentation) is as crucial but is ignored here. A thorough analysis of the parameters influencing the performance of write barriers is presented regarding the management of memory regions (Section 2). A solution improving the write barrier performance of both memory regions and the GC is then given (Section 3). Results of this analysis are further exploited to derive a memory management solution for a Java environment aimed at wireless PDAs, which we are experimenting through adaptation of the KVM (Section 4). Finally, a summary of our contribution together with an overview of our ongoing and future research work towards offering an overall memory management solution for next-generation wireless PDAs conclude this paper (Section 5).
2. Analyzing the Performance of Region Management From a real-time perspective, the GC introduces unpredictable pauses that are not tolerated by real-time task. Real-time collectors eliminate this problem but introduce a high overhead. This must not lead to undertake the unsafe primitive solution that consists in letting the application programmer to explicit deal with memory reclamation. An intermediate approach is to use memory regions within which allocation and deallocation are customized, and also space locality is improved. Each memory region is then managed so as to embed objects that are related regarding associated lifetime and real-time requirements. Such a facility is supported by the RTSJ through the three following kinds of regions [4]: (i) immortal memory that contains objects whose life ends only when the JVM terminates; (ii) (nested) scoped memory that enables grouping objects having well-defined lifetimes and that may either offer temporal guarantees or not on the time taken to create objects; and (iii) the conventional heap. The immortal memory region is never subject to garbage collection and may be exploited by hard real-time tasks. Scoped memory regions may or may not be subject to internal real-time garbage collection depending on their temporal properties. A scoped region gets collected as a whole once it is no longer used. Garbage collection within the heap relies on the (real-time) GC of the JVM. The RTSJ imposes strict rules on assignments to or
from regions. The JVM must detect illegal assignments and throw an exception when they occur. Since objects allocated in regions may contain references to objects in the heap, the GC must scan regions for references to objects within the heap. In general, the management of memory regions introduces overhead, which we characterize in the following subsections.
2.1. Region Management Overhead From a real-time perspective, regions give predictable performance, since the cost of every allocation operation is easily bounded. The overall cost introduced by region management is given by the cost associated with: region allocation, object allocation, reference counter updates, and region deletion. The time cost to allocate a new region is always constant. The region implementation given in [5] presents a time overhead that is constant per instruction executed. In RTSJ, the ScopedMemory abstract class has three subclasses: VTMemory, LTMemory, and ScopedPhysicalMemory. Since objects allocated in these memory regions are not garbage collected5 it is safe to associate this subclasses with a NoHeapRealtimeThread. Assuming that objects in a ScopedMemory region are not subject to garbage collection and may not be moved, the time to allocate an object is proportional to the object size, and in the worst case may include time to acquire additional memory for the region. Whereas an allocation in a VTMemory region may take variable time, the time taken in a LTMemory region is linear to the object size. Then, the memory space for a LTMemory region must be continuous 6, and its size is further specified upon creation, remaining fixed over its lifetime. Then, an instance of VTMemory is created with an initial size and may grow up to a given maximum size. The ScopedPhisicalMemory subclass support a region with special memory attributes (e.g. ALIGNED, BYTESWAP, DMA, and SHARED). Scoped regions can be nested. A safe region implementation requires that a region gets deleted only if there is no external reference to it. This problem has been solved by using a reference-counter for each region. Then, every scoped region is associated with a reference counter that keeps track of the use of the region by threads, and a simple reference-counting GC collects scoped memory regions when their counter reaches 0. The reference-counter is increased when entering a new scoped through the enter() method, the creation of a real-time thread with a scoped region, or the opening of an inner scope. It is decreased when returning from the enter() method, when the real-time 5 This
is always the case for objects within an instance of LTMemory but this is not mandatory for objects within an instance of VTMemory. 6 Typically, the ScopedMemory implementation is made by using malloc() and free() routines to manipulate memory.
thread using the scoped region exits, or when an inner scope returns from its enter() method. Note that by collecting regions, problems associated with reference counting collectors are solved: the space to store reference counters is minimal, and cyclic structures can be collected because they are within the same region. Before cleaning a region, the finalize() method of all the objects in the region must be executed, and it cannot be reused until all the finalizers execute to completion.
2.2. Region Barrier Implementation The life of objects allocated in scoped regions is governed by the control flow. To maintain the safety of Java and avoid dangling references, objects within a scoped region can only be referenced by objects from the same region or within an inner region, and objects within immortal regions or within the heap cannot reference an object allocated in a scoped region. The above must be checked when executing instructions that store references within other objects (or arrays), which can be implemented by a region stack associated with each memory region, similarly to the contaminated collection solution [3]. We detail below how to support such a functionality:
When an object is created, it is associated with the scope of the active region. The putfield (aputfield quick) instruction causes the object X to reference Y, whereas the aastore (aastore quick) instruction stores a reference Y into an array X of references. Then, the scope of X must be inner than the scope of Y. This check can be made by using the region stack, from the scope of X (the active region) down to the scope of Y (an outer region). If the scope of Y is not found in the stack (i.e., the heap that is the outest region and hence at the bottom of the stack is reached), this is notified by throwing an exception. The putstatic (aputstatic quick) instruction causes the scope of the referenced object to be the outermost region (i.e., the bottom of the stack). Then, checking the stack is not needed. When removing a region, the top of the region stack is adjusted, and it is sure that there is no object dependent on an older scoped region. As an exception, the NoHeapRealtimeThread tasks must not interfere with the collector, and cannot access any pointer within the heap. This is dealt with by not including the heap scope in the region stack assotiated to these tasks.
2.3. Interaction with the GC Since objects allocated within regions may contain references to objects in the heap, the GC must scan regions for references to objects within the heap. Thus, the collector must take into account the external references, adding them to its reachability graph. To facilitate this task, each object allocated outside the heap is colored black. In this way, a reference from an object allocated in a region (i.e., black) to an object in the heap that is still not reached (i.e., white) is treated as a write barrier (i.e., the white object is greyed so as to be reached by the GC). In addition, as NoHeapRealtimeThread tasks must not interfere with the collector, and cannot access any object within the heap; we introduce a fourth color (e.g., red) meaning that the object cannot reference objects within the heap. A reference from a red object allocated for a critical task to another object allocated in the heap (i.e., white, black, or grey) causes a MemoryAccessError exception. Then, the performance of our collector is impacted by the write barrier overhead introduced by the cooperation with memory regions 7 . We thus add the getWriteBarrierOverhead() method to the MemoryArea abstract class, which further serves identifying region barrier overhead (see x 2.4). Note that for write barrier-based collectors (e.g., incremental or generational collector), this method gives the write barrier overhead caused by the GC.
2.4
Write Barrier Overhead
The cost of maintaining inter-region references is considered as a fraction of the total program execution time (without including the garbage collection costs). To estimate the time overhead of different implementations, two measures are combined: (i) the number of events that occur during the execution of a program, and (ii) the measured cost of the event. Then, the inter-region overhead is given by dividing the application execution time with the number of events and the cost per event. Experimental measures indicate that in Lisp programs the references to the heap (as opposed to the runtime stack) account for an average of 12% of all executed instructions [16]. However, all the objects created in Java are allocated in the heap, only primitive types are allocated in the runtime stack [6]. In most applications of the SPECjvm98 benchmark suite8 , less than half of the references are to the heap memory (i.e., 45%), the other half is to either the Java or the C stack (see Table 1), and about 35% of total executed instructions are memory references [8], where typically 70% are load operations and 30% store operations. Then, 0:05% (i.e., 0:45 0:35 0:30) of in7 Note
that this overhead would be introduced by any GC (e.g., reference-counting, mark-and-sweep, incremental, or generational). 8 http://www.spec.org/osg/jvm98
structions executed by a Java application is a write into the heap or another memory region. Thus we estimate the interregion reference overhead as 0:05 writeBarrierRegion, where the writeBarrierRegion parameter is the average number of instructions executed by the algorithm described in x 2.2. In the same way, we calculate the write barrier cost introduced by the GC (i.e., 0:05 writeBarrierC ollector). But, in this case, the writeBarrierC ollector parameter is the average cost to preserve the three colors invariant (e.g., test if a white object is referenced by a black object, then grey the referenced object and link it to the greyList). Since in our system the GC coexists with memory regions, some additional overhead is introduced, because a test to check whether both objects are allocated into the heap must be added9 . Executed Instructions
JESS DB JAVAC MTRT JACK
9 168 106 712 106 7 717 106 3 917 106 6 553 106
Data References
1 798 106 3 211 1066 2 515 10 1 129 106 2 014 106
% of References Into the Heap
39 40 45 61 28 70 50 97 50 74 : : : : :
Table 1. Memory reference characteristics. The most common approach to implement write barriers is by in line code, consisting in generating the instructions executing write barrier events with every store operation. This solution requires compiler cooperation (e.g., JIT), and presents a serious drawback because it will double the application’s size. Regarding systems with limited memory such as PDAs, this code expansion overhead is considered prohibitive. Alternatively, we can instrument the bytecode interpreter, avoiding space problems, but this still requires a complementary solution to handle native code.
3. Minimizing the Write Barrier Overhead The detection of non-allowed references requires checking all stores, which may lead to a quite substantial time overhead (i.e., in the order of 20%). A solution minimizing this overhead consists in improving the write barrier performance by using hardware support such as the picoJavaII microprocessor10 , which allows performing write barrier checks in parallel with the store operation. This solution is presented hereafter. 9 Note
that we can calculate the write barrier overhead on different architectures by measuring the average number of instructions executed by the write barrier event on each architecture. 10 http://www.sun.com/microelectronics/picoJava
3.1
ObjectReference
Using Hardware Support 29
Upon each instruction execution, the picoJava-II core checks for conditions that cause a trap. From the standpoint of hardware support for garbage collection, the core of this microprocessor checks for the occurrence of write barriers, and notifies them using the gc notify trap. This trap is triggered under certain conditions when a reference field of some object is assigned some new reference (i.e., when executing putfield, putstatic, aputfield quick, aputstatic quick, aastore, or aastore quick bytecodes). The conditions under which this trap is generated are governed by the values of the PSR and the GC CONFIG registers. If the GCE bit of the PSR register is set, then write barriers are enable. Hence to disable them it suffices to unset this bit. The GC CONFIG register governs two types of write-barrier mechanism: page-based and reference-based. We can use both mechanisms simultaneously as required our collector. Also, we can disable either or both of the mechanisms if we do not want to use them. The configuration of this register is summarized in Table 2. Bits 31:21
Field
Type
REGION MASK
RW
20:16
CAR MASK
RW
15:0
WB VECTOR
RW
(Write Barrier Vector)
Description It allows knowing if both, the reference and the stored data belong to the same page. It allows knowing if both, the reference and the stored data belong to the same car. If the corresponding bit is set, then, the above bytecodes signal a gc notify trap.
Table 2. Garbage Collector Register (GC CONFIG). The page-based barrier mechanism of picoJavaII (see Figure 2) was designed specifically to assist generational collectors. However, we can use this mechanism to detect references across different regions. For example, if in the GC CONFIG register we initialize the REGION MASK field (< 31 : 21 > bits) as %00000000000, and the CAR MASK field (< 20 : 16 > bits) as %11111, we divide the memory address space in 32 regions, each one divided in 16K Bytes cars (see Figure 3). If we choose a %11110 value for the CAR MASK, then we have 16 regions, and a car size of 32K Bytes. The reference-based write barriers of picoJavaII (see Figure 4) can be used to implement incremental collectors. An incremental collector traps when a white object is written into a black object (e.g., the GC TAG field for black objects is set to %11 and is set to %00 for white objects). In order to use this hardware mechanism, we adapt our algorithm as follows: (i) in the header object, the < 31 : 30 > bits give the color of the object and the < 18 : 14 > bits
XOR
19 18
14
0
XOR
StoreData 29
19 18
14
0
GC_CONFIG CAR_ REGION_MASK MASK
AND
31
21 20
16
AND 15
0
=
AND
00000000000 PSR.GCE
00000
AND
gc_notify trap
a. Page-based mechanism.
=
if ( PSR.GCE 1 ) AND (( ObjectReference & GC CONFIG ) ( StoreData & GC CONFIG )) AND (( ObjectReference & GC CONFIG ) ( StoreData & GC CONFIG )) then gc notify trap
=
b. Page-based pseudocode. Figure 2. Page-based write barriers. give the memory area in which the object is allocated 11 , and (ii) an associated exception handler determines whether to execute the algorithm described in x 2.2 to detect erroneous inter-region references, and whether to execute actions preserving the three colors invariant of the GC. The only overhead is the handling of the exception trap. However, this solution is very costly, due to the high costs of operating system traps.
3.2
Improving Write Barrier Performance
Regarding the proposed solution, which configures picoJava-II to enable page-based and reference-based write barriers, the gc notify traps under the following conditions: (A) when a black object references a white object, (B) when a red object references a non red object, (C) when an object allocated in a scoped region references an object allocated in another region, (D) when an object from the heap or from persistent memory references a scoped object, or (E) when an object from the heap (persistent memory) references an object from persistent memory (the heap). Two different mechanisms detect the above conditions: (i) conditions A and B are detected by reference-based write 11 Note that is not needed accessing the region stack when the reference is from the heap or from persistent memory.
$0000 0000 : $0000 3FFF $0000 4000 : $0007 FFFF $0008 0000 : $0008 3FFF $0008 4000 : $000F FFFF $0010:0000 $0010 4000 $0010 3FFF : $0010 7FFF $0018:0000
: $3FFF FFFF $4000 0000 :
16 KBytes of HEAP 31 regions of 16 KBytes 16 KBytes of HEAP 31 regions of 16 KBytes 16 KBytes of HEAP 31 regions of 16 KBytes
GC_TAG
1 1
GC_TAG
0 0
ObjectReference
31 30
StoreData
31 30
1 1 0 0
GC_CONFIG.WB_VECTOR
0001000011010000 15
0
gc_notify trap
a. Reference-based mechanism.
= =
if ( PSR.GCE 1 ) then gc index < (ObjectReference > gc index) j 0x00000001 if (write barrier bit 1) then gc notify trap UNUSED
$FFFF FFFF
Figure 3. A memory map for memory regions.
barriers, whereas (ii) conditions C, D, and E are detected by page-based write barriers. Since we must treat each condition in a different way, it is pretty interesting to make distinction by hardware whether this trap has been caused by a reference-based condition or a paged-based condition 12. In order to improve the performance of real-time tasks, the treatment of the condition dealing with memory regions has been prioritized (i.e., condition B) as shown in Figure 5. Then, we establish three main priority levels, where level 3 is the highest: 1. the reference-based write barrier trap is triggered, 2. the page-based write barrier trap is triggered, and
= =
2
b. Reference-based pseudocode. Figure 4. Reference-based write barriers.
to see if the referenced object is in an outer scoped region. The routine of priority 2 treats condition C (i.e., explores the region stack), and condition D (i.e., throws the MemoryAccessError exception); condition E is allowed. Finally, the routine associated which priority 1 treats condition A as a classical write barrier for incremental collectors (i.e., greying the white object). Note that this condition is also treated by the routine of priority 3 in a different way. Whereas in this latter case, both objects are allocated into the heap, in the former, the reference is outside the heap. priority 3: if A goto actualizeExternalReferences if B.1 goto memoryAccesError goto exploreRegionStack //B.2 condition
3. both traps are triggered. The exception with priority 3 treats condition A (i.e., the object is black), and condition B (i.e., the object is red) which is divided in two subconditions: (B.1) the referenced object is allocated in the heap, and (B.2) the referenced object is allocated outside the heap. Condition A is treated similarly as intergenerational pointers in a generational collector [1]. The object which causes the reference is not into the heap and the referenced object must be reachable by the GC. Then, the referenced object is taken as a root by introducing it in a externalReferences list. The B subconditions are treated as follows: (B.1) throws the MemoryAccessError exception, and (B.2) explores the region stack 12 Actually, the hardware support of picoJava-II does not make distinction, throwing the gc notify for both, reference-based and page-based, write barriers.
priority 2: if C goto exploreRegionStack if D goto memoryAccesError rte //E condition priority 1: goto actualizeGreyObjectSet
//A condition
Figure 5. Treating write barrier exceptions. The above solution improves the memory management performance, because it minimizes the cost of intra-region references, that has been reduced to the write barrier cost introduced by the GC algorithm in the heap, and to zero for the other memory regions. Note further that intra-region references are much more frequent than inter-region references.
4. Implementation Issues This section discusses the implementation of a memory management strategy for a Java environment aimed at wireless PDAs. Regarding specifically the offered memory management, it builds upon the RTSJ and the KVM, and integrates the aforementioned solutions for improving the performance of write barriers in garbage collection and region management.
4.1
Integration within the KVM
We are currently implementing the proposed memory management solution within the KVM in a way compliant to the RTSJ. The RTSJ defines the GarbageCollector abstract class, which has been specialized through an IncrementalGarbageCollector subclass. We have implemented such a class within the KVM by modifying some files of the interpreter to support our real-time GC (i.e., garbage.c to implement the collector algorithm and interpreter.c to implement the write barriers, as well native.h and nativeCore.c which support the interface for the Java native methods). The incremental collector can be introduced in the system by using the Run-Time Type Identification (RTTI) that the Class class offers, (e.g., GarbageCollector gc = (GarbageCollector) Class.forName(’IncrementalGarbageCollector’).newInstance();).
As discussed in the previous section, a significant source of performance improvement for memory management is to exploit hardware aid. We are thus implementing memory management so that it can be run over picoJava-II. In order to make our memory management implementation compliant with this microprocessor, we have modified the object header tag of the KVM as follows: GC TAG < 31 : 30 >, SIZE H < 29 : 19 >, CAR MASK < 18 : 14 >, SIZE L < 13 : 7 >, TYPE < 6 : 2 >, X < 1 >, and H < 0 >. We thus take six bits of the KVM SIZE < 31 : 8 > field (i.e., the maximum size of the objects has thus been reduced from 16M Bytes to 256K Bytes). Note the small average size of Java objects in SPECjvm98 [11] applications (e.g. Jess 40Bytes, Db 31, Javac 36, Mtrt 25, and Jack 31 ). We also reduce the KVM TYPE < 7 : 2 > field since 5 bits are sufficient (i.e., only 20 types are handled). These bits have been used as the GC TAG < 31 : 30 >, and the CAR MASK < 18 : 14 > fields of picoJava-II (i.e., these fields are used to store the color of the object and the embedding memory region). The KVM MARK BIT that is used by the collector to mark the object is no longer used because objects are marked by color. Then, this bit is exploited to support the X bit of picoJava-II. Finally, the KVM STATIC BIT is now used to mean the H bit of picoJava-II. Since the collector does not move objects, handles are sup-
pressed. This strategy increases the performance of both, the application and the collector. Then, the H bit is fixed to 0, and the core of picoJava-II accesses the object starting a word after the handle. Eliminating handles improves also memory consumption by a word per object. Given the small average size of Java objects the space overhead can be reduced to 12% of the total dynamic memory space. (i.e., 4 33+4 100, where 4 and 33 are respectively the handle size and the average size of Java objects in bytes). Since objects in memory regions are not moved, this strategy is allways possible even if the objects in the heap are accessed through a handle.
4.2
Additional Considerations
Our solution requires to configure write barriers in picoJava-II. Notice that if the GCE bit of the PSR register is set, write barriers are enable. The instruction set of picoJava-II provides extended bytecodes allowing access to the PSR and the GC CONFIG registers (i.e., priv read psr, priv write psr, priv read gc config, and priv write gc config). The routines given in Figure 6 allow enabling and disabling write barriers. And the Figure 7 shows how reference-based and page-based write barriers can be enabled to have our desired configuration. In this example, we have chosen 32 regions with a car size of 16K Bytes, and we have established the following color codes: %11, %10, %01, and %00 mean black, grey, red, and white respectively. The partition of the heap in cars is transparent for the GC by using a mask (e.g., $F F E 0F F F F ). EnableWriteBarrier: priv read psr spush 0x1000 seti 0x0000 ior priv write psr ret
//The GCE bit is set
DisableWriteBarrier: priv read psr spush 0xEFFF //The GCE bit is unset seti 0xFFFF iand priv write psr ret
Figure 6. Enabling and disabling barriers. This implementation is efficient, but quite inflexible. We must configure the system to determine the virtual region memory map. In addition, our solution requires the size of a region to be a multiple of the car size, which may possibly introduce internal fragmentation. Finally, for a VTMemory scoped region that can change its size up to its maximum-
ConfigureWriteBarrier: spush 0x10B0 seti 0x001F priv write gc config goto EnableWriteBarrier
//Reference-based //Page-based //Set the GCE bit
Figure 7. Configuring write barriers. Size, the additional memory must be assigned in terms of cars. This problem can be unpractical for classes dealing with I/O mapped memory (e.g., ScopedPhysicalMemory), which specify in their constructor not only the size of the region, but also the base address. Another problem with this solution is that it omits write barriers in native code, which may be addressed using either of the two following solutions: (i) forcing the native code to register their writes explicitly, or (ii) using virtual memory protection to detect and register changes. The latter solution needs further investigation because it is not trivial to combine real-time bounded collection with barriers supported in the MMU. In conclusion, this is an alternative implementation to memory regions that is less flexible but more efficient than the proposed software-based one. In general, the use of either of the two introduced solutions will depend upon the behavior of the target application.
5. Conclusion This paper has presented solutions for improving the performance of memory management in the RTSJ, hence addressing performance improvement of both garbage collection and region management. Our proposal builds upon existing work since the area of memory management in general, and of garbage collection in particular, has for long been deserving a great deal of attention in the programming language and system communities. The contribution of our work comes from the adaptation and integration of relevant solutions, in the context of the RTSJ, based on the analysis of the parameters that are the most influential in memory management performance. In addition, we have discussed the implementation of the resulting memory management solution within the KVM. We are currently finalizing our implementation and our next study will be on assessing the performance of our memory management system. Our solutions for improving performance of memory management partly addresses the use of hardware aid by exploiting existing hardware support for Java. In general, our study should be complemented with work on improving memory management performance at the hardware level considering both hardware aid and the features of the underlying processor (e.g., impact of garbage collection upon cache management [8]).
References [1] K. Ali. A Simple Generational Real-Time Garbage Collection Scheme. Computing Paradigms and Computational Intelligence (New Generation Computing), 16(2):201–221, December 1998. [2] H. Baker. The Treadmill: Real-Time Garbage Collection without Motion Sickness. In Proc. of the Workshop on Garbage Collection in Object-Oriented Systems. OOPSLA’91, 1991. Also appears as SIGPLAN Notices 27(3), pages 66-70, March 1992. [3] D. Cannarozzi, M. Plezbert, and R. Cytron. Contaminated Garbage Collection. In Proc. of the Conference on Programming Languages Design and Implementation (PLDI), volume 35, pages 264–273. ACM SIGPLAN, May 2000. [4] G. Bollella and J. Gosling. The Real-Time Specification for Java. IEEE Computer, June 2000. [5] D. Gay and A. Aiken. Memory Management with Explicit Regions. In Proc. of the Conference of Programming Language Design and Implementation (PLDI), pages 313–323. ACM SIGPLAN, June 1998. [6] D. Gay and B. Steensgaard. Stack Allocating Objects in Java. Technical report, Research Microsoft, 1998. In preparation. [7] J Consortium, Inc. Core Real-Time Extensions for the Java Platform. Technical report, NewMonics, Inc, August 1999. [8] J. Kim and Y. Hsu. Memory System Behavior of Java Programs: Methodology and Analysis. In Proc. of the ACM Java Grande 2000 Conference, June 2000. [9] M.T. Higuera, V. Issarny, M. Banˆatre, G. Cabillic, J.P. Lesot, and F. Parain. Java Embedded Real-Time Systems: An Overview of Existing Solutions. In Proc. of Ithe International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC), pages 392–399. IEEE, March 2000. [10] A. Petit-Bianco and T. Tromey. Garbage Collection for Java in Embedded Systems. In Proc. of IEEE Workshop on Programming Languages for Real-Time Industrial Aplications., pages 59–67, December 1998. [11] Standard Performance Evaluation Council. SPEC JVM98 benchmarks. Technical report, , 1998. http://www.spec.org/osg/jvm98. [12] Sun Microsystems. picoJava-II Programmer’s Reference Manual. Technical report, http://www.sun.com/microelectronics/picoJava, March 1999. [13] Sun Microsystems. KVM Technical Specification. Technical report, Java Community Process, May 2000. [14] The Real-Time for Java Expert Group. Real-Time Specification for Java. Technical report, RTJEG, June 2000. http://www.rtj.org. [15] P. Wilson and M. Johnstone. Real-Time Non-Copying Garbage Collection. In ACM OOPSLA Workshop on Garbage Collection and Memory Manageeent, September 1993. [16] B. Zorn. Barrier Methods for Garbage Collection. Technical report, Department of Computer Science, University of Colorado at Boulder, CU-CS-494-90, November 1990.