Adding Hardware Support to the HotSpot Virtual Machine for Domain Specific Applications Yajun Ha½ , Radovan Hipik ¾ , Serge Vernalde ½ , Diederik Verkest ½ , Marc Engels½ , Rudy Lauwereins ½ , and Hugo De Man ½ ½
IMEC, Kapeldreef 75, Leuven 3001, Belgium,
[email protected] ¾ Department of Microelectronics FEI STU, Ilkoviova 3, 81219 Bratislava, Slovakia
Abstract. Like real general-purpose processors, Java Virtual Machines (JVMs) need hardware acceleration for computationally intensive applications. JVMs however require that platform independence can be maintained while resorting to hardware acceleration. To this end, we invented a scheme to seamlessly add hardware support to Sun’s HotSpot JVM. By means of run-time profiling, we select the most heavily used Java methods for execution in Field Programmable Gate Arrays (FPGA) hardware. Methods running in hardware are designed at compiletime, but the bitstreams are generated at run-time to guarantee platform independence. If no method improves the performance by running in hardware, all Java methods still can run in software with trivial run-time overheads. We have implemented this hardware supported JVM. The results show that hardware acceleration for JVMs can be achieved while maintaining platform independence for domain specific applications.
1 Introduction Java applications doing high-throughput streaming computations require hardware acceleration while maintaining traditional Java platform independent features [1]. Different approaches have been explored in the previous researches. But they are constrained by either the limited parallelism that they can explore or the ignorance of the platform independence issues. We invented a scheme to seamlessly add hardware support to the current HotSpot JVM, while maintaining the platform independence at the same time. We call this the Hard-HotSpot JVM in the remainder of this text. By doing runtime profiling, we use the previously adaptively compiled Java methods as candidates, and select most heavily used Java methods from these candidates to run in FPGA hardware. The bitstreams for those methods are generated at run-time. As a result, we end up with a hardware supported HotSpot JVM, where large portions of bytecodes are executed in the interpreted mode, small portions of bytecodes are executed in the compilation mode, and even smaller portions of heavily used bytecodes are executed in the hardware mode. also Ph.D. student at Katholieke Universiteit Leuven also Professor at Vrije Universiteit Brussel also Professor at Katholieke Universiteit Leuven
The hardware design for methods in the hardware mode are done at compile-time, but the generation of the bitstreams is delayed until runtime, which guarantees that the same design can be executed on different platforms. This paper is organized into the following sections. In section 2, we give the overview of the Hard-HotSpot JVM. Section 3 explains the run-time bitstream generation. Finally in section 4, experimental results are given for an application running on the HardHotSpot JVM.
2 The Hard-HotSpot JVM This section gives an overview of the Java method classification, the programming model and the execution sequence of the Hard-HotSpot JVM.
...... method_a () { boolean is_to_hw; if(is_to_hw) interface calls to HW else { SW algorithm A code } } HW_method_a () { JBits codes for SW algorithm A } ......
= FALSE; while Execute the software code; run-time profiling; find ; if generate bitstream; configure programmable HW; ; end
(a)
(b)
Fig. 1. (a) Example Java code that uses hardware support. (b)Sequence to execute Java codes in the hardware supported HotSpot VM.
Java methods running on the Hard-HotSpot JVM are classified into normal ones and the computationally intensive ones. This classification is done at compile-time with the help of profiling information. No extra work needs to be done for the normal methods. But for the computationally intensive methods, both software and hardware implementations should be provided. To maintain the platform independence feature for the hardware description, the hardware implementation of computationally intensive methods is captured in a platform independent way by using JBits API, which is a Java package that can be used to generate FPGA bitstreams at run-time. The Hard-HotSpot JVM uses a simple programming model. For normal methods, Java programming does not change in any way. But for the computationally intensive methods, some programming style rules should be followed. For example, in Fig. 1(a), is one of the computationally intensive candidates identified at designif , which contains the time, there should be a corresponding method
JBits codes that describe the hardware implementation of . The Hard-HotSpot VM will decide at run-time to execute or not. The Hard-HotSpot JVM will only execute it at run-time if it can gain by running method in is a run-time flag, which is set by the Hard-HotSpot JVM at runhardware. time to switch the control flow from software to hardware. The Hard-HotSpot JVM executes programs with the algorithm shown in Fig.1(b). It first runs the bytecodes in the interpretation mode, gives the run-time flag is to hw an initial value of FALSE, and executes the software implementation of the methods. At the same time, the Hard-HotSpot JVM does run-time profiling to identify the candidates to be dynamically compiled to native methods. Based on these native methods, the Hard-HotSpot JVM chooses the most computationally intensive native methods as the candidates to be migrated to HW, and evaluates their SW execution time and HW execution time. If the evaluation shows that HW will gain and all the other run-time conditions are met, the bitstream will be generated from the JBits code. The bitstream is then used to configure the programmable HW. Once the configuration is done, the run-time flag is to hw is turned to TRUE by the virtual machine. From then on, the method will be executed in hardware. Runtime profiling will continue, and dynamically choose the methods that should be executed in hardware.
3 Run-Time Bitstream Generation
Logic Core Lib
Core 1
Core 2
Core 3
Core 4
Physical Core Lib
Control HVM
FPGA Bitstream
Compile Time
Run Time
Fig. 2. Runtime bitstream generation.
Run-time bitstream generation is a key step to ensure the platform independence. The Hard-HotSpot JVM implements the run-time bitstream generation in a framework shown in Fig.2. In the framework, various logic cores and physical cores are implemented. The logic cores capture the functionalities of cores like multipliers and FIR filters, while the physical cores give the real platform specific implementation of their corresponding logic cores. The logic cores are developed to give the designers a high level means to capture architectures at compile-time, while the physical cores are developed to give the Hard-HotSpot JVM the executable Java codes to generate bitstreams at run-time. Running Java codes to generate bitstreams is enabled by the JBits Java API developed at Xilinx. It also provides the run-time routing support.
4 Experimental Results We implemented the Hard-HotSpot JVM based on Sun’s HotSpot 1.3.1 community source Linux version. It is a multi-thread C++ implementation. Apart from the existing threads, we added a separate thread HWCompilerThread to handle all management work that is related to the hardware support. The Hard-HotSpot VM used the JBits version 2.8 package. To check the pure overheads of the run-time evaluation to the JVM, part of the benchmarks of size in Java Grande Forum Benchmark Suit[2] have been run on the platform. In this experiment, overhead time is intentionally set to a very high value, thus almost no method will be chosen to run in hardware, and pure run-time evaluation overheads are measured. Figures are shown in table 1, and they prove that the overheads for the run-time decision of the hardware methods are trivial. This result is important, since it shows even in the worst case, all methods can still run in software with low overheads.
SeriesA LUFactA SORA HeapSortA CryptA FFTA SparseA HotSpot (s) 60.714 4.422 23.753 5.327 14.027 43.164 25.816 Hard-HotSpot (s) 61.268 4.623 24.078 5.419 14.713 43.335 26.152 Table 1. Pure run-time evaluation overheads for the Java Grande Forum Benchmark.
A grayscale video player applet that converts color video clips to grayscale has also been tested on the Hard-HotSpot JVM. The clock for the bitstream of grayscale filter is simulated with BoardScope to be 98MHZ. The performance figures for the grayscale filter running on the Hard-HotSpot JVM show that a gain of about 11 seconds have been obtained when the grayscale filter runs in HW for a piece of 100 seconds video clip. Bitstream generation takes up the major overheads, and it will be helpful to further reduce this overhead.
5 Conclusions and Acknowledgments We implemented a new JVM, that enables hardware acceleration and platform independence for Java methods at the same time. As an important feature, the new JVM does not degrade the original JVM performance. Because even in the worst case, all Java methods can still run in software. Experimental results show that the run-time evaluation overheads are trivial. The authors would like to thank Tim Price and Xilinx Labs for providing the Grayscale filter JBits core and allowing us to use it in this paper.
References 1. Y. Ha, S.Vernalde, P. Schaumont, M. Engels, R. Lauwereins and H. De Man: Building a virtual framework for networked reconfigurable hardware and software objects. The Journal of Supercomputing. 21 (2002) 131–144 2. Java Grande Forum Benchmark Suit. (http://www.epcc.ed.ac.uk/javagrande)