Comparison of Factors Influencing Bytecode Usage in the Java Virtual Machine. John Waldrony, Charles Daly, David Gray and Jane Horgan Computer Applications, Dublin City University, Dublin 9, Ireland. Email: y
[email protected] WWW: y http://www.compapp.dcu.ie/jwaldron
Abstract The purpose of this paper is to look at factors which influence bytecode usage in the Java Virtual Machine by Java programs, such as the number of Application Programming Interface (API) methods executed, and the compiler that was used to generate the class files. The dynamic percentages of bytecode instructions and non-native method calls in both API and program methods used by a selection of different programs were studied to compare the way different applets and applications use the bytecodes. It was found that 74:1% of bytecodes executed and 67:1% of Java methods are in the API for the programs studied. This suggests a way to improve the speed of Java programs would be to compile the API methods to native instructions and save these on disk in a standard format, cutting the time spent compiling or interpreting bytecode. Non-API program methods only used 18:1 bytecodes on average when executed and Java API methods on average 23:8 bytecodes. One interesting factor to note is that more static methods were invoked than virtual methods by the non-API bytecodes. Remarkably very similar bytecode frequencies were obtained using the different compilers because many traditional optimisations performed by compilers cannot be used when converting a Java program to bytecode.
1. Introduction Java has grown in popularity since its release in 1996 and is particularly interesting because it uses a bytecode intermediate language to represent programs, so that the same program can be run unchanged on machines with different underlying instruction sets. Dynamic frequencies for the execution of the bytecodes in the instruction set by a test suite of real programs was recently measured (Waldron 1999) to compare the way different applets and applications use the bytecodes, and it was found that very similar patterns of usage appear in all cases. For the test suite studied most of the bytecodes were used at least once during execution. However a small subset of the bytecodes was executed with very high frequency; 40% of the instructions executed either pushed local variables or constants onto the operand stack, merely telling the useful instructions which operands to use. The purpose of this paper is to extend this work by studying the factors which influence overall bytecode usage, such as the number of Application Programming Interface API methods executed, and the compiler used to generate the class files. In Section 2 we discuss the Java Virtual Machine (JVM) bytecodes. Section 3 describes the suite of programs chosen to investigate usage of the bytecodes by real programs. Section 4 gives the bytecode frequencies over the five programs and compares usage between programs and between API and non-API methods
and Section 5 looks at the dynamic bytecode usages produced using different compilers. A summary of the results is given in Section 6.
2. Bytecode Frequency The dynamic frequency of an instruction is the number of times it is executed during a program run. The static bytecode frequency, which is the number of times a bytecode appears in a class file or program has been studied in (Antonioli and Pilz 1998) where a wide difference was found between the bytecodes appearing in different class files, and each class file used on average 25 different bytecodes. Java originated in 1991 as the OAK project aiming to produce a software development environment for small distributed embedded systems. The design involves compiling to a machine independent instruction set similar to UCSD Pascal (Bowles 1978). The architecture is a stack machine with almost no addressing modes, a highly regular instruction set, and very dense instruction encoding (1.9 bytes per instruction for the programs studied in (Antonioli and Pilz 1998)). The instruction set of the Java Virtual Machine is fully documented in (Lindholm and Yellin 1996). The Java stack is divided into frames and there is one frame per method invocation which has its own local registers. The Java instruction set contains an unusual amount of type information, and there are restrictions on the use of the operand stack, so that at every code point each slot in the stack and each local variable has a type. The type can be determined statically by a bytecode verifier, which means that stack underflow and overflow, the types of all parameters to instructions and object field references, and potential illegal conversions can be checked statically. The type information in the bytecode representation should also help a just in time (JIT) compiler in translating programs into native machine code (Gosling 1995). The simplest way the Virtual Machine executes a Java program is by interpreting the bytecodes. Performance can be improved by using a JIT compiler to produce native machine code at run time. In order to achieve a performance competitive with the code produced by a native compiler however, it would be necessary to apply the same code-generation techniques, which seems too expensive for JIT compilation. Even though the Java platform dynamically links and loads classes as they are needed, most execution time is spent in a small portion of the code (Armstrong 1998). The hotspot Virtual Machine design uses dynamic compilation to only convert the most heavily used parts of a program to native machine instructions, and interprets the bulk of the code, avoiding the overhead of compiling code that is infrequently or never executed. In order to study dynamic bytecode usage it was necessary to modify the source code of a Java Virtual Machine. Kaffe (www.kaffe.org) is an independent implementation of the Java Virtual Machine which was written from scratch and is free from all third party royalties and license restrictions. It comes with its own standard class libraries, including Beans and Abstract Window Toolkit (AWT), native libraries, and a highly configurable virtual machine with a JIT compiler for enhanced performance. Kaffe is available under the Open Source Initiative and comes with complete source code, distributed under the GNU Public License.
3. Programs Measured A selection of different programs was studied to compare the way different applets and applications use the bytecodes. The atomic applet symbolises nuclear forces at work. It is a simple applet showing an animated atom with whizzing electrons. Lots of configuration options make it easy to customise. The fireworks applet displays a number of fireworks rockets and animates them according to a set of user definable attributes. Jasmin (Meyer and Downing 1997) is a Java Assembler Interface. It takes ASCII descriptions for Java classes, written in a simple assembler-like syntax using the Java Virtual Machine instruction set. It converts them into binary Java class files suitable for loading into a Java Virtual Machine implementation. Jasmin itself is written in Java. Jasmin was measured assembling Count.j one of the example programs distributed with it. Java Compiler Compiler (www.suntest.com/JavaCC) is currently the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognise matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building actions, debugging, etc. JJTree, the tree builder, has now become popular and many users have started using JJTree along with JavaCC. The bytecodes JavaCC used while running javacc SPL.jj were measured. SPL (Stupid Programming Language) is one of the example programs distributed with JavaCC. JJTree was also measured while running jjtree SPL.jjt.
4. Instruction Execution Frequencies In (Waldron 1999) the total bytecode usage for the programs above was given. In order to fully study the dynamic execution of bytecodes, those used within API methods were measured separately from those in the program’s methods. Table 1 compares the dynamic percentages of instructions and non-native method calls in the API and the non-API program classes for the programs studied. It can be seen that 74:1% of bytecodes executed and 67:1% of Java methods are in the API. As shown in (Waldron and Harrison 1999) java/lang, java/awt and java/util contain the API methods most frequently executed by the programs. Table 1 implies that the behaviour of these methods will dominate the overall behaviour of a program. One obvious suggestion to improve the speed of Java programs would be to compile the API methods to native instructions and save these on disk in a standard format, cutting the time spent interpreting, which should give a dramatic increase in performance. Table 1 shows that on average each Java method in the programs (i.e. not in the API) only used 18:1 bytecodes when executed and that Java API methods executed on average 23:8 bytecodes. Table 2 shows non-API dynamic instruction execution frequencies for the atomic, fireworks, jasmin, JavaCC, and JJtree programs. The unabridged version of the instruction frequencies given in Table 2 is available at authors web page. The last column in Table 2 shows the average of the five benchmarks 1 5 100 ik fi = 5 k=1 256
i=1 ik
XP
where ik is the number of times bytecode i is executed during the execution of program k .
fi
is
atomic fire works jasmin JJTree JavaCC average
API non-native bytecodes bytecodes method % % per method 43.5 79.8 37.7 72.8 64.9 24.1 86.4 89.7 22.4 61.6 63.9 15.5 71.3 72.4 19.1 67.1 74.1 23.8
atomic fire works jasmin JJTree JavaCC average
non-API non-native bytecodes bytecodes method % % per method 56.5 20.2 7.3 27.2 35.1 34.8 13.6 10.3 16.4 38.4 36.1 14.0 28.7 27.6 18.2 32.9 25.9 18.1
Table 1: Comparison of dynamic percentages of instructions and non-native method calls by methods in the API and in the program classes. an approximation of that bytecode’s usage for a typical program. The Java bytecodes selected for inclusion in this Table 2 were the 35 most commonly used bytecodes. One interesting factor to note is that more static methods are invoked than virtual by the non-API bytecodes, although the actual methods may be in the API. For example, inspection of the source code of the atomic applet shows there are no static methods, so the invokestatics must be calling API methods. Table 3 shows the dynamic execution frequencies within API methods, which as shown in Table 1, accounts for 74.1% of instructions executed. In contrast to the non-API bytecodes, invokevirtual is used more frequently than invokestatic within the API methods. Total (API and non-API) dynamic bytecode execution frequencies are given in Table 4. For the purposes of this study, the 202 bytecodes can be split into 22 categories as shown in Table 5. By assigning those instructions that behave similarly into groups it is possible to describe clearly what is happening. Table 2, Table 3 and Table 4 are summarised in Figure 1, Figure 2 and Figure 3 respectively where the breakdown of API, non-API and total (API + nonAPI) are given. It is clear from these charts that the execution patterns for each category are similar in all five programs. This is confirmed in the correlations given in Table 6, which gives a numeric estimation of the similarity in the usage of each instruction category by each pair of programs. Table 6 shows strong positive associations between the programs with respect to execution frequencies in corresponding categories. While all correlations are significantly greater than zero, those for the API are higher than the corresponding correlations for the nonAPI, which is to be expected as the programs are undertaking different activities.
Instruction aload 0 getfield getstatic invokestatic dup iload 1 iconst 1 iconst 2 isub putstatic invokevirtual dmul bipush ifeq ldc2w iload i2d putfield dload 3 ireturn iload 0 iadd aload 1 if icmpne iconst 0 aload iload 2 ineg goto d2i if icmplt return invokespecial ldc1 istore
atom 9.6 9.4 0.0 13.4 0.0 8.8 0.0 9.4 9.3 0.0 0.2 8.8 0.0 0.0 8.8 0.0 8.8 0.0 8.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0
fire 27.8 27.6 0.0 0.0 6.4 0.0 1.9 0.0 0.0 0.0 2.8 1.8 0.0 0.0 0.0 0.0 0.0 6.6 0.0 0.0 0.0 1.9 2.8 0.0 0.4 0.0 1.1 0.0 0.1 3.7 1.1 0.9 0.0 0.0 0.0
jas 11.0 5.8 0.2 0.1 4.9 3.3 2.5 2.9 0.9 0.0 5.4 0.0 4.4 0.0 0.0 7.4 0.0 0.6 0.0 0.7 1.2 1.3 0.3 1.3 1.6 4.9 2.3 0.3 1.1 0.0 1.2 0.3 2.4 1.3 2.0
jcc 4.4 3.8 18.3 4.7 3.0 0.6 4.2 0.1 0.6 5.6 0.9 0.0 2.9 5.6 0.5 1.3 0.0 0.9 0.0 3.6 3.2 2.2 0.9 2.4 2.0 0.0 0.8 0.0 1.7 0.0 0.7 1.0 0.3 1.0 1.0
fi 11.5 10.2 7.2 4.6 3.4 2.6 2.5 2.5 2.3 2.2 2.2 2.1 2.1 2.0 1.9 1.9 1.8 1.8 1.8 1.6 1.5 1.5 1.3 1.2 1.1 1.0 1.0 0.9 0.9 0.7 0.7 0.7 0.7 0.7 0.7
jtt 4.5 4.2 17.5 4.8 2.7 0.5 4.0 0.0 0.6 5.4 1.6 0.0 3.0 4.5 0.3 0.9 0.0 1.0 0.0 3.8 3.0 2.0 2.3 2.1 1.6 0.3 1.0 0.0 1.8 0.0 0.7 1.2 0.7 1.1 0.7
Table 2: Non-API dynamic bytecode execution frequencies. object_fields local_load
push_const others array_store array_load method_return arithmetic
conversion local_store
method_invoke
stack conditional_branch
Figure 1: Non-API dynamic bytecode execution frequencies by category.
Instruction aload 0 getfield iload 1 iload 2 aload 1 invokevirtual iload 3 iload iadd return dmul iconst 0 ireturn putfield invokestatic if icmpge goto iinc iconst 1 bipush if icmplt invokespecial caload dup i2d if icmpne areturn ldc2w aload arraylength if icmple istore 3 d2i aconst null dload
atom 7.5 3.7 2.6 4.5 1.2 4.4 2.1 7.9 2.6 1.0 10.2 0.5 3.8 0.6 4.2 2.2 1.1 0.9 0.2 1.4 0.1 0.3 0.3 0.2 4.0 0.2 0.2 4.6 0.5 0.2 0.1 0.2 3.4 0.3 3.4
fire 17.8 16.4 3.5 5.2 5.5 2.9 3.5 1.6 1.2 3.0 0.1 0.8 1.9 2.7 2.4 0.7 0.8 0.8 1.9 0.8 1.3 1.0 0.4 1.0 0.9 1.0 0.3 0.2 0.4 0.4 0.2 0.6 0.0 1.1 0.1
jas 16.2 14.1 4.6 2.8 1.9 3.2 3.0 2.3 3.9 1.5 0.0 1.8 2.1 2.5 0.4 2.6 2.6 2.1 2.3 1.7 0.9 0.9 2.4 1.5 0.0 1.6 1.4 0.0 2.0 0.9 0.5 1.6 0.0 0.8 0.0
jcc 13.9 11.9 5.2 4.3 4.2 3.4 4.3 3.1 3.1 3.1 0.0 3.3 1.1 1.7 1.0 1.7 2.2 2.0 1.5 1.3 1.8 1.5 1.6 1.2 0.0 1.6 1.3 0.0 1.3 1.4 1.4 0.7 0.0 0.4 0.0
jtt 16.0 12.3 4.2 3.3 5.9 4.1 4.7 1.9 2.6 4.0 0.0 4.0 1.1 2.3 1.5 1.0 1.5 1.2 1.2 1.1 1.9 2.1 1.1 1.4 0.0 0.4 2.0 0.0 0.9 1.4 1.9 0.7 0.0 0.8 0.0
fi 14.3 11.7 4.0 4.0 3.7 3.6 3.5 3.4 2.7 2.5 2.1 2.1 2.0 2.0 1.9 1.6 1.6 1.4 1.4 1.3 1.2 1.2 1.2 1.1 1.0 1.0 1.0 1.0 1.0 0.9 0.8 0.8 0.7 0.7 0.7
Table 3: API dynamic bytecode execution frequencies. local_load
object_fields
others array_manage misc stack unconditional_branch
conditional_branch
conversion array_load push_const local_store arithmetic
method_return method_invoke
Figure 2: API dynamic bytecode execution frequencies by category.
Instruction aload 0 getfield iload 1 invokevirtual iload 2 aload 1 iload iload 3 getstatic invokestatic iadd dmul ireturn return putfield iconst 0 iconst 1 dup goto if icmpge bipush iinc ldc2w i2d if icmpne ifeq if icmplt areturn invokespecial aload iload 0 caload isub d2i istore 3
atom 7.9 4.8 3.8 3.6 3.6 1.0 6.3 1.7 0.1 6.1 2.1 9.9 3.0 0.8 0.5 0.4 0.2 0.2 0.8 1.7 1.1 0.7 5.5 4.9 0.2 0.2 0.1 0.1 0.3 0.4 1.9 0.3 1.9 2.7 0.2
fire 21.3 20.3 2.3 2.9 3.7 4.6 1.0 2.3 0.1 1.5 1.4 0.7 1.2 2.3 4.1 0.7 1.9 2.9 0.5 0.8 0.5 0.5 0.1 0.6 0.7 0.4 1.2 0.2 0.6 0.2 0.3 0.3 0.1 1.3 1.1
jas 15.7 13.2 4.5 3.5 2.8 1.8 2.8 2.9 0.1 0.3 3.6 0.0 2.0 1.4 2.3 1.8 2.3 1.9 2.5 2.4 2.0 2.1 0.0 0.0 1.6 0.6 0.9 1.3 1.0 2.3 0.1 2.1 0.5 0.0 1.5
jcc 11.3 9.7 3.9 2.7 3.3 3.3 2.6 3.2 5.5 2.0 2.8 0.0 1.8 2.5 1.5 2.9 2.3 1.7 2.0 1.4 1.8 1.6 0.1 0.0 1.8 1.7 1.5 1.0 1.2 1.0 1.0 1.2 0.6 0.0 0.5
jtt 11.9 9.4 2.9 3.2 2.5 4.6 1.5 3.2 6.5 2.7 2.4 0.0 2.1 3.0 1.8 3.1 2.2 1.9 1.6 0.8 1.8 1.0 0.1 0.0 1.0 1.9 1.5 1.7 1.6 0.7 1.2 0.7 0.9 0.0 0.5
fi 13.6 11.5 3.5 3.2 3.2 3.1 2.8 2.7 2.5 2.5 2.5 2.1 2.0 2.0 2.0 1.8 1.8 1.7 1.5 1.4 1.4 1.2 1.2 1.1 1.1 1.0 1.0 0.9 0.9 0.9 0.9 0.9 0.8 0.8 0.8
Table 4: Total (API and non-API) dynamic bytecode execution frequencies. local_load
object_fields
others conditional_branch
array_manage misc unconditional_branch array_load stack
push_const
conversion local_store arithmetic
method_return method_invoke
Figure 3: Total (API and Non-API) dynamic bytecode execution frequencies by category.
Category misc push const local load array load local store array store stack arithmetic logical shift logical boolean conversion comparison conditional branch unconditional branch subroutine table jump method return object fields method invoke object manage array manage monitor
Number 5 20 25 8 25 8 9 24 6 6 15 5 16 2 3 2 6 4 4 3 4 2
Bytecodes nop,iinc,athrow,wide,breakpoint 1-20 21-45 46-53 54-78 79-86 87-95 96-119 120-125 126-131 133-147 148-152 153-166,198,199 goto,goto w jsr,ret,jsr w tableswitch,lookupswitch 172-177 178-181 182-185 new,checkcast,instanceof 188-190,197 monitorenter,monitorexit
Table 5: Categories of Java bytecodes.
fire works jasmin JavaCC JJTree
non-API atomic fire works jasmin JavaCC 0.590 0.763 0.681 0.454 0.798 0.533 0.480 0.818 0.557 0.996
fireworks jasmin JavaCC JJTree
API atomic fire works jasmin JavaCC 0.800 0.840 0.987 0.849 0.988 0.992 0.856 0.986 0.986 0.997
Table 6: Correlation between bytecode category execution frequency for API and non-API.
5. Bytecodes Produced by Different Compilers An interesting question is to what extent the choice of compiler influenced the bytecodes executed by the Java Programs under scrutiny. Source code was not available for JavaCC or JJTree, so it was not possible to fully investigate this. Table 7 gives non-API dynamic bytecode execution frequencies for the Jasmin assembler application compiled by Borland’s Jbuilder2, Microsoft Visual J++ version 6.0 and and Sun’s JDK 1.1.7 compilers. Bytecodes executed in API methods were excluded from Table 7 as the API libraries were not recompiled. Remarkably similar frequencies were obtained using the different compilers. Table 8 gives API dynamic bytecode execution frequencies for the Jasmin assembler application when the program files were compiled with different compilers. The API was not recompiled. The bytecodes used where very similar, implying that the compilers are calling the same API methods, as would be expected. An interesting problem was posed when trying to look at the bytecodes used when the atomic applet was compiled by different compilers since the applet is an infinite loop that runs until exited, making it impossible to reproduce executions exactly. One solution would be to stop the virtual machine after a certain number of instructions, but this would be unfair if one complier was more efficient and generated fewer bytecodes, so the source code of the applet was altered to rotate the atom a precise number of times and then exit. Table 9 shows non API dynamic bytecode execution frequencies for the atomic applet compiled with Borland’s Jbuilder2, Microsoft Visual J++ version 1.0, Sun’s JDK 1.1.7 and JDK 1.2, and Pizza. The bytecodes executed were almost identical in all cases, again implying the main difference between Java compilers may be the speed of compilation. The similarity between bytecode usage by different compilers may be because many traditional optimisations performed by compilers cannot be used when converting a Java program to bytecode. For example, there are no choices about registers to use, and it is very difficult to optimise method calls. According to (Javasoft White Paper) most method invocations in the Java programming language are virtual (potentially polymorphic), which are much less frequent in C++. The data in Table 2 show higher dynamic execution frequencies for invokestatic than for invokevirtual, questioning this assumption for the programs studied. If most method invocations are assumed to be virtual, method invocation performance is more dominant, and static compiler optimisations (especially global optimisations like inlining) are much harder to perform. Traditional optimisations are often most effective between calls, and the decreased distance between calls in the Java programming language can significantly reduce the effectiveness of such optimisations, since they have smaller sections of code to work with. Java programs can change on-the-fly due to the ability to perform dynamic loading of classes, making it far more difficult to perform many types of global optimisation, since the compiler must not only be able to detect when these optimisations become invalid due to dynamic loading, but also must be able to undo and/or redo those optimisations during program execution, even if they involve active methods on the stack, without compromising or impacting Java program execution semantics in any way. Other difficulties arise in optimisation because the Java programming language is dynamically safe (Lindholm and Yellin 1996), meaning that it is ensured that programs do not violate the language semantics, or directly access unstructured memory. This means dynamic type-tests must frequently be performed when casting, and when storing into object arrays. The Java
Instruction aload 0 iload getfield invokevirtual dup aload bipush iload 1 saload sastore iconst 2 iconst 1 invokespecial iload 2 istore iinc iload 3 iconst m1 iconst 0 arraylength imul if icmpne iadd goto istore 3 if icmpeq if icmple aload 2 aastore isub if icmplt ldc1 iload 0 idiv if icmpge Total
Jbuilder2 10.6 7.2 5.7 5.2 5.0 4.8 4.4 3.2 2.9 3.1 2.8 2.5 2.3 2.2 2.0 1.7 1.7 1.8 2.5 1.6 1.3 1.2 1.3 1.6 1.2 1.1 0.8 1.0 1.0 0.8 0.0 1.5 1.1 0.8 1.8 65608
j++ 6.0 11.0 7.4 5.8 5.4 4.9 4.9 4.4 3.3 3.0 2.9 2.9 2.5 2.4 2.3 2.0 1.8 1.8 1.8 1.6 1.7 1.3 1.3 1.3 1.1 1.2 1.1 1.1 1.0 1.0 0.9 1.2 0.7 1.2 0.8 0.4 63774
JDK117 11.0 7.4 5.8 5.4 4.9 4.9 4.4 3.3 3.0 2.9 2.9 2.5 2.4 2.3 2.0 1.8 1.8 1.8 1.6 1.7 1.3 1.3 1.3 1.1 1.2 1.1 1.1 1.0 1.0 0.9 1.2 1.3 1.2 0.8 0.4 63771
JDK117 -O 11.2 7.7 6.1 4.3 5.1 5.1 4.6 3.5 3.1 3.0 3.0 2.7 2.5 2.4 2.1 1.9 1.9 1.9 1.6 1.7 1.4 1.3 1.4 1.2 1.3 1.2 1.2 1.0 1.0 0.9 1.2 0.1 0.0 0.9 0.4 61218
fi 10.9 7.4 5.8 5.1 5.0 4.9 4.5 3.3 3.0 3.0 2.9 2.5 2.4 2.3 2.0 1.8 1.8 1.8 1.8 1.7 1.3 1.3 1.3 1.2 1.2 1.1 1.1 1.0 1.0 0.9 0.9 0.9 0.9 0.8 0.8
Table 7: Non-API dynamic bytecode execution frequencies for the Jasmin assembler application using different compilers.
Instruction aload 0 getfield iload 1 iadd invokevirtual iload 3 iload 2 if icmpge goto putfield caload iload iconst 1 iinc ireturn aload aload 1 iconst 0 bipush if icmpne istore 3 return dup areturn imul if icmplt invokespecial arraylength aconst null astore aload 2 ifeq aload 3 aaload istore 2 Total
Jbuilder2 16.1 14.0 4.6 3.9 3.3 2.9 2.8 2.7 2.8 2.5 2.4 2.3 2.3 2.1 2.1 2.0 1.9 1.8 1.7 1.5 1.6 1.5 1.5 1.4 1.0 0.8 0.8 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.6 554889
j++ 6.0 16.2 14.1 4.6 4.0 3.2 3.0 2.8 2.6 2.6 2.5 2.4 2.3 2.2 2.1 2.1 2.0 1.9 1.8 1.7 1.6 1.6 1.5 1.5 1.4 1.0 0.9 0.9 0.9 0.8 0.8 0.7 0.6 0.6 0.6 0.6 554035
JDK117 16.2 14.1 4.6 3.9 3.2 3.0 2.8 2.6 2.6 2.5 2.4 2.3 2.3 2.1 2.1 2.0 1.9 1.8 1.7 1.6 1.6 1.5 1.5 1.4 1.0 0.9 0.9 0.9 0.8 0.8 0.7 0.6 0.6 0.6 0.6 553124
JDK117 -O 16.2 14.1 4.6 3.9 3.4 3.0 2.9 2.6 2.6 2.5 2.4 2.3 2.2 2.1 2.0 2.0 1.9 1.8 1.7 1.6 1.6 1.5 1.5 1.4 1.0 0.9 0.9 0.9 0.8 0.8 0.7 0.6 0.6 0.6 0.7 554019
fi 16.2 14.1 4.6 3.9 3.3 3.0 2.8 2.6 2.6 2.5 2.4 2.3 2.2 2.1 2.1 2.0 1.9 1.8 1.7 1.6 1.6 1.5 1.5 1.4 1.0 0.9 0.9 0.9 0.8 0.8 0.7 0.6 0.6 0.6 0.6
Table 8: API dynamic bytecode execution frequencies for the jasmin assembler application when the program files were compiled with different compilers. The API was not recompiled.
Instruction invokestatic aload 0 getfield iconst 2 isub dmul i2d ldc2w iload 1 dload 3 ineg invokevirtual dstore nop lsub fsub
Jbuilder2 13.4 9.6 9.4 9.4 9.3 8.8 8.8 8.8 8.8 8.8 4.4 0.2 0.1 0.0 0.0 0.0
j++ 1.00 13.4 9.6 9.4 9.4 9.3 8.8 8.8 8.8 8.8 8.8 4.4 0.2 0.1 0.0 0.0 0.0
JDK117 13.4 9.6 9.4 9.4 9.3 8.8 8.8 8.8 8.8 8.8 4.4 0.2 0.1 0.0 0.0 0.0
JDK12 13.5 9.6 9.4 9.4 9.3 8.8 8.8 8.8 8.8 8.8 4.4 0.2 0.1 0.0 0.0 0.0
pizza 13.4 9.6 9.4 9.4 9.3 8.8 8.8 8.8 8.8 8.8 4.4 0.2 0.1 0.0 0.0 0.0
fi 13.4 9.6 9.4 9.4 9.3 8.8 8.8 8.8 8.8 8.8 4.4 0.2 0.1 0.0 0.0 0.0
Table 9: Non-API dynamic bytecode execution frequencies for the atomic applet for 5 different compilers. programming language allocates all objects on the heap, whereas in C++ many objects are stack allocated allowing potential for optimisation. This means that object allocation rates are much higher for the Java programming language than for C++. In addition, because the Java programming language is garbage-collected, it has very different types of memory allocation overhead including potentially scavenging and write-barrier overhead than C++. The -O directs the JDK 1.1.7 compiler to try to generate faster code by inlining static, final and private methods. This option may slow down compilation, make larger class files, and/or make it difficult to debug. This option informs the compiler that all generated class files are guaranteed to be delivered and upgraded as a unit, enabling optimisations that may otherwise break binary compatibility, so this option needs to be used with discretion. The Java Language Specification (Gosling, Joy and Steele 1996) section 13.4.21 describes situations in which it is legal to use a Java compiler to inline methods. The compiler will only optimise code for which source is available during the compilation, so the only .java files discoverable by the compiler should be for classes intended to be delivered or upgraded as a unit. Table 8 shows a decrease in use of invokevirtual and an increase in overall bytecode use for Jasmin compiled with the -O option. In the most recent Java systems (Java 1.2), the inlining abilities of javac have been vastly reduced. This was necessary because the previous inlining was too aggressive; it would sometimes produce class files that failed verification.
6. Conclusions The results obtained from studying the test suite must be analysed with caution since they may have been strongly influenced by the particular programs analysed. The paper looks at the factors which influence overall bytecode usage, such as the number of Application Programming Interface (API) methods executed, and the compiler that was used to generate the class files. Because of the object oriented nature of Java programs, there are a high number of small methods executed, and a very high percentage of a particular program’s bytecodes are executed
during method calls within the API. The dynamic percentages of instructions and non-native method calls in the API and the program classes used by a selection of different programs was studied to compare the way different applets and applications use the bytecodes and it was found that 74:1% of bytecodes executed and 67:1% of Java methods are in the API. This suggests a way to improve the speed of Java programs would be to compile the API methods to native instructions and save these on disk in a standard format, cutting the time spent interpreting programs. Non-API program methods only used 18:1 bytecodes on average when executed and Java API methods executed on average 23:8 bytecodes. One interesting factor to note is that more static methods were invoked than virtual by the non-API bytecodes. Remarkably similar frequencies were obtained using the different compilers because many traditional optimisations performed by compilers cannot be used when converting a Java program to bytecode. For the set of programs studied, most of the bytecodes are used at least once during execution. However a small subset of the bytecodes is executed with very high frequency. This group of instructions, load locals and object field accesses, seems to be used frequently by both large applications and small applets and by both API and non-API methods. The data in this paper question the stack based design for the bytecode intermediate representation of Java programs. Because a stack based architecture is not random access, there is a lot of moving data between the stack and local variables which does not do any useful work but is merely a way of telling those instructions that do the work of the program what operands to use. As can be seen from Figure 3, Loading local variables and pushing constants onto the operand stack together take 41:2% of the instructions executed and storing locals from the stack takes a further 3:7% which uses a lot of execution time when bytecodes are interpreted. However for frequently used parts of a program, the dynamic compiler will incorporate a lot of these pushes and pops into a single native instruction. JIT compiling to convert this inefficient representation to native code has been proposed as a solution but to achieve a performance competitive with the code produced by a native compiler, it would be necessary to apply the same code-generation techniques which seems too expensive for JIT compilation. As shown in (Antonioli and Pilz 1998), the bytecodes only occupy on average 12% of a class file, so an intermediate representation that used more space, but executed more efficiently might be possible. Register numbers can be extracted very quickly from instructions executed directly by hardware, but this would be more problematic for bytecodes designed to be interpreted.
7. References Aho, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools, Addison Wesley, 1986 Antonioli, Denis and Pilz, Markus. Analysis of the Java Class File Format, Dept. of Computer Science, University of Zurich, Technical Report 98.4 Armstrong, Eric Hotspot: A New Breed of Virtual Machine, Java World, March 1998. Bowles, Kenneth UCSD Pascal, Byte 46 May 1978 170-173. Gosling James, Oak Intermediate Bytecodes, ACM SIGPLAN Workshop on Intermediate Representations (IR ’95) p. 111-118. Gosling, Joy and Steele, The Java Language Specification, Addison Wesley 1996
Java Compiler Compiler, JavaSoft White Paper, The Java Hotspot Performance Engine Architecture: A White Paper About Sun’s Second Generation Performance Technology Lindholm, Tim and Yellin, Frank, The Java Virtual Machine Specification, Addison Wesley 1996 Meyer, John and Downing, Troy The Java Virtual Machine, O’Reilly 1997 Waldron John, Dynamic Bytecode Usage by Object Oriented Java Programs, Technology of Object-Oriented Languages and Systems 29th International Conference and Exhibition, Nancy, France June 1999. Waldron, J and Harrison, O Analysis of Virtual Machine Stack Frame Usage by Java Methods, Third IASTED Conference on Internet and Multimedia Systems and Applications (IMSA), Nassau, Grand Bahamas, Oct 18-21, 1999. Wilkinson, Tim, KAFFE, A Virtual Machine to run Java Code