Evaluation of a Just-in-Time Compiler Retrofitted for PHP - CiteSeerX

6 downloads 17732 Views 146KB Size Report
Mar 19, 2010 - tion of a traditional PHP interpreter and our JIT compiler-based im- ... The dynamic scripting languages such as Perl, PHP, Python, and.
Evaluation of a Just-in-Time Compiler Retrofitted for PHP Michiaki Tatsubori

Akihiko Tozawa

Toyotaro Suzumura

Scott Trent

Tamiya Onodera

IBM Research – Tokyo 1623-14 Shimotsuruma, Yamato, Kanagawa, Japan {mich,atozawa,toyo,trent,tonodera}@jp.ibm.com

Abstract Programmers who develop Web applications often use dynamic scripting languages such as Perl, PHP, Python, and Ruby. For general purpose scripting language usage, interpreter-based implementations are efficient and popular but the server-side usage for Web application development implies an opportunity to significantly enhance Web server throughput. This paper summarizes a study of the optimization of PHP script processing. We developed a PHP processor, P9, by adapting an existing production-quality just-in-time (JIT) compiler for a Java virtual machine, for which optimization technologies have been well-established, especially for server-side application. This paper describes and contrasts microbenchmarks and SPECweb2005 benchmark results for a well-tuned configuration of a traditional PHP interpreter and our JIT compiler-based implementation, P9. Experimental results with the microbenchmarks show 2.5-9.5x advantage with P9, and the SPECweb2005 measurements show 20-30% improvements. These results show that the acceleration of dynamic scripting language processing does matter in a realistic Web application server environment. CPU usage profiling shows our simple JIT compiler introduction reduces the PHP core runtime overhead from 45% to 13% for a SPECweb2005 scenario, implying that further improvements of dynamic compilers would provide little additional return unless other major overheads such as heavy memory copy between the language runtime and Web server frontend are reduced. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors – Run-time environments; Compilers; Optimization General Terms Keywords

Design, Experimentation, Languages

PHP, JIT Compiler, Scripting Languages

1. Introduction The dynamic scripting languages such as Perl, PHP, Python, and Ruby have become enormously popular for quickly implementing Web applications, and are widely used to access databases and other middleware. In particular, PHP has been one of the most popular server-side programming languages [25]. A dynamic scripting language runtime is usually an interpreter-based implementation, making it easy to support scripting language-specific features such

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. VEE’10, March 17–19, 2010, Pittsburgh, Pennsylvania, USA c 2010 ACM 978-1-60558-910-7/10/03. . . $10.00 Copyright 

as runtime code generation and dynamic typing. However, such server-side code has high potential for drastic optimization from just-in-time (JIT) compilation. We have developed a PHP runtime, P9, based on a production quality JIT compiler framework [9, 20] from an existing IBM Java virual machine. While supporting PHP almost fully, we limited the customization of the JIT compiler as small as possible so that we can leverage the existing JIT compiler components that are not specific to Java. By using a tag-word model to represent PHP values, we implemented a prototype of a high-performance PHP runtime based on JIT compiler technologies. While researchers and practitioners are examining the performance of recent dynamic Web-content generation technologies [4, 16, 22, 24, 25], this paper complements those studies with a detailed analysis focusing on potential acceleration of PHP. As far as we know, using JIT compiler techniques for server-side scripting language implementations have not been evaluated. We measured the benchmark performance of P9, other dynamic scripting languages, and Java to see the differences of pure scripting engine performance. Then we measured P9 and an existing PHP runtime with the SPECweb2005 benchmark to evaluate the end-toend performance. The contributions of this paper are: • study of the PHP language, which is the most popular Web ap-

plication programming language without any formal language specification document, from a language implementer perspective, • evaluation of a modified Java JIT compiler retrofitted for PHP

with a simple adaptation for dynamic typing, • experimental results with SPECweb 2005, an industry-standard

Web server benchmark, showing that the acceleration of dynamic scripting language processing in a realistic Web application server environment does matter and that applying an existing JIT compiler simply adapted to PHP processing works effectively, and • performance profiling which shows the simple JIT introduction

greatly reduces the PHP core runtime overhead, implying that further improvements of dynamic compilers would provide little additional return unless other dominant overheads are reduced. The rest of this paper is organized as follows. First, we discuss the server-side use of PHP and the optimization implications in Section 2, followed by a description of our JIT compiler-based implementation of a PHP runtime in Section 3. We present experimental results with microbenchmarks in Section 4 and with SPECweb2005 in Section 5 to evaluate the effectiveness of the JIT compiler approach, and discuss the significance in Section 6. After relating our work to other work in Section 7, we conclude the paper in Section 8.

main.php:

Resulting output: Hello, World

hello.php:

op extension/operands ---------------------------------------ECHO ’\n\n’ INCLUDE_OR_EVAL ’hello.php’, INCLUDE INIT_FCALL_BY_NAME ’hello’ SEND_VAL ’World’ DO_FCALL_BY_NAME 1 ECHO ’\n\n’ RETURN 1

Bytecode for hello() in hello.php op extension/operands ---------------------------------------RECV 1 CONCAT ~0, ’Hello, ’, !4 ($nm) ECHO ~0 RETURN null

Figure 2. Bytecode for main.php in Figure 1

Figure 1. An example “Hello World” PHP script

2. PHP Interpreter Acceleration In this section, we highlight the dynamic aspects of scripting languages and the potential optimization of an interpreter-based implementation for server-side usage. We mostly use variations of the open-source PHP language implementation 1 as examples, but these discussions are mostly applicable to other popular dynamic languages such as Perl, Python, and Ruby. 2.1

The Dynamic Aspects of a Language

In this paper, we are primarily interested in two aspects of dynamic scripting languages: Runtime code generation that allows programs to easily change their own structure while running, and Dynamic typing to check the types of values at runtime as opposed to compile-time static typing. While there is no universally accepted definition of what constitutes a dynamic scripting language, we feel that these points are fundamental to it since popular languages such as Perl, Python, PHP, and Ruby, incorporate them. We can contrast these features to some non-scripting programming languages that are widely used for developing Web applications, Java and C#/.NET. The non-scripting languages also offer these dynamic features, but only in limited ways. For example in Java, programmers can generate and load a new class at runtime using the customizable class loader mechanism but this feature is rarely used especially when developing Servlet or J2EE Web applications based on application containers. Figure 1 contains sample PHP scripts, main.php and hello.php, with the outputs shown on the right side. part of the figure. PHP programs are normally embedded in HTML description, where the actual PHP code starts after “”. A PHP runtime on the server side handles the other HTML text by simply passing it along unchanged. The include operation in PHP incorporates the content of the specified file ("hello.php" in this example) and evaluates it as if the code were included in the original code. The hello() is a call to a function named hello with the argument of a string value represented as the string literal ("World"). The declaration starting with the keyword “function” in hello.php is a function declaration for a given name (“hello”). 1 http://www.php.net/

This function receives the variable ($nm) as a parameter. The body of the function initially performs a string concatenation of the string value ("Hello, ") and the value of a variable ($nm). Note that the string value for the include operation can syntactically be an arbitrary string value that may be computed at runtime. The exact code that will be executed may not be known before the code is executed. Also, the definition of a function such as hello() might change for each execution of the program starting from main.php. In other cases, what is actually called at the function call hello() may for each request processing. 2.2

Bytecode and Code Caching

Bytecode For improved efficiency, interpreter-based runtimes typically compile their source code into some kind of intermediate code, often called “bytecode”. This contributes to higher performance by: • Avoiding duplicated lexical and syntactical parsing within a

source script, and • Simplifying the interpretation simplification of operations by

localizing data to reduce code fetches. One of the drawbacks of this approach is the cost of bytecode generation, especially when the runtime executes those bytecode instructions only once. Most of the popular dynamic scripting languages such as Perl, Python, and PHP have used this approach for many years. Ruby is unique in utilizing an abstract-syntax tree interpreter, but in the latest beta version 1.9 (known as YARV), Ruby also adopts a bytecode approach. Figure 2 shows the actual bytecode used by a PHP runtime, compiled from the main.php script in the “Hello World” example of Figure 1. Additional Interpreter Optimizations In PHP, we can access the value of a variable through a runtime string value representing the name of the variable. This kind of powerful introspection mechanism is often available in dynamic scripting languages. An interpreter-based implementation can offer such functionality relatively easily compared to a compiler-based implementation, since the data structures can be accessed by an interpreter while they are usually not even present in compiled code. Earlier PHP engines such as Zend Engine 1.x (PHP 4.x) and Zend Engine 2.0 (PHP 5.0) compiled the access to a variable nm in the same source code for hello() resulting in the following bytecode:

PHPValue pseudo_hello(PHPValue name) { static PHPValue _hello = ..; static PHPValue _null = ..; PHPValue tmp0; tmp0 = CONCATs[_hello.type][name.type] ->operation2(_hello.val, name.val); ECHOs[tmp0.type]->operation1(tmp0.val); return _null; }

TR JIT Compiler Framework TR-IL Trees & CFG

Bytecode

TR-IL Trees & CFG

Code Code Generators Generators

IL Generator Optimizers

Instructions & Meta-data

Virtual Machine / Runtimes

Figure 3. Pseudo-C code representing the JIT-compiler generated code

Figure 4. Architecture of the TR JIT compiler framework op extension/operands ---------------------------------------FETCH_R $4, ’nm’ CONCAT ~0, ’Hello, ’, $4

The $4 and ~0 references above are forms of temporary variables. The FETCH R instruction looks up the pointer to a variable from a runtime variable table in the interpreter. The name of variable is a constant value here but we can use arbitrary string values for FETCH R. For example, we can use a string concatenation result which is only available at runtime. Since table look-up is a costly operation at runtime, beginning with Zend Engine ZE 2.1 (PHP 5.1), such lookups were compiled into bytecode as demonstrated in Figure 2. The variable !4 refers to a structure in the execution stack which stores and references ”real” variables in userspace. The hash value for each variable is computed at compile time. The first time a compiled variable is used, the engine looks it up in the active symbol table and updates the compile-time table to identify its location. Subsequent uses of that compiled variable use the prefetched address. Although each individual lookup may not consume that much time, there can be a substantial performance impact when a value is used repeatedly in a tight loop. Code Caching In the context of server-side usage, the idea of caching bytecode for other requests is simple but highly effective. The overhead of generating bytecode and caching it can be quickly recovered when it is used for subsequent requests. The official release of the commonly used PHP implementation does not include such the caching feature by default, but it is available through the so-called APC caching library extension available at the same website. It is simple to install, configure, and use. Also, the optimization of compiled variables turns out to be more effective as compiled variables are used more frequently during recurring request handling. The effectiveness of caching suggests a potentially effective optimization with a just-in-time compiler. With techniques similar to code caching, we can preemptively “compile” the majority of the dynamic constructs of scripting languages to reduce the compilation costs at runtime. Though it is difficult to ensure that the benefits of compilation always exceed the cost of compilation, things become easier in a server-side environment, as many similar requests will repeatedly invoke the same code.

3. A Just-in-Time Compiler for PHP Our approach seeks potential optimizations to accelerate the performance of a Web server with applications written in dynamic scripting languages by leveraging existing optimization efforts in statically-typed language runtimes. We experimented with a production-quality just-in-time (JIT) compiler [9], TR JIT compiler, from an existing Java virtual machine (VM) to implement a PHP runtime. Our JIT compiler for PHP generates native code

similar to the code generated by a C compiler from the pseudo-C source code in Figure 3. Our goal is to evaluate the effectiveness of a JIT compiler for Java used for PHP. We designed our PHP runtime with the objective of reusing as many of the existing TR JIT compiler components as possible, primarily by retaining the same intermediate language (IL). In the rest of this section, we describe the design and implementation of our PHP runtime by focusing on the adaptation of the TR JIT compiler. 3.1

The TR JIT Compiler Framework

The IBM TR JIT compiler is a high-quality, high-performance optimizing compiler, designed with a high level of configurability in mind [9, 20]. It supports multiple Java VMs and multiple class library implementations, as well as many hardware architectures, and uses a wide range of optimizations and optimization strategies. It is currently used by the IBM J9 Java VM. The TR JIT compiler framework has 3 customizable components, the IL Generator, Optimizer, and Code Generator components, as shown in Figure 4. When compiling a Java method, the IL Generator walks through the method’s bytecode to generate the tree-based JIT compiler intermediate language (IL) that also encodes the control flow graph (CFG). The optimizers perform various optimizations on the IL, including languageindependent/dependent and architecture-independent/dependent ones. Finally, from the TR-IL, the Code Generator generates machine-specific instructions after peforming architecture-specific optimization on the instructions. The shaded boxes and ellipses indicate Java-specific components. For our PHP runtime implementation, we provided a PHPspecific IL generator and did not use the Java-specific optimizers. The IL generator consumes PHP scripts to produce TR-IL trees and CFGs. It includes a PHP parser that generates bytecode similar to the APC bytecode shown in Figure 2. It then generates TRIL from the APC bytecode. Though we developed PHP-specific optimizers for our runtime, a detailed explanation is beyond the scope of this paper, and we did not evaluate the efficacy of these individual optimizations in our experiments. The generated code for a script is cached and reused when executing the script for subsequent requests. It checks the metadata of the file to ensure that the script is still the same as previously compiled. If it has changed, then the runtime causes the JIT compiler to recompile the script. 3.2

Tag-Word Model and PHP Values

In dynamically typed languages, a single variable can hold both scalar and non-scalar values, such as integers and pointers to heap objects. Scalar values and non-scalar values should use the same size in their internal representations. Also, a mechanism is required to distinguish between the two during the execution. When PHP programs are compiled into Java bytecode, a boxed model is needed

typedef struct { P9Tag type; union { long char P9Double* P9Reference* P9Object* P9String* P9Array* P9Resource* } value; } P9Value;

longVal; charVal; dblPtr; refPtr; objPtr; strPtr; aryPtr; resPtr;

typedef enum { PNull, PBool, PInt, PStr1, PDouble, PReference, PObject, PStr, PArray PResource, } P9Tag;

Figure 5. A pseudo C structure for runtime PHP values since that is the only type of model that can represent compound objects in the Java language. In contrast, there is more freedom in a native compiler approach. In our P9 runtime, we used a tag-word model. Compared to the boxed model, the tag-word model has the advantage of less heap allocation particularly for scalar values that fit in the word size. Another benefit of a tag-word model over a tag-bit model is that we can treat various types as scalar types, which cannot be identified by a 1 or 2-bit tag-field. In our runtime, we use tags PNull, PBool, PInt, PStr1 to represent, null, boolean, 32-bit integer, and 8-bit character values, respectively. Other value types include PObject and PResource. Objects in PHP are very similar to arrays, except that objects are handled by reference, not by value 2 . Resources encapsulating data, such as file handles or DB connections, are used by the extension libraries. Note that we are usingmore precise tags for the values than are defined by the PHP language, in which there is no distinction between PStr1 and PStr. Figure 5 shows a pseudo-C code of the PHP value structures (left) and the type identifier enumeration for P9. 3.3

Helper Functions

One difficulty with the primitive language constructs in PHP is operator-overloading. For example, consider the meaning of +. The behavior of + is very complex, and varies depending on the types of the values passed as arguments. Such heavy overloading prevents encoding of each primitive operator as an IL instruction, since an IL instruction should be replaced with native code during the code generation phase of our Java JIT compiler, which is a tedious, architecture-dependent task. In our runtime, we instead chose to implement the primitive operators as helper functions written in C. 3.4

Dynamic Inclusion of Scripts

PHP’s link mechanism is done with include, which dynamically loads and executes external scripts. For example, in the following function, we dynamically link the external script specified by a file name $script: function foo($script, $param) { include $script; }

Note that include also causes variables to escape from their local scopes. For example, in the previous example, the variable $param can be seen from the included scripts. In such cases, we explicitly pass variable tables (created as needed) for the local variables that are being passed to the included scripts. 3.5

Variables Layout

The freedom PHP offers in accessing variables helps a programmer control meta-programming tasks, such as listing all of the bound variables in a context. At the same time, this freedom is a barrier for a developer of a runtime. PHP supports several kinds of scoping for variables: • Global variables. Variables defined at the top-level of scripts are

treated as global variables. Global variables can be seen from every context. In addtion, global variables can also be accessed through an array $GLOBALS by using their names. • Local variables escaped using include. The variable $param

appears in the example in the previous section. • Variable variables. Local variables can also be accessed by

name, using the notation $$x where $x holds the name of a variable to be accessed.

To handle such the variable scoping, the Zend PHP runtime allocates variables using associative arrays, which can be searched by using the names of the variables. However, this is apparently inefficient. The first optimization for scopes is to use the machine stack frame to access variables. Our Java JIT compiler provides an aggregate type, the components of whose values can be accessed by using indices. We initially allocate the local variable frame on stack, as an aggregate value. Now the compiler translates the normal local variable access, such as $x=”y”, into an index-lookup operation on this aggregate structure, which in turn is translated into a pair 3 of memory access instructions. In contrast, the other accesses such as variable-variable access use the variable table, which is an associative array for mapping from each variable name to a location inside the aggregate value. 3.6

PHP-Specific Implementation Notes

Here are some of the other less important design choices for the runtime. These choices are less important for the main concerns of this paper, but must be explained because they greatly affect the performance of the runtime, which affects our experimental results. Memory Management While asynchronous garbage collection mechanisms (frequently used in Java) are often best for good throughput, we had to use reference counting for three reasons: • To manage the lifetimes of the PHP references and resources. • To manage the copy-on-write operations for associative ar-

rays [23]. • To reclaim a heap cell when the reference count becomes 0.

Our runtime fully supports this feature by treating each top-level program, which consists of bare statements in the script text not enclosed inside function definitions, as a function. Our runtime treats include as a function call. This is different from Phalanger [2] which only allows static linking, i.e., include is only allowed to appear at the top-level, and its argument should be a constant file name.

The first reason is due to the PHP language semantics. The second reason significantly improves the performance of the runtime, since values are copied for each assignment to a variable in PHP, so that a naive implementation will suffer from very poor performance. For the third reason, we consider an alternative design choice to asynchronous garbage collection reasonable since we do not need to offset the costs of reference counting as it is already introduced for the first two requirements.

2 This

3 Due

is true from PHP 5. PHP 4 treats objects by value.

to accessing a tag and a value of the tag-word object model.

The language constructs for references and resources in PHP are quite unusual compared to those of other dynamic scripting languages. In PHP, we can declare destructors for resources and object values, which will be called when their values are destroyed. We need a reference counting collector to guarantee the execution of such destructors for locally declared resources and objects when a control exits from their local scope. Also, while assigned-byreference values must be handled in special ways when copied as an array element, but each element needs to revert back to a normal value when there is only a single reference to it. Both kinds of reference require precise counting of the references, which is very difficult with asynchronous garbage collection. Extension Libraries It is important to support PHP extension libraries, which are C libraries that provide important functions such as interfacing to operating systems, databases, and networks, and incorporating useful C libraries into PHP. The native interface of the current Zend runtime that communicates with the extension libraries has strong runtime dependencies. For example, the extension functions routinely and without checking access the runtime internal data structures through C macros. To support the extension libraries in P9, we define a more transparent native API, closer to JNI, and ported the basic PHP extension libraries using this API. This is essential for our runtime to have sufficient functionality to run SPECweb2005. In the porting, we used a shim layer created with C++ operator overloading 4 so that we could reuse the source code of the PHP extension libraries. One reason for the popularity of PHP is its support of a variety of extension libraries. PHP is sometimes called a wrapper language, a language that acts as a thin interface to talk to library components responsible for the major tasks of the programs. This led us to wonder whether the main bottleneck in typical applications is in the PHP runtime itself or in the extension components. We partially answer this question through our experiments with SPECweb2005. Sharing Compiled Code Finally, our runtime is a multi-VM runtime in which a single process contains multiple VMs executing simultaneously in separate threads. Each VM uses a separate thread local heap, which stores mutable data, while the shared memory is basically read-only, i.e., essentially storing constants. The shared memory also stores the compiled code. Note that considering the server-side usage and short life-time of PHP scripts, it is important to share compiled code.

4. Microbenchmarks Before discussing the end-to-end Web server benchmark results, we discuss some microbenchmark results here. The test system was an IBM IntelliStation M Pro with a 3.4GHz Xeon uniprocessor running Fedora Core 7 (kernel 2.6.23). 4.1

PHPBench

Figure 6 shows the comparison of PHP 5 and P9 for PHPBench 5 , which is a benchmark suite for PHP. It performs a large number of simple tests to benchmark various aspects of PHP interpreters. Here are some details about the tested PHP engines: 4 For example, inside the source code of extension libraries, the macro ZEND LONG(zv) is used for both loading and storing a long value from/to zv. We overload the =-operator to distinguish between the store ZEND LONG(zv) = ... from load ZEND LONG(zv) in the source code of PHP extensions, and dispatch them to different API calls that we created. 5 http://www.pureftpd.org/project/phpbench

PHP 5 (Recent implementation of PHP with Zend Engine 2.1, version 5.2.2) P9 (Our JIT-compiler-based PHP engine, which is based on IBM J9 VM 1.5.0 Build 2.3) P9 generally outperforms PHP 5. The rations vary, with some program improved by up to 5 times while others are worse than the PHP 5 version, probably because of problems in our P9 implementation and overheads in the extension adapter layer. 4.2

Comparison between language implementations

In this section, we report the performance of the P9 runtime using small application benchmarks. The goal of the experiments with these microbenchmarks is to understand the differences in the performance characteristics among the various PHP runtime implementations, focusing on the language-runtime level. PHP 4 (Old but still popular implementation of PHP, php.net, version 4.4.7) PHP 5 (Recent implementation of PHP, version 5.2.2) P8 1.1 (a Java implementation of PHP with Java bytecode generation, running on IBM J9 VM 1.5.0 Build 2.3) P9 (Our JIT-compiler-based PHP engine, based on IBM J9 VM 1.5.0 Build 2.3) In addition, we compared the PHP versions with Ruby and Python versions (taking them as representatives of other dynamic scripting languages), and with Java versions for reference (taking it as a representative of statically typed languages). We compared the following engines with our microbenchmarks. Python (2.6, a C implementation bytecode interpreter engine with bytecode interpretation) Jython (2.2.1, a Java implementation of Python with Java bytecode generation) Ruby 1.8 (Matz) (1.8.6, a C implementation of Ruby with abstract syntax tree interpretation without bytecode) Ruby 1.9 (YARV) (1.9.0.5, a C implementation of Ruby with a bytecode interpreter engine known as YARV) JRuby (1.1.5, a Java implementation of Ruby with Java bytecode generation) Java int (IBM J9 VM 1.5.0 Build 2.3 with JIT compiler disabled) Java w/noopt-JIT (IBM J9 VM 1.5.0 Build 2.3 as for ”Java w/JIT” below but with a base-level (no optimization) JIT compiler enabled) Java w/JIT (IBM J9 VM 1.5.0 Build 2.3 with the default JIT compiler enabled) In addition to Java with the default JIT compiler, Java with a no-optimization JIT compiler was tested, since this base-level optimization compiler matches the base system of P9, which does not involve any Java or PHP-specific dynamic optimizations. We compared the engines above using a series of microbenchmarks implementing the same logic as ones for PHP but written in different programming languages, Python, Ruby, and Java. The PHP language framework allows developers to extend the language with library functions written in C. These functions, which are known as ”extensions”, are then available to be used within PHP scripts. The PHP runtime provides a variety of extensions for string manipulation, file handling, networking, and so forth. Since our first goal is to understand the performance of the scripting engine itself, we conducted our microbenchmark experiments without using any extensions.

Relative Elapsed Time to PHP 5

1

ar it

hm

et bi ic tw c h c h is e r_ r_f i h co ard xe d m co m d co e n e d m t_l c o par oop m e_ p f co are alse _ co mp inv m are er pa _ t re str _u ict ns co trict m pa do r _w c e hi rc3 le _b 2 r do ea k em _w pt hile y_ lo o em p p gl fo ty ob r al g ea gl _s c e t_ ch ob a c al lar las _s _a s tri s n g sig _a n if _ s s co ign n in stan cr em t is_ en t a is_ rr ay ob je is_ c t ty pe lo iss c lo al_ et ca a l_ r ra bo y lin e _ lo olea ass ca n _ ign lo l_flo as ca a si lo l_h t _a gn ca a s l_ sh sig i lo nt e _ a n ca s l_ ge r sig lo ob _a n ca je s s l_ ct ig lo s ca _as n c a la s or l_ r _ ign de st a re rin ss d_ g_ ign fu as nc sig tio n ns _ or m de ref d re ere 5 d_ n fu ce s pr ncti eg o n _m s re at st fer c h rin en g_ ce ap s pe nd st rle un sw n or itc de h r va ed_ t im ria fu e bl nc po e t pu _v ion la ar s te ia _a bl es rra pr op y_ wh p er r o st r ile ty _a per in g c ty_ ke ob ce s ac y je s _ ce ct dy ss _i ns nam ta nt ic ia tio n

0

Figure 6. Relative elapsed time of P9 compared to PHP 5 for each test in the PHPBench benchmark suite (lower is better)

We developed three relatively small functions in PHP, Python, Ruby, and Java as microbenchmarks:

Java virtual machine, their benefit seems to be limited due to possible gaps between expressive powers of Java bytecode and dynamic scripting languages implemented on top of Java.

Quick Sort A quick sort benchmark that sorts 100 integers, Levenshtein A Levenshtein distance calculation benchmark which measures the similarity between two strings of 56 characters, Fibonacci A well-known Fibonacci number calculation benchmark that calculates the 15th value in a Fibonacci series from two arbitrary starting values. These PHP benchmarks were implemented entirely with PHP language primitives and avoided the use of any PHP extensions. The Python versions, Ruby versions, and Java versions also focused on using language primitives rather than using standard classes. We compared the total run time of executing each test 10,000 times with each engine. We also executed each benchmark for additional 10,000 times as warming-up before the measured test. This prevents the Java just-in-time compilation overhead from impacting the results in the Java tests. We ran the experiment on an Intel Pentium 4 3.40GHz CPU with 3GB of RAM, with the Linux 2.6.17 kernel. This experiment found large performance differences between each of the measured scripting languages and implementations. The experimental results in Figure 7 indicate that “Java w/JIT” performs the best, followed by “Java w/noopt-JIT”. “Java int” and “P9” came next. “Ruby 1.9 (YARV)” is the fastest of the interpreter-based dynamic scripting language engines, with a small gap to the next “Python” and a large gap to the “PHP 5”. “PHP 4” and “Ruby 1.8” were at the bottom for all of the tests. Java with JIT compilers (“Java w/JIT” and “Java w/noopt-JIT”) were nearly three orders of magnitude better due to their use of efficiently generated native code. It is also obvious that PHP 5 has a two to three fold performance improvement over PHP 4 for the measured computations. P9 is 2.5 to 9.5times better than PHP 5. This difference is much smaller than the difference between the noopt-JIT compiler and the interpreter for Java, which has a 13 to 22 fold advantage with the JIT compiler acceleration. These results are mainly because of the existence of helper functions in the PHP runtimes. Since the overhead of these functions is not masked by compilation, the acceleration with a JIT compiler cannot reduce their costs. Implementations with Java bytecode generation on stock Java VMs, “P8 1.1”, “Jython”, and “JRuby”, achieved competitive performance to best-of-bleed native interpreters, “PHP 5”, “Python”, and “Ruby 1.9 (YARV)”, respectively for each language. While they can leverage the JIT compiler acceleration in the underlying

5. Experiments with SPECweb2005 We then measured real-world end-to-end server throughput with PHP 5 and P9, by using the SPECweb2005 benchmark and a Web server environment. Based on the reported experiences in tuning PHP for the comparisons with Java [24], we prepared the best PHP server configuration for the existing PHP 5 runtime, and configured P9 similarly. 5.1

Environmental Set-Up

The SPECweb2005 benchmark [19] is a benchmark for Web servers focusing on dynamic content created with JSP or PHP. In our experiments, we used the PHP scripts included in the benchmark suite. For interfacing with Web servers, we implemented the FastCGI protocol based on OpenMarket’s implementation (version 2.4.0) 6 . Our experiment uses the Lighttpd Web server, and the measurement platform uses Linux IA32. The server machine was a single-core, single-processor (Pentium 4 3.40GHz) machine with 3 GB of memory. Figure 8 shows our test environment. We also used the PHP 5 runtime configured to use the APC cache. The version of the PHP 5 runtime was 5.2.2. Both the PHP 5 runtime and P9 were configured to store the session data on a RAM disk. We configured the number of server PHP runtime processes to 16. Unfortunately, the current version of FCGI does not allow a single process to create multiple connections to Web servers. Therefore the multi-VM functions were not fully exploited in our experiments. However we could still share and reuse the compiled JIT code within a single process across multiple requests. SPECweb2005 Benchmark Methodology The SPECweb2005 benchmark, developed by the Standard Performance Evaluation Corporation (SPEC), consists of three test scenarios based on common website uses: a banking site scenario, an e-commerce site scenario, and a support site scenario. The banking site scenario allows for typical encrypted account transactions via SSL and 60% of the data is generated through dynamic webpages. The e-commerce shopping site allows a user to browse catalogs and purchase products using both encrypted and unencrypted data. Experimentally, about 15% of the data in the e-commerce scenario 6 http://www.fastcgi.com/

1000 Relative Elapsed Time to Java w/ JIT

Fibonacchi Levenshtein Quick Sort

100

10

1 PHP 4

PHP 5

P8 1.1

P9

Python

Jython

Ruby 1.8 Ruby 1.9 (Matz) (YARV)

JRuby

Java int

Java noopt

Figure 7. Mini-application benchmark results with relative elapsed time to Java in a log scale (lower is better)

Fast CGI Client

Lighttpd

PHP PHP PHP runtime runtime runtime

Backend Simulator

Server machine (Linux) Figure 8. SPECweb2005 experiment environment

Table 1. Characteristics of SPECweb2005 data transferred from a Web server to its clients for each scenario Banking Percent of encrypted page accesses Percent of dynamic page accesses Percent of dynamic page data transferred

Ecommerce

Support

100.0%

14.1%

0.0%

8.7%

7.1%

4.7%

59.5%

71.6%

11.7%

We used a single SUT machine running the Web server, a BESIM server running the back-end simulation engine, a prime client machine, and three additional dedicated client machines. The computers were connected via a gigabit Ethernet network. The SUT was an IBM IntelliStation M Pro with a 3.4GHz Xeon uniprocessor running Fedora Core 7 (kernel 2.6.23) and Lighttpd 1.4.18. The standard distribution of SPECweb2005 was installed and configured as described in the SPEC documentation 7 . The parameter MAX OVERTHINK TIME was set to 90000, though 20000 is the officially recommended value. Increasing this value allowed us to measure SPECweb2005 under an extremely high load. Testing Methodology

is transmitted using SSL encryption and 70% of the data transmitted is generated for dynamic webpages. Finally, the vendor support site provides downloads of large unencrypted support files such as manuals and software upgrades. Since this scenario primarily allows for accessing large and non-confidential static files, there is no encryption, and only 12% of the data transmitted is generated through dynamic webpages. Table 1 summerizes our analysis of the characteristics of the experimental data. A typical SPECweb2005 test bed has multiple client machines controlled by a so-called Prime Client to provide a load on the System Under Test (SUT), thus simulating hundreds or thousands of users accessing the scenario websites. To reflect a modern multitier Web server architecture, SPECweb2005 uses one or more machines to serve as a Back-End SIMulator (BESIM), emulating the function of a ”Back-End” database server.

To assure fair comparisons, before each individual test began, our testing tool restarted the SPECweb2005 client components, all middleware and the Web server, and otherwise ensured that the environment on each system in this distributed environment was in a consistent and properly prepared state. An officially published SPECweb2005 benchmark score is a single value that based on three 30-minute test runs for each of the three scenarios, and compares the performance relative to SPEC’s reference machine. This can be used to compare the relative performance of Web server platforms from different vendors. Since our goal was to analyze in detail how the use of scripting language and the Web server affected the performance, for each test we used SPECweb2005’s internal metrics, such as the number of good/tolerable/failed requests served as re7 http://www.spec.org

ported from the SPECweb2005 test harness. To improve the test coverage in the time available, we used 10-minute test runs rather than the official 30-minute runs, and only ran each test once rather than three times. Although our test runs are not suitable for reporting as official scores, they are quite useful for identifying the trends seen over tens of tests, and variations between identical test runs were small as shown in Figure 9. The vmstat command was also used to monitor such performance statistics as memory usage, swapping activity, and CPU utilization [6]. In separate test runs, we used the oprofile tool to identify process, module, and function CPU utilization. 5.2

Results

Figure 9 shows the result of running our test with the SPECweb2005 banking scenario, which includes 18 dynamic pages visited by each client based on the fixed “think time” delay and page transition probability. The size of the workload is specified by the number of clients. The response time for each page request is classified as GOOD (within 2 seconds), TOLERABLE (within 4 seconds), and FAIL. For example with a workload of 1,200 clients in the Banking scenario (top two graphs), 97% of the requests are GOOD with the server configuration using P9, while this number drops to 7% with the PHP 5 runtime. Also the maximum number of total requests processed in 2-minute test runs is 18,690 at 1,200 clients for P9, and 16,373 at 1,000 clients for the PHP 5 runtime.

sending many small and large files from Web server to clients puts large overhead in Web server, rather than PHP runtime.

6. Discussion Figure 10 shows a profile of the CPU usage in the server machine running the SPECweb2005 E-commerce scenarios. It breaks down the CPU usage for the PHP runtimes according to how much CPU time is spent in each shared library component. A total of 29% of the CPU time is spent in the P9 runtime, whereas 57% is spent in the PHP 5 runtime. We conclude that the compiler approach is effective in improving the end-to-end throughput of the Web servers with dynamic scripting language programs. The improvements over the PHP 5 runtime come from the reduced interpretive overhead for program execution. To address the overhead in the runtime helper functions and extensions, an inlining technique used for Java Native Interface (JNI) inlining [20] might be useful. The technique of inserting guards to increase the opportunities of inlining, the polymorphic inline cache [11], could be used to inline an invoked function whose target is uncertain at compile-time. The large memory copy overheads are due to the communications between the Web server and the PHP runtime. This overhead could be eliminated by specializing the PHP runtime for the Web application while using zero-copy techniques [21].

Banking Scenario The banking scenario represents an online banking Web application, characterized as a secure Web application with SSL communications, such as checking account information, transferring money to different accounts, and so forth. The average amount of data sent to each Web client is 34.8 KB, as shown in Table 1. Figure 9 shows the performance results of the Banking scenario in SPECweb with the two runtime configurations, PHP 5 and P9. As shown in the graph, the peak performance of PHP 5 is 1,000 sessions, whereas the it is 1,200 sessions for P9, showing about 20% advantage with P9. This performance advantage of P9 comes from the fact that the runtime exploits just-in-time compilation, and PHP 5 is an interpreter-based runtime. The language runtime core overhead must be enough large even comparing to the overhead of encryption and decryption in SSL, which could be known to become dominant in light Web applications in Java. E-commerce Scenario The E-commerce scenario represents an online shopping Web application that supports searching for certain products, displaying product details, and finally purchasing the products. SSL processing is only used at checkout time. The average amount of data in this scenario is around 144 KB, which is larger than other scenarios. The peak performance of PHP 5 is 1,400 sessions, and P9 peeks at 1,800 sessions. This result demonstrates that our JIT compilerbased optimization approach outperforms the original runtime, PHP 5 runtime by about 30%. Support Scenario Finally the Support scenario represents a company support website where customers can download files such as drivers and manuals. The portion of dynamic content is small, and many files are sent from the Web server without the use of any PHP runtime. The average amount of data size is 78 KB. The peak performance of PHP 5 is 1,000 sessions, while P9 handles 1,200 sessions, having about 20% advantage over PHP 5. Even though SSL processing is not involved in this scenario, simply

7. Related Work Compilers There are a number of ongoing efforts to compile dynamic scripting languages. A popular approach is to compile the scripts into an existing bytecode. For example, Phalanger [2] is a PHP language compiler for the .NET Framework. We tried the downloadable version of Phalanger (v2.0 Beta 2) with our microbenchmarks and found the performance similar to be similar to the Zend interpreter, possibly because we could not tune the configuration adequately. We were also unable to this system to run SPECweb2005, which is not currently in the list of supported applications. Microsoft DLR is a framework to support various dynamic scripting languages for the .NET Framework including IronPython 8 and IronRuby 9 , which are for Python and Ruby. The latest versions of Phalanger are also reported to support DLR. Jython [12] 10 and JRuby 11 are Python and Ruby compilers generating Java bytecode. The P8 runtime in Project Zero 12 is another Java-based PHP compiler. However, for a runtime using an existing Java VM, the lack of expressiveness in the bytecode is problematic. JSR-292 proposes to add the invokedynamic instruction to Java bytecode as an inexpensive mechanism to call methods for untyped dynamic objects. The PyPy project [18] takes an implementation approach that attempts to preserve the flexibility of Python, while still allowing for efficient execution. This is achieved by limiting the use of the more dynamic features of Python to an initial, bootstrapping phase. This phase is used to construct a final, restricted python program, RPython [1], that is actually executed. RPython is a subset of Python with static typing and no dynamic modifications of classes or method definitions. However, it can still take advan8 http://www.codeplex.com/IronPython 9 http://ironruby.rubyforge.org/ 10 http://www.jython.org/ 11 http://dist.codehaus.org/jruby/ 12 http://www.projectzero.org/

Banking scenario Good

Failed

18000

18000

16000

16000

6000

Simultaneous Sessions

3000

2800

2600

2400

2200

200

3000

2800

2600

2400

2200

2000

1800

1600

1400

1200

800

1000

600

0 400

2000

0

2000

4000

2000

1800

4000

8000

1600

6000

10000

1400

8000

Failed

12000

1200

10000

Tolerable

14000

800

12000

1000

14000

600

Thoughput (requests/sec)

20000

400

Tolerable

20000

200

Thoughput (requests/sec)

Good

Simultaneous Sessions

PHP 5 runtime

P9 runtime E-commerce scenario Failed

Good 18000

16000

16000

6000

3000

2800

2600

2400

2200

200

3000

2800

2600

2400

2200

2000

1800

1600

1400

1200

1000

800

600

0

400

2000

0

2000

4000

2000

1800

4000

8000

1600

6000

10000

1400

8000

Failed

12000

1200

10000

Tolerable

14000

1000

12000

800

14000

600

Thoughput (requests/sec)

18000

400

Tolerable

20000

200

Thoughput (requests/sec)

Good 20000

Simultaneous Sessions

Simultaneous Sessions

PHP 5 runtime

P9 runtime Support scenario Failed

Good

12000

12000

6000

Simultaneous Sessions

Simultaneous Sessions

PHP 5 runtime

P9 runtime

Figure 9. SPECweb2005 benchmark results

3000

2800

2600

2400

2200

2000

1800

1600

3000

2800

2600

2400

2200

2000

1800

1600

1400

1200

1000

800

600

0 400

2000

0

1400

4000

2000

1200

4000

Failed

8000

1000

6000

Tolerable

10000

800

8000

600

10000

400

Thoughput (requests/sec)

14000

200

Tolerable

14000

200

Thoughput (requests/sec)

Good

2%

Disk I/O

Profiling Overhead Connection to Backend

1%

Web Server Core 1%

2%

Web Server e1000 Web Server libc

12%

Memory Copy

22%

SSL

PHP Runtime

58%

Web Server lighttpd Web Server aic79xx

45%

Runtime Core

9%

Memory Copy

Web Server libcrypto

Profiling oprofile/oprofiled PHP Runtime e1000 PHP Runtime libc PHP Runtime php-cgi

6%

Connection to Clients

PHP 5 runtime (PHP 5.2.2 with APC)

Web Server Disk I/O Profiling Core 2%2% Overhead

Connection to Backend

4%

28% 3%

Web Server libc Web Server libcrypto

SSL PHP Runtime

13%

Memory Copy

29%

13%

Memory Copy

Web Server e1000

Runtime Core

24%

Web Server lighttpd Web Server aic79xx Profiling oprofile/oprofiled PHP Runtime e1000 PHP Runtime libc PHP Runtime libp9rtsvc

11%

Connection to Clients

P9 runtime (IBM Java SE 5 JIT compiler) Figure 10. CPU usage profiling for SPECweb2005 E-commerce scenarios tage of Python features such as mixins and first-class methods and classes. There are also proposals for native compilers of PHP on the Parrot VM 13 , but this implementation is still too immature implementation for the kinds of experiments we report on in the paper. The Psyco [17] compiler for Python regards compilation as a kind of symbolic execution or specialization of a program. The Tamarin VM for JavaScript uses a dynamic JIT compiler based on runtime traces [8], rather than using traditional control flow graphs. Dynamic Optimization Further acceleration of the PHP runtime would be possible based on our work, since our JIT compiler is a baseline implementation for the application of additional dynamic optimizations. Chambers [5] proposed a variety of optimization techniques for SELF, a prototype-based object-oriented language without static typing nor class declarations. In the context of compile-time optimization for dynamicallytyped languages, tag optimizations have been extensively studied. Cartwright et al. [3] studied a soft-typing system for LISP, whose concept was to use type-inference to identify the typeable fragments of a program, and then remove the runtime checks during the execution of those typed fragments. Henglein [10] extended this framework by proposing an untagging optimization. The representation analysis [7, 13, 14] targets the optimization of polymorphic functions in ML, using unboxing. 13 http://www.parrot.org/

Escape analysis [6, 15] is another approach to obtain effects similar to unboxing. Escape analysis allows each boxed value to be allocated on a stack, and by combining it with a privatization technique, we may have the same effect as unboxing. A common problem with escape analysis is that it usually has very strict conditions on when we can say that an object is non-escaping.

8. Concluding Remarks In this paper, we explored the potential acceleration of the dynamic scripting language PHP in the context of server-side usage for Web applications. We modified an existing production quality just-intime compiler for Java to support PHP to evaluate the effectiveness of the compilation technologies in this context. We compared pure scripting runtime engine performance with microbenchmarks and then measured the JIT acceleration of the end-to-end PHP Web application server performance with the industry standard benchmark, SPECweb2005. The experimental results showed about 2030% advantages with our JIT-based acceleration, showing that the acceleration of dynamic scripting language processing matters in a realistic Web application environment, and that significant overheads still remain in the language runtime as well as in the communications with the Web server front-end. This paper found low-hanging fruit for the performance improvements in server-side dynamic scripting language implementations, which are currently popular and widely used in Web application development. By applying simple, but yet effective, implementation strategies, we can raise the bar for performance improve-

ments in the domain of server-side dynamic scripting language implementations.

Acknowledgments We thank anonymous reviewers of this paper for their insightful comments, constructive criticism, and advices, which were largely reflected on the final version of this paper.

References [1] D. Ancona, M. Ancona, A. Cuni, and N. D. Matsakis. RPython: a step towards reconciling dynamically and statically typed oo languages. In DLS ’07: Proceedings of the 2007 Symposium on Dynamic Languages, pages 53–64, New York, NY, USA, 2007. ACM. [2] J. Benda, T. Matousek, and L. Prosek. Phalanger: Compiling and running PHP applications on the Microsoft .NET platform. In Proceedings of .NET Technologies 2006, the 4th International Conference on .NET Technolgoies, Plzen, Czech Republic, May 29 - June 1, 2006, pages 11–20, 2006. [3] R. Cartwright and M. Fagan. Soft typing. In Proceedings of the SIGPLAN ’91 Conference on Programming Language Design and Implementation, pages 278–292, 1991. [4] E. Cecchet, A. Chanda, S. Elnikety, J. Marguerite, and W. Zwaenepoel. Performance comparison of middleware architectures for generating dynamic web content. In Proceedings of Middleware 2003, ACM/IFIP/USENIX International Middleware Conference, Rio de Janeiro, Brazil, June 16-20, 2003, volume 2672 of Lecture Notes in Computer Science, pages 242–261. Springer, 2003. [5] C. D. Chambers. The design and implementation of the self compiler, an optimizing compiler for object-oriented programming languages. PhD thesis, Stanford University, Stanford, CA, USA, 1992. [6] J.-D. Choi, M. Gupta, M. J. Serrano, V. C. Sreedhar, and S. P. Midkiff. Escape analysis for Java. In Proceedings of OOPSLA’99, Proceedings of the 1999 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, number 10 in SIGPLAN Notices vol.34, pages 1–19, Denver, Colorado, USA, November 1999. ACM. [7] K.-F. Fax´en. Representation analysis for coercion placement. In Static Analysis, 9th International Symposium, SAS 2002, Madrid, Spain, September 17-20, 2002, Proceedings, volume 2477 of Lecture Notes in Computer Science, pages 278–293. Springer, 2002. [8] A. Gal, C. W. Probst, and M. Franz. Hotpathvm: an effective jit compiler for resource-constrained devices. In VEE ’06: Proceedings of the 2nd international conference on Virtual execution environments, pages 144–153, New York, NY, USA, 2006. ACM. [9] N. Grcevski, A. Kielstra, K. Stoodley, M. G. Stoodley, and V. Sundaresan. Java just-in-time compiler and virtual machine improvements for server and middleware applications. In Proceedings of the 3rd Virtual Machine Research and Technology Symposium, May 6-7, 2004, San Jose, CA, USA, pages 151–162. USENIX, 2004. [10] F. Henglein. Global tagging optimization by type inference. In Proc. 1992 ACM Conf. on LISP and Functional Programming (LFP), San Francisco, California, pages 205–215. ACM Press, 1992. [11] U. H¨olzle, C. Chambers, and D. Ungar. Optimizing dynamicallytyped object-oriented languages with polymorphic inline caches. In ECOOP ’91: Proceedings of the European Conference on ObjectOriented Programming, pages 21–38, London, UK, 1991. SpringerVerlag.

[12] J. Hugunin. Python and Java - the best of both worlds. In Proceedings of the 6th International Python Conference, October 14-17, 1997, pages 31–38, 1997. [13] X. Leroy. Unboxed objects and polymorphic typing. In Conference Record of the Nineteenth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 177–188, Albequerque, New Mexico, 1992. [14] Y. Minamide and J. Garrigue. On the runtime complexity of typedirected unboxing. In ICFP ’98: Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming, pages 1–12, New York, NY, USA, 1998. ACM. [15] Y. G. Park and B. Goldberg. Escape analysis on lists. In PLDI ’92: Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation, pages 116–127, New York, NY, USA, 1992. ACM. [16] U. V. Ramana. Some experiments with the performance of lamp architecture. In Proceedings of CIT 2005, Fifth International Conference on Computer and Information Technology, 21-23 September 2005, Shanghai, China, pages 916–921. IEEE Computer Society, 2005. [17] A. Rigo. Representation-based just-in-time specialization and the psyco prototype for python. In Proceedings of PEPM 2004, the 2004 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-based Program Manipulation, 2004, Verona, Italy, August 24-25, 2004, pages 15–26. ACM, 2004. [18] A. Rigo and S. Pedroni. PyPy’s approach to virtual machine construction. In OOPSLA ’06: Companion to the 21st ACM SIGPLAN conference on Object-Oriented Programming Systems, Languages, and Applications, pages 944–953, New York, NY, USA, 2006. ACM. [19] Standard Performance Evaluation Corporation. SPECWeb2005, 2005. http://www.spec.org/web2005/. [20] L. Stepanian, A. D. Brown, A. Kielstra, G. Koblents, and K. Stoodley. Inlining Java native calls at runtime. In VEE ’05: Proceedings of the 1st ACM/USENIX International Conference on Virtual Execution Environments, pages 121–131, New York, NY, USA, 2005. ACM. [21] T. Suzumura, M. Tatsubori, S. Trent, A. Tozawa, and T. Onodera. Highly scalable web applications with zero-copy data transfer. In Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009, pages 921–930, 2009. [22] L. Titchkosky, M. F. Arlitt, and C. L. Williamson. A performance comparison of dynamic web technologies. SIGMETRICS Performance Evaluation Review, 31(3):2–11, 2003. [23] A. Tozawa, M. Tatsubori, T. Onodera, and Y. Minamide. Copy-onwrite in the PHP language. In Proceedings of POPL 2009, the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Savannah, Georgia, USA, January 21-23, 2009, pages 200–212. ACM, 2009. [24] S. Trent, M. Tatsubori, T. Suzumura, A. Tozawa, and T. Onodera. Performance comparison of PHP and JSP as server-side scripting languages. In Proceedings of Middleware 2008, ACM/IFIP/USENIX 9th International Middleware Conference, Leuven, Belgium, December 15, 2008, pages 164–182. Springer, 2008. [25] S. R. Warner and J. S. Worley. SPECweb2005 in the real world: Using IIS and PHP. In Proceedings of 2008 SPEC Benchmark Workshop, Millbrae, CA, USA, January 27, 2008. Standard Performance Evaluation Corporation (SPEC), 2008.