Distributed Java Compiler - Semantic Scholar

13 downloads 19270 Views 556KB Size Report
Most modern computer programs are written in high-level programming languages that are ... programming languages, there exists a program named distcc, the ...
Distributed Java Compiler: Implementation and Evaluation

A Thesis by Andrew Ryan Dalton

Submitted to the Graduate School Appalachian State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE

August 2004 Major Department: Computer Science

Distributed Java Compiler: Implementation and Evaluation

A Thesis by Andrew Ryan Dalton August 2004

APPROVED BY: Cindy Norris Chairperson, Thesis Committee James B. Fenwick Jr. Member, Thesis Committee Dee Parks Member, Thesis Committee Edward G. Pekarek Jr. Chairperson, Computer Science Judith E. Domer Dean of Graduate Studies and Research

c Andrew Ryan Dalton 2004 Copyright All Rights Reserved

iii

ABSTRACT Distributed Java Compiler: Implementation and Evaluation. (August 2004) Andrew Ryan Dalton, Appalachian State University Appalachian State University Thesis Chairperson: Cindy Norris

Most modern computer programs are written in high-level programming languages that are quite different from the binary code executed by computers. When programs are written in such a language, another program is used to translate the high-level English-like language to machine code. This program is called a compiler. For large programs consisting of many source files, this translation process can be quite time consuming. For the C and C++ programming languages, there exists a program named distcc, the Distributed C Compiler [6], to facilitate the parallel compilation of source into machine code. This thesis discusses the design and evaluation of a distributed Java compiler. The research indicates that distributed compilation of projects consisting of a large number of relatively large source files can see a speed improvement over the traditional implementation of javac.

iv

For Daddy

v

Contents 1 Introduction 2 Background 2.1 Introduction . . . . . . . . . . . . 2.2 The C Programming Language . 2.3 The Java Programming Language 2.4 distcc - Distributed C Compiler .

1

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

5 5 5 10 15

3 Implementation 3.1 Introduction . . . . . . . . . . . . . . . . 3.2 Building the Dependence Graph . . . . 3.3 Scheduling Algorithms . . . . . . . . . . 3.4 Compilation Techniques . . . . . . . . . 3.5 Parallel verses Distributed Compilation

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

18 18 21 25 29 30

4 Experimental Study 4.1 Introduction . . . . . . . . . . . . . . . 4.2 pjavac - Parallel Java Compiler . . . . 4.3 djavac - Distributed Java Compiler . . 4.4 cdjavac - C Distributed Java Compiler 4.5 Summary . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

43 43 47 53 54 57

. . . .

. . . .

. . . .

. . . . .

5 Summary

59

Bibliography

62

Vita

64

vi

List of Tables 4.1 4.2

Benchmark Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bottom-Up Scheduling with Multiprocessed Compiler and Varying Thread Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Bottom-Up Scheduling with Multithreaded Compiler and Varying Thread Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Speedup of Using a Multithreaded Compiler over a Multiprocessed Compiler 4.5 File Size Scheduling with Two Threads and Varying Minimum File Lengths 4.6 Speedup of Using a File Size Scheduling Algorithm with two threads over a Bottom-Up Scheduling Algorithm with two threads . . . . . . . . . . . . . . 4.7 File Size Scheduling with Four djavac Servers and Varying Minimum File Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Speedup of Using the Distributed Implementation over the Parallel Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Time Required to Access 10807 Files and Directories on both a local filesystem and a NFS filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 File Size Scheduling with Four cdjavacd Servers and Varying Minimum File Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 Speedup of cdjavac over djavac . . . . . . . . . . . . . . . . . . . . . . . . .

vii

45 48 49 50 51 52 54 55 55 56 57

List of Figures 1.1

Phases of a Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

2.1 2.2 2.3

C Compilation Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sample Make Dependences Graph . . . . . . . . . . . . . . . . . . . . . . . Java Source File Dependence . . . . . . . . . . . . . . . . . . . . . . . . . .

7 10 13

3.1 3.2 3.3 3.4 3.5 3.6

Compilation Sequence Diagram . . . Sample Java Interdependence Graph Sample Original Adjacency Matrix . Sample Transitive Closure Matrix . . djavac Network Environment . . . . cdjavac Network Environment . . . .

20 22 24 25 34 38

. . . . . .

viii

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

List of Listings 1.1 2.1 2.2 3.1 3.2 3.3 3.4 3.5

Interdependent Java Source Files . . . . . Sample Makefile . . . . . . . . . . . . . . Sample Use of distcc . . . . . . . . . . . . Sample Java Source Files . . . . . . . . . ServerMessage Class . . . . . . . . . . . makegen Generated Makefile . . . . . . . Sample hosts.conf File for cdjavac proxy Sample Use of cdjavac . . . . . . . . . . .

ix

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

4 9 16 23 35 39 40 42

Chapter 1

Introduction Compilers are computer programs that translate human-readable source code in some highlevel language to the machine code that is directly executed by a computer. Before compilers were developed, programmers had to develop software using assembly language or machine code. Programs written in this way were error prone, difficult to maintain, and could not easily be made to run on different hardware architectures. Compilers allow programmers to develop software in a high-level, machine independent language. A compiler is used to translate the software, called the source, to the machine language of the target architecture. The translation process of a compiler is composed of several steps. First, a lexical analyzer takes the source code as a stream of characters and generates a stream of tokens. The phase of the compiler that performs lexical analysis is also called the scanner. Tokens are basic, grammatically indivisible units of the programming language being parsed. Next, a parser takes this stream of tokens and does syntactic checking. Syntactic checking ensures that the stream of tokens corresponds to the grammar for the programming language. This is similar to ensuring that an English sentence is grammatically correct. Next, the compiler performs semantic checking. Semantic checking ensures that statements in the language 1

2

that may be syntactically correct make “sense.” For example in English, “The rock climbed the lasagna” is a grammatically correct statement; however, it does not make sense to the reader. Next, the compiler builds an internal data structure known as intermediate code that represents the structure of the program being compiled. The intermediate code is then optimized and the compiler traverses the resultant data structure and translates the structure into binary machine code that is executed directly by the computer hardware [2]. Figure 1.1 illustrates these phases of a compiler. The source code for programs may consist of many different files or translation units, each of which contain some portion of an entire program. The compiler processes each source file independently [9]. When dependences exist between translation units, the compiler must have a method to perform semantic checking. Listing 1.1 shows an example where one source file (A.java) depends on the other (B.java). On line five, a new instance of the class B is created. The compiler must know if such a class exists and if so, that it has a constructor that takes no parameters 1 . On line six, the method fubar is called on the object of type B. The compiler must know that class B has a method with this signature. The signature of a function or method is its return type, name, and parameter list. For example, the signature of method fubar in class B is {int, fubar, (int, int, int)}. The manner in which the compiler performs semantic checking has an impact on the ways in which compilation can be parallelized. If the compiler can perform semantic checking without any dependences on other files, all files can be compiled in any order and in parallel. However, if the compiler requires other source files or requires files generated from other source files, the compilation order is important. The C and C++ programming languages use header files to hold semantic information allowing source files to be compiled 1

If a class does not explicitly have any constructors, a default constructor that takes no parameters is supplied by the compilers. This line makes use of this default constructor.

3

Figure 1.1: Phases of a Compiler

in any order. Without the use of header files, as in the Java programming language, the order in which files can be compiled is restricted. This thesis discusses these issues, as well as the design and experimental evaluation of parallel and distributed Java compilation techniques. Chapter 2 provides background information related to this thesis. Section 2.2 explores the way inter-file semantic checking is performed for the C programming language and Section 2.3 describes the compilation of Java programs. A tool for compiling C translation units simultaneously is described in Section 2.4. Chapter 3 gives a description of the software developed to research distributed Java compilation. Section 3.2 describes the process of building a graph to represent inter-file dependences. Section 3.3 describes the different algorithms used for scheduling files for compilation. Section 3.4 describes the methods used to compile source files. Section 3.5 describes the implementations of a parallel and two distributed compilers and discusses the differences between them. Chapter 4 provides the results of experiments conducted using the software described in Chapter 3. Section 4.2 describes the experiments conducted using pjavac – the

4

Listing 1.1: Interdependent Java Source Files 1 2 3 4 5 6 7 8 9 10 11 12 13 14

// A. j a v a public c l a s s A { public void f o o ( ) { B b = new B ( ) ; int x = b . f u b a r ( 1 , 3 , 7 ) ; } } // B. j a v a public c l a s s B { public int f u b a r ( int x , int y , int z ) { return x + y + z ; } }

parallel Java compiler. Section 4.3 describes the experiments conducted using djavac – the distributed Java compiler. Section 4.4 describes the experiments conducted using cdjavac – the C distributed Java compiler. Finally, Chapter 5 provides a summary of this thesis and discusses possibilities for future work.

Chapter 2

Background 2.1

Introduction

When inter-translation unit dependences exist between files, it is necessary for the compiler to have a method to perform semantic checking. The methods for gathering this information are different for different programming languages. Section 2.2 describes the way inter-file semantic checking is performed for the C programming language. Section 2.3 describes the way semantic checking is performed for the Java programming language and how it is different from C. Finally, Section 2.4 describes the distcc tool that takes advantage of the C compilation semantics to distribute C source files to multiple machines for simultaneous compilation.

2.2

The C Programming Language

The C programming language is a procedural programming language developed by Dennis Ritchie at Bell Laboratories in the early 1970s [9]. C was designed to be a general-purpose programming language, but its precise control over the computer makes it especially well 5

6

suited for systems programming [9]. Shortly after the development of the C language, Dennis Ritchie and Ken Thompson used it to develop the UNIX operating system. Most application software for UNIX operating systems is also written in C [19]. Today C continues to be a favorite language for systems programmers. For example, the Linux operating system, an open-source UNIX clone created by Linus Torvalds and currently under development, is written mostly in C [3]. A program written in C consists of one or more logical units called functions. In order to compile a program, all functions used by the program must be compiled. Before a C compiler can perform semantic analysis, the compiler must know the signature of any function being called [9]. To communicate the signature of a function to the compiler, the function must either be declared or defined textually prior to the function call within a translation unit. Function declarations can be stored separately in “header files,” and textually included at the top of a translation unit before any executable code via the #include preprocessor directive. Preprocessing is a phase executed prior to compilation. The preprocessor replaces the #include statement with the contents of the file being included and generates a new file. This causes the function signatures in the header files to be placed at the top of the file being compiled. This new file is then translated into an assembly source file which is then translated by an assembler into an object file, a binary representation of the original source file. The phases of this process are outlined in Figure 2.1. In addition to the machine code equivalent of the source code, each object file contains a linker symbol table. Symbols are functions and variables that are defined in an object file. A symbol table identifies symbols that can be referenced outside that module, symbols that are defined outside that module but are referenced within the module, and symbols that can only be referenced within that module. An application called a linker is responsi-

7

Figure 2.1: C Compilation Stages

8

ble for resolving inter-module symbol table entries when generating the executable program [4]. The use of header files to store the declarations of functions that cross translation unit boundaries allows the compilation of source files to be independent of each other. If a subset of source files is modified, only that subset of files and files dependent upon them need to be recompiled for the project to be re-linked correctly. A separate utility known as make is often used to manage the dependence relationship between files in a C project [19]. make uses a description file, which is usually named makefile or Makefile, to represent the static hierarchy of dependences between files in a C program as a set of make rules. make uses these rules, source and object filenames, and last-modified times of the source and object files in determining which source files need to be recompiled and in what order to compile them [17]. Listing 2.1 is a makefile for a sample C program that does temperature conversions. The numbers in the left margin are not part of the actual file. They are shown here so that the explanation can refer to individual lines of the file conveniently. Each set of lines composes a make rule of the form:

:

make, if executed without specifying a target, builds the first target found in the makefile, in this case convert on line nine. Alternately, the command make convert can be used whether convert is the first rule in the file or not. The make rule on line nine indicates that the file convert is dependent upon files convert.o, ctof.o, and ftoc.o. make then searches for rules to build each of those files (found on lines 12, 15, and 18 respectively). This process is repeated until no target matching the dependences is listed. Next make compares the file modification time of the

9

Listing 2.1: Sample Makefile 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

# # makefile # # Sample m a k e f i l e f o r t h e t e m p e r a t u r e c o n v e r s i o n program # CC = gcc convert : convert . o ctof . o ftoc . o $ (CC) c o n v e r t . o c t o f . o f t o c . o −o c o n v e r t convert . o : convert . c $ (CC) − c c o n v e r t . c −o c o n v e r t . o ctof . o : ctof . c ctof .h $ (CC) − c c t o f . c −o c t o f . o ftoc . o : ftoc . c ftoc .h $ (CC) − c f t o c . c −o f t o c . o

target file against all the dependent files. If any dependent file is newer than the target file, the command associated with the rule is executed to update the target file. This process of modification time comparison and conditional update is repeated until all files that need to be updated for the project to be up-to-date are compiled. Figure 2.2 illustrates the dependences between the files in this sample project. Assume the project has been built once, but the file ftoc.c has since been modified. The modification time for ftoc.c is later than the modification time for ftoc.o. The rule on line 15 would cause the command on line 16 to be executed which will generate a new version of ftoc.o with a modification time matching the time it was created by the compiler. This means that the modification time for ftoc.o will be later than the modification time for convert, so the rule on line 6 will cause the command on line 7 to be executed. Line 7 re-links the application and generates a new executable named convert.

10

Figure 2.2: Sample Make Dependences Graph

2.3

The Java Programming Language

The Java programming language is an object-oriented programming language based on C++ developed by Sun Microsystems in the mid-1990s. Java was initially designed to program smart consumer devices like cable TV boxes. When no high demand was found for such devices, it was decided that the language could be used in more general ways [11]. Java was designed to be a simple, object-oriented, network-savvy, robust, secure, architecture neutral, portable, interpreted, high performance, multithreaded, and dynamic programming language [13]. For these reasons, Java is one of the most widely used programming languages today. A program written in Java consists of one or more classes. In order to compile a program, all classes used by a program must be compiled. Before a Java compiler can perform semantic analysis, the compiler must load all classes upon which the current file is dependent. To facilitate inter-file dependence checking, Java compilers have requirements

11

on filename and file locations. These requirements are summarized as follows: 1. The name of a file must match exactly the name of the public class in that file and there can be only one publicly accessible class in a file. For example, a public class Fubar must be defined in a file named Fubar.java. 2. If a source file is in a package, the directory hierarchy the file is contained in, relative to some root source directory, must match the package hierarchy. For example, a public class Fubar in package edu.appstate.cs must be located in the file edu/appstate/cs/Fubar.java. If a package is not explicitly declared, a file is assigned a “default package” which consists of the current directory. 3. The compiler generates a separate class file whose filename consists of the class name and ends in .class, for every class in the program. For example, a class Fubar would be compiled to a file Fubar.class. If a single file contains multiple classes (only one of which can be public) or any class has classes defined internally (inner classes), a .class file is generated for each class. 4. If a compiled class is in a package, the directory hierarchy in which the .class file is contained, relative to some root class file directory, must match the package hierarchy. For example, if a class Fubar in package edu.appstate.cs was compiled, the resulting Fubar.class file would be stored in the directory edu/appstate/cs/. This directory may or may not be the same as the root source directory. When compiling a source file, a Java compiler needs information about class types it does not yet recognize. A Java compiler needs type information for every class or interface used, extended, or implemented in the source file, including classes and interfaces not explicitly mentioned in the source file but which provide information through inheritance.

12

When a Java compiler needs this information, it searches for a source file or class file which defines the type [14]. This search involves several potential sources of the necessary information. One source is known as the bootstrap classes – library classes that are shipped with the compiler such as java.lang.Object. Another source is the classpath. The classpath consists of a set of directories where existing compiled classes may reside. The final source is the sourcepath. The sourcepath consists of a set of directories where source files may reside. If a sourcepath is not explicitly set, it defaults to the same set of directories as the classpath. A type search may produce a class file, a source file, both, or neither. If the type search produces only a class file, that file is loaded and semantic checking is performed using the information contained in that file. An example of such a case is the use of a third-party library class. If the type search produces a source file but no class file, the source file is scanned, parsed, and scheduled for code generation. An example of this case is a source file in a project that is being compiled for the first time. If the type search produces both a source file and a class file, the compiler determines if the class file is out of date by comparing the file timestamps of the two files. If the modification time of the class file is older than that of the source file, the source file is scanned, parsed, and scheduled for code generation. If the modification time on the class file is later than the source file, that class file is loaded. An example of this case is the recompilation of a project that has been previously compiled. Finally, if neither a class file nor a source file are found, semantic checking cannot be performed and an error message is generated [14]. As each class is parsed, the compiler identifies class types using the grammar of the language. If the compiler does not yet recognize a type, that type is added to a list of yet to be processed types. Once a file is processed, the compiler performs a type search for the next type in the list. Once all types have been resolved, semantic checking can

13

Figure 2.3: Java Source File Dependence

be performed for each file being compiled in the same order the files were parsed. If this checking is successful, the bytecode for that class is generated; otherwise, an error message is generated. A sample compile sequence is shown in Figure 2.3. Each node A through D in the graph represents classes A through D. The source for class A resides in the file A.java, the source for class B resides in the file B.java, etc. An arrow from node x to node y in the graph represents class x’s dependence on class y (and thereby, file x.java’s dependence on file y.class or y.java). Assuming that no .class files exist for these classes, the following is the sequence of events that occur when compiling A.java 1 .

1. Programmer invokes the compiler to compile A.java

2. The source for class A is scanned and parsed by the compiler.

3. Class A does not explicitly extend a class, so it implicitly extends java.lang.Object. 1

This behavior can be confirmed by using the -verbose command-line option with javac [14].

14

The compiler does not yet recognize class java.lang.Object. The class file for that class is found in the bootstrap classes and is loaded. 4. Class A depends on class B, but the compiler does not yet recognize class B. The source for class B is scanned and parsed by the compiler. 5. Class B depends on class D, but the compiler does not yet recognize class D. The source for class D is scanned and parsed by the compiler. 6. Class A depends on class C, but the compiler does not yet recognize class C. The source for class C is scanned and parsed by the compiler. 7. All types have been identified. Semantic checking is performed on class A. If semantic checking is successful, the bytecode for this class is written to A.class. 8. Semantic checking is performed on class B. If semantic checking is successful, the bytecode for this class is written to B.class. 9. Semantic checking is performed on class D. If semantic checking is successful, the bytecode for this class is written to D.class. 10. Semantic checking is performed on class C. If semantic checking is successful, the bytecode for this class is written to C.class. 11. All necessary classes have been successfully processed. Compilation is successful.

A Java compiler’s dependence on other files at compile time are quite different from that of a C compiler. With the C programming language, through the use of header files, all translation units are independent of one another and can be compiled in parallel easily. With Java, however, every file being compiled has a dependence upon the source files or

15

class files that contain symbols it uses. Java has no concept of “header files” to share intertranslation unit semantic information. The compiler must have direct access to the source files or class files containing the needed symbols.

2.4

distcc - Distributed C Compiler

distcc is a program used to distribute the compilation of a C, C++, Objective C, or Objective C++ project across machines on a local area network [6]. It is organized into two applications: distcc – the client application, and distccd – the server application. The server application, distccd, is a daemon process that runs on each of the distributed compilation servers. Its primary responsibility is to listen for compile requests from distcc clients. Once a connection is established from a client, distccd extracts the name of the compiler to use, compiler options, and source file from the network stream. The source file is uncompressed, if necessary, and written to disk. Then distccd invokes the compiler on that file with the specified options. If the compilation is successful, the resulting object file is then optionally compressed and returned to the client via the network connection. The client application, distcc, is executed on the local machine in place of the normal compiler. Its first parameter is the name of the compiler to be used on the server side for compilation. Its next parameters specify the options to be passed to the compiler as well as the name of the file to compile. When distcc is started, it checks the environment variable DISTCC HOSTS for a list of available distccd servers. It communicates with the servers in this list to identify a server that can compile the file. Once a server is found, the source file is preprocessed to include any necessary header files and macro definitions. The resultant source file is then optionally compressed and sent to the server along with the compilation technique to use and the compiler options. The client application then waits for a response

16

Listing 2.2: Sample Use of distcc 1 DISTCC HOSTS=” r i c h roan g r a n d f a t h e r howard” 2 e x p o r t DISTCC HOSTS 3 CC=” d i s t c c gcc ” make − j 8

from the distccd server. If the compilation was successful, the response will include the resultant object file which distcc will write to disk. If the compilation was unsuccessful, the response will include any error messages generated by the compiler. One of the primary goals when designing distcc was ease of integration into existing projects. As described in Section 2.2, dependency checking for most C source projects is managed by an application called make. distcc gets its parallelism from a version of make known as gnu make which supports executing multiple rules in parallel. Listing 2.2 shows a sample of how to integrate distcc into an existing C project (using a Bourne-equivalent shell). Lines one and two show the environment variable DISTCC HOSTS being set and exported to the environment. This variable contains a list of hosts running distccd. Alternately this could include IP addresses if no local name resolution is available. Line three shows the invocation of gnu make. First, a temporary environment variable 2 named CC is created and set to ‘‘distcc gcc’’. CC is used to specify the C compiler used by the make utility and overrides any assignments to CC in the makefile. This allows the user to change the compiler used by the make dependency file without any modification to the file itself. Next make is executed with the -j8 option. This instructs gnu make to attempt to execute eight jobs in parallel [20]. This value is normally chosen to be twice the number of processors available in the distributed environment. 2

By defining this variable on the same line on which make is invoked, this variable will only be available to the environment in which make is being executed [18].

17

When make begins execution, it attempts to find up to eight jobs that are independent of each other. These jobs are all started in parallel. The compile rule executes the C compiler specified by the CC macro (which the user set to be ‘‘distcc gcc’’). This invokes the distcc client application as described above for each of the eight jobs. As servers compile and return object files to the client, the distcc client terminates and make attempts to start new jobs to take its place. If the distcc configuration in Listing 2.2 was used with the makefile in Listing 2.1 to compile the project from scratch, convert.c, ctof.c, and ftoc.c would be compiled in parallel, each on one of the machines listed in DISTCC HOSTS. Upon successful compilation of these three files, the local machine would link the resultant object files into the convert executable.

Chapter 3

Implementation

3.1

Introduction

Parallelizing the compilation of Java source files requires implementing the software to schedule and compile individual files or groups of files in a project. The implementations described in this chapter are written in Java to take advantage of the compilation tools already available with the Java Software Development Kit. File scheduling involves determining the inter-compilation unit dependences within a project and finding groups of files that can be compiled in parallel. Compilation translates the Java source code to bytecode that is executed by the Java Virtual Machine. The three implementations described in this chapter fit into a common framework. The remainder of this section describes this framework. Upon startup, the software described in this chapter scans and parses the requested source files using a Java 2 version 1.5 compliant open-source parser generator called CUP[12] to identify the class types used in the files. These types are added to a list of types to be searched for, similar to the behavior of Java compilers described in Section 2.3. 18

19

Unlike other Java compilers, however, the compilers described in this chapter, as prototype implementations, do not search for existing .class files. Instead, it is assumed that the source code is being compiled for the first time and thus no .class files exist for user code. Once all source files corresponding to class types are identified, an adjacency matrix is constructed to represent the static inter-file dependences in the program being compiled. Next, a scheduler is created to determine the compilation order of the files in the program. This scheduler interacts with a user-specified number of compiler threads that are responsible for compiling files in the order the scheduler determines. A thread is a lightweight process that has its own stack and execution content [1]. Each of these threads requests a file to compile from the scheduler. If the scheduler has an available compilation candidate, it will return it to the requesting compiler thread. However, if the scheduler does not have an available compilation candidate, it will cause the requesting compiler thread to go into a “wait state,” waiting for an available candidate. When a compiler thread successfully compiles a file, it notifies the scheduler. Upon notification, the scheduler updates the adjacency matrix and wakes up any compiler threads currently in a “wait state.” Each compiler thread then attempts to get a candidate from the scheduler. This process is repeated until all files in the project have been compiled. Figure 3.1 shows a UML Sequence Diagram representing this process. Each rectangle represents an object in the program. Time progresses from the top of the diagram down. First, the Scheduler invokes a method to get a set of files from the AdjacencyMatrix. The AdjacencyMatrix returns this list of files to the Scheduler. Next, the CompileThread gets the next file to be compiled from the Scheduler, compiles it, and notifies the Scheduler of the successful compilation of that file. The Scheduler then updates the AdjacencyMatrix. This process is repeated until all files have been compiled.

20

Figure 3.1: Compilation Sequence Diagram

21

The remainder of this chapter provides more detail about the compilation strategies. Section 3.2 describes the process of identifying class types within source files and building the dependence graph using that information. Section 3.3 describes the different algorithms used for scheduling files for compilation. Section 3.4 describes the methods used to compile the source files. Finally, Section 3.5 describes the three compiler implementations and their differences.

3.2

Building the Dependence Graph

In order to identify dependences within a source file, the file must be parsed and a list of the types used must be identified. A tree can be built using package and import statements along with the sourcepath to identify what types are semantically available within a program and where the dependent source files are located on disk. If a file corresponding to a type cannot be found, it can be assumed that the class is either part of the standard Java Application Programming Interface (API) or is available in some other library included in the classpath. Once this information is collected for all files in a project, a graph can be generated to represent the static interdependences between files. Listing 3.1 shows an example of six Java source files with interdependences. Figure 3.2 is a graphical representation of the interdependences between the files in this listing. A dependence is indicated by an arrow originating at one node in the graph and terminating at another node in the graph. An arrow from node x to node y indicates that the file corresponding to node y must be compiled prior to the compilation of the file corresponding to node x. The root node of the graph is the node that has no other nodes depending upon it. Node A is the root of the graph. A parent node in a graph is a node that has a dependence on another node. We say that node A is the parent of nodes B, C, G, and I; node B is the

22

Figure 3.2: Sample Java Interdependence Graph

parent of nodes D and E, etc. Inversely, a child node in a graph is a node that is depended upon by another node, the parent. Nodes B, C, G, and I are the children of node A. If a node is depended upon by more than one parent, we say that node has multiple parents. Node E is an example of a node with multiple parents. A leaf node in a graph is a node that has no children. Nodes D, E, F, and H are leaf nodes. A node x is a descendant of another node y if there exists a path from the root of the graph, to x through y. Two nodes x and y are independent if there does not exist a node z where z is a descendant of both x and y. Nodes D and E are independent. A cycle in the graph is identified by a path in the graph that returns to a node that has already been traversed. Nodes I, J, and K form a cycle. Node I can be considered to be its own great-great grandchild.

An adjacency matrix can be built to represent the direct inter-compilation unit dependences [5]. An adjacency matrix is defined as the graph G = (V, E), where V is the set of nodes in the graph and E = {(i, j) : there is an edge from vertex i to vertex j in

23

Listing 3.1: Sample Java Source Files 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

// A. j a v a public c l a s s A { B b ; C c ; G g ; I i ; } // B. j a v a public c l a s s B { D d ; E e ; } // C. j a v a public c l a s s C { E e ; F f ; } // D. j a v a public c l a s s D { } // E . j a v a public c l a s s E { } // F . j a v a public c l a s s F { } // G. j a v a public c l a s s G { H h ; } // H. j a v a public c l a s s H { } // I . j a v a public c l a s s I { J j ; } // J . j a v a public c l a s s J { K k ; } // K. j a v a public c l a s s K { I i ; }

24

Figure 3.3: Sample Original Adjacency Matrix

G}. This is represented as an n x n matrix, where n is the number of files in the source program. The entry at (i, j) in the matrix is 1 if there is an edge from vertex i to vertex j in the graph, and 0 otherwise. Other values may be used to represent intermediate states. Two adjacency matrices are maintained for each project, the original matrix representing the true dependences between files and a non-cyclic matrix where any cycles that exist in the original graph are eliminated by removing the edge causing the cycle. This is done to prevent deadlock while scheduling files for compilation. Figure 3.3 shows the original adjacency matrix corresponding to the source code in Listing 3.1. The non-cyclic matrix is almost identical, the only difference being the removal of the dependences of K.java on I.java. The process of detecting dependences between files and building the adjacency matrix to represent these dependences is called external dependence analysis. In some cases, it is necessary to know both the direct and indirect external dependences in a program. In these cases, a transitive closure operation can be applied to the

25

Figure 3.4: Sample Transitive Closure Matrix

existing original adjacency matrix. The transitive closure of a graph G is defined as the graph G0 = (V, E 0 ), where E 0 = {(i, j) : there is a path from vertex i to vertex j in G} [5]. This is represented as an n x n matrix, similar to an adjacency matrix. The entry at (i, j) in the matrix is 1 if there is an path from vertex i to vertex j in the graph, and 0 otherwise. Figure 3.4 shows the resultant matrix after the transitive closure operation is performed on the original adjacency matrix in Figure 3.3.

3.3

Scheduling Algorithms

Scheduling involves determining the order in which files in a program are compiled and what files can be compiled simultaneously. The goal of scheduling is to minimize the total compilation time of the program being compiled. Scheduling decisions are based on the non-cyclic adjacency matrix while the original adjacency matrix is used to determine which files are compiled. Bottom-Up Scheduling schedules files corresponding to leaf nodes giving

26

equal weight to all files. File Size Scheduling takes into account the fact that larger files take more time to compile than smaller files. The following sections describe these scheduling algorithms in detail.

Bottom-Up Scheduling A bottom-up scheduling approach continuously schedules files corresponding to leaf nodes in the non-cyclic adjacency matrix. Once a file is scheduled for compilation, it is marked as processed within both the original and non-cyclic adjacency matrices by setting the entire row corresponding to that file to a sentinel value. This prevents the same file from being compiled multiple times. Once compilation of a file is complete, any dependences on that file are removed (the node corresponding to that file is no longer considered to be the child of any of its parents). The process of removing dependences creates new leaf nodes (once all the children of a parent node have been processed) in the graph. The files corresponding to these new leaf nodes can then be scheduled for compilation. Once all files are compiled, all files in the adjacency matrix are marked as processed and the compilation of the project is complete. For example, the program in Listing 3.1 would be compiled as follows. First, all files corresponding to leaf nodes in the non-cyclic adjacency matrix are scheduled for compilation, here files D.java, E.java, F.java, H.java, and K.java 1. All of these files can be compiled in parallel. Assume that the compilation of E.java completes first. At this time, B.java and C.java’s dependence upon E.java is removed and the scheduler searches for a new leaf node; however, none exist and that compilation thread waits for a newly available leaf node. Next, assume the compilation of file H.java completes. File G.java’s dependence upon 1

K.java it not actually a leaf node, but is considered to be one after cycle elimination.

27

H.java is removed and G.java becomes a new leaf node and is scheduled for compilation. Next, the compilation of files D.java and F.java complete making files B.java and C.java leaf nodes available for compilation. Now assume the compilation of K.java completes. Because K.java is in a cycle, the entire cycle has been compiled (K.java, I.java, and J.java). All the files that depended upon K.java are marked as complete in the non-cyclic adjacency matrix by using the original adjacency matrix. When the compilation of files B.java, C.java, and G.java is complete, all that remains to be compiled is file A.java. Finally, A.java is compiled and the compilation of the program is successful.

File Size Scheduling Bottom-up scheduling gives equal weight to all files and invokes the compiler for every file in the project. Since the compilation of a Java source file will automatically compile any files the file being compiled is dependent upon, it is not necessary to explicitly compile every file in the project. If groups of independent files are identified, the groups can be compiled in parallel with less compiler invocation overhead. It may take less time to compile n files as a group than it does to compile n files individually. The file size scheduler was developed to take advantage of this behavior. The file size scheduler identifies groups of independent files whose total file length meets some user-specified threshold and schedules those groups for compilation in parallel. Instead of invoking the compiler explicitly for each file in the program, the compiler is invoked once per “group” of files. In order to isolate independent groups, some additional processing is required. First, all leaf files with multiple parents are scheduled for compilation. This is done to allow the parents to form the roots of independent groups of files. An example of such a file is E.java in listing 3.1 and the corresponding graph node E in

28

Figure 3.2. With the elimination of E.java, files B.java and C.java become the roots of independent groups of files {B, D} and {C, F } respectfully. Once all leaf files with multiple parents are compiled, the scheduler begins building independent groups of files using the non-cyclic adjacency matrix, starting with leaf files. If the cumulative length of the file exceeds the threshold value, the file is scheduled for compilation. The cumulative length of a file, X.java, is defined to be the length of the current file plus the lengths of any currently uncompiled files upon which X.java is dependent. The transitive closure matrix is used to gather this information. Compilation of a file is deferred if its cumulative length does not meet the file size threshold. However, if the compilation of a file is deferred, it is marked as processed in the non-cyclic adjacency matrix in order to allow the dependent source files to be considered for compilation. This process is repeated until all source files have been compiled. For example, assuming all files are 500 bytes in length and the threshold length is 1000 bytes, the program in Listing 3.1 would be compiled as follows. First, only E.java would be scheduled for compilation since it is a leaf file with multiple parents. Once the compilation of E.java is successful, there are no more leaf files with multiple parents. Next, the scheduler checks all leaf files (D.java, F.java, H.java, and K.java) for files whose length meets the threshold length. K.java meets this limit because its cumulative length is 1500 due to the K.java, I.java, J.java cycle and is scheduled for compilation. Files D.java, F.java, and H.java do not, however, meet the threshold length so they are not scheduled for compilation, but are marked as completed. Marking these files as completed removes their parent’s dependence upon those files creating new leaf nodes (B.java, C.java, and G.java). Again, the scheduler checks all leaf files for files whose length meets the threshold length. B.java, C.java, and G.java meet the threshold limit because each of

29

their cumulative length is 1000 bytes and they are scheduled for compilation. At this point, the files B.java, C.java, G.java, and K.java can be compiled in parallel. The compilation of each of these files compiles the entire group rooted at that file. Once each of these files is successfully compiled, the only file remaining to be compiled is A.java. Finally, A.java is compiled and the compilation of the program is successful.

3.4

Compilation Techniques

Since the goal of this research focused on the parallelization of the compilation of Java source files, it was not necessary to re-invent a Java compiler. Instead, existing compilers are invoked in parallel similar to the method of distcc. A general interface to a compiler is defined so that any available compiler implementation can be substituted.

Multiprocessed Compilation

The initial compilation method chosen was to execute javac [14] outside of the currently running Java Virtual Machine. Each scheduled file is compiled via a call to the Runtime.getRuntime().exec() method. A thread within the running Java Virtual Machine calls this method passing the file path to the local installation of javac. Any output to either standard output or standard error is read from streams provided by the returned Process object and displayed to the user’s console. The status of the exec’d process is monitored to check for compile-time errors returned from javac. If the given file is successfully compiled, the scheduler is consulted for another file to compile. This process is repeated until all files are compiled.

30

Multithreaded Compilation The execution of a javac process outside of the currently executing Java Virtual Machine is time consuming. Also, since javac is itself a Java program, each invocation of it results in the startup of a new Java Virtual Machine (which is itself very time consuming). With these facts in mind, it was decided that a compiler that would execute inside the context of the currently running Java Virtual Machine was needed. All that was necessary was to include the appropriate Java Archive (tools.jar) in the classpath and execute the com.sun.tools.javac.Main.compile() method directly. Any output that the method generated would get written to the user’s console directly. If an error occurred during compilation, the method would return non-zero. This return value is monitored to check for compile-time errors. If the given file is successfully compiled, the scheduler is consulted for another file to compile. This process is repeated until all files are compiled.

3.5

Parallel verses Distributed Compilation

Three experimental compilers were implemented using the software described in this chapter, one parallel and two distributed. The parallel compiler, called pjavac, is a Java program that compiles files in parallel via threads on a single machine with multiple processors. The two distributed compilers use a shared Network File System (NFS) between multiple hosts to resolve compile-time file dependences. NFS provides a transparent mechanism to share a common filesystem across a network [22]. The first distributed compiler is a client/server application. The server application, djavacd, is a Java program that runs on a remote server. This application listens on a TCP port for incoming compilation requests, compiles the file, and returns the status to

31

the requesting client. The client application, called djavac, is a Java program that runs on the local workstation. This application performs external dependence analysis, if necessary, and distributes files for compilation to remote djavacd servers. The second distributed compiler consists of four applications in conjunction with the make utility. The first application, called makegen, is a Java program that runs on the local workstation. This application performs external dependence analysis and generates a makefile representing the static inter-file dependences in the program. The second application, called cdjavacd is a Java program, similar to the above mentioned djavacd, that runs on a remote server. This application listens on a TCP port for incoming compile requests, compiles the file, and returns the status to the requesting client. The third application, called cdjavac proxy, is a C program that runs on the local workstation. This application serves as a proxy between multiple clients and the available servers. Finally, the fourth application, called cdjavac, is a C program that runs on the local workstation. Multiple instances of this application are invoked from the makefile to compile single source files. This program communicates with cdjavac proxy via UNIX domain sockets. The following subsections describe the parallel compiler and the two distributed compiler implementations in detail.

pjavac – Parallel Java Compiler pjavac is a multithreaded application that runs on a single machine with multiple processors. The user controls the behavior of this compiler via command line arguments. These command line options include -scheduler, -method, -threadCount, and -minFileLength, each of which require a single argument. The -scheduler option allows the user to choose which of the scheduling algorithms described in Section 3.3 will be used during compilation.

32

The available arguments to this option are BottomUpScheduler and FileSizeScheduler. The -method option allows the user to choose which of the compilation techniques described in Section 3.4 will be used. The available arguments to this option are external for multiprocessed compilation and internal for multithreaded compilation. The -threadCount option allows the user to specify the number of simultaneous compilations that should be attempted at a time. The argument to this option is a positive integer value greater than or equal to one. Finally, the -minFileLength option allows the user to specify the file size threshold in bytes used in conjunction with the file size scheduling algorithm. The argument to this option is a positive integer value greater than or equal to one.

Upon startup, this application checks for the adj.dat file in the current working directory that contains the static inter-file dependence information for the project being compiled. If a file containing this information is found, the inter-file dependence information is loaded from the file; otherwise, the compiler must parse all source files associated with the project, as described in Section 3.2, to obtain this information and build the adjacency matrix.

Next, an instance of the the user-specified scheduler class is created to schedule files for compilation. The user-specified number of compilation threads are then created. Each of these threads receives the name of a file to compile from the scheduler and executes the user-specified compilation technique to compile the file. If the compilation is successful, the thread notifies the scheduler and requests a new file to compile. If the scheduler determines that there are currently no files eligible for compilation, the compilation thread will wait for eligible files. Once all files have been compiled, all compilation threads terminate and the compiler exits.

33

djavac – Distributed Java Compiler pjavac runs on a single machine and the parallelism available is restricted by the number of processors on that machine. A distributed implementation removes this restriction by allowing multiple hosts on a network to be involved in the compilation. djavac is a client/server application written in Java. The server application runs on remote servers while the client application runs on the local workstation. Figure 3.5 shows a diagram of the network environment in which djavac is executed. djavacd must be started prior to the execution of djavac. djavacd accepts the -method command line option to allow the user to specify the compilation method (external or internal) to be used by the server and ignores any other options. djavacd and djavac use serialized objects as their application layer protocol. djavacd creates a ServerSocket to accept incoming compilation requests from a client. The ServerSocket class provides a platform independent way to create a listening TCP socket on a specified port. Upon receipt of a request, djavacd attempts to read a serialized ServerMessage object from the Socket’s input stream. Listing 3.2 shows the class type of this object. The ServerMessage object contains two pieces of information that are of use to djavacd, the fileToCompile and the compilerOptions. The fileToCompile represents the file the client wishes this instance of djavacd to compile. The compilerOptions is a string corresponding to all command line options that should be passed onto the compiler this server is using. Such options might include the classpath and the sourcepath. djavacd then uses this information and the compilation method specified to compile the requested file. Upon completion, the exit status of the compiler is copied into the ServerMessage object and that object is written to the output stream of the Socket. djavac is executed by the user from the local workstation. This application accepts

34

Figure 3.5: djavac Network Environment

35

Listing 3.2: ServerMessage Class 1 public c l a s s ServerMessage implements S e r i a l i z a b l e { 2 private F i l e fileToCompile ; 3 private S t r i n g c o m p i l e r O p t i o n s ; 4 private int status ; 5 6 public ServerMessage ( F i l e fileToCompile , 7 String compilerOptions ) { 8 this . fileToCompile = fileToCompile ; 9 this . compilerOptions = compilerOptions ; 10 } 11 12 public F i l e g e t F i l e T o C o m p i l e ( ) { 13 return f i l e T o C o m p i l e ; 14 } 15 16 public S t r i n g g e t C o m p i l e r O p t i o n s ( ) { 17 return c o m p i l e r O p t i o n s ; 18 } 19 20 public void s e t S t a t u s ( int s t a t u s ) { 21 this . s t a t u s = s t a t u s ; 22 } 23 24 public int g e t S t a t u s ( ) { 25 return s t a t u s ; 26 } 27 }

36

the -scheduler command line option as well as the -minFileLength command line option when a FileSizeScheduler is used. Upon startup, this application, like pjavac, attempts to read static inter-file dependence information from the file adj.dat, and, if it is not available, parses the source files in the program to build the adjacency matrix. djavac next creates an instance of the user-specified scheduler class to schedule files for compilation. After the scheduler has been created, djavac reads the file named servers.conf which contains a whitespace delimited list of machines running djavacd. A thread is created for each host that is responsible for communication with a single remote server. Each of these threads creates a TCP Socket connection to the remote machine with which it is associated, receives the name of a file from the scheduler, and creates a ServerMessage object containing the information needed to compile the file. This object is then written to the output stream associated with the Socket. The thread then blocks waiting for a response containing a serialized ServerMessage object from djavacd. If the status member in this object is 0, compilation of the file was successful and the thread requests another file from the scheduler. If the status member is non-zero, an error occurred during compilation of the file. In this case, an error message is printed to the user’s console. This process is repeated until an error occurs or until all files in the program have been compiled.

cdjavac – C Distributed Java Compiler djavac, being a Java application, is up to a factor of ten times slower than applications written in languages such as C that are compiled for the architecture of the host machine [10]. cdjavac is a client/server application written using a mixture of both C and Java. The server, written in Java, runs on remote servers awaiting compile requests. The client

37

applications, written in C, send compile requests to the server via a proxy also written in C. Figure 3.6 shows a diagram of the network environment in which cdjavac is executed. Prior to starting the distributed compilation using this compiler, makegen must be used to build the makefile that will hold the static inter-file dependences in the program to be compiled. This application accepts the -scheduler command line option as well as the -minFileLength command line option when a FileSizeScheduler is used. Upon startup, makegen, like pjavac, attempts to read the static inter-file dependence information from a file, and if not available, parses the source files in the program to build the adjacency matrix. Next, makegen translates the inter-file dependence information contained in the adjacency matrix into the makefile. A make target is generated for each file returned by the scheduler. Listing 3.3 shows a makefile 2 generated for the source represented by Figure 2.3. After the makefile has been generated, the program is ready to be compiled.

Once the makefile has been generated, the applications used by this compiler must be started in the appropriate order. cdjavacd must be started prior to the execution of cdjavac proxy and cdjavac. cdjavacd accepts the -method command line option to allow the user to specify the compilation method (internal or external) to be used by the server and ignores any other options. It then blocks, waiting for incoming TCP connections from the cdjavac proxy. cdjavac proxy is an intermediate application written in C. It is responsible for receiving connections from multiple clients and forwarding those messages to cdjavacd servers. This is necessary as any given instance of cdjavac cannot be aware of which cdjavacd servers are currently processing a request from one of its peers and which are free for use. Upon 2

The generated makefile has been slightly modified for readability by splitting long lines using the \ escape character on lines 10, 14, and 17.

38

Figure 3.6: cdjavac Network Environment

39

Listing 3.3: makegen Generated Makefile 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

# # m a k e f i l e . gen # # V a r i a b l e s −− modify as n e c e s s a r y JAVAC = javac CLASSES = ./ s r c / j a v a / example SOURCEPATH = . / s r c / j a v a / example CLASSPATH = ./ s r c / j a v a / example JAVAC OPTS = − s o u r c e p a t h $ (SOURCEPATH) − c l a s s p a t h $ (CLASSPATH) \ −d $ (CLASSES) − d e p r e c a t i o n # End o f V a r i a b l e s a l l : $ (CLASSES)/A. c l a s s $ (CLASSES)/B . c l a s s $ (CLASSES)/C. c l a s s \ $ (CLASSES)/D. c l a s s $ (CLASSES)/A. c l a s s : $ (SOURCEPATH)/A. j a v a $ (CLASSES)/B . c l a s s \ $ (CLASSES)/C. c l a s s $ (JAVAC) $ (JAVAC OPTS) $ (SOURCEPATH)/A. j a v a $ (CLASSES)/B . c l a s s : $ (SOURCEPATH)/B . j a v a $ (CLASSES)/D. c l a s s $ (JAVAC) $ (JAVAC OPTS) $ (SOURCEPATH)/B . j a v a $ (CLASSES)/C. c l a s s : $ (SOURCEPATH)/C. j a v a $ (JAVAC) $ (JAVAC OPTS) $ (SOURCEPATH)/C. j a v a $ (CLASSES)/D. c l a s s : $ (SOURCEPATH)/D. j a v a $ (JAVAC) $ (JAVAC OPTS) $ (SOURCEPATH)/D. j a v a

40

Listing 3.4: Sample hosts.conf File for cdjavac proxy 1 3 2 192.168.1.100 24482 3 192.168.1.200 24482 4 192.168.1.300 24482

startup, cdjavac proxy reads a text file named hosts.conf in the current working directory. The first line in this file contains a single integer. This integer represents the number of host entries found in the file. All remaining lines consist of white space delimited pairs of IP addresses and port numbers. Listing 3.4 shows an example of this file. cdjavac proxy reads the hosts.conf file and creates a new thread for each IP/port pair found in the file. Each server thread created is responsible for communication with one server. A UNIX domain socket is created by the main thread of cdjavac proxy for client/proxy communication. Unix domain sockets were chosen because they are often twice as fast as TCP sockets when both peers are on the same host [23]. Next the main thread of cdjavac proxy enters a loop where it accepts incoming requests from cdjavac clients. The accept() function call made by the main thread returns the descriptor associated with the new socket connection. A descriptor is a small positive integer that the UNIX kernel uses to uniquely identify open files and socket connections [21, 23]. This descriptor is then inserted into a queue of yet-to-be-processed connection descriptors. The server threads dequeue descriptors from this queue and process the requests associated with them. A server thread reads data from the cdjavac client via this descriptor and forwards that data to the cdjavacd server with which it is associated. It then waits for a response from the cdjavacd server and forwards that response back to the client. cdjavac proxy is not aware of the client/server application protocol, it merely forwards

41

whatever data it receives. The descriptor queue is shared among multiple threads in the proxy running in parallel and operations on it are not atomic. A mutex is required to prevent the threads from modifying the queue at the same time and potentially corrupting it. A mutex is an object that allows multiple threads to synchronize access to a shared resource [24, 25, 1]. When a thread is about to access the queue, it attempts to lock a mutex guarding it. If that mutex is not already locked by another thread, the lock is successful and the thread is able to access the queue. When that thread has finished its interaction with the queue, it unlocks the mutex. If the mutex is locked when a thread attempts to lock it, that thread blocks until the mutex is unlocked. If a server thread attempts to read a descriptor from the queue and the queue is currently empty, the thread waits on a condition variable for notification that the queue is no longer empty. Condition variables allow threads to release their lock on a mutex and block until some condition has been met [24, 25, 1]. When another thread modifies the state of the queue, that thread performs a signal on the condition variable which “wakes up” the threads waiting on that condition variable. These threads again attempt to lock the mutex and read a descriptor from the queue. This is repeated until the thread successfully obtains a descriptor to process. If compilation of the project is complete, each thread remains in a blocked state until another project is compiled. Once cdjavacd servers and the cdjavac proxy are running, the user can compile the program. This is done in a way similar to the distcc application described in Section 2.4. The makefile generated by makegen is used in conjunction with cdjavac to execute multiple cdjavacs in parallel. Listing 3.5 shows a sample of how to execute cdjavac in parallel from the generated makefile makefile.gen using gnu make.

42

Listing 3.5: Sample Use of cdjavac 1 JAVAC=” c d j a v a c ” make − j 6 − f m a k e f i l e . gen

cdjavac is passed all the command line options that would be passed to the traditional compiler. Upon startup, it establishes a connection to the cdjavac proxy via the UNIX domain socket. cdjavac then takes all the command line options it received and the name of the file being compiled and creates a single null-terminated ASCII string. cdjavac sends this string to the cdjavac proxy via the domain socket connection. Then, cdjavac blocks waiting for a response from the cdjavac proxy. The response received consists of a single character. If the character is ‘Y,’ the compilation was successful; otherwise the compilation was unsuccessful. cdjavac then exits with a status reflecting the success of compilation, 0 for success and 1 for failure. If make receives a failure, it will abort further compilations and display an error message to the user. This process is repeated until all files in the program have been compiled or until an error occurs.

Chapter 4

Experimental Study

4.1

Introduction

This chapter explores the performance of the three compiler implementations described in Chapter 3. Section 4.2 evaluates pjavac, the parallel compiler written totally in Java, that uses multiple threads within a single process for compilation. Section 4.3 evaluates djavac the first distributed implementation whose design is based on the parallel implementation. Section 4.4 evaluates the second distributed implementation that more closely resembles the design of distcc described in Section 2.4 where static inter-file dependence information is stored in a makefile and the compiler client is written in C. For this prototype implementation both of the distributed implementations require a shared Network File System (NFS) to support Java’s inter-file dependence semantics. Fourteen benchmarks were used to test the performance of each of the compiler implementations and are summarized in Table 4.1. The number of files that compose the benchmarks ranges from 17 to 287. The total program size in bytes ranges from 12,592 to 22,689,597. The maximum file size ranges from 1,655 bytes to 1,418,051 bytes. The average 43

44

path length, which is the number of edges in the dependence graph in the path from the root to a leaf, ranges from 1.00 to 64.00. The first five benchmarks are existing applications. pjavac is the source code for the parallel compiler implementation. jEdit[15] is an open source text editor written in Java. MegaMek [16] is an unofficial online version of the classic BattleTech board game. ganttproject[8] is an open source project management tool written in Java. emma[7] is an open source code coverage tool written in Java. The next three benchmarks are contrived to examine the behavior of the compilers on wide, shallow dependence graphs. The dependence graph for each of these benchmarks consists of a root and k independent source files where k is either 16 or 32. Contrived Wide 16 large consists of 16 large (1418055 bytes) independent source files. Contrived Wide 16 small is similar to Contrived Wide 16 large with the file sizes decreased by roughly half to 698054 bytes. Contrived Wide 32 small consists of 32 independent source files of size 698054 bytes. The Contrived Tall k benchmarks are contrived to examine the behavior of the compilers on narrow, tall dependence graphs. The dependence graph for each of these benchmarks consists of a root with four children. The length of the path from the root to a leaf node that goes through each child is k where k is either 16, 32, or 64. For example, for Contrived Tall 64 the path length from the root to each of the four children is 64. The final three benchmarks are randomly contrived to examine the behavior of the compilers on files with varying lengths and interdependences. Contrived Random 1 consists of a small set of source files, each of which has a 90 percent chance of being dependent upon between one and seven other source files. Each of these files has between one and three methods. Thus, Contrived Random 1 consists of a set of small classes with a large number

Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 16 Contrived Random 1 Contrived Random 2 Contrived Random 3

Number of Files 40 287 179 141 169 17 17 33 257 129 65 52 205 203

Total Program Size 775244 2911349 2451604 747876 1269681 22689597 11170779 22340331 7415824 3707796 1853780 12592 155401 198721

Average File Size 19381.10 10144.07 13696.11 5304.09 7512.91 1334582.18 657104.64 676979.73 28855.35 28742.60 28519.69 242.15 758.05 978.92

Maximum File Size 612689 156469 369096 96668 88633 1418051 698054 698054 28969 28969 28969 1665 1655 3315

Minimum File Size 468 215 129 146 805 757 757 1445 136 136 136 75 156 82

Average Path Length 4.57 12.75 5.73 3.84 6.87 1.00 1.00 1.00 64.00 32.00 16.00 2.25 9.38 2.22

Maximum Path Length 7 21 13 7 22 1 1 1 64 32 16 32 22 2

Minimum Path Length 2 1 1 1 2 1 1 1 64 32 16 9 2 5

Table 4.1: Benchmark Properties

45

46

of dependences between them. Contrived Random 2 consists of a large set of source files, each of which has an 80 percent chance of being dependent upon between one and three other source files. Each of these files has between ten and fifteen methods. Thus, Contrived Random 2 consists of a set of moderately sized classes, also with a fairly large number of dependences between them. Contrived Random 3 consists of a large set of source files, each of which has a 20 percent chance of being dependent upon between one and three other source files. Each of these files has between one and thirty methods. Thus, Contrived Random 3 consists of a set of large classes with relatively few dependences between them. Experiments conducted using the Parallel Java Compiler were run on a Compaq Alpha ES40 running Tru64 UNIX (OSF1 V5.1). This machine has two Alpha processors each running at 667 MHz and has two gigabytes of primary memory. Experiments conducted using the Distributed Java Compiler and the C Distributed Java Compiler were run using two distributed server machines. The first was the Compaq Alpha ES40 described above. The second was a Dell Server P2600 running Red Hat Enterprise Linux AS release 3 (2.4.21TM processors, each running r 15.ELsmp). This machine has two Hyperthreaded Intel Xeon

at 3.06 GHz with two gigabytes of primary memory. The client applications were run on the Dell server. The server applications were run on both the Compaq and Dell servers. Experiments were conducted between the hours of 02:00 and 06:00 in the morning to minimize the influence of other processes on the results. The results do not include the time required to build the dependence graphs for the benchmarks or to run makegen, the program that generates a makefile for running cdjavac. Each experiment was performed three times and the average execution time of the three is what appears in tables. The performance of each compiler was compared against the performance of the standard implementation of javac executed on a Compaq Alpha ES40 running Tru64 UNIX (OSF1 V5.1).

47

Section 4.2 examines pjavac’s performance on these benchmarks. Using the performance information from Section 4.2, Sections 4.3 and 4.4 examine the performance of djavac and cdjavac respectively.

4.2

pjavac - Parallel Java Compiler

Table 4.2 shows the results of using pjavac with a bottom-up scheduling algorithm and a multiprocessed compiler implementation, varying the number of processes used for compilation. As the number of processes increase, the total time for compilation decreases. This is especially true for very loosely interconnected benchmarks such as the Contrived Wide benchmarks and Contrived Random 3. The Contrived Tall benchmarks show that the performance does not improve once the number of processes exceeds the number of paths from the root of the dependence graph to a leaf (here, that number is four). Contrived Random 1 shows that there is little overall performance increase by increasing the number of processes for programs whose dependence graphs are tightly interconnected. This is due to the restrictions that dependences between files place on the order in which files can be compiled and the number that can be compiled in parallel. Even though the total program size for Contrived Wide 16 large and Contrived Wide 32 small are roughly the same, Contrived Wide 32 small takes longer to compile. This suggests that compilation of a small number of larger files takes less time than the compilation of a large number of smaller files. However, note that all of these compilations take more time than the baseline javac. Table 4.3 shows the results of using the Parallel Java Compiler with a bottom-up scheduling algorithm and a multithreaded compiler implementation, varying the number of threads used for compilation. Table 4.4 shows speedup of using a multithreaded compiler instead of a multiprocessed compiler. Note that the multithreaded technique always

48

Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 16 Contrived Random 1 Contrived Random 2 Contrived Random 3

1 135.92 769.28 537.78 457.70 325.18 186.47 118.80 236.07 801.22 393.10 193.19 149.40 620.85 603.44

s s s s s s s s s s s s s s

Number of Processes 2 4 8 116.28 s 104.25 s 96.40 667.04 s 564.87 s 522.67 463.31 s 383.38 s 349.21 402.56 s 338.39 s 327.73 285.48 s 248.88 s 233.47 167.46 s 145.17 s 131.67 106.78 s 90.28 s 81.20 210.63 s 177.35 s 163.21 698.90 s 587.08 s 554.09 351.86 s 276.65 s 277.86 165.06 s 140.92 s 140.72 139.72 s 133.96 s 133.98 525.97 s 446.71 s 424.79 509.67 s 421.33 s 371.72

s s s s s s s s s s s s s s

16 95.16 505.42 333.98 315.29 224.64 126.55 77.95 152.81 554.40 277.50 140.21 134.52 409.30 343.73

s s s s s s s s s s s s s s

Baseline javac 3.51 s 14.18 s 9.54 s 8.38 s 7.34 s 118.98 s 56.78 s 55.56 s 22.22 s 13.22 s 7.66 s 3.23 s 4.10 s 4.13 s

Table 4.2: Bottom-Up Scheduling with Multiprocessed Compiler and Varying Thread Counts

performs better than the multiprocessed technique. This is due to the startup cost of the Java Virtual Machine for each file being compiled. To estimate the startup cost of the Java Virtual Machine in terms of time, an experiment was conducted where a single empty Java source file is compiled. The average total time to compile the empty file was 2.66 seconds. Using the multiprocessed compilation technique each process has to pay this startup cost. Table 4.3 shows the compilation time for the Contrived Wide 16 benchmarks are all less than baseline javac while the compilation time for Contrived Wide 32 small is greater than baseline. This suggests that parallelization is more effective for programs with loosely interconnected dependence graphs and that the effectiveness decreases as the total number of files increases. Multithreaded compilation is less effective as the number of threads increases beyond two. Notice that the best performance is achieved with a thread count of two. The machine, which contains two processors, limits the number of compilations that can occur simultaneously. Increasing the thread count beyond the number of processors increases the

49

Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 16 Contrived Random 1 Contrived Random 2 Contrived Random 3

1 16.69 66.32 46.74 52.49 31.35 93.99 46.32 86.66 72.25 36.67 19.76 15.43 53.30 51.41

s s s s s s s s s s s s s s

Number of Threads 2 4 8 14.35 s 14.67 s 14.98 54.03 s 64.77 s 69.96 36.66 s 38.50 s 39.01 40.94 s 42.20 s 43.62 25.79 s 26.37 s 26.53 73.23 s 77.49 s 89.10 35.29 s 37.00 s 39.69 65.88 s 66.45 s 70.55 53.52 s 59.30 s 60.41 29.48 s 31.45 s 31.18 16.59 s 17.45 s 17.26 14.72 s 15.14 s 15.55 43.86 s 48.47 s 47.38 41.97 s 45.06 s 45.51

s s s s s s s s s s s s s s

16 14.83 93.92 40.37 50.33 27.88 92.64 44.79 80.17 60.60 31.09 17.34 15.92 50.56 50.79

s s s s s s s s s s s s s s

Baseline javac 3.51 s 14.18 s 9.54 s 8.38 s 7.34 s 118.98 s 56.78 s 55.56 s 22.22 s 13.22 s 7.66 s 3.23 s 4.10 s 4.13 s

Table 4.3: Bottom-Up Scheduling with Multithreaded Compiler and Varying Thread Counts

possibility of time being spent swapping between threads, especially for compiling large files. Table 4.3 shows that the multithreaded compiler is usually much slower than the baseline javac. This is most likely due to additional file I/O operations in pjavac. With baseline javac, each file is only parsed or loaded once. However, with pjavac the bytecode for each class gets written to disk and the compilation of any dependent files must re-load that resulting class file. Interestingly, however, this is not the case with the Contrived Wide 16 benchmarks. Even with a single compilation thread, pjavac with bottom-up scheduling outperforms baseline javac. This is due to the limited size of the heap within the Java Virtual Machine. Since javac keeps class files in memory, as each class is compiled, more memory is needed. An experiment was conducted where the minimum and maximum heap size for the virtual machine running javac was set to an arbitrarily large value (512 MB). With this experiment, the average compilation time for Contrived Wide 16 large was 94.25 seconds yielding a speedup of 1.26 over baseline javac.

50

Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 16 Contrived Random 1 Contrived Random 2 Contrived Random 3

Number of Processes/Threads 1 2 4 8 16 8.14 8.10 7.11 6.44 6.42 11.60 12.35 8.72 7.47 5.38 11.51 12.64 9.96 8.95 8.27 8.72 9.83 8.02 7.51 6.26 10.37 11.07 9.44 8.80 8.06 1.98 2.29 1.87 1.48 1.37 2.56 3.03 2.44 2.05 1.74 2.72 3.20 2.67 2.31 1.91 11.09 13.06 9.90 9.17 9.15 10.72 11.94 8.80 8.91 8.93 9.78 9.95 8.08 8.15 8.09 9.68 9.49 8.85 8.62 8.45 11.65 11.99 9.22 8.97 8.10 11.74 12.14 9.35 8.17 6.77

Table 4.4: Speedup of Using a Multithreaded Compiler over a Multiprocessed Compiler

The next experiments use the most effective performance characteristics of the previous experiments (a multithreaded compiler with two threads) along with a file size scheduling algorithm, varying the minimum file size. Table 4.5 shows the results of these experiments. These results suggest that the file size scheduling algorithm using two compilation threads is most effective for programs whose dependence graphs are loosely connected such as the Contrived Tall benchmarks. For most benchmarks, however, there is not a significant performance increase as the minimum file length threshold increases. This is likely due to a relatively large number of files having multiple parents. The file size scheduling algorithm schedules files with multiple parents immediately, without regard for the file size and can therefore prevent long dependence chains from being compiled. Table 4.5 also indicates that once the file size threshold approaches or exceeds the total program size, the performance decreases. This performance decrease is due to a decrease in parallelism. When the file size threshold exceeds the total program size, the entire program is compiled in a single thread by compiling the root of the dependence

51

graph. This is evidenced by the Contrived Wide and Contrived Tall benchmarks with large file length thresholds. The best threshold value tends to be one tenth of the total program size. This allows for a large number of files to be compiled in a single chain while maintaining parallelism. Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 18 Contrived Random 1 Contrived Random 2 Contrived Random 3

1 12.60 18.54 16.39 21.40 17.95 78.41 35.75 66.59 127.08 39.21 18.27 5.32 9.64 15.01

s s s s s s s s s s s s s s

File Length in 103 Bytes 10 100 1000 12.29 s 9.56 s 9.65 s 18.10 s 18.10 s 18.17 s 16.47 s 16.39 s 18.08 s 22.60 s 20.79 s 20.70 s 14.38 s 15.34 s 15.40 s 71.64 s 72.15 s 74.04 s 35.54 s 35.30 s 60.83 s 66.07 s 66.43 s 66.07 s 123.62 s 41.57 s 14.76 s 39.20 s 14.28 s 14.82 s 17.48 s 8.90 s 10.60 s 5.25 s 5.26 s 5.22 s 9.43 s 9.30 s 9.33 s 15.36 s 15.22 s 15.24 s

10000 9.46 s 18.30 s 18.62 s 20.82 s 15.20 s 122.48 s 61.67 s 65.20 s 25.39 s 15.88 s 9.49 s 5.25 s 9.30 s 15.65 s

Baseline javac 3.51 s 14.18 s 9.54 s 8.38 s 7.34 s 118.98 s 56.78 s 55.56 s 22.22 s 13.22 s 7.66 s 3.23 s 4.10 s 4.13 s

Table 4.5: File Size Scheduling with Two Threads and Varying Minimum File Lengths

Table 4.6 shows the speedup of the file size scheduling algorithm over the bottom-up scheduling algorithms. These results show that the file size scheduling algorithm is generally more effective than the bottom-up scheduling algorithm, even though there is greater cost associated with file size scheduling. The file size scheduler has to update the cumulative size attached to each node in the dependence graph after each compilation. This cost is acceptable in cases where it would take longer to compile each file separately than it does to compile a dependence chain as a group, such as long chains of files smaller than the file size threshold. Examples of such cases are the jEdit [15], MegaMek [16], and Contrived Tall benchmarks. This cost is not acceptable when long dependence chains cannot be built, such as with the Contrived Wide benchmarks.

52

Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 16 Contrived Random 1 Contrived Random 2 Contrived Random 3

File Length in 1 10 100 1.14 1.17 1.50 2.91 2.99 2.99 2.24 2.23 2.24 1.91 1.81 1.97 1.44 1.79 1.68 0.93 1.02 1.01 0.99 0.99 1.00 0.99 1.00 0.99 0.42 0.43 1.29 0.75 0.75 2.06 0.91 0.95 1.86 2.77 2.80 2.80 4.55 4.65 4.72 2.80 2.73 2.76

103 Bytes 1000 10000 1.49 1.52 2.97 2.95 2.03 1.97 1.98 1.97 1.67 1.70 0.99 0.60 0.58 0.57 1.00 1.01 3.63 2.11 1.99 1.86 1.57 1.75 2.82 2.80 4.70 4.72 2.75 2.68

Table 4.6: Speedup of Using a File Size Scheduling Algorithm with two threads over a Bottom-Up Scheduling Algorithm with two threads

In summary, the results of testing with pjavac show that the multithreaded compiler outperforms the multiprocessed compiler. This is because the multiprocessed compiler has to pay the startup cost for the Java Virtual Machine for each file being compiled while the multithreaded compiler only pays that cost once. Maximum speedup is achieved using the multithreaded compiler when the number of threads executing simultaneously is equal to the number of processors in the machine. This maximizes processor usage without causing additional contention for the shared resources. File size scheduling outperforms the bottomup scheduling algorithm for benchmarks when the file size threshold is large enough to cause multiple files to be compiled by a single compilation request and small enough to prevent all or most of the program from being compiled by a single compilation request.

53

4.3

djavac - Distributed Java Compiler

The second implementation associated with this research was the development of a distributed version of the compiler using the software developed for the parallel implementation. The results from Section 4.2 indicate invoking javac internally via threading with a file size scheduling approach is more efficient thus the multithreaded compilation approach with file size scheduling is used on the server side in these djavac experiments. Table 4.7 shows the results of using djavac with a file size scheduling algorithm, varying the file size, and four server processes, two processes on each of the two machines. Again, these results suggest that the file size scheduling algorithm is most effective for programs whose dependence graphs are loosely connected, such as the Contrived Wide and Contrived Random 3 benchmarks. However, once the file size threshold approaches or exceeds the total program size, the performance decreases. This performance decrease does not always fall below performance of baseline javac. At times, djavac distributes the compilation of the program to the faster machine. The djavac implementation performs better than baseline on the Contrived Wide benchmarks, and on the Contrived Tall benchmarks. Table 4.8 shows the speedup of the distributed implementation over the parallel implementation using a file size scheduling algorithm. These results show that the distributed implementation generally performs better than the parallel implementation on the benchmarks. By off-loading a portion of the compilations to a different machine, djavac is able to decrease the total compilation times for some benchmarks. The benchmarks that see the best performance increase are the Contrived Tall benchmarks which are especially well-suited for file size scheduling. However, the compile times for other benchmarks with a large number of relatively small files with large numbers of interdependences such as jEdit [15], MegaMek [16], ganttproject [8], and Contrived Random 2 do not speed up. This is

54

Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 16 Contrived Random 1 Contrived Random 2 Contrived Random 3

1 7.56 39.85 20.01 23.75 16.32 44.07 21.01 46.25 32.97 15.05 7.97 4.80 11.04 11.48

s s s s s s s s s s s s s s

File Length in 103 Bytes 10 100 1000 6.33 s 4.54 s 4.50 s 38.05 s 38.85 s 38.63 s 19.94 s 19.48 s 20.21 s 21.26 s 21.99 s 21.18 s 11.62 s 11.56 s 11.90 s 42.94 s 38.55 s 37.97 s 20.97 s 21.86 s 49.18 s 44.68 s 45.53 s 56.77 s 32.87 s 15.26 s 11.07 s 14.40 s 8.30 s 12.45 s 7.75 s 5.59 s 6.76 s 4.51 s 5.03 s 4.84 s 10.51 s 11.23 s 12.01 s 11.11 s 12.02 s 13.04 s

10000 4.54 s 38.52 s 19.83 s 21.97 s 12.61 s 115.88 s 54.27 s 58.91 s 20.82 s 12.34 s 7.20 s 5.09 s 10.48 s 9.51 s

Baseline javac 3.51 s 14.18 s 9.54 s 8.38 s 7.34 s 118.98 s 56.78 s 55.56 s 22.22 s 13.22 s 7.66 s 3.23 s 4.10 s 4.13 s

Table 4.7: File Size Scheduling with Four djavac Servers and Varying Minimum File Lengths

potentially due to NFS latency. Table 4.9 shows a comparison of initial file systems access times for a directory containing 10807 files and directories using approximately 167 megabytes of disk space on both a local filesystem and files on a NFS filesystem. NFS latency can indeed effect the performance of djavac as the compilers request source files to compile, request class files to perform semantic checking, and write compiled bytecode to file.

4.4

cdjavac - C Distributed Java Compiler

The third and final implementation associated with this research was the development of a distributed version of the compiler with all client runtime software implemented in C. A tool named makegen was written, using software developed for the other two implementations, to generate a makefile to represent the static inter-file dependences. make then uses this makefile to invoke a user-specified number of client applications in parallel. These client

55

Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 16 Contrived Random 1 Contrived Random 2 Contrived Random 3

File Length in 1 10 100 1.67 1.94 2.11 0.47 0.46 0.47 0.82 0.83 0.84 0.90 1.06 0.95 1.10 1.24 1.33 1.78 1.67 1.87 1.70 1.69 1.61 1.44 1.48 1.46 3.85 3.76 2.72 2.61 2.72 1.72 2.29 2.26 1.59 1.11 1.16 1.05 0.87 0.90 0.83 1.31 1.38 1.27

103 Bytes 1000 10000 2.14 2.08 0.47 0.48 0.89 0.94 0.98 0.95 1.29 1.21 1.95 1.06 1.24 1.14 1.16 1.11 1.33 1.22 1.19 1.29 1.57 1.32 1.08 1.03 0.78 0.89 1.17 1.65

Table 4.8: Speedup of Using the Distributed Implementation over the Parallel Implementation

Command find dir cp -R dir dir2 rm -rf dir2

Local 1.65 s 49.93 s 10.03 s

NFS 3.03 s 803.61 s 459.09 s

Table 4.9: Time Required to Access 10807 Files and Directories on both a local filesystem and a NFS filesystem

applications communicate with a single proxy application which relays the compilation requests to servers specified in the file hosts.conf. Previous results indicate invoking javac internally via threading with a file size scheduling approach is more efficient thus the multithreaded compilation approach with file size scheduling is used on the server side in these cdjavac experiments. Table 4.10 shows the results of using cdjavac with a file size scheduling algorithm, varying the file size, and four server processes, two processes on each of the two servers machines. cdjavac outperforms javac on more than half the benchmarks in this study. Generally, the performance increases as the file size threshold increases to roughly one-tenth

56

of the total program size. When the file size threshold exceeds this value, the parallelism decreases as large portions of the entire program are compiled by a single compilation.

Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 16 Contrived Random 1 Contrived Random 2 Contrived Random 3

1 4.44 37.87 12.03 13.35 11.62 48.48 20.35 39.43 24.45 11.04 5.86 1.94 7.04 7.74

s s s s s s s s s s s s s s

File Length in 103 Bytes 10 100 1000 3.52 s 2.68 s 2.67 s 32.86 s 34.70 s 34.32 s 11.20 s 10.89 s 8.96 s 12.38 s 12.25 s 11.74 s 10.42 s 9.85 s 9.65 s 36.93 s 37.08 s 36.89 s 20.87 s 21.07 s 53.03 s 39.59 s 40.62 s 60.83 s 24.09 s 11.03 s 7.71 s 10.62 s 5.32 s 9.83 s 5.39 s 2.60 s 5.30 s 1.50 s 1.43 s 1.45 s 6.68 s 6.55 s 7.05 s 7.33 s 7.51 s 7.10 s

10000 2.77 s 34.76 s 9.18 s 10.09 s 9.41 s 121.37 s 58.79 s 54.53 s 18.12 s 9.45 s 5.83 s 1.44 s 6.94 s 6.85 s

Baseline javac 3.51 s 14.18 s 9.54 s 8.38 s 7.34 s 118.98 s 56.78 s 55.56 s 22.22 s 13.22 s 7.66 s 3.23 s 4.10 s 4.13 s

Table 4.10: File Size Scheduling with Four cdjavacd Servers and Varying Minimum File Lengths

Table 4.11 shows the speedup of cdjavac over djavac. These results show that the distributed implementation using applications written in C on the client side generally performs better than the distributed implementation with the client side written in Java. cdjavac does not perform file size scheduling at runtime which eliminates the overhead involved in maintaining the cumulative file lengths. The speedup is small due to the fact the most CPU intensive parts of both applications are on the server side and are both written in Java. Again, benchmarks with a large number of small source files or benchmarks whose dependence graphs are highly interconnected perform more poorly than other benchmarks. This is likely due to NFS latency and the restrictions the dependences place on the number of simultaneous compiles.

57

Benchmark Name pjavac jEdit MegaMek ganttproject emma Contrived Wide 16 large Contrived Wide 16 small Contrived Wide 32 small Contrived Tall 64 Contrived Tall 32 Contrived Tall 16 Contrived Random 1 Contrived Random 2 Contrived Random 3

File Length in 1 10 100 1.77 1.80 1.69 1.05 1.16 1.12 1.66 1.78 1.79 1.78 1.72 1.80 1.40 1.12 1.17 0.91 1.16 1.04 1.03 1.00 1.04 1.17 1.13 1.12 1.35 1.36 1.38 1.36 1.36 1.56 1.36 1.44 2.15 2.47 3.01 3.52 1.57 1.57 1.71 1.48 1.52 1.60

103 Bytes 1000 10000 1.69 1.64 1.13 1.11 2.26 2.16 1.80 2.18 1.23 1.34 1.03 0.95 0.93 0.92 0.93 1.08 1.44 1.15 1.27 1.31 1.28 1.23 3.34 3.53 1.70 1.51 1.84 1.39

Table 4.11: Speedup of cdjavac over djavac

4.5

Summary

Studies using pjavac indicate the multithreaded compiler implementation is better than the multiprocessed since the multithreaded compiler does not incur the overhead of starting a new Java Virtual Machine with each compile. Results using pjavac also indicate that it is better to use a file size scheduling algorithm in order to compile a group of dependent files rather than to incur the overhead of performing a bottom-up compilation of those files unless those files individually are quite large. The best file size threshold determined is in the range of one tenth of the total program size. Studies using the distributed Java compilers indicate that a distributed version generally performs better than a parallel version. The additional processors the distributed implementation provides allow more compiles to be performed simultaneously. However, if there are a larger number of relatively small files, NFS overhead can decrease performance. cdjavac performs better than djavac because it does not perform file size scheduling at

58

runtime which eliminates the overhead involved in maintaining the cumulative file lengths. cdjavac outperforms the baseline javac compiler in cases the program consists of a number of large, somewhat independent source files. The independence among source files allows them to be compiled in parallel.

Chapter 5

Summary Translation of high-level languages like C or Java to machine code can be quite time consuming. An application called distcc distributes the compilation of C/C++ source code to machines on a local area network for parallel compilation. This thesis explored the development of a distributed compiler for Java. Unlike C or C++, Java compile-time translation unit interdependences prevent a distributed compiler from blindly distributing files for compilation. In C/C++ header files hold semantic information allowing the source files to be compiled in any order. The Java language does support header files thus a file A.java needing semantic information contained in B.java must be compiled after A.java. Parallelizing compilation of Java source files requires performing external dependence analysis to determine a reasonable compilation order to satisfy these interdependences. This thesis explained the design steps and results of an experimental study undertaken to develop a distributed compiler for Java. First, multiple processes and multiple threads on a single machine were used to determine the best compilation method and the best scheduling algorithm available. Experiments show that using multiple threads within 59

60

a single Java Virtual Machine is much more efficient than spawning new virtual machines. Experiments also show that file size scheduling is slightly more efficient than using a bottomup scheduling approach. The program, cdjavac, performs especially well on programs where the source file size is relatively large and where the source files are not highly dependent. In summary, the contributions of this research are the following: • The design and implementation of a technique to perform external dependence analysis on Java source files. This technique examines Java source files and builds a dependence graph that reflects the dependences between them. • The design and implementation of an infrastructure for generically handling compilers so that the code can be extended to test other compilation techniques. • Implementation of a compilation technique that executes javac as an external process for analysis. • Implementation of a compilation technique that executes javac by calling the static compile method in the code shipped with the Java Software Development Kit. • Design and implementation of an infrastructure for generically handling compilation scheduling algorithms so that the code can be extended to test other scheduling algorithms. • Implementation of a bottom-up scheduler that constantly compiles leaf nodes in the dependence graph. • Implementation of a file size scheduler that, starting with the leaf nodes, works up the dependence graph and schedules files for compilation based on their cumulative file length.

61

• Implementation of a parallel Java compiler capable of analyzing parallel compiler performance on a multiprocessor machine without a network. • Implementation of distributed Java compilers capable of analyzing distributed compiler performance on multiple hosts on a TCP/IP network. • Implementation of an application that is capable of generating a makefile to represent the static inter-file dependences in a Java program. This makefile can potentially be used by other programs for compile time analysis. • Performance of an experimental study to evaluate the parallel Java compiler and two distributed Java compilers. Further research in this area includes distributing the compilation load to more servers on a local area network. This could also include a more in-depth assessment of network and NFS latency on the efficiency of the distributed compilation. Also, modifying the djavac and cdjavac implementations by assigning a weight to each available server in the hosts.conf file would allow faster machines to be assigned larger files to compile. Another approach would be to compress and transfer the files necessary for compilation similarly to the way distcc distributes C/C++ source files. The Distributed Java Compiler could be modified to support a Windows-based file system so that the Windows file sharing protocol could be used in place of NFS. Also, the grammar of the Java language could be modified to include inter-translation unit semantic information prior to compilation, similar to the preprocessing that is done to C/C++ source files, to eliminate compile-time interdependences.

Bibliography [1] Gregory R. Andrews. Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley, 2000. [2] Andrew W. Appel. Modern Compiler Implementation in Java. Cambridge University Press, Cambridge, UK, second edition, 2002. [3] Daniel P. Bovet and Marco Cesati. Understanding the Linux Kernel. O’Reilly & Associates, Inc., Cambridge, MA, 2001. [4] Randal E. Bryant and David R. O’Halloran. Computer Systems: A Programmer’s Perspective. Prentice Hall, Upper Saddle River, New Jersey 07458, 2003. [5] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2001. [6] distcc: a fast, free distributed c/c++ compiler. Online - http://distcc.samba.org/, February 2004. Distcc Project Website. [7] EMMA: a free Java code coverage tool. Online - http://ganttproject.sourceforge.net/, May 2004. An open source code coverage tool. [8] GanttProject. Online - http://ganttproject.sourceforge.net/, May 2004. source application that allows you plan projects using a Gantt charts.

An open

[9] Samuel P. Harbison III and Guy L. Steele Jr. C: A Reference Manual. Prentice-Hall, Upper Saddle River, NJ 07458, USA, fifth edition, 2002. [10] John L. Hennessy and David A. Patterson. Computer Architecture – A Quantitative Approach. Morgan Kaufmann Publishers, Los Altos, CA 94022, USA, third edition, 2003. [11] Cay Horstmann and Gary Cornell. Core Java 2, Volume I: Fundamentals. P T R Prentice-Hall, Englewood Cliffs, NJ 07632, USA, fifth edition, 2000. [12] Scott E. Hudson, Frank Flannery, C. Scott Ananian, and Dan Wang. CUP: LALR Parse Generator for JavaTM . Online - http://www.cs.princeton.edu/ appel/modern/java/CUP/, September 2003. CUP Parser Generator for Java. [13] The JavaTM Language: An Overview. Online http://java.sun.com/docs/overviews/java/java-overview-1.html, March 2003. Java Language White Paper. 62

63

[14] javac = The JavaTM programming languge compiler. Online http://java.sun.com/j2se/1.4.2/docs/tooldocs/solaris/javac.html, April 2003. Javac Documentation Website. [15] jEdit - Open Source programmer’s text editor. Online - http://www.jedit.org/, February 2004. An open source text editor written in Java. [16] MegaMek - an unofficial, online version of the Classic BattleTech board game. Online http://megamek.sourceforge.net/, May 2004. An open source, unofficial, online version of the Classic BattleTech board game. [17] Andrew Oram and Steve Talbott. Managing Projects with make. O’Reilly & Associates, Inc., 981 Chestnut Street, Newton, MA 02164, USA, second edition, 1991. [18] Chet Ramey and Brian Fox. Bash Reference Manual. Network Theory Limited, 15 Royal Park, Clifton, Bristol BS8 3AL United Kingdom, 2.5b edition, 2002. [19] Syed Mansoor Sarwar, Robert Koretsky, and Syed Aqeel Sarwar. Unix, the textbook. Addison-Wesley Longman, Harlow, Essex CM20 2JE, England, 2001. [20] Richard M. Stallman and Roland McGrath. GNU Make: A Program for Directing Recompilation, for GNU Make Version 3.80. GNU Press, Boston, MA, USA, 2002. [21] W. Richard Stevens. Advanced Programming in the UNIX Environment. Addison-Wesley, Reading, MA, USA, 1993. [22] W. Richard Stevens. TCP/IP Illustrated, Volume 1: The Protocols. Addison-Wesley Professional Computing Series. Addison-Wesley, Reading, MA, USA, 1994. [23] W. Richard Stevens, Bill Fenner, and Andrew M. Rudoff. UNIX Network Programming: The Sockets Networking API, Volume 1. Addison-Wesley Professional Computing Series. Prentice-Hall PTR, Upper Saddle River, NJ 07458, USA, third edition, 2004. [24] Andrew S. Tanenbaum and Albert S. Woodhull. Operating Systems – Design and Implementation. Prentice-Hall, Upper Saddle River, NJ 07458, USA, second edition, 1997. [25] Barry Wilkinson and Michael Allen. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice-Hall, Upper Saddle River, New Jersey 07458, 1999.

Vita Andrew Ryan Dalton was born in Hickory, North Carolina, on August 4, 1978, son of Beverly Ann Dalton and the late Detroy Briscoe Dalton. He graduated from Hickory High School in June 1996, and entered Appalachian State University the following August. In December 2000, he graduated cum laude with a Bachelor of Science degree in Computer Science. In January of 2001 he accepted a position as a software engineer with Nortel Networks in Research Triangle Park, North Carolina. In August 2002, he returned to Appalachian State University to begin study toward a Master’s degree. This degree will be awarded in August 2004, and he will enter Clemson University in the fall of 2004 to pursue a Ph.D. degree in Computer Science. Mr. Dalton is a member of of the Association for Computing Machinery and a member of the Appalachian State University Graduate Student Association Senate. While a graduate student at ASU, he has taught six sections of a Computer Science II Laboratory, one section of Introduction to Computers, and served as a teaching assistant for a course in assembly language and machine operation. He has also assisted in the development of laboratories for the Computer Science I and the Survey of Computer Science courses. In addition, he has given introductory workshops on the vim text editor and the Linux operating system.

64