On the Implementation of AD using Elimination Methods ... - CiteSeerX

On the Implementation of AD using Elimination Methods via Source Transformation Mohamed Tadjouddine 1 , John D. Pryce 2 , & Shaun A. Forth 3 AMOR Report No 2000/8 Cranfield University (RMCS Shrivenham), Swindon SN6 8LA, UK November 2000, reviewed July 2001 [email protected]

Abstract In this paper, we investigate the implementation of an Automatic Differentiation tool based on source transformation by using vertex elimination methods related to sparse matrices. We focus on a restricted class of Fortran codes from CFD applications and aim to provide a near-optimal differentiation approach for such problems. We discuss the algorithms and data structures we use in developing that AD tool.

1 AMOR

Group, Engineering Systems Department Department of Informatics & Simulation 3 AMOR Group, Engineering Systems Department 2 CISE,

1

1 Introduction In inverse problems, data assimilation, or design optimisation we need to linearise a model or to build up the adjoint linearisation of a working model. One way to implement this is to use Automatic Differentiation (AD) [1, 2]. Automatic Differentiation is an efficient technology to compute a large number of derivatives of a function represented by a computer program. To perform Automatic Differentiation, one of the mechanisms used is source-to-source translation. Nevertheless, in some cases, the derived code suffers from poor performance due to a lack of memory locality caused by a costly order of statement execution or by sparse computations. A simple example of an ordering problem appears when we have a sequence of two independent blocks A; B to differentiate. We can compute A followed by B , then differentiate A after B . We can also compute A followed by its derivative, then compute B followed by its derivative. The latter procedure could be better than the first one because variables used to determine A may still be in cache when differentiating A. Similarly for B and its derivative. Actually an inappropriate ordering of operations can prohibit the use of the cache and then slows down the derived program [3]. To enhance the performance of the AD derived code, we employ the vertex elimination approach [2, 4, 5]. The use of this approach requires a vertex elimination sequence, which aims to minimise the number of arithmetic operations required to compute the derivative function. This approach may also take account of the sparsity of the Jacobian [2]. Furthermore, we would like to get an ordering of operations that optimises the cache utilisation. Those two targets turn out to be the purpose of this work. In this paper, we detail our efforts on the implementation issues of using the vertex elimination approach in the context of AD based on source transformation. Therefore, we assume that a vertex elimination sequence, which is nearly optimal as regards of number of arithmetic operations, has been found by some algorithms (see for instance [5]). After a brief review of AD, we will motivate the use of vertex elimination approach for our purpose through an example. Then we focus on our AD-tool development. We cover the following points: 1. the use of ANTLR [6] to build up the front-end and the back-end of our AD translator tool; 2. the description of the data structures and algorithms that we use throughout our tool from reading of the input code to outputting the corresponding derivative code; 3. a discussion on the implementation issues and the exciting possibilities that will arise as we deal with large-scale problems (e.g. CFD flux computations). Throughout this paper, we work on Automatic Differentiation of Fortran programs though the algorithms employed are independent of the input language. Apart from input and output of source code, this work could equally be applied to other high level imperative languages such as C or C++. In some examples, we use the Fortran language extended by mathematical notations.

2

2

Automatic Differentiation

Automatic Differentiation (AD) [2] is a set of algorithms for generating derivatives of functions represented by computer programs. AD relies on the use of the chain rule of calculus applied to elementary operations in an automated fashion. AD can be seen as an example of program transformation, which transforms an input program to a new one by adding derivative statements or overloading original operations in order to support derivative value computation. Consider we have a computer program that represents the following function:

F

:

! Rm 7 y !

Rn

x

Define z to be the vector comprising all the intermediate variables and the vector v (x; z; y ). Assume that the quantities in the vector v are numbered as follows: (v1

|

n ; : : : ; v0 ; v1 ; : : : ; vl m {z } x

1

=

; vl m ; vl m+1 ; : : : ; vl ) | {z } y

An execution of such a program can be seen as a code-list [2] consisting of scalar assignments vi = i (ui ); i = 1 n; : : : l where ui is the subvector consisting of those previously computed vj , which are used in computing vi and i represents an elemental function. It is convenient to define

i (ui ) = xi ; We write

i = 1 n; : : : 0:

ui = (vj )j i :

The notation j i means that vj is used in computing vi . This is the data dependency relationship, see [2] for details. The code-list above represents a non linear system of equations 0=

E (x; v ) = (i (ui )

vi )i=1 n;:::l ;

(1)

Assuming the functions i have continuous first derivatives, we can differentiate the non linear system E (x; v ) = 0. We then obtain

i (ui ) vj vj x

vi x

=0

(2)

y of the function F (see section 3) turns out to be a The computation of the Jacobian x solution of a linear system

A

v x

=

b; followed by

y x

= [0(l

3

m+n)m ; Im ℄

v : x

(3)

The matrix A is the so-called extended Jacobian and consists of a lower triangular i matrix A = C In+l where C = ( ij )1 nil 1j i = vj and In+l the identity matrix of size (n+l) (n+l). Generally, the input variables x with respect to which we need to compute derivatives are called independent variables. The output variables y whose derivatives are desired are called dependent variables. A variable which depends on an independent variable, and on which a dependent variable depends, is called an active variable. The system of equations 3 is solved by some form of elimination process. One way of viewing this process is to consider each vi as a vertex of the Computational Graph (CG) and associate each edge (vj ; vi ) with the partial derivative ij (see section 3). We then obtain the so-called linearised CG. We can apply a vertex elimination sequence to this CG to eliminate the intermediate vertices until the graph become bipartite. We y . then retrieve the Jacobian x

2.1 Why use AD? There are at least 2 reasons that invite us to use AD as a way of getting derivatives of functions: 1. Improving time complexity and memory requirement of derivative computation. To accumulate the Jacobian, there are two modes that have predictable complexities [2]. The forward mode in which derivatives and values are propagated through the program. The reverse mode in which a representation of the program execution is built up, then traversed backwards to get the derivatives. Because the time complexities of the forward/reverse modes are dependent on the number of inputs/outputs, we should use for instance the reverse mode to compute a gradient. In that case, the reverse mode enables us to compute a gradient with a time complexity independent of the number of inputs of the function. Certain ways of improving derivative calculation using AD are investigated in, for instance [5, 7, 8, 9, 10, 11]. Compared to the popular Finite Differences, AD often offers the opportunity for cheap computation of a large number of derivatives. 2. Getting accurate results compared to Finite Differences (see [12, 13]). In the absence of roundoff, AD produces exact results. Therefore, to solve some convergence problems that require high precision, AD appears to be useful and welladapted. AD has been successfully applied to academic and industrial applications, see [12, 14]. Still, a lot of work has to be done to improve its efficiency in terms of time complexity or memory requirement. Our present work is intended to provide an optimal AD for a class of codes that occur in CFD applications.

2.2 How AD works We can implement an AD tool by using the following methods:

4

Operator overloading that consists in overloading the basic arithmetic operators and intrinsic function calls to allow the propagation of derivatives. The AD tool is built up as a library (see [15, 16]). Source transformation that relies on compiler techniques. This technique transforms a source code to a new source code augmented by new statements, which compute derivative values (see [14, 17, 18, 19]).

To use an AD tool which works by operator overloading, one has to change the types of independent variables and all the program variables that depend on those independent variables. Then, the modified program is linked to the AD library. To use an AD tool which works by source transformation, one has to specify the independent variables (optionally the dependent variables), and then the AD tool generates the derivative code from the original code.

3 Vertex Elimination Methods This section is part of our previous work [3]. An alternative to the conventional forward and reverse modes of AD is the elimination approach [2, 4, 5]. To illustrate this tech6

64

65 5

4

53

51

1

31

3

43

32

42

2 1 66 1 66 31 32 1 66

42 43 4 51

53

3 77 77 77 5

1 1

64 65

1

2

Figure 1: An example of CG (left) and its matrix representation (right) nique, consider the graph sketched in Figure 1, which shows a Computational Graph (CG) for derivative calculation and its matrix representation. The vertices 1; 2 represent the independents x1 ; x2 ; the vertices 3; 4; and 5 the intermediate; and the vertex 6 the dependent y . We have the following equations, relating infinitesimal increments (differentials) in the various variables:

dv1 = dx1 dv2 = dx2 dv3 = 31 dv1 + 32 dv 2

dv4 = 42 dv2 + 43 dv3 dv5 = 51 dv1 + 53 dv3 dv6 = dy = 64 dv4 + 65 dv5

vi . The Jacobian The coefficients i;j ; 1 i; j 6 represent partial derivatives v j y (x1 ;x2 ) is determined by eliminating intermediate vertices or edges from the graph until it becomes bipartite. In terms of the matrix representation, vertex elimination is

5

equivalent to successively choosing a diagonal pivot element from rows 3 to 5, eliminating all the coefficients under that pivot and leaving the Jacobian as elements 61 and

62 . Using the notation as the transitive closure of the data dependency relationship (see [2]) and the notation jS j as the number of elements in the set S , we define the Markowitz and VLR costs at an intermediate vertex vj respectively as follows: mark(vj ) VLR(vj )

= =

jfi : i j gjjfk : j kgj jfxi : i j gjjfyk : j kgj

mark(vj )

(4) (5)

Ordering the elimination process using heuristics based on choosing the vertex with minimum Markowitz or VLR cost generally gives a further improvement [3, 5].

4 An AD Tool by Source Transformation with ANTLR and Java Our AD tool is intended to provide efficient AD techniques for a restricted class of Fortran codes, which is important in CFD applications. That is, subroutines with typically 5 to 30 inputs xi and outputs yi and some hundreds vi of intermediate values, with no loops but allowing branches. The aim is to efficiently compute the Jacobians associated with such subroutines. Appendix A shows the specifications of the input language, which is a subset of Fortran. In a previous work [3], we have shown the superiority of source transformation over operator overloading and state the possible gain of using vertex eliminations methods like the Markowitz criterion to achieve our aim. We have taken those results into account in this current AD tool. To develop our AD tool, we have considered the following features:

the tool takes as input a Fortran code and outputs a new Fortran code. we need software that allows us to quickly develop a front-end (parser) and a back-end (pretty-printer). we need to have a good tree structure to support symbolic manipulation. we need to develop in a language that improves the productivity in terms of code development. we would like to minimise licensing and legal obstacles that sometimes slow down the coding process.

Given the issues listed above, we have chosen to use ANTLR (Another Tool for Language Recognition) available on the site www.antlr.com and to develop our AD tool in the Java programming language. ANTLR has been used for development of real life applications; see for instance [20].

6

4.1 Choice of ANTLR ANTLR is free software, which generates recursive descent parsers (namely LL(k ); the first “L” is for left-to-right scanning of the input and the second “L” for producing a leftmost derivation [21]). ANTLR can create parsers with selective k lookahead. In other words the generated parser can lookahead more than one token. ANTLR is written in Java and generates a parser program in Java or C++. An interesting feature of ANTLR is the use of predicates, which allow us a detailed control of the lexing and parsing. ANTLR has good error reporting compared to the ”shift/reduce” or ”reduce/reduce” errors produced by YACC on a given grammar rule even if those errors could be localized with the help of the YACC debugger. Moreover, the parser generated by ANTLR is more readable. This helps to find errors in the input grammar. Neverthless, ANTLR has some disadvantages. The software is complex to explore despite a good online documentation. Fortunately included examples help enormously. Also, debugging an ANTLR grammar is not easy even though there is a debugging tool. ANTLR can generate abstract syntax trees and tree-walkers. To write an ANTLR grammar, you have to define a parser class, then a start rule. Each grammar rule corresponds to a method in the parser class. In the same file, you define the lexer class. A rule can be associated with a semantic action that is expressed as Java code. To launch your parser, you must create a class that defines a main() and start the parsing by defining the data input and invoking the start method.

4.2 Algorithms and Data Structures Most problems can be formulated in terms of mathematical objects and solutions of such problems can be outlined in terms of fundamental operations on those objects. Data structures can lead to component reuse, which is an important goal in objectoriented programming. We rely on Java to design an object-oriented software. In this section, we are going to describe how we rely on common data structures such as trees, graphs, stacks, or hashtables to build up our AD tool. 4.2.1 The Abstract Syntax Tree The output of a parser is an Abstract Syntax Tree (AST) that mirrors the input language. Consequently, ANTLR outputs an abstract syntax tree from a grammar described in a file with the extension .g. In code generation or program transformation framework, generally the abstract syntax tree is processed to construct appropriate intermediate representations such as control or data flow graphs. In the AD framework, the controlflow graph appears to be suitable for supplying some optimizations. The abstract syntax tree we use is based on binary trees implemented in the ANTLR software. ANTLR tree parsers use a child-sibling structure. This kind of tree is easy to traverse compared to binary trees with specific children reference fields, see [22]. The basic structure of the abstract syntax tree can be described as follows: class BasicAST { String text; // Token string int ttype; // Token type protected BasicAST down; protected BasicAST right; }

7

+

*

a

/

b

c

d

Figure 2: an example of abstract syntax tree representation For example the expression a b + =d will be converted by an ANTLR parser to the binary tree shown in Figure 2. To traverse the abstract syntax tree, ANTLR provides the two important methods:

getFirstChild: Returns a reference to the first child of the sibling list. getNextSibling: Returns a reference to the next child in the list of siblings.

In order to be able to perform a reverse searching or to make easier some operations through the abstract syntax tree, we derive a class MyNode from BasicAST as follows: class MyNode extends BasicAST { /* vertices for doublelinking */ protected MyNode up; // up down protected MyNode left; // left right ... }

The operations that we use from the abstract syntax tree are: matching specific subtrees; retrieving, adding, or removing subtrees. Those operations are implemented in a class ASTStat (see section Implementation issues for more details). 4.2.2 The Symbol Table The symbol table is built up while analyzing the semantic of the input language. A symbol table maps program identifiers to their types, natures, or locations. For each identifier, we associate the following information:

type: we store its declared type (integer, real, etc.) if this is explicit. Otherwise, its type is implicit. species: the species of an identifier could be constant, variable, intrinsicCall, or funCall. rank: the rank of an identifier represents its number of dimensions if it is a variable or its number of arguments if it is a function. 8

scope: the scope allows us to store the scope of an identifier (where the identifier is visible). The information we store tells us if a variable is local to a subroutine or global.

Our symbol table class is implemented as follows: public class SymbolTable { /** supplies scopes */ private Stack stackOfScopes; /** table where all defined names are mapped to their related information as a tree */ private Hashtable symTable; ... }

We process the program declarations and associate each identifier to its information in the symbol table. When an occurrence of an identifier is used, it is checked in the symbol table. According to the validity of the statement, the used identifier and the information related to it are pushed in the symbol table if that identifier was not already in the symbol table. 4.2.3 The Control Flow Graph The control flow graph plays a central role in code optimization. A control flow graph describes the controlflow of a program. It is a triplet (V; E; s) where (V; E ) is a directed graph, s 2 V represents the initial vertex, and there exists a path from s to any other vertex of the graph. A vertex represents a basic block. An edge represents a transfer of information between two vertices of the graph. Figure 3 shows an example of a control flow graph associated with a computer program. In Figure 3, we call A; B and Entry

program ... !declarations do i=1, n if (test(i)) then else endif if (T) stop enddo end

A

B false

true

D

C

E

Exit

Figure 3: An example of program (left) and its control flow graph (right)

E block-headers whereas C and D are usually called basic blocks. The data structure we use for the control flow graph is described the following:

9

class CFG { /** Vertices are stored as (key, value): key = number of vertex and value = its AST */ protected Hashtable vertices = new Hashtable(11); /** graph = list of adjacent vertices */ protected Hashtable adjacencies ; ... }

To implement the control flow graph we have used an algorithm, which is similar to the one described in [21]. The algorithm is based on the following rules: 1. The first statement of a basic block is a Leader, an IF statement is a Leader, any Loop statement is a leader. 2. For each leader, its basic block is composed of Leader followed by statements up to (not including) the next Leader or the End of the block or program. 4.2.4 Traversing Trees and Graphs Many algorithms which make use of trees (or graphs) traverse the trees (or graphs) in some order. Several ways of doing that exist [22, 23, 24]. We have used two common strategies to traverse the abstract syntax tree, which is a binary tree.

A preorder traversal of the abstract syntax tree which is recursively described as follows: 1. visit the root v 2. visit in preorder the left subtree of v and then its right subtree.

A postorder traversal of the abstract syntax tree which is recursively described as follows: 1. visit in postorder the left subtree of v and then its right subtree. 2. visit the root v

A prototype of a class that implements an algorithm for traversing the abstract syntax tree can be defined as follows: class TreeTraverse implements Enumeration { /* stores vertex objects */ protected Stack vertices; ... }

In our implementation, these two algorithms are implemented in the class named MyFortranTreeWalker. On the order hand, we have used two common algorithms to traverse the control flow graph, which is a cyclic oriented graph. A difficulty in traversing graphs is if there are cycles in the graph. This is solved by keeping track of information for visited vertices. The algorithms we used to traverse the control flow graph are:

10

1. depth-first traversal in which one visits a vertex v , then traverses in depth-first traversal its children before its siblings. 2. breadth-first traversal in which one visits a vertex v and then its siblings and the children of other visited vertices before visiting the children of v . A prototype of a class that implements the two algorithms above can be defined as follows: class GraphTraverse implements Enumeration { protected List visited; /* for depth-first traversal */ protected Stack stackOfVertices; /* or use a vector for the breadth-first */ protected Vector queueOfVertices; ... }

In our implementation, the two algoritms depth-first and breadth-first traversals are respectively coded in DFSEnumerator and BFSEnumerator classes. 4.2.5 Code-list Construction The code-list of a program consists of statements that carry only one unary or binary operation. To build up a code-list of a program, we have to split up complex operations into basic operations (arithmetic, intrinsic or function calls) by using temporary variables. This turns out to be a program transformation in which each assignment is rewritten into a sequence of simpler assignments that hold basic operations. Figure 4 shows an example of code-list construction

!

t(2) = a/b(3)+sqrt(2.*a)

v1 = v2 = v3 = t(2)

2.*a sqrt(v1) a/b(3) = v3+v2

Figure 4: An example of code-list construction The class CodeList is derived from the class MyNode and is implemented as follows: class CodeList extends MyNode { // AST return value for a created new assignment protected MyNode returnAST = null; // Access to symbol table protected SymbolTable symbTable = new SymbolTable(); // Stack of subtree vertices during the traversing static final Stack trees = new Stack(); // creates new vertices static final MyNodeFactory factory = new MyNodeFactory(); ... }

The algorithm we have developed and implemented uses a parse tree named returnAST of type MyNode. returnAST carries the transformed abstract syntax tree. Subtrees are created using the variable factory and stored in a stack named trees. The created 11

: A Control ow graph CFG of a program P : A code-list returnAST of P

1 Input 2 Output 5 begin 6 7 ; 8 9 foreach 10 11 if unary/binary 12 then 13 14 else 15 16 17 od 18 foreach 19 20 od 21 end

returnAST := null Stack :=

Traverse the CFG in breadth- rst order basic block bb do Traverse bb in post order a operation is found build an assign ass by popping AST subtrees from Stack insert ass into returnAST push the traversed subtree in Stack

header block bh do insert its AST into returnAST

Figure 5: Code-list construction algorithm temporary variables are added in the symbol table symbTable. The code-list construction algorithm is based on the control flow graph, which is traversed breadth-first. For each basic block, a code-list is built up using a postorder traversal through the abstract syntax tree of the basic block. The implemented algorithm can be sketched as in Figure 5. 4.2.6 Activity Analysis In order to eliminate unnecessary partial derivatives during the differentiation process, we need to determine the active variables of the program to be differentiated. In our current implementation we have used the assumption that a variable v is active if there is at least one execution path (a chain of data dependencies) from an independent variable to v . Thereby, we compute an over-approximation of the set of active variables since we ignore that there should be also one execution path from v to a dependent variable. For instance, if a and b are the independent variables and if we have the following basic block code: x = cos(6.*a)+1.5*b y = sqrt(10.)*(c+3.) z = x*y+x+y We will then obtain that a; b; x; z are the active variables throughout that basic block. To determine the active variables, we use an algorithm based on fixpoint calculation (fixed point iteration). Let us consider the set V of all program variables, and P (V ) the set of subsets of V . h P (V ); ; ;; V; \; [ i is a complete lattice (if e1 ; e2 2 P (V ), 12

: A CFG of a program P and its idependent variables ? : The set of P 's active variables

1 Input 2 Output f ixpoint 5 6 x ? 7 x x 8 x x 9 10 f ixpoint x 11 13 15 phi x 16 U se S 17 18 Def S S 19 20 U se S \x6 ; 21 22 x x [ Def 23 24

begin := ; while ( ) do := ( ); od := ; end funct ( ) Traverse in depth- rst order the CFG of the program P ( ) = variables used by S = comment: / comment: / ( ) = variable de ned by S = foreach statement of P do In := ( ) = if In then := (S ) od end Figure 6: Activity analysis algorithm

then the two elements admit a least element e1 \ e2 and a greatest element e1 [ e2 ). If we consider the function : P (V ) ! P (V ) defined such that (x) is the set of active variables computed from the already known set x of active variables through the program, then is a monotically increasing mapping and then admits a least fixpoint. The proof of this relies on the facts that for all subset x 2 P (V ), x admits a lower bound in P (V ) and if we have a x, then (a) (x). The algorithm we have implemented is based on the control flow graph of the program. It is coded in the class named ActiveVars. class ActiveVars { protected List actVars = new ArrayList(); ... }

Figure 6 describes the algorithm implemented for the activity analysis. 4.2.7 The Abstract Computational Graph Construction The Abstract Computational Graph (ACG) is a representation of the program which takes into account all its execution paths. For the class of codes (without loops) we stated earlier, the derived abstract computational graph is a DAG (direct acyclic graph) describing the chain of operations from the data inputs to the outputs. The abstract computational graph can be viewed as a flowchart i.e. a controlflow graph in which each basic block is expanded to a computational graph. If the abstract computational graph contains the local derivatives of program variables, then the abstract computational graph is linearized. A vertex of the abstract computational graph represents a program variable whose derivative information is not identically zero. The data structure we have used to represent an abstract computational graph vertex is the following: 13

class MyVertex { protected MyNode name; protected MyNode operator; protected List outgoingEdges; protected List incomingEdges; ... }

Since a program constant is not mapped as a vertex of the linearized abstract computational graph, the input/output vertices are respectively the only ones that have no predecessors/successors. An edge of the abstract computational graph represents a control or data dependency between two vertices. For instance, if we consider the following code fragment: if (x>3.) then y = sqrt(a) endif then y depends on a by flow (data) whereas y depends on x by control (see [25]). A partial derivative is associated to each edge of the abstract computational graph. An edge is described such that it contains the source and destination vertices. The data structure we have used to represent an edge of the abstract computational graph is the following: public class MyEdge { /* partial derivative */ protected MyNode deriv; /* Edge type: control/data */ protected String edgeType; /* source vertex number */ protected int source; /* destination vertex number */ protected int destination; ... }

The class MyEdge shows the basic item stored in a adjacency list within the class MyVertex. The linearized abstract computational graph is stored as a dictionnary in which each entry is of type MyVertex. class ACG { /* describes the ACG graph (key = String variable of vertex, value = MyVertex Object) */ protected Hashtable vertexMap = new Hashtable(); ... }

In the abstract computational graph, a vertex is searched using its field name, which represents a program variable. The program variable is stored as a String. The algorithm we use is based on the controlflow graph of the code-list program. The controlflow graph is traversed top-down whilst the local derivatives are computed using the information on active variables and then edges or vertices of the abstract computational graph are created. This algorithm make use of the basic block structure supplied by the controlflow graph. The abstract computational graph construction algorithm can be sketched as in Figure 7.

14

: The code-list's CFG and the List A of active variables : The Abstract Computational Graph vertexMap of the program

1 Input 2 Output 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

begin

vertexMap := ;

Traverse the CFG in breadth- rst traversal foreach basic block bb of the CFG do Traverse bb's AST in postorder if a traversed variable v 2 A then if MyVertex of v 2 vertexMap then update mv's information else build up MyVertex mv of v add mv into vertexMap

od foreach block header bh of the CFG do comment: / Each de ned variable within a basic block controled = comment: / by bh should be connected by a control dependency = build up MyVertex vh of bh check the control dependencies add vh into vertexMap

od end

Figure 7: abstract computational graph construction algorithm

4.3 Implementation Issues In this section, we present the hierarchy of our AD tool classes and the manner with which we use those classes to run and test the implemented algorithms. Also, we state some experiments we have done up till now. A class named Test allows us to experiment with our tool on some Fortran program examples. To ease operations for the user, we hope to develop a GUI (Graphical User Interface). This GUI is intended to simplify the use of the AD tool for a novice user by providing selecting/clicking utilities. The class Test we have used is implemented as follows: public static void main(String[] args) { try { DataInputStream dis = new DataInputStream(System.in); ... } catch (Exception e) { System.err.println ( "exception: " + e); e.printStackTrace(); } } }

4.3.1 The parser The parser (front-end) transforms the input Fortran into an abstract syntax tree. The parser specifications and the lexical analyzer are defined in the file minifort.g. The lexer MyFortranLexer and the parser MyFortranParser are derived respectively from the ANTLR classes Lexer and Parser. To parse a code, we have to invoke the start method translationUnit() in an object of type MyFortranParser. The following Java code shows an example of how to launch the parser. MyFortranLexer lexer = new MyFortranLexer (dis); lexer.setTokenObjectClass("MyToken");

15

MyFortranParser parser = new MyFortranParser (lexer); parser.setASTNodeType(MyNode.class.getName()); MyNode.setTokenVocabulary("MyFortranTokenTypes"); // invoke the parser parser.translationUnit(); ...

We intend to extend the parser in order to include some Fortran 95 features which frequently occur in numerical codes. To date, we have considered only freeformat Fortran 77 codes for our study. 4.3.2

The pretty-printer

The printer (back-end) which reconverts an abstract syntax tree back to a Fortran text is implemented via an ANTLR grammar. The printer specifications are defined in the file MyFortranPrinter.g. The class MyFortranPrinter is derived from the ANTLR class TreeParser. To print back an abstract syntax tree into a Fortran text, we have to invoke the start method translationUnit() in an object of type MyFortranPrinter. In the following Java code fragment, we have showed an example of use of the printer. MyFortranPrinter p = new MyFortranPrinter(); p.setASTNodeType(MyNode.class.getName()); // invoke the printer p.translationUnit(parser.getAST()); ...

4.3.3 AD tool skeleton The AD tool is a collection of classes that respect a certain hierarchy. Figure 8 shows the AD tool skeleton from ANTLR classes to current implemeted classes for our purpose of AD by source transformation based on vertex elimination techniques. The code generation phase, not covered here, will be described in a future report. Therefore, there is no class responsable for derivative code generation yet. At the same time we are investigating heuristics based on the Markowitz elimination criterion in order to improve elimination sequences for the derivative code generation. We have tested the current implemented algorithms on some small test case examples and we have successfully parsed and analysed the CFD Roe Flux code [12], built up its code-list and printed out the transformed program (200 lines of code).

5 Conclusions and Future Work In this paper, we have investigated the implementation of an AD tool based on a graph elimination approach and on source transformation. We have designed the abstract computational graph as a way to achieve that goal. We have described and implemented some algorithms (code-list construction, activity analysis, and computational graph construction) for our ongoing project. Our next tasks will be:

to investigate algorithms to generate derivative code in which the original code is interspersed with derivative calculations such that we can get a better execution order of operations and to maximize the cache machine performance. 16

TreeWalker

ACG

ASTStat

UTILITIES

TreeParser

MyFrPrinter

CFG

Enumerator

ActiveVars

MyForLexer CodeList

Lexer MyNode

SymbolTable

MyFrParser

Parser Parsing CommonAST Printing

Fortran AD Tool DFortran

Figure 8: AD tool skeleton

To find algorithms which yield near-optimal elimination sequences for the derivative code generation.

With this current AD approach, we expect to provide an optimal AD, which may beat hand-coded derivative performances in the class of codes arising from CFD flux calculations.

Acknowledgments We would like to thank the ANTLR developers, John K. Reid for his help in this project, and Monty Zukowski & John Mitchell whose GNU C translator enlightened us about ANTLR grammars.

17

A

The input language

As stated ealier, the input language is a subset of Fortran. We now describe its syntax by using an extension of the standard BNF (Backus-Naur Form) or context-free grammars. The notation hobjecti means 0 or more instances of object. The notation hobjecti+ means 1 or more instances of object. The notation (hobjecti)? means object is optional. We have defined a reduced syntax based on what we perceive to be the key statements in our intended applications.

< translationUnit > ::= < routine >* < routine > ::= < header > < declaration > < statement > end < header > ::= subroutine ( < reference >* ) < declaration > ::= < basic_type > < variable >* | dimension * | external/intrinsic * | parameter ( < assign >* ) < statement > ::= < assign > | if () then ( else < statement > )? endif | do , (, < stride >)? < statement > enddo | dowhile (< expr >) < statement > enddo | return | < statement >* < reference > ::= < name > | < name > ( < expr >+ ) < array > ::= < name > ( < dim >+) < variable > ::= < name > | < array > < dim > ::= < expr > | < expr > : < expr > < expr > ::= < constant > | < reference > | () | < unaryOp > < expr > | < expr > < binaryOp > < expr > | < funcCall > (< expr >* ) ... Figure 9: The Fortran input language specification

B Some notes on the choice of Java ANTLR generates Java that is, in our opinion much simpler than C++. For a wide view of Java, see [26, 27, 28, 29]. Java does not have pointers; this makes the type system safer. Java objects are passed by reference. This produces codes that are easy to debug. The programmer has not to worry about the memory management since this task is done by a garbage collector. Both of these properties are useful in writing compilers [30]. Unlike C++, the Java programming language does not allow operator overloading or multiple inheritance. By design, all defined classes derive from the basis class Object. Java supplies certain useful packages (java.io, java.lang, java.util), which are 18

easy to use. Like the STL (Standard Template Library) in C++ that provides full set of collections as well as sorting/searching algorithms on those collections, there is a JGL (Java Generic Library) freely available at www.ObjectSpace.com. However Java does not allow parameterized types whilst C++ does by using templates. One may argue about Java performance on applications since generally Java applications are compiled into a portable bytecode. This bytecode is interpreted at runtime and executed by the JVM (Java Virtual Machine). Nevertheless, according to the requirements of whether that you need (or not) a portable code and the nature of the problem at hand, one can use a native Java compiler (see for instance http://cafe.symantec.com). The use of a native compiler may lead to performances comparable to C++ or C. But, according to our needs stated ealier in this report the Java performance is not a major obstacle for our purpose, which is to transform a Fortran code to a new Fortran code. Only the performance of the generated Fortran code matters in reaching our objective.

References [1] G. Corliss and A. Griewank, editors. Automatic Differentiation: Theory, Implementation, and Applications. SIAM, 1991. [2] A. Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, 2000. [3] Mohamed Tadjouddine, Shaun A Forth, and John D Pryce. AD Tools and Prospects for Optimal AD in CFD Flux Jacobian Calculation. In Automatic Differentiation: From Simulation to Optimization, eds. George Corliss, Christele Faure, Andreas Griewank, Laurent Hascoet, and Uwe Naumann. Springer, New York, pages 145–155, 2001. [4] A. Griewank and S. Reese. On the Calculation of Jacobian Matrices by the Markowitz rule for Vertex Elimination. G. Corlis and A. Griewank Ed., Automatic Differentiation: Theory, implementation, and applications, SIAM, pages 126–135, 1991. [5] Uwe Naumann. Efficient Calculation of Jacobian Matrices by Optimized Application of the Chain Rule to Computational Graphs. PhD thesis, Technical University of Dresden, February 1999. [6] T. Parr, J. Lilly, P. Wells, R. Klaren, M. Illouz, J. Mitchell, S. Stanchfield, J. Coker, M. Zukowski, and C. Flack. ANTLR Reference Manual. Technical report, MageLang Institute’s jGuru.com, January 2000. Available via http://www.antlr.org/doc/index.html. [7] Christian H. Bischof, Peyvand M. Khademi, Ali Bouaricha, and Alan Carle. Efficient computations of gradients and jacobians by dynamic exploitation of sparsity in automatic differentiation. Optimization Methods and Software, 7:1–39, 1996.

19

[8] Christian Bischof and Mohammad Haghighat. Hierarchical approaches to automatic differentiation. Technical Report CRPC-TR96647, Center for Research on Parallel Computation, Rice University, April 1996. [9] M. Tadjouddine, F. Eysette, and C. Faure. Sparse Jacobian Computation in Automatic Differentiation by Static Program Analysis. In Proceedings of the Fifth International Static Analysis Symposium, number 1503 in Lecture Notes in Computer Science, pages 311–326, Pisa, Italy, 1998. Springer-Verlag. [10] Thomas F. Coleman and Arun Verma. The efficient computation of sparse jacobian matrices using automatic differentiation. SIAM Journal of Scientific Computing, 19(4):1210–1233, July 1998. [11] D.B. Christianson, A.J. Davies, L.C.W. Dixon, R. Roy, and P. Vam Der Zee. Giving reverse differentiation a helping hand. Optimization Methods and Software, 8:53–67, 1997. [12] S.A. Forth. Automatic Differentiation for Flux Linearisation. AMOR Report 98/1, Cranfield University (RMCS Shrivenham), Swindon SN6 8LA, England, 1998. Poster Presentation at 16th International Conference on Numerical Methods in Fluid Dynamics, July 6th-10th, 1998, Arcachon, France. [13] Shaun A Forth and Trevor P Evans. Aerofoil Optimisation via Automatic Differentiation of a Multigrid Cell-Vertex Euler Flow Solver. 2001. In Automatic Differentiation: From Simulation to Optimization, eds. George Corliss, Christele Faure, Andreas Griewank, Laurent Hascoet, and Uwe Naumann. Springer, New York. [14] C. H. Bischof, L. Roh, and A. J. Mauer-Oats. ADIC: An extensible Automatic Differentiation Tool for ANSI-C. Software-Practice and Experience, 27:1427– 1456, 1997. [15] J.D. Pryce and J.K. Reid. ADO1, a Fortran 90 code for automatic differentiation. Technical Report RAL-TR-1998-057, Rutherford Appleton Laboratory, Chilton, Didcot, Oxfordshire, OX11 OQX, England, 1998. Available via ftp://matisa.cc.rl.ac.uk/pub/reports/ prRAL98057.ps.gz. [16] A. Griewank, D. Juedes, and J. Utke. ADOL-C, a package for the automatic differentiation of algorithms written in c/c++. ACM Transactions on Mathematical Software, 22:131–167, 1996. [17] Christian H. Bischof, Alan Carle, Peyvand Khademi, Andrew Mauer, and Paul Hovland. Adifor 2.0 users’ guide. Technical Report ANL/MCS-TM-192, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439 USA, 1995. [18] Christèle Faure and Yves Papegay. Odyssée user’s guide, version 1.7. Technical Report 0224, INRIA, Unité de Recherche, INRIA, Sophia Antipolis, 2004 Route des Lucioles, B.P. 93, 06902, 20

Sophia Antipolis Cedex, France, September 1998. Available via http://www.inria.fr/safir/SAM/Odyssee/odyssee.html. [19] Ralf Giering. Tangent linear and adjoint model compiler. Technical Report Manual Version 1.2, TAMC Version 4.8, Center for Global Change Sciences, Department of earth, Atmospheric, and Planetary Science, MIT, Cambridge, MA 02139, USA, December 1997. [20] Gary L. Schaps. Compiler Construction with ANLTR and Java. Dr. Dobb’s Journal, 1999. [21] A.V. Aho, R. Sethi, and J.D. Ullman. Compilers: principles, techniques, and tools. Addison-Wesley, 1986. [22] Donald E. Knuth. The Art of Computer Programming, Volume 1: Fundamental algorithms. Adison-Wesley, 1997. [23] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Adison-Wesley, 1974. [24] Dieter Jungnickel. Algorithms and Computation in Mathematics Volume 5: Graphs, Networks and Algorithms. Springer-Verlag, 1998. [25] B. Chapman and H. Zima. Supercompilers for Parallel and Vector Computers. Addison-Wesley Publishing Company, 1991. [26] S. Cohen, T. Mitchell, A. Gonzalez, L. Rodrigues, and K. Hammil. Professional Java Fundamentals. Wrox Press, 1996. [27] Mark Allen Weiss. Data Structures & Problem Solving Using JAVA. AddisonWesley, 1998. [28] Liwu Li. Java: Data Strutures and Programming. Springer-Verlag, 1998. [29] Bruce Eckel. Thinking in JAVA. Prentice Hall PTR, 1998. [30] Andrew W. Appel. Modern Compiler Implementation in JAVA. Cambridge University Press, 1998.

21

1996-Using Elimination Methods to Compute Thermophysical

Implementation research to support the initiative on the elimination of ...

Protocol Design and Implementation Using Formal Methods - CiteSeerX

Row-wise backward stable elimination methods for the ... - CiteSeerX

Row-wise backward stable elimination methods for the ... - CiteSeerX

Analysis of TCP Performance on Ad Hoc Networks Using ... - CiteSeerX

The Study of 3D-from-2D using Elimination - CiteSeerX

On Approximation Methods by using Orthogonal ... - CiteSeerX

Implementation of Elimination and Choice Expressing

On Implementation of Word-based Compression Methods

the influence of carbonization elimination on the ... - CiteSeerX

Architecture Implementation Using the Machine ... - CiteSeerX

ON THE DISTRIBUTED IMPLEMENTATION OF LOTOS ... - CiteSeerX

Effect of sapphire substrate nitridation on the elimination ... - CiteSeerX

Multivariate Methods in Enterprise System Implementation ... - CiteSeerX

On the timeâspace complexity of geometric elimination ... - CiteSeerX

Note on the stability of Gauss-Jordan elimination for ... - CiteSeerX

Efficient FORTRAN implementation of the gaussian elimination and ...

Sparse Gaussian Elimination on High Performance ... - CiteSeerX

Ad Hoc Positioning System using AOA - CiteSeerX

Crosslingual Implementation of Linguistic Taggers using ... - CiteSeerX

Implementation of BIST Capability using LFSR ... - CiteSeerX

Implementation of JPEG2000 Arithmetic Decoder using ... - CiteSeerX

On Formal Certification of AD Transformations - CiteSeerX

On the Implementation of AD using Elimination Methods ... - CiteSeerX