An Environment for the Design and Performance Evaluation of Portable Parallel Software Report Number : EDPEPPS/22 Deliverable Number : 2.1.4
Final Syntax De nition of SimPVM T. Delaitre, M.J. Zemerly, J. Bourgeois, G. Justo, S. Winter Centre for Parallel Computing, University of Westminster London W1M 8JS Email:
[email protected] Web: http://www.cpc.wmin.ac.uk/edpepps March 28, 1997
1 Introduction This reports de nes the nal syntax for the SimPVM language used in the EDPEPPS project. The SimPVM language is used to describe the \simulatable" parallel programs. The language is an important part for the integration of the toolset. It forms the interface between the graphical design tool (PVMGraph) [3] and the simulation utility SES/Workbench [7]. The graphical design tool produces a set of C/PVM les (.c) which are preprocessed by the Tape/PVM tracing facility [4] to obtain instrumented les ( t.c). The instrumented C source les are translated using the SimPVM Translator into a queueing network representation suitable for the SES/Workbench graph le (.grf). The graph le is then fed into the SES/Workbench to generate an executable model using some SES/Workbench utilities, libraries, declarations and the PVM platform model. Earlier versions of SimPVM (1.0 [10] and 1.1 [11]) provided only some limited elements of simulatable parallel programs. The EDPEPPS SimPVM prototype version 1.52 [2] oered substantial improvements over the earlier versions and provided the ability to handle regular C- les (e.g. Pointers, C Functions, if-else, loops, etc.) and eliminated features not found in C syntax such as process and exec. However, SimPVM v1.52 still has some important limitations which broadly t within the following main items: Include les are ignored and not written to the .grf le. Common include les (e.g. stdio.h, stdlib.h, math.h) are already de ned in SES Workbench. #de ne macros are correctly written to the resulting .grf le. However, in SES preprocessing phase, the macros are considered as \global", i.e. common to every program of the design. To solve this problem there is a need to preprocess and unfold the macros during or before the translation phase. Structures (struct or typedef) are not recognized by the SimPVM translator. Preprocessing directives other than #de ne and #include are not supported (for example #ifdef, #ifndef, #endif and so on) The use of \switch" and \case" is not supported as well as some unary operations (e.g. ? &). The previous versions of SimPVM were implemented using Lex and Yacc. Recently, a tool called SAGE++ [5] which can handle regular C-grammar has emerged. This tool will be used in other components of the EDPEPPS environment and will also be used in SimPVM to have the capability of handling the full C-syntax. However, some minor limitations will still be present and these will be described in this report. Other planned features in the nal version of SimPVM include the use of 1
random number streams and probabilistic distribution functions. These features are available in SES and will be used in a C-like fashion. PVM calls which will be implemented in SES in the nal version of the toolset (e.g. group functions) will also be added. A computational model capable of estimating CPU cost of computational blocks will also be added to the nal version of the EDPEPPS toolset and features to handle this model will be added to the nal SimPVM version. Other keywords used in SimPVM such as \cpudelay", \delay", \cputime" and \get time" will also be discussed. The following section will describe the planned features in the nal version of SimPVM.
2 Features and Grammar of the SimPVM language
2.1 Features of Sage++
Sage++ is an attempt to provide an object oriented toolkit for building program transformation systems for Fortran 77, Fortran 90, C and C++ languages. Sage++ is intended to be used by researchers interested in building parallelizing compilers, performance analysis tools, and source code optimizers. It is designed as an open C++ class library that provides the user with a set of parsers, a structured parse tree, a symbol and type table and access to programmer annotations embedded in the source text. The heart of the system is a set of functions that allow the tool builder complete freedom in restructuring the parse tree and a mechanism (called unparsing) for generating new source code from the restructured internal form. The library is organized as a class hierarchy that provides access to the parse tree, symbol table and type table for each le in an application project. There are ve basic families of classes in the library: Project and Files, Statements, Expressions, Symbols, and Types. In Sage++, it has been decided to add the control ow structures and data dependence analysis primitives on top of the user level class library. In this way, they can be easily modi ed or extended by the tool user. This aspect of Sage++, however, is not yet complete.
2.1.1 Overview of Sage++
In this section we provide an overview of the Sage++ library. There are ve basic families of classes in the library: Projects and Files which correspond to source les in a multi-source application project; Statements which correspond to the basic source statements in Fortran90, C and C++; Expressions which are contained within statements; Symbols which are the basic user de ned identi ers; and Types which are associated with each identi er and expression. In addition, the SgAttribute class allows the users to add their own information to Sage++ objects. Attributes can be attached to SgStatement, SgExpression, SgSymbol, and SgType objects. In Sage++, program parsing and program analysis and restructuring are divided into two phases. Application projects in Fortran77, Fortran90, C and C++ are rst parsed, one le at a time to produce a machine independent binary internal format called a .dep le. For example, given an application with source les Main.f, Subs.f, c++funs.C, cfuns.c one invokes the Fortran parser cfp or the C parser pc++ to generate the corresponding .dep les. Finally the user builds a project le, MyProject.proj which lists each of the .dep les, one per line. In this example, the .proj le is: Main.dep Subs.dep c++funs.dep cfuns.dep
The source language type is encoded within the .dep le. It should be noted that the .dep le is a complete translation of the source (including comments), and the original source, up to the line numbers of statements, can be regenerated. Note that pc++ passes the les through a standard preprocessor before actually parsing them and the comments are discarded by the preprocessor. However, 2
pC++2dep does not include the preprocessing step, and thus comments are retained (but no preprocessing is done). The purpose of the project le is so that it is possible to exploit inter-procedural analysis.
2.1.2 Limitations
Sage++ is a powerful tool, but it still has a number of important limitations. The most important of these is that it is not easy for users to add language extensions to Fortran or C to the system. In principle this is not dicult. To add a new statement to the language one must extend the parser which is based on the GNU Bison version of YACC. A new node type must be added to the internal form and a corresponding subclass added to the Sage++ hierarchy. The unparser module, which is table driven, must be extended to recognize this new node. However, this is not an easy task because it requires a complete understanding of the internal parser structures. Other limitations which will be handled by the SimPVM translator is that the macros (#de ne) need to be unfolded in subroutines before calling Sage++. Also as in the prototype version include les are handled properly in SES and will be ignored by SimPVM.
2.2 Computation Characteriser 2.2.1 The cputime function
A computational block characteriser (or time analyser) which traverses a C/PVM program (based on Sage++) is being implemented . The characteriser takes the output of the Tape/PVM instrumented les ( t.c) of the C/PVM code generated by PVMGraph and inserts the SimPVM call cputime to characterise the number of machine instructions within each sequential C code fragment. This characteriser is called only in the simulation path to estimate the time taken by computational blocks within a parallel algorithm. The costs associated with the various instructions are kept in a le in the hardware layer accessible by the SES utilities. These costs will be obtained by benchmarking of dierent instructions on dierent machines. Assumptions have been made to reduce the number of possible machine instructions to 43. These assumptions will be described in details in the nal simulation model. The output le of the characteriser has the same extension as its input (i.e. t.c). Figure 1 shows the location of the characteriser (time analyser) call within the EDPEPPS environment. C source files .c simtapepp
0
_t.c
Add cputime function
time analyser _t.c simpvm translator .grf
Figure 1: Location of the time analyser within the EDPEPPS environment The cputime SimPVM instruction is a simple function call which has a xed number of parameters (a total of 31). This is dierent from the number of machine instructions because the instruction cache duplicate some of the instructions (hit or miss) and only the last parameter in the cputime call 3
is used to select the hit or miss instructions as will see later. Each parameter of the cputime function represents the number of times each instruction is executed within the sequential C code fragment . The only exception is the last parameter IC. An IC value greater than 0 means that the sequential code has a size of IC assignments. Therefore, the instruction cache is applied to an entire block of sequential code and not for each instruction. An IC value of 0 indicates that the instruction cache is over owed. A syntax of the cputime call is given below: cputime(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,func,pfunc,cif,cfi,IC)
Two sets of measurements are required for all arithmetic and store operations depending whether the instruction cache is set to miss (IC=0) or hit (IC > 0 but depends on the size of the cache if ICsizeof(word) > cache size then miss otherwise it is a hit)). These parameters are: a = store/load integer with data cache hit (e.g. \a =" , int a;) b = store/load integer with data cache miss (e.g. \a =", int a;) c = store/load oat with data cache hit (e.g. \a=", oat a;) d = store/load oat with data cache miss (e.g. \a=", oat a;) e = access to array of dimension 1 with data cache miss (e.g. \a[i]", int a;) f = access to array of dimension 1 with data cache hit (e.g. \a[i]", int a;) g = overhead for array dimension (= array dimension - 1) (e.g. \a[i][j]" gives 1) h = add integer with data cache miss (e.g. \+/- a", int a) i = add integer with data cache hit (e.g. \+/- a", int a) j = multiply integer with data cache miss (e.g. \* a", int a) k = multiply integer with data cache hit (e.g. \* a", int a) l = divide integer with data cache miss (e.g. \/ a", int a) m = divide integer with data cache hit (e.g. \/ a", int a) n = add oat with data cache miss (e.g. \+/- a", oat a) o = add oat with data cache hit (e.g. \+/- a", oat a) p = multiply oat with data cache miss (e.g. \* a", oat a) q = multiply oat with data cache hit (e.g. \* a", oat a) r = divide oat with data cache miss (e.g. \/ a", oat a) s = divide oat with data cache hit (e.g. \/ a", oat a) t = logarithm (e.g. log(x)) u = exponential (e.g. exp(x)) v = square root (e.g. sqrt(x)) w = trigonometric functions (e.g. sin(x); cos(x); tan?1 (x); cosh(x)) x = power (e.g. power(a, b), i.e. ab) y = absolute value (e.g. abs(x)) z = loop overhead ( e.g. for(i = 0; i < n; i + +)) func = function calls (e.g. dummy()) pfunc = number of parameter in a function call (e.g. \dummy(a)" gives 1) c = cast from oat to integer (e.g. \(int)" in a=(int) b, int a and oat b;) cif = cast from integer to oat (e.g. \( oat)" in b=( oat) a, int a and oat b;) IC = this variable determines whether the instruction cache is used or not. If 0 instruction cache miss. if IC > 0 then the total number of bytes required to store the instructions is computed and tested whether it is less (cache hit) or greater (cache miss) than the size of the instruction cache. For example the following cputime call will be translated to: cputime(2,1,3,1,0,2,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0);
2 store integer with data cache hit 1 store integer with data cache miss 3 store oat with data cache hit 1 store oat with data cache miss 2 access to array of dimension 1 with data cache hit 1 add integer with data cache miss 4
1 multiply oat with data cache miss 1 exponential 1 absolute value Last 0 means all these instructions are with instruction cache miss
2.2.2 location of the cputime functions
The cputime functions are located : 1. Just before a pvm call (except pvm mytid() function). 2. Just before the end of the body of a function, a loop or a conditional expression. 3. Just before the beginning of a loop or a conditional expression except in the case of nested loops. The following example shows how the characteriser generates the cputime function parameters. Example: Let taskcpu t.c be the following program : void nothing(int x,int y) { x=y+z; y=z/x; } main() { int i,a[10000],b,c,f; int info,mytid; float d, e; /* Task body */ mytid = pvm_mytid(); /* my task id */ for (i=0;i