Calculating lenient programs' performance

0 downloads 0 Views 146KB Size Report
Abstract. Lenient languages, such as Id Nouveau, have been proposed for program- .... Fun. = (Ans !Ans) Time. List. = (nil + (D List)) Time t. 2 Time = N at? 2 Env.
Calculating lenient programs' performance Paul Roe Abstract Lenient languages, such as Id Nouveau, have been proposed for programming parallel computers. These languages represent a compromise between strict and lazy languages. The operation of parallel languages is very complex; therefore a formal method for reasoning about their performance is desirable. This paper presents a non-standard denotational semantics for calculating the performance of lenient programs. The semantics is novel in its use of time and time-stamps.

1 Motivation Recently interest has grown in parallel programming using functional languages, particularly lenient languages such as Id Nouveau [8]. In order to derive and debug the performance of parallel programs, it is necessary to reason about their performance (execution time). Reasoning about the performance of sequential functional programs has always been an informal a air. However given the inherent operational complexity of parallel functional languages, a more formal approach to performance analysis is desirable. A possible exception to this are languages for SIMD machines. To tackle this problem a non-standard semantics has been developed for calculating the performance of lenient functional programs.

2 Background How should parallel performance be measured ? To measure sequential performance it is sucient to count the total number of operations (perhaps only certain ones) that are performed during an evaluation. The performance of a parallel program depends on many aspects of the machine on which it is run, including: the machine's number of processors and its scheduling policy. Including these aspects in reasoning about performance is too complicated. Eager [2] states that the average parallelism of a program is a useful performance measure. This may be used to bound the performance of a program running on a P processor machine. Eager proves that Author's address: Department of Computing Science, The University, 17, Lilybank Gardens, Glasgow. G12 8QQ. Email: [email protected] 

1

the average parallelism is equivalent to the ratio of the single processor execution time, to the execution time with an unbounded number of processors. Thus a good guide to a program's parallel performance may be calculated with its sequential evaluation time and its evaluation time with an unbounded number of processors. Some shortcomings with this method are discussed in section 6. Lenient languages represent a compromise between strict and lazy languages. Strict languages are simple to reason about because their operational behaviour is compositional. For example the sequential cost (performance) of the (full) application f E1 E2 is the sum of the costs of E1, E2 and the application. The cost of the application if E1 and E2 are evaluated in parallel (with an unbounded number of processors) is the maximumof the costs of E1 and E2 plus the cost of the application. This kind of technique is called step counting, and it forms the basis of the work in [6, 7], which concerns the automatic complexity analysis of a strict FP language. A problem with these languages, and the step counting technique, is that they do not support pipelined parallelism. Lazy languages are dicult to analyse because their operational behaviour is not compositional. Approaches to the performance analysis of sequential lazy languages, such as [1, 10, 11] are essentially based on strictness analysis. They use strictness analysis to determine the degree to which expressions will be evaluated. This can then be used to determine the costs of evaluating expressions. Unfortunately this approach is not sucient to analyse parallel lazy languages. This is because it is not sucient to know to what degree expressions are evaluated; in addition it is necessary to know, when expressions are evaluated. Hudak and Anderson [5] have devised an operational semantics for parallel lazy languages, based on partially ordered multisets. This could be used as the basis for a performance semantics. However the approach is extremely complicated and unwieldy, and there are some technical problems with it.

3 A lenient language This section informally describes the lenient language which later will be analysed. Lenience means that the language is strict in expressions which are sequentially evaluated and lazy in expressions which are evaluated in parallel. Thus the language supports pipelined parallelism, unlike parallel strict languages. However this language is not as expressive as a lazy one. This lenient language di ers from Id Nouveau because it explicitly expresses parallel and sequential evaluation.

3.1 Rationale The syntax of the lenient language is shown in Figure 1. The language constructs should be familiar except for plet and {E} (parentheses (E) denote expression grouping). The plet construct is a parallel let; its binding is evaluated in parallel with its main

c 2 Con v; h; t 2 Var E ::= c j v j EE j \v . E j let v=E in E j plet v=E in E j letrec v=E in E j E:E j case E of []->E (h:t)->E j (E) j {E} Figure 1: Syntax expression. A more general approach is to evaluate all function applications and other constructs in parallel. However, most parallel programs have only a few places where parallel evaluation is necessary. Therefore, for simplicity all parallelism will be made explicit via plet; there will be no implicit parallelism. The plet construct is the only source of parallelism in the language; however other parallel constructs such as those proposed in [3] could easily be added and analysed. Rather than xing the costs of operations into the semantics, an explicit annotation is used to label expressions whose evaluation should be `counted'. The annotation {E} means that the evaluation of {E} should take one step in addition to the cost of evaluating E. For example to analyse the cost of a parallel sorting program, only comparisons might be measured; thus applications of the comparison operator would be annotated with curly braces.

3.2 An informal explanation of the language's evaluation The evaluation of the lenient language is unusual; therefore before explaining the semantics some example expressions are shown and their evaluation are discussed. Note that providing E1 is completely de ned, the two expressions shown below have the same meaning but not necessarily the same performance. let v = E1 in E

plet v = E1 in E

Consider the three expressions shown below: let f = \x.[] in let y = ? in f y

let f = \x.[] in plet y = ? in f y

let f = \x.[] in plet y = ? in f (case y of [] -> 0 (h:t) -> 1)

The rst expression evaluates to bottom, because let is strict and hence let y = ? in f y does not terminate. The second expression evaluates to []. The plet evaluates y and f y in parallel. Applications, like f y, evaluate their arguments. However like strict languages variables need not be evaluated because: In the lenient language all variables' evaluation is started at binding time but their evaluation is not necessarily completed then. Thus all variables are either evaluated or being evaluated. The third expression evaluates to bottom. This is because the application of f causes its argument to be evaluated. The evaluation of case needs the value of y. Therefore the case expression fails to terminate, as does the f application.

4 A performance semantics 4.1 Time Step counting does not work for lenient languages. Consider the evaluation of the evaluation of E1 and E should proceed in parallel, with no unnecessary synchronisation. Synchronisation only occurs if E tries to access the value of v. When this happens two possibilities arise: either v's evaluation will have completed or it will still be being evaluated. If v's evaluation has completed it should be accessed exactly the same as if it had been evaluated sequentially by a let. If the evaluation of v has not completed E should wait for its evaluation to complete. (In an implementation this arises as one task blocking on another.) To reason about the length of one task (v) and the time for another task (E) to require its value, a notion of time is required. Thus rather than counting the number of steps to evaluate an expression, all expressions will be evaluated at some time and the time at which evaluation completes will be calculated. To do this expressions must be evaluated relative to some time, just as expressions are evaluated within an environment. Two pieces of temporal information are required about evaluations. Firstly the time spent evaluating an expression is required. Secondly the time at which values become available is needed. These times may be di erent, since the result of one task may be several values which do not simultaneously become evaluated. For example a list whose evaluation starts at plet v = E1 in E;

time t may become fully evaluated at time t . However individual elements of the list may become evaluated at times before or after t . In a sequential system this is of no consequence; in a parallel system this can a ect performance. Speci cally, pipelining relies on this; for example, one task may consume elements of a list while another task produces elements of the list. Times have also been used in real time functional languages, for example Ruth [4]. However, in these languages times are used for a di erent purpose; they are used to respond to real-time events and to avoid non-determinism. 0

0

4.2 The semantics Rather than augment a standard semantics with temporal information, a combined semantics has been de ned. In this semantics, the standard semantics and temporal information are mutually dependent. The valuation function M has the form:

M : E ! Env ! Time ! Ans Expressions are evaluated within an environment and at a speci c time to produce answers. The semantic domains are: = ; 2 D = Basic = Fun = List = t 2 Time =  2 Env = B = Ans

D  Time

Basic + Fun + List B  Time (Ans ! Ans )  Time (nil + (D  List ))  Time

Nat

Var ! D constants and primitive functions ?

All values (D) are time-stamped with the time at which they become available. Each evaluation returns a pair (Ans), comprising a value (D) and a Time representing the time at which the evaluation nished. In general, the time required to evaluate an expression is not the the same as the time at which the expression's value becomes available. For example variables already spawned by plet require no evaluation, but they may not yet be available. Also, the elements of a list may become available before the entire list is evaluated; this is necessary for pipelined parallelism. The meaning of a variable v, in an environment  and at time t, is:

M[ v]]  t = h[v]; ti The time-stamped value is looked-up in the environment. The variable is either evaluated or being evaluated, thus no time is required to evaluate it. Therefore the input time t is returned as the new time after v's evaluation.

The meaning of let is:

M[ let v = E1 in E2]  t = M[ E2] [v 7! ] t h ; t i = M[ E1]  t 0

0

The let construct evaluates its binding (E1) and then it evaluates its main expression (E2). Thus the binding is evaluated at the current time t and the main expression is evaluated at the time when the evaluation of the binding nishes. The valuation function is strict in its time argument: M[ E]]  ? = h?; ?i. Therefore if the let binding evaluates to bottom, t will be bottom and hence the whole construct will evaluate to bottom. In this way times are used to ensure the strictness of sequential evaluation. This may be contrasted with plet: 0

M[ plet v = E1 in E2]  t = M[ E2] [v 7! ] t h ; i = M[ E1]  t The di erence between plet and let is that for plet, the main expression's evaluation (E2) begins at the same time as the bindings evaluation (E1). Unlike the sequential let, the binding may evaluate to bottom and the main expression may still be de ned. Synchronisation occurs when E2 requires v's value. The meaning of cons is:

M[ E1:E2]  t = hhcons ; ti; t2i h ; t1i = M[ E1]  t h ; t2i = M[ E2]  t1 Operationally cons produces a cons cell, then the head of the cons is evaluated and then the tail of the cons is evaluated. Many di erent patterns of evaluation for cons are possible; for example E1 and E2 could be evaluated in parallel. This cons, although sequential, can give rise to pipelining. Notice that the cons value is time-stamped with the current time. The head and tail will often have di erent time-stamps from this cons time-stamp. The semantics for {E} simply increments the time at which E is evaluated:

M[ {E}]  t = M[ E]]  (t +1) The crucial semantics is for case: = case M[ E]]  t hhnil ; t1i; t2i : M[ E1]  (max t1 t2) [] ->E1 (x:xs) ->E2 ]  t hhcons ; t1i; t2i : M[ E2]  (max t1 t2)  = [x 7! ; xs 7! ]

M[ case E of

0

0

The case construct evaluates E at time t. Since case requires the value of E, if necessary, it must wait for this value to become available (synchronise). It does not wait for the whole list to become evaluated but only the top cons or nil. The value E become available at time t1. The evaluation of E takes until t2. Therefore evaluation of E1 or E2 starts at the later of the two times t1 and t2. All primitive operators, such as addition, must synchronise in this way. That is all primitive operators must wait for their strict operands to become available. The complete semantics, except for primitive operators and constants, is shown in Figure 2.

5 A simple proof Using the semantics proofs can be constructed about the performance of parallel programs. As with conventional complexity analysis one does not calculate the performance of arbitrary programs. Rather the performance of core algorithms and library functions are calculated. Below, two program fragments are proved to have the same performance. A kind of idempotence is proved. This allows some redundant plets to be removed from programs; this will improve programs' eciency. Any expression having the form of the left hand side may be replaced by the more ecient form shown on the right. plet a = E in plet b = a in

=

Emain

plet a = E in Emain [a=b]

The left hand side is equal to, at time t and in an environment :

M[ Emain]]  [b 7! fst (M[ a]]  t)] t  =  [a 7! fst (M[ E]]  t)] 0

0

0

= var semantics

M[ Emain]]  [b 7!  [a]] t 0

0

= by substitution

M[ Emain [a=b]]]  t  =  [a 7! fst (M[ E]]  t)] 0

0

= meaning of the right hand side 2 This proof may seem intuitively obvious; however beware, for example plet x and E have the same meaning, but they do not have the same performance.

=

E

in x

M[ E]]  ?

= h?; ?i

If t 6= ? :

M[ v]]  t

= h[v]; ti

M[ E1 E2]  t

= f (M[ E2]  t1) hhf; i; t1i = M[ E1]  t

M[ \v.E]]  t

= hhh ; t i: M[ E]] [v 7! ] t ; ti; ti

M[ let v = E1 in E2]  t

= M[ E2] [v 7! ] t h ; t i = M[ E1]  t

0

0

0

0

M[ letrec v = E1 in E2]  t = M[ E2] [v 7! ] t h ; t i = x (h ; i: M[ E1] [v 7! ] t) 0

0

M[ plet v = E1 in E2]  t

= M[ E2] [v 7! ] t h ; i = M[ E1]  t

M[ []]  t

= hhnil ; ti; ti

M[ E1:E2]  t

= hhcons ; ti; t2i h ; t1i = M[ E1]  t h ; t2i = M[ E2]  t1

M[ case E of

= case M[ E]]  t hhnil ; t1i; t2i : M[ E1]  (max t1 t2) hhcons ; t1i; t2i : M[ E2]  (max t1 t2)  = [x 7! ; xs 7! ]

[] ->E1 (x:xs) ->E2 ] 

M[ {E}]  t

t

0

0

= M[ E]]  (t +1) Figure 2: The semantics

6 Discussion The main problem with the semantics is its complexity. In [9] a pipelined version of Quicksort is analysed; this occupies ve pages! In many ways this is not surprising; the operational behaviour of parallel programs is very complex. The semantics may be regarded as a speci cation for a parallel interpreter or simulator. By treating the semantics as a set of transformation rules, parallel program simulation may be performed by program transformation. The semantics may also be augmented to collect other information, for example: task length statistics and parallelism pro les. This information is hard to reason about using the semantics but it may be useful for simulation purposes. The semantics can on some occasions be too abstract. For example the semantics does not specify that tasks must terminate. Thus speculative tasks may be generated, which are dicult to implement. A more fundamental problem is that average parallelism is not always an accurate measure of performance. The optimal algorithms for some problems, like sorting, depend not only on the input data but also on the number of processors a parallel machine has. Many problems have di erent optimal algorithms for sequential and parallel evaluation, for example parallel pre x.

7 Conclusions A semantics for the time analysis of a lenient language has been presented. The semantics can be used to analyse small programs but the analysis of larger programs is problematic. In general more theorems concerning the performance of general patterns of computation are required. Alternatively a higher level language could be used to simplify the programs which must be reasoned about. For some programs a more detailed analysis involving a nite number of processors is desirable. The lenient language is more expressive than a strict language but not as expressive as a lazy language. It seems dicult to apply this approach to a parallel lazy language.

8 Acknowledgements I would like to thank my supervisor, Professor Simon Peyton Jones, for his encouragement and for helping me to debug many of my ideas.

References [1] B Bjerner and S Holmstrom. A compositional approach to time analysis of rst order lazy functional programs. In 1989 ACM Conference on Functional Programming Languages and Computer Architecture, London, pages 157{165, 1989.

[2] D L Eager, J Zahorjan, and E D Lazowska. Speedup versus eciency in parallel systems. Technical Report 86-08-01, Dept. of Computational Science, University of Sasketchewan, August 1986. [3] C Hankin, G Burn, and S L Peyton Jones. A safe approach to parallel combinator reduction. Theoretical Computer Science, 56:17{36, 1988. [4] D Harrison. Ruth: A functional language for real-time programming. In PARLE, pages 297{314, 1987. [5] P Hudak and S Anderson. Pomset interpretations of parallel functional programs. In 1987 ACM Conference on Functional Programming Languages and Computer Architecture, Portland, pages 234{256. Springer Verlag LNCS 274, September 1987. [6] S B Jones. Investigation of performance achievable with highly concurrent interpretations of functional programs. Final Report, ESPIRIT project 302, October 1987. [7] D LeMetayer. Mechanical analysis of program complexity. ACM SIGPLAN Symposium on Programming Languages and Programming Environments, 20(7), 1985. [8] R S Nikhil, K Pingali, and Arvind. Id Nouveau. Technical Report memo 265, Computational Structures Group, Laboratory for Computer Science, MIT, July 1986. [9] P Roe. Parallel Programming using Functional Languages. PhD thesis, Department of Computing Science, University of Glasgow, 1991. [10] M Rosendahl. Automatic complexity analysis. In 1989 ACM Conference on Functional Programming Languages and Computer Architecture, London, pages 144{156, 1989. [11] D Sands. Complexity analysis for a lazy higher order language. In K Davis and J Hughes, editors, Functional Programming: Proceedings of the 1989 Glasgow Workshop, 21-23 August 1989, Fraserburgh, Scotland, Springer Workshops in Computing. Springer Verlag, 1990.

Suggest Documents