Jun 1, 1990 - In the execution of an imperative language on a traditional computer, ... eager evaluation [10], less well known in the imperative language ...
Understanding Eager Evaluation of Imperative Instruction Streams, Part 1 Presented at IBM Yorktown Heights June 1, 1990 under a slightly different title, with slightly different content. Augustus K. Uht and S. ShouHan Wang University of California, San Diego Dept. of Computer Science and Engineering, C-014 La Jolla, California 92093 (619) 534-6181
October 1, 1990
This work was supported in part by the National Science Foundation under Grant No. CCR-8910586.
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
1
Abstract It has been demonstrated that eager evaluation and branch prediction, separately, can be employed to improve the performance of imperative instruction streams. Branch prediction is well known and has been exhaustively analyzed in the literature. Eager evaluation is less well known. It is desired to understand it better, hence exhaustive analyses are undertaken herein. The goal of the motivating analysis is to determine how processor resources should be allocated given an eagerly evaluated branch. The results of this analysis demonstrate that, under certain conditions, branch prediction gives better performance than eager evaluation. The model is examined further; either eager evaluation or branch prediction may be preferred depending on the circumstances. It is assumed throughout that the branch resolution time is less than the execution time of either path.
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
Index Terms • Branch or control dependencies • Branch prediction • Concurrency • Control-flow • Eager evaluation • General purpose computers • Out-of-order execution • Parallelism • Supercomputers
2
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
3
1. Introduction In the execution of an imperative language on a traditional computer, instructions without data and branch dependencies between them can be executed concurrently, enhancing performance. In this paper the basic computer system allows concurrency exploitation of input code. Such exploitation can be done either with software-based methods [1, 2, 3], or hardware-based methods [8, 12, 13, 14, 16, 17], or a combination of the two [18, 20]. In this work, the precise methods used are unimportant; the important thing is that in some way the concurrency in the code is exploited. We define branch effects loosely as any attribute of a branch which impedes execution of the code past the branch; alternatively, branch effects can also be defined as impediments to code execution caused by the uncertainty of branch execution, i.e., uncertainty in which path a branch takes upon execution. There are a variety of methods available
to
reduce
branch
effects [5, 6, 7].
Reducing
branch
or
procedural
dependencies
helps
significantly [14, 15, 21]. We are concerned with two other major methods of reducing branch effects: branch prediction, long known in the research community [5, 7, 11], and beginning to be seen in commercial machines; and eager evaluation [10], less well known in the imperative language research community, and on occasion used commercially [9]. We assume throughout this paper that the dynamic instruction stream is used; using the static instruction stream is also possible, and can be beneficial in other respects; see [19]. The particular code execution situation we are concerned with is as follows. There may be an instruction I after a branch B which has only a branch dependency on the branch, and other dependencies on earlier code are resolved, i.e., the earlier dependee code has executed. Then I is only kept from executing by B. If this can be bypassed, the degree of concurrency obtainable can be substantial, as demonstrated in [10]. Also, I can be waiting for more than one unresolved branch to execute, while having no other unresolved dependencies. As also shown in [10], the number of such unresolved branches can be large. In branch prediction, when a branch is encountered and its condition value is unknown, a guess is made as to how the branch will eventually evaluate, and execution proceeds down the corresponding path, e.g., in Figure 1 the F(alse) condition guess is made at point A, and the F-path is taken. By definition, branch prediction puts all processor resources on the guessed path. If the guess is found to be wrong later, at branch resolution time tB, then the machine state is restored to what it was at point A, and the other path is taken. This backtrack time can be large, and can lead to poor performance of code with hard-to-predict branches. Is it possible to eliminate or lessen the backtrack time? Yes, if the machine executes down both paths from A; this is eager evaluation. Thus, in Figure 2, both T- and F-paths are followed. At tB the unneeded path is abandoned.
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
4
T
A
T
A
F Left: before tB
(F) Right: after tB
Solid line: useful or potentially useful work done. Figure 1. Branch prediction model. There is no backtrack time. With minimal data dependencies (flow dependencies) and perfect eager evaluation, maximal performance is obtained. The catch is the exponential growth of resources seemingly required by eager evaluation with the increase in the number of branches bypassed [10]; however, with a static flow assumption, this is not quite the case; see [19]. Most studies of eager evaluation [10] have assumed unbounded processor resources, i.e., an unlimited number of processing elements, letting the maximum parallelism in the code determine the performance. In this paper, we place the constraint on the system that the number of processing elements is limited to R (processor Resources). Comparing Figure 2 to Figure 1 illustrates a basic tradeoff we investigate in this paper: eager evaluation is done only at the expense of wasted resources on one of the paths; the desirable fraction of resources to put on an eagerly evaluated path likely depends on the branch probability. This influences the degree to which eager evaluation is superior to branch prediction in a given instance. Note that if in the example above the branch had executed false, then branch prediction would have yielded the superior result. Thus there is a performance motivation to use eager evaluation, tempered by possible ensuing resource wastage. It is the goal of this paper to analyze eager evaluation and come up with some useful characterizations of how well eager evaluation works. Eliminating the ill effects of branches is the promise of eager evaluation. In this paper we demonstrate that it may not be the best way in some of the cases encountered. The rest of this paper is organized as follows. In Section 2 the basic eager evaluation model having limited resources, unlimited parallelism, and known branch probabilities is analyzed; the explanatory dictum of greatest
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
5
T
A
T
A
F Left: before tB
(F) Right: after tB
Solid line: useful or potentially useful work done. Figure 2. Eager evaluation model, work done to scale of Figure 1. marginal benefit is described in Section 3; the model is re-examined for the case of constrained parallelism in Section 4; eager evaluation is compared to branch prediction for constrained parallelism cases in Section 5; the case of unknown branch probabilities is considered for the model in Section 6; and lastly the results are summarized in Section 7, and general conclusions are drawn.
2. Constrained Resource Model The goal of the model developed in this section is to determine the optimal split of resources between two eagerly evaluated paths, i.e., to minimize total expected execution time as a function of the split of resources. We assume that branches are two-way; this shall be relaxed in future work. The basic model is first described, followed by the derivation of the optimal conditions under which eager evaluation should be performed.
2.1. Basic Model The basic assumptions are: • There are a limited number of processors able to work on a program. • There is unlimited parallelism in a program or program fragment, i.e., an additional processor can always be used. Put another way, the program is unsaturable. Such implied large degrees of parallelism are demonstrated in [4, 19]. • The branch probability is known. Next, the following terms are defined (also see Figure 3): • R processors are available in toto for the execution of a program. • p is the probability of the branch executing true. This is an independent variable. • wi is the constant amount of work to be performed in code section i. Work is defined as the number of processors executing a code section multiplied by the time spent by said processors on the section. • TTOT is the total weighted execution time of the program. This is what is to be minimized.
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
6
• f is the fraction of R to be allocated to the true path. This is an independent variable. • tB is the branch resolution time, at which point it is known which way the branch evaluates. • wi is the work done down path i up until tB. B
• tT, tF are the minimal times to execute the T-path and F-path, depending on the total number of processors available, R. The full model is pictorially described in Figure 3.
T w
T
R w
T B
tB
fR p
A
or 1-p (1-f)R t
w
FB
B
R w
F
F
Figure 3. Schematic of the model.
2.2. Derivation of Optimal Conditions It is desired to minimize total execution time, TTOT, where: TTOT = tB + ( weighted T-path time, after tB) + ( weighted F-path time, after tB) = tB + p
(
wT − wT R
B
)
+ (1 − p)
(
wF − wF R
B
)
At tB, branch resolution time, the work accomplished so far is: wT = fRtB B
wF = (1 − f)RtB B
(1)
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
7
therefore: TTOT = tB + p =
pwT
(
wT − fRtB
R (1 − p)wF
)
wF − (1 − f)RtB
(
+ (1 − p)
R
)
+ + (p + f − 2pf)tB R R = ptT + (1 − p)tF + h tB h = p + f − 2pf is considered to be the branch penalty factor, indicating the cost of the branch to the overall execution of the program. Trying to find a minimum of TTOT with respect to f, we set: dTTOT df and get:
=0
1 2 So the minimum is at an endpoint of h. Since h is the only variable penalty part of TTOT with respect to f, it is p=
examined further in Figure 4 as a function of f, and parameterized by p.
h 1.0
p=0
0.9 0.8 p = 1/4 0.7 0.6 p = 1/2
0.5 0.4 0.3
p = 3/4 0.2 - minima
0.1
p=1 0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Figure 4. Branch resolution time penalty factor h.
0.9
1.0
f
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
8
From the figure, for any given branch probability, h is minimized by assigning all of the resources to one path or the other; specifically: 1 2
p > → minimum h for f = 1, so all resources put on T-path; 1 2
p < → minimum h for f = 0, so all resources put on F-path; 1 2
p = → minimum h everywhere, so resources can be assigned arbitrarily. Put another way, all resources should be put on the most likely path. But this is just simple branch prediction! Summarizing, when tB
0.0, the greatest marginal benefit is to place additional processors on the F-path. Thus, if saturation on the likely path occurs, eager evaluation is the preferred policy, with f set to just cause saturation in the likely path.
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
11
T
.7
saturated .7 -> 0 .3 > 0 so
A
.3
F Figure 6. System exhibiting saturation.
5. Eager Evaluation versus Branch Prediction, given Saturation First, the branch prediction model is given, then the execution times for the eager evaluation model are derived for the following types of saturation: T-path saturated, F-path saturated, unlikely path saturated, and both paths saturated. Included in these derivations are demonstrations of branch prediction as special cases, and explanations of the operation of the particular saturation model for each type using the GMB theory. Lastly and very importantly, determinations of which execution model is preferred for a given type of saturation are included.
5.1. Branch Prediction Model The type of branch prediction model used is a simple one: choose the path with the higher probability, i.e., the most likely path. How it is determined to be the most likely path is left to machine designers; either static or dynamic models are applicable. We assume that p is known. Referring to Figure 7, assuming the T-path is the most likely path, it is taken. At branch resolution time, if the branch evaluates True, then the guess was right, execution continues down the T-path to the end, and the time taken is tT. If the branch evaluates False at tB, all processor resources switch to the False path, and the time becomes
UNDERSTANDING EAGER EVALUATION OF IMPERATIVE INSTRUCTION STREAMS, PART 1
12
tF + tB, the latter term being the penalty for having made a wrong guess.
T tB
p > 1/2
F Figure 7. Branch prediction scenario.
The weighted total times are thus: TTOT = TPK
p>1/2
= ptT + (1 − p)(tB + tF) for p >
1 2
(2)
where PK stands for Predicted with Known probability. TTOT = TPK
p