Estimation of nested loops execution time by integer arithmetic in ...

Estimation of Nested Loops execution time by Integer Arithmetic in Convex Polyhedra * Nadia Tawbit BULL Corporate Research Center Rue Jean Jaurks 78340 Les Clayes-sous-Bois France Abstract

If the loop bounds are linear functions, each iteration space corresponding to a nest of loops can be viewed as a bounded convex polyhedron (polytope). This polyhedron is presented as a set of inequalities representing lower and upper bounds of loop iteration variables. The number of integer points in this polyhedron is the same as the iteration space size. We prove that, in the general case, we cannot compute this number by simply summing over the iteration variables; we have to split the nest of loops such that each inner loop is executed at least once for each value of the outer nest ittrations. This splitting method is based on a characterization of the polyhedra corresponding to this kind of loop nest. Section 2 is a presentation of the problem. Section 3 presents some related work. Section 4 is a description of the splitting method. In Section 5 the algorithm, the convergence proof and the complexity are given. Some concluding remarks are drawn in Section 6.

Estimating the execution time of nested loops o r the volume of data transferred between processors is necessary to make appropriate processor o r data allocation. To achieve this goal one need to estimate the execution time of the body a n d thus the number of nested loop iterations. This work could be a preprocessing step in a n automatic porallelizing compilers to enhance the performance of the resulting parallel program. A bounded convex polyhedron can be associated with each loop nest. The number of its integer points corresponds to the iteration space size. I n this paper, we present a n algorithm that approximates this number. The algorithm is not restricted to a fixed dimension. The worst case complexity of the algorithm is infrequently reached in our context where the nesting level i s rather small and the loop bound expressions are not very complex.

1 Introduction An algorithm for counting the number of loop nest iterations is presented in the context of processor allocation problem for shared memory multiprocessor computers. This algorithm is used to estimate program components execution time [7],[8].The execution time of a single statement is based on the average execution time of each operator or function appearing in the statement. We assume that the execution time of the body is the same for all the iterations in a nest of loops. Thus, the execution time of the nest mainly depends on the size of the iteration space and on the estimated execution time of the body. This algorithm can be used in other contexts. Estimation of data volume transferred between processors and complexity algorithm analysis.

2

At first glance, the problem of computing the iteration space size of a loop nest seems obvious; the following examples show that, in the general case, the problem is more complex. We assume that the i t e r a tion step is 1 throughout this paper. Example 2.1 do

i = 1, n do j = 1, i do k = j , m

S end do end do end do

'This work is a part of the PAF automatic parallelizer project supervised by Prof. Paul Feautrier; supported by DRET under contract 87/280 and by PRC C3 of the CNRS t email: [email protected]

In this example one has to be careful before applying the symbolic sum in order to count loop iteration

217

0-8186-5602-6/940 1994 IEEE

Problem Presentation

[?I, [l]. The results, in this topic, strive determining the problem complexity and designing polynomial time algorithms for a h e d dimension. In [lo] a polynomial time algorithm for 2 dimensions can be found. Dyer shows in [l]that only odd dimensions have to be studied. He presents an algorithm for 3 or 4 dimensions in polynomial time. The algorithm is based on the reduction to the computation of dedekind sums. The problem of the existence of integer points in convex polyhedra is NP-complete [?I ,[6]. Feautrier presents in [3] a parametric algorithm to solve this question. This algorithm has inspired the work presented in this paper.

number. Actually, we have to be sure that inner loops are executed at least once for all the surrounding i t e r a tions in order to be able to safely apply symbolic sum. In our example, the innermost loop is not executed for some values of i and j unless m 2 n , otherwise the nest is executed as follows: i = 1, m j = 1, i do k = j, m S end do end do end do

do

do

do

i = m+l, n j = 1, m do k = j, m

do

4

S end do end do

we assume that the nest of loops L = (L1,La, . . . , L,,) where L1 is the most external loop and infi (resp. sup*) corresponds to the lower (resp. upper) bound of loop Li. These bounds are linear functions of surrounding loop iteration variables and parameters. The following notations and definitions are used in the rest of the paper: A nest of loops is denoted by L = ( L 1 ,La,.. . ,Ln) where Li are the loops, L1 being the outermost one. Z = (21,Z Z , . . . , z n ) denotes the loop iteration variables corresponding to L. Z(j) = ( ~ 1 ~ ~. .,zj) 2 , . is composed of the first j elements of 5. We assume that an order exists over the variables z i (as well as over the parameters). li = LsupiJ - [infjl + 1; li is function of Z(i - 1) and of the parameters. The number of effective iterations of Li is Nj = max(h, 0) The number of different executions of the body of Li is Mi. Note that if l j < 0 , l 5 j 5 i for some value of Z ( j - l), then Mi cannot be computed using the symbolic sum formula. A nest of loops is regular iff V i , 1 5 i 5 n , l j 2 0 for all the values of Z ( i - 1). For instance the nest of loops in example 2.1 is not regular because 13 = m - j 1. This value is negative for the values of j , j < m 1 (unless n 5 m 1, since we have j 5 n). The following set of inequalities sys is associated with a nest L:

end do

The number of iteration M can be expressed by this conditional expression:

M = m2n = CL,

xr=j1 = $ +

-

$ + f , if

+ Cy=’=,+, CY==, e+ 7 +cF=~ y,

1 ~ ; = j1 = otherwise. Thus, in the general case, counting the number of loop nest iterations consists of the following steps:

_-ms +

0

0

Splitting the nest into a set of nests such that each iteration in the original nest belongs to exactly one of the generated nests. In each generated nest, each loop is executed at least once for all the values of the surrounding loop iteration variables. This splitting algorithm is presented in Section 6. Computing symbolic sums as the expression of M in the previous examples. Symbolic sums are computed using the formula ([5], [4]): Cy=li p = P+’ B np-’ B z p @ - 1 nP-’ . . where p “P-++l $ n P + and n E N . The series terminates in n or n2 depending on the parity o f p . The B k are Bernoulli numbers.

+.

+

If the bounds are fractional, then the computation of the symbolic sum will be approximative.

3

Splitting Loop Nests

+ +

Related Work

The question of determining the number of integer points in convex polyhedra is #p-complete in general

These inequalities define the loop bounds and must be linear in order to apply our method. Actually, this

218

node are the generated polyhedra at a specific splitting step. The leaves are the polyhedra that constitute the solution. Following are some notations: n is the number of different variables and also the number of loops, T is the current level, N, is the maximum number of inequalities at current level r , F, is the maximum number of generated polyhedra from a node at level r and R is the number of nodes in the tree. It corresponds to the number of recursive calls of the splitting procedure. To prove that the algorithm terminates we have to prove that we generate a finitely number of nodes, that is to say F, is a finite number. The initial number of inequalities at each level is 2. At each step, we generate a finite number of inequalities to be added to other levels. F, depends on the number of inequalities at the corresponding level and We can on the type of these inequalities: F, = ( generate at each node only a finite number of polyhedra and the depth of the tree is n . Thus the algorithm convergence is proved.

The theoretical complexity of the algorithm is shown to be exponential. However, for our purpose, it is acceptable since the dimension of polyhedra that are considered is always small and the expressions of the loop bounds are rather simple. The algorithm splits the polyhedron into several regular polyhedra whose size is easy to compute using symbolic sums. The resulting size is function of some of the problem variables, called parameters. This algorithm has been used in a processor alloca tion problem [7],[8]. It also can be used to estimate the volume of transferred data in distributed systems and for complexity analysis purpose.

References M. Dyer, On counting Lattice Points in Polyhedra, SIAM Journal of Computing Vol. 20, No 4, August

%)”

5.3

1991. M. Dyer, A. M. Frieze, On the Complexity of Computing the Volume of a Polyhedron, SIAM Journal of Computing, Vol. 17, No 5, October 1988.

Complexity

P. Feautrier Parametric Integer Programming R.A.I.R.O. Recherche op&ationnelle/Operations Research vol 22, No 3 1988

The worst case complexity occurs when the number of generated polyhedra is maximum and when none of these polyhedra is empty. The number of inequalities in a node for the current level r is maximal if all the generated inequalities from level r 1 are of level r . Thus N, = N,+l +1: we initially have 2 inequalities at level r and we generate at most N,+1 - 1 inequalities from level r + l , one by variable elimination and N,+I 2 in order to make only two inequalities significant. Nn= 2, N, = n - r 2, and F, = R can be expressed as a function of F,:

D. E. Knuth, The Art of Computer Programming Vol. I, Addison-Wesley, 1973.

+

+

M. R. Spiegel, Formules et Tables de Mathdmatiques. Mc Graw Hill Paris, SCrie Schaum, 1974. A. Schrijver, Theory of linear and integer programming. Wiley, NY, 1986.

(v)2.

N. Tawbi, Paralldlisation Automatique: Estimation des durdes d’exdcution et allocation statique de processeurs, Ph.D. Thesis, Pierre et Marie Curie University, Paris France, April 1991.

N. Tawbi, P. Feautrier, Processor Allocation and Loop Scheduling on Multiprocessor Computers, in the proceedings of ICS’92 Conference, July 1992.

R represents the complexity of the algorithm in the worst case. The complexity, in practice, is much smaller because the loop bound expressions are not very complex and the level of nesting is in almost all cases low.

R. Shostak, Deciding linear inequalities by computing loop residues JACM 28, pp 769-779, Oct. 1981.

L. Ya. Zamanskii, V. L. Cherkaskii, Determination of the number of integer points in polyhedra in R3:polynomial algorithms, 4 1983.

6

Conclusion

This paper introduces an algorithm for computing the size of the iteration space of a loop nest. This algorithm is based on determining the number of integer points in a convex polyhedron.

219

DOH.Akad.

Ukrain. USSR Ser. A

XES := 0 (* R E S is a global variable *)

procedure genineq(P, C, I;,, INF,, SUP,, i, j , r)

'rocedure split(P,r, C ) f r = 0 then if P is not empty then R E S ,Ise :

:= R E S

-

1. if (sup! i n f: 2 0) is false then P := 0; return; (* tests the validity of the inequality in obvious cases e.g 450*)

+ {P};

(; The set of ni, inequalities of level r and of type i n f *)

3. if level = 0 then C := C U {(sup: - inj: else P := P U {(sup: - i n j: 2 0));

2. SUP, := {sup: - x , 2 0,sup: - x , 2 0,.. . , -0, 2 01 (* The set of inequalities of level r and of type sup; their number is ns, *) 3. t

:= ni, x na,; (* since the polyhedron

:= I N F , x SU,P, = {(z,-inj: 2 O,aup:-xr

4. Bounds

is bounded, t

k := l , i - 1 n e w i n e q := i n j: < i n j:; addineq(newineq,P, C ) (* makes inequality x, - ins:

0)/1

5 i 5 ni,

and

> 0 *)

5 . if t

IJ

.-

2 0 ) E Bounds

6 . for

2

0 redundant in P *)

+ l,ni7

k := 1,j

-1

n e w i n e q = sup: > sup:; a d d i n e q ( n e w i n e q , P , C); (* makes sup: - x r 2 0 redundant in P *)

8

Pij := P-INF,-SUP,U (5,-inj' > O}u{sup:-x, genineq(P, j , cij,I,, JNF, ,SUP,, i, j, r); (* P;j and C i j may be modified *) if P;j # 0 then split(P;j, r - 1, C,j)

k := i

2

n e w i n e q := inj: 5 inj:; addineq(newineq, P, C); (* makes x, - inf: 2 0 redundant in P *)

1 5j 5

nsr)

= 1 then split(P, r - 1,C) else foreach I;j = ( x v - i n f i 2 0, sup:-x, do c . .- c.

0))

4. for

5 . for

2

2

0);

+

k = j l , n s r newineq = sup: 2 sup:; a d d i n e q ( n e w i n e q , P, C); (* makes sup: - zr 2 0 redundant in P *)

7 . for

:nd;

8 . end;

Procedure a d d i n e q ( n e w i n e q , P , C) if newineq is false then P = 0; Return; level = level-of(newineg); if leuel = 0 then C := C U {newineq} else P = P U {newineq} end;

Table 1: The Splitting Algorithm

belongs to exactly one Pi and (2) the nested loops associated with P I ,P2,. . .Pn are regular. At step r , the polyhedron P is split into nir X ni, polyhedra P:, P f , . . . , P:, . . . , P,"ir. Let i n t ( P / ) = { I ,I E P! and I integer}. We have to prove that the set of int(P!>is a partition of i n t ( ~ ) .

upper bound. Hence i E P; and since x E int(P) then I E int(Pi3). If I E i n t ( P / ) , then I E P because all the inequalities in P are satisfied by I (by construction). Since IC' is integer, then i E int(P).

A nest of loops associated with a polyhedron in the final solution is regular. Actually, at level r , only two inequalities are kept: one of type inf and one of type sup. The inequality supi-in fi 2 0 , derived from these inequalities two by variable elimination, is then added to the polyhedron. When level 0 is reached, we find at each level i exactly two inequalities with supi - infi 1 0 true for all the elements in the polyhedron under the context. This inequality either appears explicitly or has been eliminated because it is redundant.

1. if int(P!) # int(PL) (i # k or j # I ) , then (Pij)n int(PL) = 0. Assume that i < k. Then inequality infk 5 infi belongs to P; and its negation infi < infk to PL. Thus i n t ( P / )n int(PL) = 0. Notice that the negation of an inequality exp 1 0 can be defined as exp + 1 5 0 only if we are considering integer points. 2. ~ ~ ~ , ' ~ ~ , " i n=t (i nPt (! P) ) . At step T , for all I E i n t ( P ) ,I E P! where i is the upper index of the greatest value of expression inff (here i = k) in I N F , = { i nff , . . ., in ff , . . . ,inf,""}. If there are many expressions that have the same greatest value, then i corresponds to the smallest index. The index j is computed as the index of the least

5.2

Convergence Proof

The splitting algorithm can be seen as a depth first search of a tree the nodes of which are polyhedra. The root is the initial polyhedron and the successors of a

220

assumption is not very restrictive since the loop bound expressions are usually linear. The above system can be-written as : AZ+ C3+ b' 2 0 with possibly M 3 + d 2 0, 3 is the vector of parame_tersA , C, M are matrices of rational constyts, b' and d are vectors of rational constants. Mz'+ d 2 0 is a system of inequalities defining constraints on the parameters. This system is called the context. The different values of i are in fact the integer points of the convex polyhedron P : P = {ZIAZ+CZ+~' 2 0, z 2 0, MZ+Z 2 o), i n t ( P ) denotes the set of integer points of P . Each iteration of the nest is associated with an element of i n t ( P ) and conversely. The set of inequalities associated with the nest in example 2.1 is the following:

The proof is based on the fact that a polyhedron P can be represented as P = Q + C where Q is a polytope and C a polyhedral cone. We also know that P is itself a polytope iff C = (0) (the vector of zeroes). Proposition 4.2 A nest of loop is regular if: e

There are exactly two inequalities at each level T 5 n one of type inf and one of type sup.

r, 1 5 e

The inequality sup,-infi, where infj (resp. supi) is the lower (resp. upper) bound of xi, is redundant in sys(i - 1).

The proof is obvious. Now, we are able to present the splitting method. It consists of keeping at each level only two inequalities one of type sup and the other of type inf and adding the inequality derived by variable elimination to the polyhedron a t the corresponding level. Keeping only two inequalities at some level T consists of splitting the polyhedron in ni, x ns, polyhedra, where ni, is the number of the in f inequalities and ns, is the number of the sup inequalities. Each generated polyhedron corresponds to the two inequalities left at this level. The other inequalities are made redundant by adding new constraints to the polyhedron.

A , C , b and d are easily deduced from this system according to the order over the variables i, j,k , i.e. X I = i, 2 2 = j and 2 3 = k. The context is empty, M and d are not defined. The level of an inequality is the greatest index of the variable with non-zero coefficient in the inequality. If all the variables have a coefficient equal to zero, i.e. only the parameters have non-zero coefficients then the level is 0 by convention. Every inequality of level i, 1 5 i 5 n can be presented as one of the two following forms after division by a positive rational: supi - xi 2 0 or xi - infi 2 0 depending on the sign of xi; the inequality defines an upper or lower bound of the variable xi and is said to be of type sup or i n f . The inequality k - j 2 0 in (2) is of level 3 and of type i n f . The inequality i - j 2 0 is of level 2 and of type sup. sys(i) consists of all the inequalities of sys level j 1,sj 5 i. By convention sys(0) = 0. An inequality of level i is redundant iff it can be derived from the other inequalities in sys(i) and the context, i.e. this inequality is satisfied by all the points of the polyhedron defined as sys(i) and the context.

5

Splitting Algorithm

Let AZ + CI+ b 2 0 be the polyhedron associated with a nest of loops. We assume that all the inequalities are represented as (xi-in f j 2 0 ) or (supi-xi 2 0 ) where inf j and sup, are linear functions of Z ( i - 1). The set of inequalities of level i and of type inf (resp. sup) is denoted Infi (resp. Supi). The Splitting algorithm is recursive and is applied to a polyhedron P , a level T , 1 5 T 5 n and a context C. P is represented as a system S of inequalities. Initially, S is the system associated with the nest of loops, C is either empty or represents constraints on the parameters and T = n where n is the number of the loops in the nest. The result denoted in the following as RES is a set of polyhedra. The algorithm is presented in Table 1.

Proposition 4.1 Let P be an n dimension polyhedron i n the first quadrant (all the variables are posi-

5.1

tive). P is bounded i f l at least two inequalities can be associated with each level r , l 5 T 5 n , one of type sup and one of type i n f . These inequalities are either explicit i n the definition of P or can be deduced using variable elimination.

Correctness Proof

We show that, if a polyhedron P yields polyhedra P I ,P2, . . . ,P,, after application of the splitting algorithm then (1) for all Z E P and 1 integer then I

221

Estimation of nested loops execution time by integer arithmetic in ...

Estimation of nested loops execution time by integer arithmetic in ...

Suggest Documents