Statement-Level Communication-Free Partitioning ... - CiteSeerX

0 downloads 0 Views 375KB Size Report
3 Fs4;v3. 1. = t;?t]. The legality of these statement-iteration hyperplane coe cient vectors is then checked by Con- dition (C7) as follows: span( n j=1Ker(F s1;vj. 1. )) ...
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers Kuei-Ping Shihy, Jang-Ping Sheuy, and Chua-Huang Huangz y

Department of Computer Science and Information Engineering National Central University Chung-Li 32054, Taiwan E-mail: [email protected] [email protected] z

Department of Computer and Information Science The Ohio State University Columbus, OH 43210-1277 E-mail: [email protected]

Abstract: This paper addresses the problem of communication-free partitioning of iteration spaces

and data spaces along hyperplanes. We consider statement-level partitioning for the iteration spaces. The technique explicitly formulates array references as transformations from statementiteration spaces to data spaces. Based on these transformations, the necessary and sucient conditions for the feasibility of communication-free hyperplane partitions are presented.

Keywords: Communication-free, data communication, distributed-memory multicomputers, hyperplane partitioning, parallelizing compilers.

Corresponding author: Jang-Ping Sheu

Department of Computer Science and Information Engineering National Central University Chung-Li 32054, Taiwan E-mail: [email protected] Phone: +886-3-422-7151 ext. 4474 Fax: +886-3-422-2681

1 Introduction It has been widely accepted that local memory access is much faster than memory access involving interprocessor communication on distributed-memory multicomputers. If data and computation are not properly distributed across processors, it may cause heavy interprocessor communication. Although the problem of data distribution is of critical importance to the eciency of the parallel program in distributed memory multicomputers; it is known to be a very dicult problem. Mace [14] has proved that nding optimal data storage patterns for parallel processing is NP-complete, even when limited to one- and two-dimensional arrays . In addition, Li and Chen [11, 12] have shown that the problem of nding the optimal data alignment is also NP-complete. Thus, in the previous work, a number of researchers have developed parallelizing compilers that need programmers to specify the data storage patterns. Based on the programmer-speci ed data partitioning, parallelizing compilers can automatically generate the parallel program with appropriate message passing constructs for multicomputers. Projects using this approach include the Fortran D compiler project [4, 5, 18], the SUPERB project [21], the Kali project [9, 10], and the DINO project [17]. For the same purpose, the Crystal project [11, 12] and the Id Nouveau compiler [16] deal with functional languages and generate the parallel program with message passing construct. The parallel program generated by most of these systems is in SPMD (Single-Program Multiple Data) [8] model. Recently, automatic data partitioning is an attractive research topic in the eld of parallelizing compilers. There are many researchers that develop systems to help programmers deal with the problem of data distribution by automatically determining the data distribution at compile time. The PARADIGM project [3] and the SUIF project [1, 19] are all based on the same purpose. These systems can automatically determine the appropriate data distribution patterns to minimize the communication overhead and generate the SPMD code with appropriate message passing constructs for distributed memory multicomputers. Since excessive interprocessor communication will o set the bene t of parallelization even if the program has a large amount of parallelism, consequently, parallelizing compilers must pay more attention on the distribution of computation and data across processors to reduce the communication overhead or to completely eliminate the interprocessor communication, if possible. Communication-free partitioning, therefore, becomes an interesting and worth studying issue for distributed-memory multicomputers. In recent years, much research has been focused on the area of partitioning iteration spaces and/or data space to reduce interprocessor communication and achieve high-performance computing. Ramanujam and Sadayappan [15] consider the problem of communication-free partitioning of data spaces along hyperplanes for distributed memory multicomputers. They present a matrixbased formulation of the problem for determining the existence of communication-free partitions of data arrays. Their approach proposes only the array decompositions and does not take the iteration space partitionings into consideration. In addition, they concentrate on fully parallel nested loops and focus on two-dimensional data arrays. Huang and Sadayappan [7] generalize the approach proposed in [15]. They consider the is1

sue of communication-free hyperplane partitioning by explicitly modeling the iteration and data spaces and provide the conditions for the feasibility of communication-free hyperplane partitioning. However, they do not deal with imperfectly nested loops. Moreover, the approach is restricted to loop-level partitioning, i.e., all statements within a loop body must be scheduled together as an indivisible unit. Chen and Sheu [2] partition iteration space rst according to the data dependence vectors obtained by analyzing all the reference patterns in a nested loop, and then group all data elements accessed by the same iteration partition. Two communication-free partitioning strategies, nonduplicate data and duplicate data strategies, are proposed in this paper. Nevertheless, they require the loop contain only uniformly generated references and the problem domain be restricted to a single perfectly nested loop. They also treat all statements within a loop body as an indivisible unit. Lim and Lam [13] use ane processor mappings for statements to assign the statement-iterations to processors and maximize the degree of parallelism available in the program. Their approach does not treat the loop body as an indivisible unit and can assign di erent statement-iterations to di erent processors. However, they consider only the statement-iteration space partitioning and do not address the issue of data space partitioning. Furthermore, their uniform ane processor mappings can cause a large number of idle processors if the ane mappings are non-unimodular transformations. In this paper, communication-free partitioning of statement-iteration spaces and data spaces along hyperplanes are considered. We explicitly formulate array references as transformations from statement-iteration spaces to data spaces. Based on these transformations, we then present the necessary and sucient conditions for the feasibility of communication-free hyperplane partitions. Currently, most of the existing partitioning schemes take an iteration instance as a basic schedulable unit that can be allocated to a processor. But, when the loop body contains multiple statements, it is very dicult to make the loop be communication-freely executed by allocating iteration instances among processors. That is, the chance of communication-free execution found by using these methods is limited. For having more exible and possible in nding communication-free hyperplane partitions, we treat statements within a loop body as separate schedulable units. Our method does not consider only one of the iteration space and data space but both of them. As in [13], our method can be extended to handle more general loop models and can be applied to programs with imperfectly nested loops and ane array references. The rest of the paper is organized as follows. In Section 2, we introduce notation and terminology used throughout the paper. Section 3 describes the characteristics of statement-level communication-free hyperplane partitioning. The technique of statement-level communication-free hyperplane partitioning for a perfectly nested loop is presented in Section 4. The necessary and sucient conditions for the feasibility of communication-free hyperplane partitioning are also given. The extension to general case for sequences of imperfectly nested loops is described in Section 5. Finally, the conclusions are given in Section 6.

2

2 Preliminaries This section explains the statement-iteration space and the data space. It also de nes the statementiteration hyperplane and the data hyperplane.

2.1 Statement-Iteration Space and Data Space

Let Q, Z and Z+ denote the set of rational numbers, the set of integers and the set of positive integer numbers, respectively. The symbol Zd represents the set of d-tuple of integers. Traditionally, the iteration space is composed of discrete points where each point represents the execution of all statements in one iteration of a loop [20]. Instead of viewing each iteration indivisible, an iteration can be divided into the statements that are enclosed in the iteration, i.e., each statement is a schedulable unit and has its own iteration space. We use another term, statement-iteration space, to denote the iteration space of a statement in a nested loop. The following example illustrates the notion of iteration spaces and statement-iteration spaces. Example 1: Consider the following nested loop L1.

do i1 = 1, N do i2 = 1, N s1 : A[i1; ?i1 ? i2 ? 1] = A[i1 ? i2 ? 1; ?i1 + i2 + 1] + B[i1 + i2; i1 + 2i2 ? 1] s2 : B[i1 ? i2 + 1; i1 ? 2i2 + 1] = A[i2 ? 1; i1 ? i2]  B[i1 + i2 ? 1; i1 + i2 ? 2] enddo enddo

(L1)

Fig. 1 illustrates the iteration space and statement-iteration spaces of loop L1 for N = 5. In Fig. 1(a), a circle means an iteration and includes two rectangles with black and gray colors. The black rectangle indicates statement s1 and the gray one indicates statement s2 . In Fig. 1(b) and Fig. 1(c), each statement is an individual unit and the collection of statements forms two statement-iteration spaces. 2 The representation of statement-iteration spaces, data spaces and the relations among them is described as follows. Let S denote the set of statements in the targeted problem domain and D be the set of array variables that are referenced by S . Consider statement s 2 S , which is enclosed in a d-nested loop. The statement-iteration space of s, denoted by SIS (s), is a subspace of Zd and is de ned as SIS (s) = f[I1; I2; : : :; Id ]tjLBi  Ii  UBi , for 1  i  dg, where Ii is the loop index variable, LBi and UBi are the lower and upper bounds of the loop index variable Ii , respectively. The superscript t is a transpose operator. The column vector Is = [I1; I2; : : :; Id]t is called a statement-iteration in statement-iteration space SIS (s), LBi  Ii  UBi , for i = 1; 2; : : :; d. On the other hand, from the geometric point of view, an array variable also forms a space and each array element is a point in the space. For exactly describing an array variable, we use data space to represent an n-dimensional array v , which is denoted by DS (v ), where v 2 D. An array element v[D1; D2; : : :; Dn] has a corresponding data index in the data space DS (v). We denote this data index by a column vector Dv = [D1; D2; : : :; Dn]t . The relations between statement-iteration spaces and data spaces can be built via array reference functions. An array reference function is a transformation from statement-iteration space into data 3

Figure 1: Loop (L1 )'s iteration space and its corresponding statement-iteration spaces, assuming N = 5. (a) IS (L1), iteration space of loop (L1). (b) SIS (s1), statement-iteration space of statement s1 . (c) SIS (s2), statement-iteration space of statement s2 . space. As most of the existing methods, we require the array references be ane functions of outer loop indices or loop invariant variables. Suppose statement s is enclosed in a d-nested loop and has an array reference pattern v [a1;1I1 + a1;2I2 +    + a1;d Id + a1;0; a2;1I1 + a2;2I2 +    + a2;d Id + a2;0; : : :; an;1I1 + an;2I2 +    + an;d Id + an;0 ], where ai;j are integer constants, for 1  i  n and 0  j  d, then the array reference function can be written as: Ref s;v (Is) = F s;v  Is + f s;v ; where

2

a1;1    a1;d

3

2

a1;0

3

F s;v = 64 ... . . . ... 75 ; and f s;v = 64 ... 75 : an;0 an;1    an;d s;v s;v We term F the array reference coecient matrix and f the array reference constant vector. If data index Dv 2 DS (v ) is referenced in statement-iteration Is 2 SIS (s), then Ref s;v (Is ) = Dv . Take the array reference pattern A[i ? 2j + 3k ? 4; 4i + 3j ? 2k + 1] as an example. The array reference coecient matrix and vector of A[i ? 2j + 3k ? 4; 4i + 3j ? 2k + 1] are F s;A = " # " constant # 1 ?2 3 and f s;A = ?4 , respectively. 4 3 ?2 1 We de ne statement-iteration hyperplanes and data hyperplanes in the next subsection.

2.2 Statement-Iteration Hyperplane and Data Hyperplane A statement-iteration hyperplane on statement-iteration space SIS (s), denoted by (s), is a hyperspace [6] of SIS (s) and is de ned as h (s) = f[I1; I2; : : :; Id]t j 1 I1 + 2 I2 +    + d Id = ch g, 4

where 1 ; : : :; and d 2 Q are the coecients of the statement-iteration hyperplane and ch 2 Q is the constant term of the hyperplane. The formula can be abbreviated as h (s) = fIs j  Is = ch g, where  = [1; : : :; d ] is the statement-iteration hyperplane coecient vector. Similarly, a data hyperplane on data space DS (v ), denoted by (v ), is a hyperspace of DS (v ) and is de ned as g (v ) = f[D1; D2; : : :; Dn]t j1D1 + 2 D2 +    + n Dn = cg g, where 1 ; : : :; and n 2 Q are the coecients of the data hyperplane and cg 2 Q is the constant term of the hyperplane. In the same way, the formula also can be abbreviated as g (v ) = fDv j  Dv = cg g, where  = [1; : : :; n ] is the data hyperplane coecient vector. The hyperplanes that include at least one integer point are considered in this paper. Statement-iteration hyperplanes and data hyperplanes are used for characterizing communication-free partitioning. We discuss some of these characteristics in the next section.

3 Characteristics of Communication-Free Hyperplane Partitioning A program execution is communication-free if all operations on each of all processors access only data elements allocated to that processor. A trivial partition strategy allocates all statementiterations and data elements to a single processor. The program execution of this trivial partitioning is communication-free. However, we are not interested in this single processor program execution because it does not exploit the potential of parallelization and it con icts with the goal of parallel processing. Hence, in this paper, we consider only nontrivial partitioning, in speci c, hyperplane partitioning. The formal de nition of communication-free hyperplane partition is de ned as below. Let partition group, G, [ G = [s2S h (s) [v2D g (v) be the set of hyperplanes that should be assigned to one processor. The de nition of communication-free hyperplane partition can be given as the following.

De nition 1 The hyperplane partitions of statement-iteration spaces and data spaces are said to S be communication-free if and only if for any partition group G = [s2S h (s) [v2D g (v ), 8Is 2 h (s); Ref s;v (Is) 2 g (v); 8s 2 S ; v 2 D: 2 As mentioned above, the statement-iterations which access the same array element should be allocated to the same statement-iteration hyperplane. Therefore, it is important to decide statement-iterations that access the same array element. The following lemma states the necessary and sucient condition that two statement-iterations will access the same array element.

5

Lemma 1 For some statement s 2 S and its referenced array v 2 D, Is and Is0 are two statement-

iterations on SIS (s) and Ref s;v is the array reference function from SIS (s) into DS (v ) as de ned above. Then Ref s;v (Is) = Ref s;v (Is0 ) () (Is0 ? Is) 2 Ker(F s;v ) where Ker(S ) denotes the null space of S [6].

Proof. ()): Suppose that Ref s;v (Is) = Ref s;v (Is0 ). Since Ref s;v (Is) = F s;v  Is + f s;v , then, Ref s;v (Is) = Ref s;v (Is0 ) ) F s;v  Is + f s;v = F s;v  Is0 + f s;v ) F s;v  (Is0 ? Is ) = 0 ) (Is0 ? Is ) 2 Ker(F s;v ) Thus (Is0 ? Is ) 2 Ker(F s;v ). ((): Conversely, suppose that (Is0 ? Is ) 2 Ker(F s;v ) and dim(Ker(F s;v )) = n; n 2 Z. Let f 1; 2; : : :; ng be a basis of Ker(F s;v ), then vectors belonged to Ker(F s;v ) can be represented as a linear combination of vectors in f 1 ; 2; : : :; n g. Since (Is0 ? Is ) 2 Ker(F s;v ), so (Is0 ? Is ) = c1 1 + c2 2 +    + cn n, where c1; c2; : : :; cn 2 Z. Hence, Is0 = Is + c1 1 + c2 2 +    + cn n. Then,

Ref s;v (Is0 ) = F s;v  Is0 + f s;v = F s;v  (Is + c1 1 + c2 2 +    + cn n ) + f s;v = F s;v  Is + F s;v  (c1 1 + c2 2 +    + cn n ) + f s;v = F s;v  Is + f s;v = Ref s;v (Is ) Thus Ref s;v (Is ) = Ref s;v (Is0 ). We illustrate Lemma 1 using the following example.

2

Example 2:" Consider # the array reference A[i + j; i + j ]. The array reference coecient matrix, F s;A , is 11 11 . The null space of F s;A is Ker(F s;A ) = fr[1; ?1]tjr 2 Zg. By Lemma 1, any two statement-iterations with the di erence of r[1; ?1]t will access the same array element, where r 2 Z. As Fig. 2 shows, the statement-iterations f(1; 3); (2; 2); (3; 1)g all access the same array

element A[4; 4]. 2 We explain the signi cance of Lemma 1 and show how this lemma can help to nd communication-free hyperplane partitions. Communication-free hyperplane partitioning requires those statement-iterations that access the same array element be allocated to the same statement-iteration hyperplane. According to Lemma 1, two statement-iterations access the same array element if and only if the di erence of these two statement-iterations belongs to the kernel of F s;v . Hence, Ker(F s;v ) should be a subspace of the statement-iteration hyperplane. Since there may exist many di erent array references, partitioning a statement-iteration space must consider all array references appeared in the statement. Thus, the space spanned from Ker(F s;v ) for all array references appearing in the same statement should be a subspace of the statement-iteration hyperplane. The 6

Figure 2: Those statement-iterations whose di erences are in Ker(F s;v ) will access the same array element. dimension of a statement-iteration hyperplane is one less than the dimension of the statementiteration space. If there exists a statement s such that the dimension of the spanning space of Ker(F s;v ) is equal to the dimension of SIS (s), then the spanning space cannot be a subspace of a statement-iteration hyperplane. Therefore, there exists no nontrivial communication-free hyperplane partitioning. From the above observation, we obtain the following theorem.

Theorem 1 If 9s 2 S such that dim(span([v2D Ker(F s;v ))) = dim(SIS (s)); then there exists no nontrivial communication-free hyperplane partitioning for S and D.

2

Example 3: Consider matrix multiplication. do i = 1, N do j = 1, N do k = 1, N s:

C [i; j ] = C [i; j ] + A[i; k]  B[k; j ]

enddo enddo enddo

In the above program, there are three array variables, A, B , and C , with three distinct array references involved coecient matrices, F s;A , F s;B , " in statement # " s. The# three "array reference # and F s;C , are 10 00 01 , 00 01 10 , and 10 01 00 , respectively. Thus, Ker(F s;A ) = fr1[0; 1; 0]tjr1 2 Zg, Ker(F s;B ) = fr2[1; 0; 0]tjr2 2 Zg, and Ker(F s;C ) = fr3[0; 0; 1]tjr3 2 Zg. Ker(F s;A ), Ker(F s;B ), and Ker(F s;C ) span Z3 which has the same dimensionality as the statementiteration space. By Theorem 1, matrix multiplication has no nontrivial communication-free hyperplane partitioning. 2 Theorem 1 can be useful for determining nested loops that have no nontrivial communication-free hyperplane partitioning. Furthermore, when a nontrivial communication-free hyperplane partitioning exists, Theorem 1 can also be useful for nding the hyperplane coecient vectors. We state this result in the following corollary. 7

Corollary 1 For any communication-free statement-iteration hyperplane h(s) = fIsj  Is = chg,

the following two conditions must hold:

(1) span([v2D Ker(F s;v ))  h (s), (2) t 2 (span([v2D Ker(F s;v )))?, where S ? denotes the orthogonal complement space of S .

Proof. By Lemma 1, two statement-iterations access the same data element using array reference

F s;v if and only if the di erence between these two statement-iterations belongs to the kernel of F s;v . Therefore, the kernel of F s;v should be contained in the statement-iteration hyperplane, h (s). The fact should be true for all array references appeared in the same statement. Hence, span([v2D Ker(F s;v ))  h (s). The rst condition is obtained. Since h (s) = fIs j  Is = ch g,  is the normal vector of h (s). That is, t is orthogonal to h (s)1. By condition (1), it implies that t is orthogonal to the subspace span([v2D Ker(F s;v )). Thus, t belongs to the orthogonal complement of span([v2D Ker(F s;v )) [6]. 2

Corollary 1 gives the range of communication-free statement-iteration hyperplane coecient vectors. It can be used for the nding of communication-free statement-iteration hyperplane coecient vectors. On the other hand, the range of communication-free data hyperplane coecient vectors is also given as follows. As mentioned before, the relations between statement-iteration spaces and data spaces can be established via array references. Moreover, the statement-iteration hyperplane coecient vectors and data hyperplane coecient vectors are related. The following lemma expresses the relation between these two hyperplane coecient vectors. A similar result is given in [7].

Lemma 2 For any statement s 2 S and its referenced array v 2 D, Ref s;v is the array reference function from SIS (s) into DS (v ). h (s) = fIs j  Is = ch g and g (v ) = fDv j  Dv = cg g are communication-free hyperplane partitions if and only if  =   F s;v , for some , = 6 0. Proof. ()): Suppose that h (s) = fIsj  Is = chg and g (v) = fDv j  Dv = cg g are

communication-free hyperplane partitionings. Let Is0 and Is00 be two distinct statement-iterations and belong to the same statement-iteration hyperplane, h (s). If Dv0 and Dv00 are two data indices such that Ref s;v (Is0 ) = Dv0 and Ref s;v (Is00) = Dv00, from the above assumptions, Dv0 and Dv00 should belong to the same data hyperplane, g (v ). Because Is0 and Is00 belong to the same statement-iteration hyperplane, h (s), then,   Is0 = ch and   Is00 = ch . Therefore,   Is0 =   Is00 )   (Is0 ? Is00) = 0 1 Note that  is a row vector. However, it is  , but not , that is orthogonal to (s). t

h

8

On the other hand, since Dv0 and Dv00 belong to the same data hyperplane, g (v ), that means   Dv0 = cg and   Dv00 = cg . Thus,

) ) ) )

  Dv0 =   Dv00   (Dv0 ? Dv00) = 0   (Ref s;v (Is0 ) ? Ref s;v (Is00)) = 0   (F s;v  (Is0 ? Is00)) = 0 (  F s;v )  (Is0 ? Is00) = 0

Since Is0 and Is00 are any two statement-iterations on the statement-iteration hyperplane h (s), (Is0 ? Is00) is a vector on the statement-iteration hyperplane. Furthermore, both   (Is0 ? Is00) = 0 and (  F s;v )  (Is0 ? Is00) = 0, hence we can conclude that  and   F s;v are linearly dependent. It implies  =   F s;v , for some , 6= 0 [6]. ((): Suppose h (s) = fIs j  Is = ch g and g (v ) = fDv j  Dv = cg g are hyperplane partitions for SIS (s) and DS (v ) respectively and  =   F s;v , for some , 6= 0. We claim h (s) and g (v ) are communication-free partitioning. According to De nition 1, what we have to do is to prove 8Is 2 h (s); Ref s;v (Is ) 2 g (v ). Let Is be any statement-iteration on statement-iteration hyperplane h (s). Then   Is = ch . From the assumption that  =   F s;v , we have (   F s;v )  Is = ch )   (F s;v  Is) = 1 ch )   (F s;v  Is + f s;v ) = 1 ch +   f s;v )   Ref s;v (Is) = 1 ch +   f s;v Let cg = 1 ch +   f s;v , then Ref s;v (Is ) 2 g (v ). We have shown that 8Is 2 h (s); Ref s;v (Is ) 2 g (v ). It then follows that h (s) and g (v ) are communication-free partitioning. 2 By Lemma 2, the statement-iteration hyperplane coecient vector  can be decided if the data hyperplane coecient vector  has been determined. If F s;v is invertible, the statement-iteration hyperplane coecient vectors can be decided rst, then the data hyperplane coecient vectors can be derived by  = 0 (_F s;v )?1 , for some 0; 0 6= 0. The range of communication-free data hyperplane coecient vectors can be derived from this lemma. Corollary 1 shows the range of statement-iteration hyperplane coecient vectors. The next corollary provides the ranges of data hyperplane coecient vectors.

Corollary 2 For any communication-free data hyperplane g (v) = fDv j  Dv = cg g, the following

condition must hold:

t 2 ([s2S Ker((F s;v )t))0; where S 0 denotes the complement set of S . 9

Proof. This paper considers the nontrivial hyperplane partitioning, which requires  be a nonzero vector. By Lemma 2,  =   F s;v . Therefore,   F s;v is not equal to 0. It implies that t 62 Ker((F s;v )t). The condition should be true for all s; s 2 S . Hence, t 62 ([s2S Ker((F s;v )t)). It follows that t belongs to the complement of ([s2S Ker((F s;v )t)), i.e., t 2 ([s2S Ker((F s;v )t))0. 2

Example 4: Consider the following loop. do i = 1, N do j = 1, N s1: s2:

A[i; j ] = B[i; i] B[i; j ] = A[j; j ]

enddo enddo

The nested loop is communication-free if and only if the statement-iteration hyperplane coef cient vectors for s1 and s2 and data hyperplane coecient vectors for v1 and v2 are 1 = [r; 0], 2 = [0; r], 1 = [r; 0], and 2 = [0; r], respectively, where v1 = A, v2 = B , and r 2 Q ? f0g. We show that 1 and 2 satisfy the Corollary 1 as follows. span([v2D Ker(F s1 ;v )) = fc1[0; 1]tjc1 2 Q ? f0gg; =) (span([v2D Ker(F s1 ;v )))? = f[r1; 0]tjr1 2 Q ? f0gg; =) t1 = [r; 0]t 2 (span([v2D Ker(F s1 ;v )))? : span([v2D Ker(F s2 ;v )) = fc2[1; 0]tjc2 2 Q ? f0gg; =) (span([v2D Ker(F s2 ;v )))? = f[0; r2]tjr2 2 Q ? f0gg; =) t2 = [0; r]t 2 (span([v2D Ker(F s2 ;v )))?: The test of Corollary 2 for 1 and 2 is as below. [s2S Ker((F s;v1 )t) = fc3[1; ?1]tjc3 2 Q ? f0gg; =) ([s2S Ker((F s;v1 )t))0 = f[r3; r4]tjr3 6= ?r4; r3; r4 2 Q ? f0gg; =) t1 = [r; 0]t 2 ([s2S Ker((F s;v1 )t ))0: [s2S Ker((F s;v2 )t) = fc4[1; ?1]tjc4 2 Q ? f0gg; =) ([s2S Ker((F s;v2 )t))0 = f[r5; r6]tjr5 6= ?r6; r5; r6 2 Q ? f0gg; =) t2 = [r; 0]t 2 ([s2S Ker((F s;v2 )t ))0:

2

The next section describes the communication-free hyperplane partitioning technique. The necessary and sucient conditions of communication-free hyperplane partitioning for a single perfectly nested loop will be presented.

4 Communication-Free Hyperplane Partitioning for a Perfectly Nested Loop Each data array has a corresponding data space. However, a nested loop with multiple statements may have multiple statement-iteration spaces. In this section, we will consider additional condi10

tions of multiple statement-iteration spaces for communication-free hyperplane partitioning. These conditions are also used in determining statement-iteration hyperplanes and data hyperplanes. Suppose S = fs1 ; s2; : : :; sm g and D = fv1 ; v2; : : :; vn g, where m; n 2 Z+ . The number of occurrences of array variable vj in statement si is ri;j , where ri;j 2 Z+ [ f0g, i = 1; 2; : : :; m and j = 1; 2; : : :; n. If si does not reference vj , ri;j is set to 0. The previous representation of array reference function can be modi ed slightly to describe the array reference of statement si to variable vj in the k-th occurrence as Refks ;v (Is ), where 1  k  ri;j . The related representations will be changed accordingly, such as Refks ;v (Is ) = Fks ;v  Is + fks ;v = Dv . In this section, a partition group that contains a statement-iteration hyperplane for each statement-iteration space and a data hyperplane for each data space is considered. Suppose that the data hyperplane in data space DS (vj ) is g (vj ) = fDv jj  Dv = cg g, for all j; 1  j  n. Since Dv = Refks ;v (Is ), for i = 1; 2; : : :; m, j = 1; 2; : : :; n and k = 1; 2; : : :; ri;j and j  Dv = cg , we have j  Dv = cg , j  (Fks ;v  Is + fks ;v ) = cg , (j  Fks ;v )  Is = cg ? (j  fks ;v ): Let i

j

i

i

j

i

j

i

i

j

i

j

j

i

j

j

j

j

i

j

j

j

j

i

j

i

j

i

i

j

j

i

i

j

j

i = j  Fks ;v ; ch = cg ? (j  fks ;v ): i

(1) (2)

j

i

j

j

i

As a result, those statement-iterations that access the data lay on the data hyperplane g (vj ) = fDv jj  Dv = cg g will be located on the statement-iteration hyperplane h (Is ) = fIs j(j  Fks ;v )  Is = cg ? (j  fks ;v )g. To simplify the presentation, we assume all variables vj appear in every statement si . To satisfy that each statement-iteration space contains a unique statement-iteration hyperplane, the following two conditions should be met. (i) 8i, j  Fks ;v = j  Fks ;v ; (j 6= j 0 _ k 6= k0), for j; j 0 = 1; 2; : : :; n; k = 1; 2; : : :; ri;j and k0 = 1; 2; : : :; ri;j . i

j

j

j

i

i

j

i

i

j

j

i

j

0

i

j0

0

0

8i, cg ? (j  fks ;v ) = cg ? (j  fks ;v ); (j =6 j 0 _ k 6= k0), for j; j 0 = 1; 2; : : :; n; k = 1; 2; : : :; ri;j and k0 = 1; 2; : : :; ri;j . Condition (i) can infer to the following two equivalent equations. j  Fks ;v = j  F1s ;v ; for i = 1; 2; : : :; m; j = 1; 2; : : :; n and k = 2; 3; : : :; ri;j .

(ii)

i

j

j

j0

i

0

j0

0

0

i

j

i

j

(3)

j  F1s ;v = 1  F1s ;v1 ; (4) for i = 1; 2; : : :; m; j = 2; 3; : : :; n. Condition (ii) deduces the following two equations, and vice versa. j  fks ;v = j  f1s ;v ; (5) i

j

i

j

i

i

11

j

for i = 1; 2; : : :; m; j = 1; 2; : : :; n and k = 2; 3; : : :; ri;j .

cg = cg1 ? 1  f1s ;v1 + j  f1s ;v ; for i = 1; 2; : : :; m; j = 2; 3; : : :; n. i

i

(6)

j

j

Eq. (6) can be used to evaluate the data hyperplane constant terms while some constant term is xed, say cg1 . Furthermore, we obtain the following results. For some j , cg should be the same for all i, 1  i  m. Therefore, j

cg1 ? 1  f1s ;v1 + j  f1s ;v = cg1 ? 1  f1s1 ;v1 + j  f1s1 ;v ; i

i

(7)

j

j

for i = 2; 3; : : :; n and j = 2; 3; : : :; n. Eq. (7) can be further inferred to obtain the following equation: (8) j  (f1s ;v ? f1s1 ;v ) = 1  (f1s ;v1 ? f1s1 ;v1 ) for i = 2; 3; : : :; m and j = 2; 3; : : :; n. After describing the conditions for satisfying the communication-free hyperplane partitioning constraints, we can conclude the following theorem. i

j

j

i

Theorem 2 Let S = fs1; s2; : : :; smg and D = fv1; v2; : : :; vng be the sets of statements and array

variables, respectively. Refks ;v is the array reference function for statement si accessing array variables vj at the k-th occurrence in si , where i = 1; 2; : : :; m; j = 1; 2; : : :; n and k = 1; 2; : : :; ri;j . h (Is ) = fIs ji  Is = ch g is the statement-iteration hyperplane in SIS (si), for i = 1; 2; : : :; m. g (Dv ) = fDv jj  Dv = cg g is the data hyperplane in DS (vj ), for j = 1; 2; : : :; n. h (Is ) and g (Dv ) are communication-free hyperplane partitions if and only if the following conditions hold. i

i

i

i

i

i

j

j

j

j

j

j

(C1) (C2) (C3) (C4) (C5) (C6) (C7) (C8) (C9)

8i; j  Fks ;v = j  F1s ;v , for j = 1; 2; : : :; n; k = 2; 3; : : :; ri;j . 8i; j  F1s ;v = 1  F1s ;v1 , for j = 2; 3; : : :; n. 8i; j  fks ;v = j  f1s ;v , for j = 1; 2; : : :; n; k = 2; 3; : : :; ri;j. j  (f1s ;v ? f1s1 ;v ) = 1  (f1s ;v1 ? f1s1 ;v1 ), for i = 2; 3; : : :; m; j = 2; 3; : : :; n. 8j; tj 2 ([mi=1 [rk=1 Ker((Fks ;v )t))0. 8i; i = j  Fks ;v , for some j; k, j 2 f1; 2; : : :; ng; k 2 f1; 2; : : :; ri;j g. 8i; ti 2 (span([nj=1 [rk=1 Ker(Fks ;v )))?. 8j; j = 2; 3; : : :; n, cg = cg1 ? 1  f1s ;v1 + j  f1s ;v , for some i; i 2 f1; 2; : : :; mg. 8i; ch = cg ? (j  fks ;v ), for some j; k, j 2 f1; 2; : : :; ng; k 2 f1; 2; : : :; ri;j g. i

j

i

i

j

i

i

i

i

j

j

j

j

j

i

i

i;j

i

j

j

i;j

i

j

i

i

j

i

i

j

j

12

j

2

Theorem 2 can be used to determine whether a nested loop is communication-free. It can also be used as a procedure of nding a communication-free hyperplane partitioning systematically. Conditions (C1) to (C4) in Theorem 2 are used for nding the data hyperplane coecient vectors. Condition (C5) can check whether the data hyperplane coecient vectors found in preceding steps are within the legal range. Following the determination of the data hyperplane coecient vectors, the statement-iteration hyperplane coecient vectors can be obtained by using Condition (C6). Similarly, Condition (C7) can check whether the statement-iteration hyperplane coecient vectors are within the legal range. The data hyperplane constant terms and statement-iteration hyperplane constant terms can be obtained by using Conditions (C8) and (C9), respectively. If one of the conditions is violated, the whole procedure will stop and verify that the nested loop has no communication-free hyperplane partitioning. On the other hand, combining Equations (3) and (5) together, a sucient condition of communication-free hyperplane partitioning can be derived as follows. j (F1s ;v ? F2s ;v ; F1s ;v ? F3s ;v ;    ; F1s ;v ? Frs ;v ; f1s ;v ? f2s ;v ; f1s ;v ? f3s ;v ;    ; f1s ;v ? frs ;v ) = 0; i

j

i

j

i

j

i

j

i

j

i

j

i;j

i

j

i

j

i

j

i

j

i

j

i

j

i;j

for i = 1; 2; : : :; m and j = 1; 2; : : :; n. To satisfy the constraint that  is a non-zero row vector, the following condition should be true.

Rank(F1s ;v ? F2s ;v ;    ; F1s ;v ? Frs ;v ; f1s ;v ? f2s ;v ;    ; f1s ;v ? frs ;v ) < dim(DS (vj )); i

i

j

i

j

i

j

j

i;j

i

j

i

j

i

i

j

i;j

j

(9)

for i = 1; 2; : : :; m and j = 1; 2; : : :; n. Note that this condition is similar to the result in [7] for loop-level hyperplane partitioning. We conclude the following corollary.

Corollary 3 Suppose S = fs1; s2; : : :; smg and D = fv1; v2; : : :; vng are the sets of statements

and array variables, respectively. Fks ;v and fks ;v are the array reference coecient matrix and constant vector, respectively, where i 2 f1; 2; : : :; mg, j 2 f1; 2; : : :; ng and k 2 f1; 2; : : :; ri;j g. If communication-free hyperplane partitioning exists then Eq. (9) must hold. 2 i

j

i

j

Theorem 1 and Corollary 3 can be used to check the absence of communication-free hyperplane partitioning for a nested loop, because these conditions are sucient but not necessary. Theorem 1 is the statement-iteration space dimension test and Corollary 3 is the data space dimension test. To determine the existence of a communication-free hyperplane partitioning, we need to check the conditions in Theorem 2. We show the following example to explain the nding of communicationfree hyperplanes of statement-iteration spaces and data spaces.

Example 5: Reconsider loop L1. The set of statements S is fs1; s2g and the set of array variables D is fv1; v2g, where v1 = A and v2 = B. The occurrences of array variables are r1;1 = 2, r1;2 = 1, r2;1 = 1, and r2;2 = 2. From Section 2.1, the array reference coecient matrices and constant vectors for statements s1 and s2 are listed below, respectively. 13

"

#

"

#

"

#

= ?11 ?10 ; F2s1 ;v1 = ?11 ?11 ; F1s1 ;v2 = 11 12 ; " # " # " # 0 ? 1 0 s s s 1 ;v1 1 ;v1 1 ;v2 f1 = ?1 ; f2 = 1 ; f1 = ?1 ; " # " # " # 1 ? 1 0 1 1 1 s s s 2 ;v2 2 ;v1 2 ;v2 F1 = 1 ?2 ; F1 = 1 ?1 ; F2 = 1 1 ; " # " # " # 1 ? 1 ? 1 s s s 2 ;v2 2 ;v1 2 ;v2 f1 = 1 ; f1 = 0 ; f2 = ?2 :

F1s1 ;v1

Since dim(span([2j =1 [rk=1 Ker(Fks ;v ))) = 1 is less than dim(SIS (si)) = 2, for i = 1; 2. By Theorem 1, it may exist a communication-free hyperplane partitioning for loop L1. Again, by Corollary 3, the loop is tested for the possible existence of a nontrivial communication-free hyperplane partitioning. For array variable v1, the following inequality is satis ed: i;j

i

j

Rank(F1s1 ;v1 ? F2s1 ;v1 ; f1s1 ;v1 ? f2s1 ;v1 ) = 1 < dim(DS (v1)) = 2: Similarly, with respect to the array variable v2, the following inequality is obtained:

Rank(F1s2 ;v2 ? F2s2 ;v2 ; f1s2 ;v2 ? f2s2 ;v2 ) = 1 < dim(DS (v2)) = 2: Although Eq. (9) holds for all array variables, it still can not ensure that the loop has a nontrivial communication-free hyperplane partitioning. Using Theorem 2, we further check the existence of a nontrivial communication-free hyperplane partitioning. In the mean time, the statement-iteration and data hyperplanes will be derived if they exist. Recall that the dimensions of data spaces DS (v1) and DS (v2) are two, 1 and 2 can be assumed to be [11; 12] and [21; 22], respectively. The conditions listed in Theorem 2 will be checked to determine the hyperplane coecient vectors and constants. By Condition (C1) in Theorem 2, the following equations are obtained. 1  F2s1 ;v1 = 1  F1s1 ;v1 (i = 1; j = 1; and k = 2; ) 2  F2s2 ;v2 = 2  F1s2 ;v2 (i = 2; j = 2; and k = 2:) By the Condition (C2) in Theorem 2, 2  F1s1 ;v2 = 1  F1s1 ;v1 (i = 1 and j = 2; ) 2  F1s2 ;v2 = 1  F1s2 ;v1 (i = 2 and j = 2:) By Condition (C3) in Theorem 2,

1  f2s1 ;v1 = 1  f1s1 ;v1 (i = 1; j = 1; and k = 2; ) 2  f2s2 ;v2 = 2  f1s2 ;v2 (i = 2; j = 2; and k = 2:) By Condition (C4) in Theorem 2, 2  (f1s2 ;v2 ? f1s1 ;v2 ) = 1  (f1s2 ;v1 ? f1s1 ;v1 ) (i = 2 and j = 2; ) 14

Substituting [11; 12] and [21; 22] for 1 and 2 , respectively, the above equations form a homogeneous linear system. Solving this homogeneous linear system, we obtain the general solution (11; 12; 21; 22) = (2r; r; 3r; ?2r), where r 2 Q ? f0g. Therefore, 1 = [2r; r] and 2 = [3r; ?2r]. Next, we show 1 and 2 satisfy Condition (C5): ([mi=1 [rk=11 Ker((Fks ;v1 )t)) = fc1[1; 1]tjc1 2 Q ? f0gg; =) ([mi=1 [rk=11 Ker((Fks ;v1 )t))0 = f[r1; r2]tjr1 6= r2; r1; r2 2 Q ? f0gg; =) t1 = [2r; r]t 2 ([mi=1 [rk=11 Ker((Fks ;v1 )t))0: i;

i

i;

i

i;

i

([mi=1 [rk=12 Ker((Fks ;v2 )t)) = fc2[1; ?1]tjc2 2 Q ? f0gg; =) ([mi=1 [rk=12 Ker((Fks ;v2 )t))0 = f[r1; r2]tjr1 6= ?r2; r1; r2 2 Q ? f0gg; =) t2 = [3r; ?2r]t 2 ([mi=1 [rk=12 Ker((Fks ;v2 )t))0: i;

i

i;

i

i;

i

Now the statement-iteration hyperplane coecient vectors can be determined using Condition (C6) in Theorem 2. 1 = 1  F1s1 ;v1 = 1  F2s1 ;v1 = 2  F1s1 ;v2 = (r; ?r) 2 = 2  F1s2 ;v2 = 1  F1s2 ;v1 = 2  F2s2 ;v2 = (r; r) Note that the statement-iteration hyperplane coecient vectors may be obtained using many different equations, e.g., 1 can be obtained using 1  F1s1 ;v1 , 1  F2s1 ;v1 , or 2  F1s1 ;v2 . Conditions (C1) and (C2) in Theorem 2 ensure that all the equations lead to the same result. For the statement-iteration hyperplane coecient vectors, Condition (C7) is satis ed:

span([nj=1 [rk1=1 Ker(Fks1 ;v )) = fc3[1; 1]tjc3 2 Q ? f0gg; =) (span([nj=1 [rk1=1 Ker(Fks1 ;v )))? = f[r1; ?r1]tjr1 2 Q ? f0gg =) t1 = [r; ?r]t 2 (span([nj=1 [rk1=1 Ker(Fks1 ;v )))?: ;j

j

j

;j

j

;j

span([nj=1 [rk2=1 Ker(Fks2 ;v )) = fc4[1; ?1]tjc4 2 Q ? f0gg; =) (span([nj=1 [rk2=1 Ker(Fks2 ;v )))? = f[r2; r2]tjr2 2 Q ? f0gg; =) t2 = [r; r]t 2 (span([nj=1 [rk2=1 Ker(Fks2 ;v )))?: ;j

;j

j

j

;j

j

Next, we determine the data hyperplane constant terms. Due to the hyperplanes are related to each other, once a hyperplane constant term is determined, the other constant terms will be determined accordingly. Assuming cg1 is known, cg2 , ch1 , and ch2 can be determined using Conditions (C8) and (C9) as below:

cg2 = cg1 ? 1  f1s1 ;v1 + 2  f1s1 ;v2 = cg1 ? 1  f1s2;v1 + 2  f1s2 ;v2 = cg1 + 3r; ch1 = cg1 ? 1  f1s1 ;v1 = cg1 ? 1  f2s1 ;v1 = cg2 ? 2  f1s1 ;v2 = cg1 + r; ch2 = cg2 ? 2  f1s2 ;v2 = cg1 ? 1  f1s2 ;v1 = cg2 ? 2  f2s2 ;v2 = cg1 + 2r: Similarly, statement-iteration and data hyperplane constant terms can be evaluated using many di erent equations. However, Conditions (C3) and (C4) in Theorem 2 ensure that they all lead to the same values. 15

It is clear that there exists at least one set of nonzero statement-iteration and data hyperplane coecient vectors such that the conditions listed in Theorem 2 are all satis ed. By Theorem 2, this fact implies that the nested loop has a nontrivial communication-free hyperplane partitioning. The partition group is de ned as the set of statement-iteration and data hyperplanes that are allocated to a processor. The partition group for this example follows. G = h1 (Is1 ) [ h2 (Is2 ) [ g1 (Dv1 ) [ g2 (Dv2 ), where h1 (Is1 ) = fIs1 j [r; ?r]  Is1 = cg1 + rg h2 (Is2 ) = fIs2 j [r; r]  Is2 = cg1 + 2rg g1 (Dv1 ) = fDv1 j [2r; r]  Dv1 = cg1 g g2 (Dv2 ) = fDv2 j [3r; ?2r]  Dv2 = cg1 + 3rg Given loop bounds 1  i1  5 and 1  i2  5, for r = 1, the constant term cg1 corresponding to statement-iteration hyperplane coecient vector 1 and 2 are ranged from ?5 to 3 and from 0 to 8, respectively. The intersection part of these two ranges means that the two statement-iteration hyperplanes have to be coupled together onto a processor. For the rest, just one statement-iteration hyperplane, either 1 or 2, is allocated to a processor. The constant terms cg2 , ch1 , and ch2 are evaluated to the following values:

?2  cg2  11;

? 4  ch1  4; and 2  ch2  10:

The corresponding parallelized program is as follows. doall c = ?5, 8 do i1 = max(c ? 3; 1), min(c + 1; 5) i2 = ?i1 + c + 2 B[i1 ? i2 + 1; i1 ? 2i2 + 1] = A[i2 ? 1; i1 ? i2]  B[i1 + i2 ? 1; i1 + i2 ? 2]

enddo do i1 = max(c + 2; 1), min(c + 6; 5) i2 = i1 ? c ? 1 A[i1; ?i1 ? i2 ? 1] = A[i1 ? i2 ? 1; ?i1 + i2 + 1] + B[i1 + i2; i1 + 2i2 ? 1] enddo enddoall

Fig. 3 illustrates the communication-free hyperplane partitionings for a particular partition 2 group, r = 1 and cg1 = 2. The communication-free hyperplane partitioning technique for a perfectly nested loop has been discussed in this section. Our method treats statements within a loop body as separate schedulable units and considers both iteration and data spaces at the same time. Partitioning groups are determined using ane array reference functions directly, instead of using data dependence vectors.

5 Communication-Free Hyperplane Partitioning for Sequences of Imperfectly Nested Loops The conditions presented in Section 4 for communication-free hyperplane partitioning can be applicable to the general case for sequences of imperfectly nested loops. In a perfectly nested loop, all 16

Figure 3: Communication-free statement-iteration hyperplanes and data hyperplanes for a partition group of loop (L1), where r = 1 and cg1 = 2. (a) Statement-iteration hyperplane of SIS (s1). (b) Statement-iteration hyperplane of SIS (s2). (c) Data hyperplane of DS (A). (d) Data hyperplane of DS (B ). statements are enclosed in the same depth of the nested loop, i.e., the statement-iteration space of each statement has the same dimensionality. The statement-iteration spaces of two statements in imperfectly nested loops may have di erent dimension. Since each statement-iteration is a schedulable unit and the partitioning technique is independent to the dimensionality of statement-iteration spaces, Theorem 2 can be directly applied to sequences of imperfectly nested loops. The following example is to demonstrate the technique in applying to sequences of imperfectly nested loops.

Example 6: Consider the following sequences of nested loops L2. do i1 = 1, N do i2 = 1, N s1:

s2 :

A[i1 + i2 ; 1] = B[i1 + i2 + 1; i1 + i2 + 2]+ C [i1 + 1; ?2i1 + 2i2; 2i1 ? i2 + 1] do i3 = 1, N B[i1 + i3 + 1; i2 + i3 +1] = A[2i1 + 2i3; i2 + i3 ]+ C [i1 + i2 + 1; ?i2 + i3 + 1; i1 ? i2 + 1] 17

enddo enddo enddo do i1 = 1, N do i2 = 1, N do i3 = 1, N s3 :

s4:

(L2)

C [i1 + 1; i2; i2 + i3] = A[2i1 + 3i2+ i3 ; i1 + i2 + 2]+ B[i1 + i2; i1 ? i3 + 1]

enddo

A[i1; i2 + 3] = B[i1 ? i2; i1 ? i2 + 2] + C [i1 + i2; ?i2; ?i2]

enddo enddo

The set of statements S is fs1 ; s2; s3; s4 g. The set of array variables is D = fv1; v2; v3g, where v1 , v2, and v3 represent A, B, and C , respectively. The values of r11, r12, r13, r21, r22, r23, r31, r32, r33, r41, r42, and r43 all are 1. We use Theorem 1 and Corollary 3 to verify whether (L2) P has no communication-free hyperplane partitioning. Since dim( v2D Ker(F1s ;v )) = 1, which is smaller than dim(SIS (si)), for i = 1; : : :; 4. Theorem 1 is helpless for ensuring that (L2) exists no communication-free hyperplane partitioning. Corollary 3 is useless here because all the values of rij are 1, for i = 1; : : :; 4; j = 1; : : :; 3. Further examinations are necessary, because Theorem 1 and Corollary 3 can not prove that (L2) has no communication-free hyperplane partitioning, From Theorem 2, if a communication-free hyperplane partitioning exists, the conditions listed in Theorem 2 should be satis ed; otherwise, (L2) exists no communication-free hyperplane partitioning. Due to the dimensions of the data spaces DS (v1), DS (v2), and DS (v3) are 2, 2, and 3, respectively, without loss of generality, the data hyperplane coecient vectors can be respectively assumed to be 1 = [11; 12], 2 = [21; 22], and 3 = [31; 32; 33]. In what follows, the requirements to satisfy the feasibility of communication-free hyperplane partitioning are examined one-by-one. There is no need to examine the Conditions (C1) and (C3) because all the values of rij are 1. By Condition (C2), we obtain i

2  F1s1 ;v2 2  F1s2 ;v2 2  F1s3 ;v2 2  F1s4 ;v2

= = = =

3  F1s1 ;v3 3  F1s2 ;v3 3  F1s3 ;v3 3  F1s4 ;v3

1  F1s1 ;v1 ; 1  F1s2 ;v1 ; 1  F1s3 ;v1 ; 1  F1s4 ;v1 ;

= = = =

1  F1s1 ;v1 ; 1  F1s2 ;v1 ; 1  F1s3 ;v1 ; 1  F1s4 ;v1 :

By Condition (C4), we obtain 2  (f1s2 ;v2 ? f1s1 ;v2 ) 3  (f1s2 ;v3 ? f1s1 ;v3 ) 2  (f1s3 ;v2 ? f1s1 ;v2 ) 3  (f1s3 ;v3 ? f1s1 ;v3 ) 2  (f1s4 ;v2 ? f1s1 ;v2 ) 3  (f1s4 ;v3 ? f1s1 ;v3 )

= = = = = =

1  (f1s2 ;v1 ? f1s1 ;v1 ); 1  (f1s2 ;v1 ? f1s1 ;v1 ); 1  (f1s3 ;v1 ? f1s1 ;v1 ); 1  (f1s3 ;v1 ? f1s1 ;v1 ); 1  (f1s4 ;v1 ? f1s1 ;v1 ); 1  (f1s4 ;v1 ? f1s1 ;v1 ):

Solving the above linear system, the general solutions are (11, 12, 21, 22, 31, 32 , 33 ) = (t, ?t, 2t, ?t, t, t, t), t 2 Q ? f0g. Therefore, 1 = [t; ?t], 2 = [2t; ?t] and 3 = [t; t; t].

18

The veri cation of Condition (C5) is as follows:

[mi=1 Ker((F1s ;v1 )t) = fc1[0; 1]tjc1 2 Q ? f0gg =) t1 2 ([mi=1 Ker((F1s ;v1 )t ))0: [mi=1 Ker((F1s ;v2 )t)s=;vfc2[1; ?1]tjc2 2 Q ? f0gg [ fc3[1; 1]tjc3 2 Q ? f0gg =) t2 2 ([mi=1 Ker((F1 2 )t ))0: [mi=1 Ker((F1s ;v3 )t) = fc4[2; ?1; ?2]tjc4 2 Q ? f0gg [ fc5[0; 1; ?1]tjc5 2 Q ? f0gg =) t3 2 ([mi=1 Ker((F1s ;v3 )t ))0: i

i

i

i

i

i

All the data hyperplane coecient vectors are within the legal range. The statement-iteration hyperplane coecient vectors can be determined by Condition (C6) as follows. 1 = 1  F1s1 ;v1 = 2  F1s1 ;v2 = 3  F1s1 ;v3 = [t; t] 2 = 1  F1s2 ;v1 = 2  F1s2 ;v2 = 3  F1s2 ;v3 = [2t; ?t; t] 3 = 1  F1s3 ;v1 = 2  F1s3 ;v2 = 3  F1s3 ;v3 = [t; 2t; t] 4 = 1  F1s4 ;v1 = 2  F1s4 ;v2 = 3  F1s4 ;v3 = [t; ?t] The legality of these statement-iteration hyperplane coecient vectors is then checked by Condition (C7) as follows:

span([nj=1 Ker(F1s1 ;v )) = fc6[1; ?1]tjc6 2 Q ? f0gg =) t1 2 (span([nj=1 Ker(F1s1;v )))?: span([nj=1 Ker(F1s2 ;v )) = fc7[1; 1; ?1]tjc7 2 Q ? f0gg =) t2 2 (span([nj=1 Ker(F1s2 ;v )))?: span([nj=1 Ker(F1s3 ;v )) = fc8[1; ?1; 1]tjc8 2 Q ? f0gg =) t3 2 (span([nj=1 Ker(F1s3 ;v )))?: span(Pnj=1 Ker(F1s4 ;v )) = fc9[1; 1]tjc9 2 Q ? f0gg =) t4 2 (span([nj=1 Ker(F1s4 ;v )))? : j

j

j

j

j

j

j

j

From the above observation, all the statement-iteration and data hyperplane coecient vectors are legal. This fact reveals that the nested loops has communication-free hyperplane partitionings. Next, the data and statement-iteration hyperplanes constant terms are decided. First, let one data hyperplane constant term be xed, say cg1 . The rest of data hyperplane constant terms can be determined by Condition (C8).

cg2 = cg1 ? 1  f1s1 ;v1 + 2  f1s1 ;v2 = cg1 ? 1  f1s3 ;v1 + 2  f1s3 ;v2 = c g1 + t cg3 = cg1 ? 1  f1s1 ;v1 + 3  f1s1 ;v3 = cg1 ? 1  f1s3 ;v1 + 3  f1s3 ;v3 = cg1 + 3t

= cg1 ? 1  f1s2 ;v1 + 2  f1s2 ;v2 = cg1 ? 1  f1s4 ;v1 + 2  f1s4 ;v2 = cg1 ? 1  f1s2 ;v1 + 3  f1s2 ;v3 = cg1 ? 1  f1s4 ;v1 + 3  f1s4 ;v3

Similarly, the statement-iteration hyperplane constant terms can be determined by Condition (C9) after data hyperplane constant terms have been decided.

ch1 = cg1 ? 1  f1s1 ;v1 = cg2 ? 2  f2s1 ;v2 = cg3 ? 3  f1s1 ;v3 = cg1 + t ch2 = cg1 ? 1  f1s2 ;v1 = cg2 ? 2  f1s2 ;v2 = cg3 ? 3  f1s2 ;v3 = cg1 ch3 = cg1 ? 1  f1s3 ;v1 = cg2 ? 2  f1s3 ;v2 = cg3 ? 3  f1s3 ;v3 = cg1 + 2t ch4 = cg1 ? 1  f1s4 ;v1 = cg2 ? 2  f1s4 ;v2 = cg3 ? 3  f1s4 ;v3 = cg1 + 3t 19

The corresponding partition group is as follows. G = h1 (Is1 ) [ h2 (Is2 ) [ h3 (Is3 ) [ h4 (Is4 ) [ g1 (Dv1 ) [ g2 (Dv2 ) [ g3 (Dv3 ), where h1 (Is1 ) = fIs1 j [t; t]  Is1 = cg1 + tg; h2 (Is2 ) = fIs2 j [2t; ?t; t]  Is2 = cg1 g; h3 (Is3 ) = fIs3 j [t; 2t; t]  Is3 = cg1 + 2tg; h4 (Is4 ) = fIs4 j [t; ?t]  Is4 = cg1 + 3tg; g1 (Dv1 ) = fDv1 j [t; ?t]  Dv1 = cg1 g; g2 (Dv2 ) = fDv2 j [2t; ?t]  Dv2 = cg1 + tg; g3 (Dv3 ) = fDv3 j [t; t; t]  Dv3 = cg1 + 3tg: Fig. 4 illustrates the communication-free hyperplane partitionings for a partition group, where t = 1 and cg1 = 0. The corresponding parallelized program is as follows. doall c = ?7, 18 c+4 do i1 = max(min(c ? 4; d c?4 2 e); 1), min(max(c; b 2 c); 5) if( max(c ? 4; 1)  i1  min(c; 5) ) i2 = c ? i 1 + 1 A[i1 + i2 ; 1] = B[i1 + i2 ? 1; i1 + i2 + 2] + C [i1 + 1; ?2i1 + 2i2; 2i1 ? i2 + 1]

endif do i2 = max(2i1 ? c + 1; 1), min(2i1 ? c + 5; 5) i3 = c ? 2i1 + i2

B[i1 + i3 + 1; i2 + i3 + 1] = A[2i1 + 2i3; i2 + i3]+ C [i1 + i2 + 1; ?i2 + i3 + 1; i1 ? i2 + 1]

enddo enddo do i1 = max(c ? 13; 1), min(c ? 1; 5) do i2 = max(d c?i21?3 e; 1), min(b c?i21 +1 c; 5) i3 = c ? i1 ? 2i2 + 2

C [i1 + 1; i2; i2 + i3 ] = A[2i1 + 3i2 + i3; i1 + i2 + 2] + B[i1 + i2; i1 ? i3 + 1]

enddo enddo do i1 = max(c + 4; 1), min(c + 8; 5) i2 = i1 ? c ? 3 A[i1; i2 + 3] = B[i1 ? i2; i1 ? i2 + 2] + C [i1 + i2 ; ?i2; ?i2] enddo enddoall

6 Conclusions This paper presents the techniques for nding statement-level communication-free hyperplane partitioning for a perfectly nested loop and sequences of imperfectly nested loops. The necessary and sucient conditions for the feasibility of communication-free partitioning along hyperplane are proposed. The techniques can be applied to loops with ane array references and do not use any information of data dependence distances or direction vectors. Although our goal is to determine communication-free partitioning for loops, in reality, most loops are not communication-free. If a program is not communication-free, the technique can be 20

Figure 4: Communication-free statement-iteration hyperplanes and data hyperplanes for a partition group of loop (L2), where t = 1 and cg1 = 0. (a) Statement-iteration hyperplane of SIS (s1). (b) Statement-iteration hyperplane of SIS (s2). (c) Statement-iteration hyperplane of SIS (s3). (d) Statement-iteration hyperplane of SIS (s4). (e) Data hyperplane of DS (A). (f) Data hyperplane of DS (B ). (g) Data hyperplane of DS (C ).

21

used to identify subsets of statement-iteration and data spaces which are communication-free. For other statement-iterations, it is necessary to generate communication code. Two important tasks in our future work are to develop heuristics for searching a subset of statement-iterations which is communication-free and to generate ecient code when communication is inevitable.

References [1] J. M. Anderson and M. S. Lam, \Global optimizations for parallelism and locality on scalable parallel machines," in Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation, pp. 112{125, June 1993. [2] T. S. Chen and J. P. Sheu, \Communication-free data allocation techniques for parallelizing compilers on multicomputers," IEEE Transactions on Parallel and Distributed Systems, vol. 5, pp. 924{938, Sept. 1994. [3] M. Gupta and P. Banerjee, \Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers," IEEE Transactions on Parallel and Distributed Systems, vol. 3, pp. 179{193, Mar. 1992. [4] S. Hiranandani, K. Kennedy, and C. W. Tseng, \Compiling Fortran D for MIMD distributedmemory machines," Communications of the ACM, vol. 35, pp. 66{80, Aug. 1992. [5] S. Hiranandani, K. Kennedy, and C. W. Tseng, \Evaluating compiler optimizations for Fortran D," Journal of Parallel and Distributed Computing, vol. 21, pp. 27{45, 1994. [6] K. Ho man and R. Kunze, Linear Algebra. Englewood Cli s, New Jersey: Prentice-Hall, Inc., second ed., 1971. [7] C.-H. Huang and P. Sadayappan, \Communication-free hyperplane partitioning of nested loops," Journal of Parallel and Distributed Computing, vol. 19, pp. 90{102, 1993. [8] A. H. Karp, \Programming for parallelism," IEEE Comput. Mag., vol. 20, pp. 43{57, May 1987. [9] C. Koelbel, Compiling Programs for Nonshared Memory Machines. PhD thesis, Dept. of Computer Science, Purdue University, Nov. 1990. [10] C. Koelbel and P. Mehrotra, \Compiling global name-space parallel loops for distributed execution," IEEE Transactions on Parallel and Distributed Systems, vol. 2, pp. 440{451, Oct. 1991. [11] J. Li and M. Chen, \Index domain alignment: Minimizing cost of cross-referencing between distributed arrays," in Proceedings 3rd Symposium on tlhe Frontiers of Massively Computation, pp. 424{433, Oct. 1990. [12] J. Li and M. Chen, \The data alignment phase in compiling programs for distributed-memory machines," Journal of Parallel and Distributed Computing, vol. 13, pp. 213{221, 1991. [13] A. W. Lim and M. S. Lam, \Communication-free parallelization via ane transformations," in Proceedings of the 7th Workshop on Programming Languages and Compilers for Parallel Computing, Aug. 1994. [14] M. Mace, Memory Storage Patterns in Parallel Processing. Kluwer Academic Publishers, 1987. 22

[15] J. Ramanujam and P. Sadayappan, \Compile-time techniques for data distribution in distributed memory machines," IEEE Transactions on Parallel and Distributed Systems, vol. 2, pp. 472{482, Oct. 1991. [16] A. Rogers and K. Pingali, \Process decomposition through locality of reference," in Proceedings of the ACM SIGPLAN'89 Conference on Programming Language Design and Implementation, pp. 69{80, June 1989. [17] M. Rosing, R. B. Schnabel, and R. P. Weaver, \The dino parallel programming language," Journal of Parallel and Distributed Computing, vol. 13, pp. 30{42, 1991. [18] C.-W. Tseng, An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Dept. of Computer Science, Rice University, Jan. 1993. [19] M. E. Wolf and M. S. Lam, \A loop transformation theory and an algorithm to maximize parallelism," IEEE Transactions on Parallel and Distributed Systems, vol. 2, pp. 452{471, Oct. 1991. [20] M. J. Wolfe, High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1996. [21] H. P. Zima, P. Brezany, and B. M. Chapman, \SUPERB and Vienna Fortran," Parallel Computing, vol. 20, pp. 1487{1517, 1994.

23

Suggest Documents