ACCESS AND ALIGNMENT OF DATA J. Jorda, A. Mzoughi, D. Litaize Institut de Recherche en Informatique de Toulouse Universite Paul Sabatier 118 Route de Narbonne, 31062 Toulouse cedex, France ABSTRACT The access of data is a major problem in array and vector processors. As a solution is to use numerous independent memory banks in parallel, one question arise : how can we store a matrix of size N N in the dierent memory banks to allow con ict-free access on commonly used vectors (rows, columns, diagonals, etc...). Many solutions have yet been discussed. For example, one can skew the data in the dierent memory banks. The skewing scheme we introduce uses the advantages of Budnik and Kuck principles and also of Lawrie, without the drawbacks. In fact, the idea is based to a part of the study of the one proposed by Lawrie. It allows, for a system including N memory banks, con ict-free access on various slices of data. It is characterized by both ecient use of the number of memory banks, and simplicity of implementation. Finally, we suggest an alignment network to be used with this storage scheme. Simulations showed this network being able to align accessed vectors. Keywords: Alignment network, array processors, memory con ict, parallel memory, skewing schemes, vector processors
1. Introduction
Many solutions for the problem of data access have been proposed. Budnik and Kuck [1] have introduce a linear storage scheme allowing access without contention on rows, columns, diagonals and square blocks of N N matrix, but using a prime number of memory modules greater than N , which implies a complex hardware for generating the addresses of vector elements, and a crossbar network for data alignment. Lawrie [2] generalized the linear storage scheme with a relatively easy address generation. His scheme enables access on p p con ict-free basic vectors of a matrix (rows, columns, diagonals and N N square blocks). But his solution implies the use of 2N memory banks for matrix no larger than N N . Thus, his model suer from the under-utilization of memory banks. Harper III and Jump [3] introduced an evaluation of linear storage scheme, and showed that they improve the average performance of the memory system. Recently, an algebraic theory of periodic skewing scheme was introduced by wijsho and Leeuwen [4] (fundamentals for two-dimensional schemes was due to Budnik and Kuck [1], and Shapiro [5]). Also, non linear schemes have been proposed. De-lei Lee [6] have introduce a storage scheme that doesn't allow diagonal access, and which necessitate four times the data in memory. Weiss [7], uses an ecient storage scheme, but no alignment network is presented. The eciency of storage schemes depends on the number of memory banks, the scheme used and the hardware required to perform necessary operations. A network must be used with such storage to align data from memory before processing, and to memory after. An N N Crossbar could be used, but its cost is very high. In general, multistage interconnection networks are used on
e-mail :
[email protected] [email protected] [email protected]
Jorda, Mzoughi, Litaize
multiprocessors systems because of their lower cost [8],[2],[9]. These networks can connect any input to any output, but simultaneous connections may lead to con icts (in such connection, a switching element is to broadcast dierent inputs by the same output). Their properties have been studied with numerous mathematical approach (permutations, group or number theory). An algebraic theory [10] based on tensor products is presented, in order to specify and verify the properties of these networks. The skewing scheme we introduce allows, for a system including N memory banks (N = 22n+1 ), con ict-free access on various slices of data for (N ? 1) (N ? 1) matrices (matrix of size greater than (N ? 1) (N ? 1) are subdivided into submatrices of size (N ? 1) (N ? 1)). Our principle is to half an A(N ? 1; N ? 1) matrix into two submatrices : the left half Al (N ? 1; N2 ) and the right half Ar (N ? 1; N2 ? 1). To store the initial matrix A, it necessitates two steps : First, we store Al, the rst half of A (columns number 0 to N2 ?1) with classic Lawrie's skewing scheme. ai;j 2 Al is stored at location (2n + 1) i + 2 j ; the rst element of matrix Al is stored in memory module number 0. Then, we store Ar , the second half of A (columns number N2 to N ? 2) with the same linear scheme : ai;j 2 Ar is stored at location (2n + 1) i + 2 j , but the basis (the number of the memory bank in which the rst element of the matrix Ar is stored) is (2n + 3) instead of 0. In our paper, we demonstrate (using number theory approach) that this storage scheme allows con ict-free q access on rows, columns, forward diagonal and M M square block (M N2 ). It is characterized by both ecient use of the number of memory banks, and simplicity of implementation. We nally suggest an alignment network to be used with this storage scheme. It is a multistage interconnection network with log2 N + 1 identical stages (consisting of a perfect shue followed by N2 switching element), followed by an inverse shue at the output.
2. General Formula of the Two Basis Skewing Scheme This skewing scheme is to be used for 2 N = 2 2 n memory banks; it allows (2 N ? 1) (2 N ? 1) matrix storing, and ensures con ict-free accessp on rows, columns, forward diagonal and M M square blocks (where M N ). We 2
de ne the two-basis periodic skewing schemes by the general formula : p 0 if 0 j N ? 1 i;j = ( N + 1) i + 2 j + b(j ) where b(j ) = p N + 3 if N j 2N ? 2 (1) p It is an ( N + 1; 2) skewing scheme on the two halves of the matrix;pthe rst one : Al is stored with the basis 0, the second one Ar with the basis ( N + 3). We have shown, in Fig. 1, the partition made on the matrix A. The basis represents the number of the memory bank in which will be stored the rst element of each submatrix. It can be seen as a periodic skewing scheme studied by Wijsho and Leeuwen [4]. In fact, the general formula could be written : p p 0 j N ?1 i;j = ( N + 1) i + 2 j + ( N + 3) k where k = 10 ifif N j 2N ? 2
Access and alignment of data N -1
0
A
N
2N-2
A l
r
Fig. 1. The partition of matrix A in two submatrices Al and Ar
This is to say one consider the two dimensional matrix A as a three dimensional matrix, as shown on Fig. 2. N
0
A
N -1 2N-2
l
A
r
Fig. 1. The transformation of matrix A in a three dimensional matrix
For example, for N = 4, i.e. 2N = 8 memory banks, we have : 0 1 2 3 4 5 6 7
a0;0 a1;4 a2;1 a3;5 a4;2 a5;6 a6;3
a0;6 a0;1 a1;3 a1;5 a2;2 a3;0 a3;6 a4;4 a4;3 a5;1 a6;5 a6;0
a1;0 a2;4 a3;1 a4;5 a5;2 a6;6
a0;2 a0;4 a1;6 a1;1 a2;3 a2;5 a3;2 a4;0 a4;6 a5;4 a5;3 a6;1
a0;3 a0;5 a1;2 a2;0 a2;6 a3;4 a3;3 a4;1 a5;5 a5;0 a6;2 a6;4
3. Con ict Free Access with Two Basis Skewing Scheme
Jorda, Mzoughi, Litaize
3.1. Row access Theorem 1 The two basis skewing scheme ensures con ict-free access on rows, for (2 N ? 1) (2 N ? 1) maximal size matrix. Proof. Let ai;j and ai;k two elements from one of the matrix row. There will be memory contention while accessing these elements if and only if :
i;j 2N i;k
p p , ( N + 1) i + 2 j + b(j ) N ( N + 1) i + 2 k + b(k) , 2 j + b(j ) N 2 k + b(k) (a) If the two elements are in the same submatrix (Al or Ar ), then b(j ) = b(k) and therefore the condition (a) becomes 2 j N 2 k. Since 0 j; k N ? 1 or N j; k 2N ? 2, we have 0 2j; 2k (mod 2N ) 2N ? 2. The only solution is j = k, that is to say we can't have con ict on two elements of a row if they are 2
2
2
in the same submatrix. Now, assume p that ai;j 2 Al and ai;k 2 Ar . The condition (a) is now 2 (j ? k) 2N ( N + 3). Since 0 j N ? 1 and N k 2N ? 2, if we assume = j ? k we have p a con ict between ai;j and ai;k if and only if 9 2 [2 ? 2N; ?1] 2 2N ( N + 3). Assume this condition is veri ed for a given . Then, there exists k 2 Z such that 2 = 2n + 3 + k 22n+1 . Now, 2 is even and 2n + 3 + k 22n+1 is odd; therefore, previous equality is always false 2.
3.2. Column access Theorem 2 The two basis skewing scheme ensures con ict-free access on columns, for (2 N ? 1) (2 N ? 1) maximal size matrix. Proof. Let ai;j and ak;j two elements from one of the matrix column. There will be memory contention while accessing these elements if and only if :
i;j 2N k;j p p , ( N + 1) i + 2 j + b(j ) 2N ( N + 1) k + 2 j + b(j ) p p , ( N + 1) i 2N ( N + 1) k (b) p If gcd (( N + 1); 2N ) = 1, the condition (b) is i 2N k. Since p i; k 2 [0; 2N ? 2], we get i = k. Assume that there exists d = 6 1 such that gcd (( N + 1); 2 N ) = d.
Then, we have : 22n+1 = d 2 2n + 1 = d We get d = 22n+1? and 2n + 1 = 22n+1? . Therefore, since 2n + 1 is odd and 22n+1? is even, this is impossible 2. 3.3. Forward diagonal access Theorem 3 The two basis skewing scheme ensures con ict-free access on forward diagonal, for (2 N ? 1) (2 N ? 1) maximal size matrix.
Access and alignment of data Proof. Let ai;i and aj;j two elements from the matrix forward diagonal. There will be memory contention while accessing these elements if and only if :
i;i 2N j;j p p , ( N + 3) i + b(i) 2N ( N + 3) j + b(j ) (c) If the two elements are in the same submatrix (Al or Ar ), pthen b(i) = b(j ) p N + 3) i 2N ( N + 3) j . If and therefore the condition ( c ) becomes ( p gcd (( N + 3); N ) = 1, and because 0 i; j N ? 1por N i; j 2N ? 2, we get i = j . Assume there exists d = 6 1 such that gcd (( N + 3); 2 N ) = d. Then,
we have : 22n+1 = d 2 2n + 3 = d We get d = 22n+1? and 2n +3 = 22n+1? . Since 2n +3 is odd and 22n+1? is even, we must have d = 1, that is to say i = j . p N+ Now, assume that a 2 A and a 2 A . The condition ( c ) is now ( i;i l j;j r p p 3) i 2N ( N + 3) (j + 1). Since gcd (( N + 3); N ) = 1, this condition is equivalent to i 2N j + 1, which is impossible because ai;i 2 Al and aj;j 2 Ar 2. 3.4. Square block access We will need the two following lemma to prove that our skewing scheme is able to access M M square blocks without memory contention : Lemma 1 The two basis skewing scheme ensures con ict-free p access on ax1 ;x2 and ay1 ;y2 , two elements of an M M square block (M N ), if they are both in Al or in Ar . p N . Let ax1 ;x2 Proof. Let B an M M square block of matrix A, where M and ay1 ;y2 two elements of B . There will be memory contention while accessing these elements if and only if :
x1 ;x2 2N y1 ;y2 p p , ( N + 1) x1 + 2 x2 + b(x2) 2N ( N + 1) y1 + 2 y2 + b(y2) (d) Since the two elements are in the samepsubmatrix (Al or Ar ), b(x2) = b(y2) and therefore the condition (d) becomes ( N + 1) (x1 ? y1 ) 2N 2 (y2 ? x2), p then 9k 2 Z like ( N + 1) (x1 ? y1 ) = 2 (y2 ? x2 ) + 2k N p ) N (x1 ? y1) + (x1 ? y1) ? 2 (y2 ? x2) = 2k N p ) (x1 ? y1) + (x1p? y1) ? p2 (y2 ? x2) = 2k N N N Now, we have p p 1 ? N (x1 ? y1) N ? 1
p1
N
?1
xp 1?y1 ) N
(
1 ? pN 1
Jorda, Mzoughi, Litaize
2 ( pN ? 1) ? pN (y ? x ) 2 (1 ? pN ) p p p Therefore pN ? N ? 2 2k N 2 ? pN + N 1
2
2
1
2
3
3
) 23N ? 12 ? p1 k p1 ? 23N + 21 N N p Assumep N ? ? pN ?1. Then 3 ? N ? 2 N ?2N , so that N ? 2 N + 3 0. p p Assume N ? ? pN 1. Then 2 N ?3+N 2N , so that N ?2 N +3 1 2
3 2
3 2
0.
1
1 2
1
p
This is a second degree equationpof N which is always false. Thus, if (x1 ? y1 )+ (x1 ?y1 ) p ? p2 (y2 ? x2 ) = 2k N then k = 0, and we obtain : N N (x1 ? y1 ) + (x1p? y1 ) = p2 (y2 ? x2 )
First, assume ?
p
N
p
We have ?1 < p2N (y2 ? x2 ) < 1, and then, from previous equation : x1 ? y1 = 0 and y2 ? x2 = 0. N
2
< (y2 ? x2)
N or (y ? x ) < ? N . It follows x ? y = 1 or x ? y = ?1 and y ? x = , or y ? x = ? which is impossible. 2
1
1
2
2
2
2
1 2
2
2
2
2
2 1 2
1
1
Thus the con ict-free access condition is veri ed if the square block elements are both in Al or in Ar 2. Lemma 2 The two basis skewing scheme ensures con ict-free p access on ax1 ;x2 and ay1 ;y2 , two elements of an M M square block (M N ), if they are one in Al or the other one in Ar . p N . Let ax1 ;x2 Proof. Let B an M M square block of matrix A, where M and ay1 ;y2 two elements of B . There will be memory contention while accessing these elements if and only if :
x1 ;x2 2N y1 ;y2 p p , ( N + 1) x1 + 2 x2 + b(x2) 2N ( N + 1) y1 + 2 y2 + b(y2) (e) p Assume ax1 ;x2 2 Al and ap e) is now ( N + 1) y1 ;y2 2 Ar . The condition (p (x1 ? y1 ) 2N p2 (y2 ? x2 ) + N + 3, then 9k 2 Z like ( N + 1) (x1 ? y1 ) = 2 (y2 ? x2 ) + N + 3 + 2k N p p ) N (x1 ? y1) + (x1 ? y1) ? 2 (y2 ? x2) ? N ? 3 = 2k N p ) (x1 ? y1) + (x1p? y1) ? p2 (y2 ? x2) ? 1 ? p3 = 2k N N N N Now, we have p p 1 ? N (x1 ? y1) N ? 1
Access and alignment of data
p1
N
?1
xp 1?y1 ) N
(
1 ? pN 1
2 ( pN ? 1) ? pN (x ? y ) 2 (1 ? pN ) p p p Therefore N ? 3 2k N N ? pN ? 1 1
2
1
1
1
6
3 ? 1 k 1 ? 1p ? 3 )? p 2 2 N N 2 N 2 p p p Assume ? pN ? ?1. Then ?3 ? N ?2 N , so that N 3. p p Assume ? pN ? N 1. Then N ?p6? N 2N , so that N + N +6 0. This is a second degree equation of N which is always false. We have to study two cases : rst, 2N = 8, which allow thepvalues ?1 for k, and 2N 32, which implies k = 0 (since ? pN ? N 0 ) N 3). Assume 2N = 8. Then, (x ? y ) + xp1 ?Ny1 ? pN (y ? x ) ? 1 ? pN = p 2k N ) 3 (x ? y ) = ?1, which is impossible. Thus, we don't have con ict-free access in the pcase 2N = 8. Assume 2N 32, i.e. N 4. Then, if (x ? y ) + xp1 ?Ny1 ? pN (y ? x ) ? p 1 ? pN = 2k N then k = 0, and we obtain : 3
1 2
2
1 2
3
1
2
1
1
1
1 2 (
1
2
3
)
2
2
3
2
1
1
1
(
)
2
2
2
3
(x1 ? y1 ) + (x1p? y1 ) = p2 (y2 ? x2 ) + 1 + p3
N
p
N
N
? 1. We have 0 < pN (y ? x ) < 1, and then, from previous equation : x ? y = 1 and y ? x = ?1, which is
First, assume 1 (y ? x ) 2
2
N
2
2
1
1
2
inconsistent with assumptions made on this subsection.
2
2
2
p N (y ? x ) pN ? 1. It follows x ? y = 2 and Now, assume p y ? x = ? ( N + 1), which is inconsistent with assumptions made on 2
2
1 2
2
2
2
1
1
this subsection.
Thus the con ict-free access condition is veri ed if the square block elements are one in Al and the other one in Ar 2. Theorem 4 The twop basis skewing scheme ensures con ict-free access on M M square blocks (M N ), for (2 N ? 1) (2 N ? 1) maximal size p matrix. Proof. Let B an M M square block of matrix A, where M N , and ax1 ;x2 and ay1 ;y2 two elements of B . If these two elements are in the same submatrix, by lemma 1, there can't be memory con ict while accessing them. If they are one in Al and the other one in Ar , by lemma 2, we won't have any memory contention. The con ict-free access on any elements of B is ensured 2.
4. An alignment network for this skewing scheme Assume we have 2 N memory modules, where N = 2 n . We use a multistage 2
interconnection network, called MINNUS (Multistage Interconnection Network Non Uniformly Structured) with log2 N +1 identical stages (consisting of a perfect
Jorda, Mzoughi, Litaize i
Spr
i
i i
I2n ... I1 I0 i
i i
O2n ... O1 O0
i
i i
S2n... S1 S0 Shuffle exchange sub-network
Omega sub-network
Output inverse shuffle
Figure 1: The network used for 2N = 8 shue followed by N2 switching elements), followed by an inverse shue at the output. It is made by two sub-networks : a shue exchange, followed by an Omega whose outputs are inverse-shued. An example is given for 2N = 8. We note O2i n O1i O0i the binary representation of output i, S2i n S1i S0i the binary representation of corresponding Omega sub-network output of output i (this is to say S2i n S1i S0i = O2i n?1 O0i O2i n ), I2i n I1i I0i the binary representation of the input to be produced on output i, and Spri the pre-alignment stage switch output of data to be produced on output i. 4.1. Alignment of rows
Theorem 5 The MINNUS network is able to align rows stored by the two-basis skewing scheme. Proof. We can, rst, observe that con ict on a switch of stage 2n +2 is impossible, since there is only one data to produce on a speci ed output.
Con ict on stage 2n + 1 We will have a con ict on a switch of this stage if the
data to be produced on output i and the data to be produced on output i + N are the two inputs of this switch. After this switch, we have the following stage output :
Spri S i n S i for element of column i Spri N S i n N S i N for element of column i + N 2
+
1
+ 2
+ 1
Now, since N = 22n , the binary representation of output i+N is O2i+nN O2i n?1 O1i O0i , and then, Ski+N = Ski for 1 k 2n. Thus, the 2n + 1 stage outputs of column elements i and i + N are dierent if and only if Spri+N = Spri .
Access and alignment of data
Con ict on stage 2n For a switch of this stage, the following column elements are concerned : i; i + 1; i + N; i + N + 1, with 0 i N ? 1. The binary
representations of the 2n stage outputs are : I0i Spri S2i n S2i for element of column i i+1 I0i+1Spri+1S2i+1 n S2 for element of column i + 1 I0i+N Spri+N S2i+n N S2i+N for element of column i + N I0i+N +1Spri+N +1S2i+n N +1 S2i+N +1 for element of column i + N + 1 We are going to show there is no con ict on switches of this stage if for all p like 2p 2N ? 2 we have 8i 2 [21 p; 21 p] Spri+20 = Spri . 1. con icts between i or i +1 and i + N or i + N +1 : since l;ip= (2n +1) l +2i, l;i+1 = (2n +1) l +2 i +2, l;i+N = (2n +1) l +2i + N +3 and p l;i+N +1 = (2n+1)l+2i+ N +5, we have I0i = I0i+1 = I0i+N = I0i+N +1. Thus, 2n stage outputs of i or i +1 and i + N or i + N +1 are dierent (this is to say that the switches of stage 2n used by i and i + 1 on the one hand, and i + N and i + N + 1 on the other hand are dierent). 2. con icts between i and i +1 or i + N and i + N +1 : since both of these cases are similar, we will only give the proof for i and i + 1. If S1i = 0 then S1i+1 = 1 and I0i Spri S2i n S2 i = I0i+1Spri+1 S2i+1 n S2i + 1, except i +1 i i i +1 i if Spr = Spr . If S1 = 1 then S2 6= S2 and then I0i Spri S2i n S2 i 6= I0i+1Spri+1S2i+1 n S2 i + 1. Con ict on stage 2n ? k (1 k 2n ? 1) Assume that, for stages 2n?q (0 q k ? 1), we have the following properties veri ed : 1. 8i 2 [0; N ? 1] like 2q+1 p i 2q+1 p + 2q ? 1, we have no con ict on switches of this stage, and this is to say Spri+2k = Spri and Spri+N 6= Spri+N +2k . 2. 8i; j 2 [2q+1 p; 2q+1 (p + 1) ? 1], there exists (i0; : : :; ir) like i0 = i, ir = j and 8 2 [0; r[ Spri+1 = Spri . 3. 8i 2 [2q+1 pi ; 2q+1 (pi + 1) ? 1]; 8j 2 [2q+1 pj ; 2q+1 (pj + 1) ? 1] (with pi 6= pj ), there are no conditions between Spri and Sprj a . For a switch of the stage 2n?k, the following column elements are concerned : i; i + 1; : : :; i + 2k+1 ? 1; i + N; i + N + 1; : : :; i + N + 2k+1 ? 1. Since the binary representation of 2n ? k stage output for i is Iki I0i Spri S2i n Ski +2 , only column element i + 2k can produce a con ict with column element i. Thus, 8i 2 [2k+1 p; 2k+1 p +2k ? 1] (with p like 2k+1 p +2k ? 1 2N ? 2), we have Spri+2k = Spri implies no con icts on switches of this stage. We must show, now, that there is no contradiction between new conditions due to the stage 2n ? k, and previous ones. Assume 2k+1 p i 2k+1 p + 2k ? 1, and i0 = 2k+1 p. There exists (i0; : : :; ir) (with ir = i) like 8 2 [0; r[ Spri 6= Spri+1 . Then 8 2 [0; r[ Spri+2k +1 = Spri +2k . Since i0 and a i.e. S i pr
j are independent and Spr
Jorda, Mzoughi, Litaize
i0 + 2k are independents until this stage, we have Spri1 = Spri0 , Spri0+2k = Spri0 and Spri1 +2k = Spri0 +2k implies no contradictions between Spri1 and Spri1 +2k . By recurrence on i, we have no contradictions between new conditions and
previous ones. We have, now, to show the two properties used to make a recurrence on the stages. First, let i; j 2 [2k+1 p; 2k+1 (p + 1) ? 1], and j ? i 2k (if not, the propertie is true by hypothesis). There exists (i0; : : :; ir ) like i0 = i, ir = j ? 2k and 8 2 [0; r[ Spri 6= Spri+1. Then, (i0; : : :; ir; ir+1) (with ir+1 = j ) veri es 8 2 [0; r + 1[ Spri+1 = Spri . Finally, we must prove that 8i 2 [2k+1 pi ; 2k+1 (pi + 1) ? 1]; 8j 2 [2k+1 pj ; 2k+1 (pj + 1) ? 1] (with pi 6= pj ), there are no conditions between Spri and Sprj . There were, previously (for stages 2n ? q; q > k), no conditions between Spri and Sprj . Since we have add conditions only inside [2k+1 p; 2k+1 (p + 1) ? 1] blocks, we still have no conditions between elements of such dierent blocks. By recurrence on the number of stage, we prove that this alignment network is able to produce permutations required for rows alignment (since these results are available for the rst stage, the conditions on Spri (for 0 i 2N ? 2) can be applied) 2. 4.2. Alignment of columns Theorem 6 The MINNUS network is able to align columns stored by the twobasis skewing scheme. Proof. We can, rst, observe that con ict on a switch of stage 2n +2 is impossible, since there is only one data to produce on a speci ed output. Con ict on stage 2n + 1 We will have a con ict on a switch of this stage if the data to be produced on output i and the data to be produced on output i + N are the two inputs of this switch. After this switch, we have the following stage output : Spri S2i n S1i for element of column i Spri+N S2i+n N S1i+N for element of column i + N
Now, since N = 22n , the binary representation of output i+N is O2i+nN O2i n?1 O1i O0i , and then, Ski+N = Ski for 1 k 2n. Thus, the 2n + 1 stage outputs of column elements i and i + N are dierent if and only if Spri+N = Spri . Since i+N;c ? i;c = 22n, the inputs corresponding to outputs i and i + N are connected to the same switch of rst stage. Then, this condition is only ensuring no con icts on the rst stage (that is to say switch elements must be con gured in direct or crossed position). Con ict on stage 2n ? k (0 k 2n ? 1) For a switch of the stage 2n ? k, the following row elements are concerned : i; i + 1; : : :; i + 2k+1 ? 1; i + N; i + N + 1; : : :; i + N + 2k+1 ? 1. Since the binary representation of 2n ? k stage output for i is Iki I0i Spri S2i n Ski +2 , only row element i + N could produce a con ict with row element i. Since Spri+N = Spri , we have no con icts possible on switches of this stage.
Access and alignment of data
By recurrence on the number of stage, we prove that this alignment network is able to produce permutations required for columns alignment 2. 4.3. Alignment of forward diagonal Theorem 7 The MINNUS network is able to align forward diagonal stored by the two-basis skewing scheme. Proof. We can, rst, observe that con ict on a switch of stage 2n +2 is impossible, since there is only one data to produce on a speci ed output. Con ict on stage 2n + 1 We will have a con ict on this stage if the outputs i and i + N are connected to the same switch in this stage. Since the binary representation of this stage output for output i is Spri S2i n S1i , and because 8 2 [1; 2n] Si = Si+N , we must have Spri+N = Spri to ensure con ict-free alignment of data in this stage. Con ict on stage 2n ? k (0 k 2n ? 1) The outputs concerned on this stage are i; i + 1; : : :; i + 2k+1 ? 1; i + N; i + N + 1; : : :; i + N + 2k+1 ? 1. The binary representation of 2n ? k stage output for i is Iki I0i Spri S2i n Ski +2 . We must rst know which concerned outputs are able to be in con ict in this stage. Two outputs i and j can be in con ict if Iki I0i = Ikj I0j , that is to say i;i 2k+1 j;j . Assume 0 ; 2k+1 ? 1. We have i+;i+ 2k+1 i+ ;i+ if and only if (2n +3) ( ? ) = x 2k+1; since gcd ((2n + 3); 2k+1) = 1, this is possible only for the trivial case = . On the other hand, we have i+N +;i+N + 2k+1 i+ ;i+ if and only if (2n +3)(? ) 2k+1 22n +2n +3
) (2n + 3) x k+1 2 n + 2n + 3 x = ? ) (2n + 3) y = 2k (p ? 2 n?k? ) y = x ? 1; p 2 Z Since ?2k y 2k ? 2, we have two possibilities : 1. y = ?1 : in this case, = , and we have solve this problem on stage 2n. 2. y = 0 : we obtain = + 1. 2
+1
+1
2
2
1
+1
Thus, in order to ensure correct alignment of forward diagonal datas, we must have Spri+N = Spri+1 and Spri+N = Spri . It is easy to show that recurrence on stage number do not create con ict on conditions, and that these conditions are allowed by the distributions of concerned outputs on inputs of the networks (since the case of stage 2n ? k is available for k = 2n ? 1) 2.
5. Conclusion
The skewing scheme that we introduce uses the advantages of Budnick and Kuck principles, and also of Lawrie without the drawbacks (prime number of memory banks and crossbar network for Budnik and kuck, under utilization of memory banks for Lawrie). In fact, the idea is based on a part of study of the one proposed by Lawrie. It allows, for a system including N memory modules, con ict-free access on various slices of data for (N ? 1) (N ? 1) matrix. This
Jorda, Mzoughi, Litaize
skewing scheme is characterized by both ecient use of the number of memory banks, and simplicity of implementation. We reviewed some networks from the literature which could provide alignment and we suggest a new network based on the concatenation of the two networks : shue exchange and output inverse shued Omega network.
Acknowledgment
The authors are indebted to E. Contensou for many discussions on this subject.
References 1. P. Budnik and D. J. Kuck, \The organization and use of parallel memories", IEEE Trans. Comput., vol. C-20, pp. 1566-1569, Dec. 71. 2. D. H. Lawrie, \Access and alignment of data in an array processor", IEEE Trans. Comput., vol. C-24, pp. 1145-1155, Dec. 75. 3. D. T. Harper III and R. Jump, \Vector access performance in parallel memories using a skewed storage scheme", IEEE Trans. Comput., vol. C-36, pp. 14401449, Dec. 87. 4. H. A. G. Wijsho and J. Van Leeuwen, \The structure of periodic storage schemes for parallel memories", IEEE Trans. Comput., vol. C-34, pp. 501-505, June 85. 5. H. D. Shapiro, \Theoretical limitations on the ecient use of parallel memories", IEEE Trans. Comput., vol C-27, pp. 421-428, May 78. 6. De-lei Lee, \Scrambled storage for parallel memory systems", The 15th Annual International Symposium on Computer Architecture, Vol. 16, Number 2, pp 232{239, May 1988. 7. Shlomo Weiss, \An aperiodic storage scheme to reduce memory con icts in vector processors", The 16th Annual International Symposium on Computer Architecture, pp 380{386, 1989. 8. K. E. Blatcher, \The ip network in STARAN", 1976 International Conference on Parallel Processing, pp. 65-71, Aug. 1976. 9. M. C. Pease III, \The indirect binary n-cube microprocessor array", IEEE Trans. Comput., vol C-26, pp. 458-473, May 77. 10. S. D. Kaushik, S. Sharma, C.-H. Huang, J. R. Johnson, R. W. Johnson, and P. Sadayappan, \An algebraic theory for modeling multistage interconnection networks", Technical Research Report OSU-CISRC-1/92-TR-4, The Ohio State University, Janv. 92. 11. Krishnan Padmanabhan and D. H. Lawrie, \A class of redundant path multistage interconnection networks", IEEE Trans. Comput., vol. C-32, pp. 1099-1108, Dec. 83. 12. D. J. Kuck, \Illiac IV software and application programming", IEEE Trans. Comput., vol. C-17, pp. 758-770, 1968.