Apr 4, 1994 - Generating Indexing Functions of. Regularly Sparse Arrays for Array Compilers. Scott Thibault, Lenore Mullin, Matt Insall. CSC-94-08. April 4 ...
Generating Indexing Functions of Regularly Sparse Arrays for Array Compilers Scott Thibault, Lenore Mullin, Matt Insall CSC-94-08
April 4, 1994
Department of Computer Science University of Missouri-Rolla Rolla, Missouri 65401
Generating Indexing Functions of Regularly Sparse Arrays for Array Compilers Scott Thibault, Lenore Mullin, Matt Insall April 11, 1994 Abstract There are many applications involving arrays that contain non-zero components in regular geometric partitions. These include triangular, diagonal, tridiagonal, banded, etc. When computing with this type of arrays, they are usually stored in a packed form and computations are performed with only the non-zero components. This packed form requires an indexing function that maps an index of the array to an index of the packed lexico-graphically stored array. This paper presents a method of describing regular partitions and of automatically generating an indexing function from that description. These methods enable an array compiler to compile array operations on these type of arrays in an ecient manner.
1 Introduction Many applications involve arrays that have a certain percentage of zero components that makes it more ecient to store arrays in a packed form. The purpose of the methods presented in this paper are to enable a compiler to compile array operations involving regularly partitioned arrays in a transparent way to the programmer. In order to compile ecient code for these arrays, they have to be stored in a packed form (zero components are not stored) where computation is done only with the stored components. The rst step is to develop a method of describing the partitions in a way that can be used as input to the compiler. Next, a method must be developed for the compiler that uses this description to generate an indexing function for the array. Some systems have been developed that store a list of elements to represent only non-zero components of arrays [JM91]. This may be the only alternative for randomly located non-zero components, but for arrays with non-zero component regularly located this method is not ecient, thus the motivation for using the methods presented here. Often lists of elements are not used for these arrays, instead the programmer is required to construct an indexing function by hand, and custom code all operations on the arrays to use this indexing function. If this task can be incorporated into a compiler to alleviate the work of the programmer then the idea is worth pursuing. Another diculty with the list of elements representation is performing parallel computation. This is a dicult problem in general for arrays represented in a packed form [ZR93]. Future work with the methods developed here could apply to this type of parallel computation.
2 Specifying Regularity The method of describing the regularity has a large impact on what, and how descriptions can be compiled. For this reason, the regularity is speci ed separately for each dimension. This makes the use of short 2
Figure 1: An example map of a regularly partitioned array. Spots marked with an indicate a non-zero entry in the array. indexes (indexing a row, or plane, ...) much more elegant for compilation. The example in gure 1 has non-zero components in regular partitions. If we want to describe the regularity of the partitions in the rst dimension, we would indicate that there are blocks of three elements occurring every forth element. The simplest type of regular partitions to describe is a blocked array. This type of array has constant size blocks of non-zeros that occur at constant intervals. We can describe this regularity by saying the blocks have a width W , a period P , and an oset O. The period, P partitions the dimension into some number of P size blocks. Each of these blocks contain W contiguous non-zero elements, where the rst element is at an oset O within the block. This description has a limited scope and will only be useful in describing simple checker board type regularity. This method does however have the advantage that it is simple to implement; for this reason we start with this method as a base to build upon. This method is also a natural generalization of that for dense arrays 1 . The section describing the indexing function will show how the indexing function for dense arrays is naturally generalized. A basic extension to this simple blocking speci cation is a hierarchical blocking speci cation. In this case we still have blocks that occur with a constant period, however the blocks do not need to be dense, they could be speci ed by yet another blocking speci cation. In order to support the idea of leveled blocking, the blocking speci cation is changed a little. The speci cation will now consist of a block size B , the number of blocks that contains non-zero elements N , and the rst block that contains non-zero elements O. The meaning of this speci cation is, there are N contiguous blocks, starting at block O that contain non-zero elements, where each block contains B elements. Only one block of non-zero elements is described. The simple blocking previously described will now require a two level blocking speci cation. For example the following array can be described using a simple blocking scheme [110001100011000] To describe this blocking using the hierarchical method, the array is rst partitioned into blocks of 5 elements with each block containing non-zero elements (O = 0; N = 3; B = 5). The next level speci cation abstractly gives a lower level or more detailed description by describing the partitions of each of the blocks in the last level. In this example the next level speci cation would be (O = 0; N = 2; B = 1). Although this two level description is not more powerful than the simple blocking speci cation, speci cations with 3 or more levels can not be described by a simple blocking scheme. The hierarchical blocking speci cation is a simple extension that greatly increases the scope of regularity that can be described and uses the same method of implementation as the simple blocking speci cation. 1
A dense array is one where every component of the array is stored and used for computation
3
l00 l01
x 0 1 x x 0 2 x x x 3 x 0 x x 1 12 x x x 3 x 0 x 2 12 x x xx 3
x x x x x x
x x x x x x x x x
x x x
x x x x x x x x x
x x x x x x x x x
Figure 2: A partitioned view of the example array. We will make one more extension to the speci cation that will still utilize the same method of implementation as the simple and hierarchical blocking speci cations. This last speci cation is the hierarchical variable blocking. We still have a hierarchy of blocking speci cations only now we will use variable blocking instead of simple blocking. Simply put, we can now write the oset O, and the number of elements N as a function of variables instead of constants. The variables that can be used are, the position of the particular block being described, in the overall hierarchy. For example, if the top level speci cation is (O = 0; N = 3; B = 5) then the next level describes the detail of the three blocks of 5 elements. This next level description can be a function of it's position in the three, higher level, blocks. For example the array2 [10000j11000j11100] could be speci ed with the top level speci cation (O = 0; N = 3; B = 5) and the next level speci cation (O = 0; N = l00 + 1; B = 1). l00 represents which block, of the blocks in top level speci cation, is being described. The top level description indicates there is three blocks of ve elements and each of these three blocks is described by the next level speci cation. In order to determine what one of these blocks looks like, we examine the next level speci cation. The second level can be interpreted for a particular block of the three by determining the value of l00. For example, the second block is in position 1 (indexing from 0) in the partitioning imposed by the top level speci cation. Thus the second block is described by the speci cation (O=0,N=2,B=1), where the second level speci cation has been interpreted for the second block by substituting the value of l00 into the speci cation. The resulting speci cation indicates the block is partitioned into blocks of one element and only the rst two are non-zero, as shown in the array. More speci cally a variable speci cation is one where the O and N can be functions represented by expressions with the following restrictions. The variable lxy represents the location or which block is being considered from the yth level speci cation of the xth dimension. The top level speci cation is level 0 and each subsequent level is 1 more than the previous. Thus from the last example l00 was used to represent the location in the top level speci cation of the 0th dimension. Similar subscripts are used to denote the dierent expressions, and values of O; N; and B with lowercase letters o; n and b. Any expression for O, or N can contain any of the operators +; ?; ; and modulo. An expression for oxy , or nxy may only refer to the variables li j ; 8i; j ji = x; j < y and li j ; 8i; j ji < x. This restriction is necessary and makes sense. 2
The bars are included only to aid in the visual inspection, i.e. they have no meaning with regard to the array
4
The generalization from constant functions to variable functions allows us to still use the same method for implementation only the computations are more complicated. For notation let a variable blocking be denoted by the 3-tuple: (O; N; B ). Let a hierarchical speci cation be denoted by a list of variable blockings enclosed in brackets, listed from highest level to lowest: (O0; N 0; B 0); (O1; N 1; B 1); :::. Finally the hierarchical variable blocking speci cation for an array is a list of hierarchical variable speci cations for each dimension from rst to last enclosed in parentheses. Using this notation we can describe the regular partitions depicted in our example in gure 1. ([(0; 3; 4); (0; 3; 1)]; [(0; l00 + 1; 8); (2 ? l01 ; 2l01 + 1; 1)]) The rst dimension is speci ed by the speci cation [(0; 3; 4); (0; 3; 1)]. This indicates which rows (elements of the rst dimension) are not all zero. The rst level partitions the rows into blocks of four rows where each of the three blocks contain a non-zero row. Each of these blocks is then further described by the next level speci cation. This level indicates that the rst three rows of each block are non-zero. Notice that the non-zero components in each of these rows depends on which of the three blocks, of the top level, it's in as well as it's position in that block. Thus the speci cation of the second dimension will depend of the variables l00, and l01 , where l00 is which block and l01 is the position in that block. These variables are shown in gure 2. The rst level of the second dimension is speci ed by (0; l00 + 1; 8). This partitions the rows into three blocks of eight components. Figure 2 shows the gure 1 the partitons of 4x8 that result from the top level speci cation of each dimension. The rst 1,2, or 3 block(s) will contain non-zero components, depending on the value of l00 (0,1, or 2). This results in the lower triangular appearance of the gure. The next level, (2 ? l01; 2l01 + 1; 1) describes the components in each of these blocks of eight. Depending on l01 there are 1,3, or 5 non-zero component(s) and they are oset in the block by 2,1, or 0. This results in the pyramid shape of the 4x8 sub-blocks in the gure. So this last method, the hierarchical variable blocking speci cation, will be the one used to specify the regular partitions of a particular array. The next section describes the method of generating an indexing function for an array with regular partitions given by a hierarchical variable blocking speci cation. The section will show that generating the indexing function can be done with simple generalizations of the well known indexing function for dense arrays [Mul88]. Although we have extended the speci cation to allow more complex descriptions the generated indexing function will only be as complicated as your particular speci cation. In other words if you specify simply blocked partitions then the indexing will be that simple even though the speci cation language provides for more complicated sparseness.
3 Generating the Indexing Function In this section a general de nition of an indexing function will be given in terms of the speci cation parameters. First consider the classical indexing function for dense arrays. Given an array with dimension d and si , denoting the number of elements in the ith dimension where 0 i < d, we could de ne an indexing function for this dense array in the following way. The following de nitions will be used throughout this section. The term component will be used to refer to the smallest object in an array. For example, a component of an array of integers would be one integer. An element of an array refers to an object in a particular dimension which may be a sub-array or a component. For example, the ith element of the rst dimension of a 3x5x2 array is a sub-array of size 5x2. Given an index into an array, the address will be the location of the indexed component in the linear addressed storage of the array. Let ~i be the index input to the indexing function. The j th index is ~i[j ] and indexes the j th dimension. Let Tx represent the number of components in dimension x. For example, for a 2 dimensional array with 5 rows and 2 columns, T1 = 3 and T0 = 15. Tx can be written 5
Tx
8 s ?1 x > < XT
=> :
if x < d if x = d
x+1
i=0
1
where sx is the number of elements in the ith dimension. Let Sx be the number of components to skip to get to the rst component of the ~i[x]th element in dimension x. Sx is de ned to be Sx
=
? X
~i[x] 1
Tx+1
i=0
So nally, if we let be the indexing function we can write (~i; s) =
? X
d 1
Si
i=0
The indexing function is dependent only on the index vector ~i and the regular partition speci cation s. This is the well known indexing function for dense arrays. The only dierence might be that since the summations are summations of constants in the case of a dense array the summations can be reduced to a product. However in our case this will not always involve constants so the summation form is used. Now this indexing function will be extended to handle each extension that was made to the speci cation in the last section. The indexing function is rst extended to allow the use of simple blocking in each dimension. To do this, Tx ; and Sx have to be rede ned. Assume that the simple blocking is speci ed using the hierarchical notation restricted to two levels. Let Tx
8 n ?1 x0 > < X T0
=> :
and
1
if x < d if x = d
x
i=0
nx 1 ?1 X 0 Tx+1 Tx = i=0
The expression for Tx0 represents the number of components in a block described by the second level speci cation. Since Tx0 represents the number of components in a block and there are nx 0 blocks then Tx represents the total number of components in dimension x. Let ixy denote the location of the component speci ed by ~i in the blocks of speci cation of level y and dimension x. $
[ ] ix 0 = b ~i x
%
x0
and
ix 1
With ixy de ned, Sx can be de ned Sx
=
= ~i mod bx 0
ix 0 X
Tx0
lx 0 =ox 0
+
ix 1 X lx 1 =ox 1
6
Tx+1
So nally the indexing function can again be written (~i; s) =
? X
d 1
Si
i=0
After taking the rst step from the indexing function for dense arrays to that of a packed array with simple blocking, the next step will be a straight forward generalization. So now a more generalized indexing function is de ned which will accommodate the next extension, the hierarchical blocking speci cation. Let hx be the number of levels of hierarchy in dimension x. First de ne Txy as
Txy
8 nxy ?1 X > > > Tx (y+1) > > > < i=0 xy ?1 = > nX > T(x+1) 0 > > > > : i=0
if x < d and y < nx ? 1 if x < d and y = nx ? 1
1
(1)
if x = d
In this de nition Tx 0 has a similar meaning to the previous de nition of Tx and Txy (y > 0) have a similar meaning to Tx0 in the previous de nitions. The dierence is that in the previous case there was a restriction of two levels. For the remaining de nitions Sx and ixy have similar meanings to the previous de nitions. ixy
where
i0xy =
Sxy
8 > > > > >
> > > > :
0 i
=
? mod
[ ]
ixy 1
bx y?1
if y > 0 if y = 0
Tx (y+1)
+ Sx (y+1) if y < nx ? 1
T(x+1) 0
if y = nx ? 1
lxy =oxy ixy 1
? X
(2)
xy
bxy
xy 1
~i x
? X
0 i
lxy =oxy
(3)
(4)
With these new de nitions the indexing function is again written (~i; s) =
? X
d 1
Si 0
i=0
The next extension made to the speci cation is to allow variable blocking to be used. Extending the indexing function once again to accommodate the variable blocking is even easier than the last extension. The way the indexing function has been de ned with summations, if ox y and nx y are changed to be expressions instead of constants, the summations are still correct. Some of the resulting expressions will still reference the varaibles lxy , these can be replaced with the the variables ixy , since ixy represent the location of the particular index the indexing function is being applied to. We also make one more simple extension to allow the application of the indexing function to a short index. A short index is one that does not specify an index for every dimension but only for the rst n. In this case the indexing function should return the address of the rst component of the element denoted by the partial index. To accomplish this the indexing function is de ned ( )=
X?1
(~i)
~i; s
i=0
7
Si 0
(5)
where ~i is the length of the indexing vector. Note also that T~i 0 is the number of components in the element denoted by the partial index. The nal de nition of a complete indexing function for packed arrays speci ed by a hierarchical variable blocking speci cation is given by equations 1,2,3,4, and 5. There are two things to notice about the limitations on the speci cation and the mathematical formulation. First, the limitation of which variables, lxy can be used in an expression for oxy or nxy . If these restrictions were to be relaxed the summations would no longer be correct. Second, bxy must still be a constant in variable blocking. To see the motivation for this consider how one might compute ixy . While this may be possible, it is not obvious at this time how that might be done. The last question to answer might be, is it reasonable to execute this complicated expression of summations in a run-time environment? Simply put, no. However, due to the nature and restrictions of the speci cation language, all of the summations are guaranteed to be eliminated by substituting an equivalent expression. The resulting expression will only be as complicated as the level of complication in the regularity being described. In fact, a speci cation of a dense array should reduce to the exact expression that is used in today's common compilers that support arrays [MT93].
4 Conclusion The purpose of this paper was to develop a means of describing regularly partitioned arrays and a method of developing an indexing function from such a description. The hierarchical variable blocking speci cation was described. A method of nding an indexing function for this type of speci cation was developed from the base of the classical indexing function for dense arrays. It was mentioned that the resulting indexing function could be reduced to a simple expression without summations. These methods are amenable to machine implementation and have been implemented for later incorporation in an currently operating array reduction compiler[MT93].
References [JM91] M. Jenkins and L. Mullin. A comparrison of array theory and a mathematics of arrays. In Arrays, Functional Languages, and Parallel Systems. Kluwer Academic Publishers, 1991. [MT93] L. Mullin and S. Thibault. The psi compiler project: Backend to massively parallel scienti c programming languages. In Fourth International Workshop on Compilers for Parallel Computers, 1993. [Mul88] L. M. R. Mullin. A Mathematics of Arrays. Ph.D. dissertation, Syracuse University, December 1988. [ZR93] E. L. Zapata and L. F. Romers. Data distributions for sparse matrix vector multiplication. In Fourth International Workshop on Compilers for Parallel Computers, December 1993.
8