How to Lay Out Arrays Spared by Rows and Columns

4 downloads 0 Views 269KB Size Report
spare rows and columns, apparently without considering that many of the ..... matching. Hence the number of edges in any maximal majorization equals m. ... of the adjacency matrix contains a one then the one must appear in the first row, else ...
How to Lay Out Arrays Spared by Rows and Columns1 Laurence E. LaForge Embry-Riddle Aeronautical University P.O. Box 623, NAS Fallon, Nevada 89407 [email protected] Abstract. Perhaps the most common fault tolerant architecture configures a nominal t × at array using bt dedicated spare rows and ct dedicated spare columns. We counterexample an outstanding conjecture by constructively showing how dedicated sparing can be laid out in area proportional to the number of elements. However, we find that dedicated sparing is more costly than homogeneous extraction of a t × at array from a (1+b)t × (a+c)t array. i) In the presence of failures whose distribution is worst-case, iid, or clustered, the fault tolerance of either architecture is Θ(t-1 ). ii) At constant proportion of failures, the area of homogeneous arrays is Θ(exp t), while that of dedicated sparing is Ω(exp t). iii) The worst-case wirelength of either architecture is Θ(ct). iv) The best-case wirelength Θ(1) of homogeneous sparing is less than that Θ(t) of dedicated sparing. v) Probabilistically, homogeneous sparing has O(log t) wirelength, less than that Θ(t) of dedicated sparing. For large t, moreover, row-column sparing is more costly than local sparing. Index terms: configuration architectures, fault tolerance, row-column sparing, systolic arrays 1

Introduction

Arrays of processors can enhance the speed of computation [1]. Large scale arrays of charge-coupled sensors provide images of scenes in varying portions of the electromagnetic spectrum [2]. Since the 1980’s, integrated circuit manufacturers have sought to increase yield by configuring memory arrays using spare rows and columns [3], [4]. Despite an extensive literature, two questions about arrays with spare rows and columns appear heretofore unresolved: 1) How to minimize the area of the layout? 2) How to minimize the maximum wirelength? This paper answers these questions, consolidates related results, and describes the implications for the designer. A configuration architecture is a set of elements and switches connected by wires. Dedicated and homogeneous sparing of arrays constitute different configuration architectures, switching rules for which are prescribed in Figure 1. Also known as reducible or degradable, homogeneous arrays are suggested as early as the 1986 paper by Koren and Pradhan [5]. LaForge [6] compares the worst-case fault tolerance of dedicated and homogeneous sparing.

t

X X

ct

(a+c)t X X

X X X X X

X

X = fault not selected

X X X X

(1+b)t

at

bt

A) Adedicated (t;a,b,c): 1. Any at out of (a+c)t columns may be selected from the t × at nominal array and ct spare columns. 2. Any t out of (1+b)t rows may be selected from the t × at nominal array and bt spare rows. 3. The columns of the spare rows may be connected, using at disjoint wires, to the columns selected by Rule 1.

X

B) Ahomog (t;a,b,c): Any row or column may be bypassed. A fault cover is equivalent to a t × at matrix minor without X’s.

Figure 1: Switching rules and fault covers for t × at = 4 x 5 array realized by A) dedicated spare rows and columns and B) extraction from a homogeneous array. a = 1.2, b = 0.5 = c. 1.

Expanded version of article by the same title, appearing in: Proceedings, IEEE 1997 International Conference on Innovative Systems in Silicon. Austin, TX: 8–10 Oct 1997. L. E. LaForge, H. Bolouri, D. Sciuto, and S. K. Tewksbury, eds. Los Alamitos, CA: IEEE Computer Society Press. pp. 30–40.

A configuration algorithm takes as input a list of faults, otherwise known as an instance2. The output is either “no solution” or a configuration; that is, a set of instructions indicating how switches should be set to achieve the adjacency of the target architecture. Refer to Figure 1. A fault cover – sometimes simply referred to as a cover – is a configuration that achieves the adjacency of the target architecture using good elements only. The purpose of a configuration algorithm is to find a cover. A particular combination of configuration architecture and algorithm is a configuration scheme. The task of identifying faults falls to a diagnosis scheme, about which there is an extensive literature [8]. Figure 2 depicts the relation between diagnosis and configuration. To focus on configuration we presume that faults have been identified with 100% accuracy. Diagnosis Architecture

Configuration Architecture

Target Architecture Hardware/software boundary

Syndrome

Diagnosis Algorithm

List of Faults

Configuration Algorithm

Setting of Switches

Figure 2: Interaction between architectures and algorithms for fault tolerance by configuration. We denote dedicated sparing as Adedicated (t;a,b,c), and indicate homogeneous sparing by Ahomog (t;a,b,c). The integer t is our scale parameter, while a, b, and c specify the geometry of the target and configuration architectures; a > 1 is a constant; c may be held at zero but, if not, c is bounded from below by a positive constant and the ratio b/c is a constant d. Since we do not build fractional rows or columns, the values at, bt, and ct must be integers. Rule-based representations, such as those of Figure 1, are useful for determining many of the properties of Adedicated and Ahomog . We do not consider variations, such as islands [9] or forced coupling of spares [3]. Figures 3 and 4 depict layouts for Adedicated and Ahomog . For the sake of simplicity we assume that data paths are one bit wide, and do not explicate lines for controlling switches. = fault

nominal 4 × 5 array

not selected

4 × 7 homogeneous array Ahomog (4;1.2,0,0.5)

Major-Mux2 ×7 switch direction

2 × 5 homogeneous array Ahomog(4;1.2,-0.5,0) Figure 3: Layout and fault cover for Adedicated (4;1.2,0.5,0.5). Compare with Figure 1A. 2. An “instance” in this paper is the same as a “faulted array” in [7], except that we distribute faults in spares.

= fault not selected

switch settings select bypass

Figure 4: Layout and fault cover for Ahomog (4;1.2,0.5,0.5). Compare with Figure 1B. It would appear that Adedicated is the traditional, prevalent approach to achieving fault tolerant two-dimensional arrays [3], [4], [7], [10], [11], [12], [13]. By contrast, and with respect to quantitative criteria, this paper shows how Ahomog is preferable to Adedicated . 2 Fault Model Consistent with most other works, we presume that elements are either faulty or good, while switches and wires are fault free. Some recent expositions quantify as well the cost of faulty switches and broken wires [14], [15]. Our sample space consists of instances of Adedicated or Ahomog , with respect to a particular distribution of faults. Under a Bernoulli model FBernoulli (p), each element fails with independent probability p. If n is the number of elements in the architecture then 1 - 1/(4δ2n) bounds from below the probability that the proportion of faults in a Bernoulli instance differs by at most δ from the expected proportion p. For large n, that is, the Bernoulli model converges [16] to a hypergeometric model Fhypergeometric (p), whereby faults are are independent and identically distributed (iid) in a fixed proportion p. The discrete uniform model Fdiscrete (f) is similar to Fhypergeometric (p), but each instance instead contains a fixed number f of iid faults. The worst-case Fworst-case (f) and fractional worst-case Ffractional worst-case (p) are subsets of Fdiscrete (f) and Fhypergeometric (p) such that no instance has a cover. Consistent with most treatments of nonuniform faults, our clustering model Fclustering (p± ) postulates the existence of nonoverlapping regions, alternatively known as quadrats or blocks [17]. Within any quadrat fault-causing defects are Poisson with mean density D. To fit production line data, D has been modeled as a random variable whose second moment about the overall mean ˆ is proportional to D ˆ 2 . Several distributions for D have been advanced. For example, density D Pr ( D ≤ X ) =

ˆ ± r) uniform ( D

ˆ + r) min ( X, D

∫Dˆ – r

dx ----2r

Murphy [18]

(1)

X

and

Pr ( D ≤ X ) ˆ) gamma ( α, α ⁄ D

=

α ˆ α – 1 – ( αx ) ⁄ D α -------------------- x dx e ˆ αΓ(α ) D

∫0

Koren and Koren [19]

(2)

∞ α – 1 –x In the above Γ ( α ) is the gamma function ∫ x e d x and α is a constant that, in ways not 0 well understood, depends on the relation between the geometry of quadrats and circuitry [17]. We subsume (1), (2), and similar distributions by way of a truncation approximation: over all quadrats, there is a nonzero minimum mean defect density D– as well as a maximum mean defect density D+. Such a truncation appears reasonable in light of the experimental data, such as it has been reported. Clustered faults are therefore independent but are not identically distributed, and the failure probability p of an element having length l and width w may be bounded: 0


k. Refer to Figure 5. In physical terms, a majorized relation between the columns of Ahomog(t;a,b-1,0) and some at columns of Ahomog (t;a,0,c) prohibits the crossing of wires laid in a straight line between related columns. Alternatively, if ρ(i,j) = 1 then, in the adjacency matrix for ρ, i) (i,j) is the only entry in row i or column j whose value is one; ii) the submatrix to the right of and above entry (i,j) contains no ones; and iii) the submatrix to the left of and below entry (i,j) contains no ones. A majorized relation on U and V is maximal if the relation has maximal order. Lemma 2. The maximal majorizations of U and V comprise

 n  m

m-matchings of V.

Proof. By definition, any row in the adjacency matrix of a majorization contains at most a single entry whose value is one. Hence the number of edges in any majorization is no greater than m. The matrix containing m ones in the diagonal entries, (j,j): j≤m, is majorized, and represents a matching. Hence the number of edges in any maximal majorization equals m. Further, and again by definition, any majorization on m edges is an m-matching (the converse is not true). It remains to enumerate the m-matchings. Let T(n,m) be the number of maximal majorizations. We proceed by induction on n. For a basis, if n = m then the maximal majorization is the unique relation whose adjacency matrix has ones in  . Assume the claim holds for values m, …, n-1 the principal diagonal, (j,j): j≤m; T(m,m) =1=  m m and consider the class of m × n adjacency matrices for a maximal majorization. If the first column of the adjacency matrix contains a one then the one must appear in the first row, else the relation is not maximal or not a majorization. By the argument given in the preceding paragraph, exactly m-1 ones must form a majorization in rows 2 through m of the rightmost n-1 columns. By induction, there are T(n-1,m-1) such majorizations. If a one does not appear in the first column then, by the argument given in the preceding paragraph, exactly m ones must form an majorization in the rightmost n-1 columns. By induction, there are T(n-1,m) such majorizations. Subject to the initial condition T(m,m)=1, the number of maximal majorizations is therefore governed by T(n,m) = T(n-1,m)+ T(n-1,m-1), a classical recurrence whose unique solution is the binomial coefficent  mn  . ❒ We are now in a position to specify the behavior of an optimum multiplexer, Major-Mux(U,V): For any m-subset Sm⊆V, form the edge (i , j ), where j is the i th least index in Sm .

(7)

Lemma 3. The relations induced by Major-Mux are the maximal majorizations of U and V. Proof. The order of the relations induced by Major-Mux equals m; for containment, that is, it suffices to show that all induced relations are majorized. Suppose that Major-Mux induced an unmajorized relation ρ. Applying the definitions, we see that (7) would be violated: there would exist ρ(i,j) = 1 = ρ(h,k) such that a) i < h and j ≥ k; b) i ≥ h and j < k ; c) h < i and k ≥ j; or d) h ≥ i and k < j. In particular, equality in a), b), c), or d) would violate (7). For equality note that Major-Mux induces a unique relation for each of the  mn  m-subsets of V. ❒ Lemma 4. The set V is perfectly m-matched by the set of maximal majorizations of U and V. Proof. By Lemma 2, the number of maximal majorizations is the same as the number  mn  of m-subsets of V. It therefore suffices to show that any m-subset of V is contained in some maximal majorization. But this follows from Lemma 3 and from the specification (7) of Major-Mux. ❒

Lemma 5. To implement the behavior (7) of Major-Mux(U,V), it suffices that element i of U be capable of connecting to elements i through i+n-m of V, for 1 ≤ i ≤ m. Proof. Suppose there is an m-subset of V such that, consistent with rule (7), the index j of the ith column selected from V is outside of the range [i,i+n-m] ⊆ [1, n]. Since there are only i-1 columns to the left of this range and m-i elements to the right of this range, the total number of columns selected is at most m-1. But this contradicts the assumption of (7) that m elements of V are selected. Hence j ∈ [i,i+n-m]. Element i of U is therefore capable of connecting to the ith element selected from V. ❒ Using Lemma 5 we reformulate the behavior (7) of Major-Mux in practical terms. Letting U equal Ahomog (t;a,b-1,0), and setting V to Ahomog (t;a,0,c), we arrive at a parallel to (6): The ith column of Ahomog (t;a,b-1,0) can be connected to the ith column selected from Ahomog (t;a,0,c) .

(8)

Theorem 1. With minimum count at(ct+1) of edges in the connection union, Major-Mux perfectly multiplexes at-matchings between Ahomog (t;a,b-1,0) and Ahomog (t;a,0,c). Proof. The first part is by Lemmas 1 and 5. The second part follows from Lemmas 3 and 4.



The preceding lemmas and theorem motivate an implementation, Major-Muxct × (a+c)t . Between Ahomog(t;a,b-1,0) and Ahomog (t;a,0,c) make room for ct rows of switches. Locate each switch site on a grid point whose column index j corresponds to the horizontal coordinate of column j of Ahomog (t;a,0,c). Number the rows 1 ≤ i ≤ ct from bottom to top. For 1 ≤ j ≤ at place a switch at (i,j) if and only if i ≤ j ≤ i+ct. Vertically connect the element at the top of column j of Ahomog(t;a,b-1,0) to the bottom of the switch at (1, j) . If i = j or i = ct then the switch at (i,j) can connect vertically to the bottom element of column j of Ahomog (t;a,0,c), else the switch at (i,j) can connect vertically to the bottom element of the switch at (i+1,j). If i = ct then the switch at (i,j) can connect horizontally to the vertical wire extending from the bottom element of column j+1 of Ahomog (t;a,0,c), else the switch at (i,j) can connect horizontally to the vertical wire extending from the bottom of the switch at (i+1,j+1). Figure 3 illustrates Major-Mux2 × 7 with switches set, while Figure 6 shows the open-switch construction for Major-Mux3 × 7 . Major-Muxct × (a+c)t has height ( ct switches + (ct+1) units of pitch) and width approximately equal to (a+c)t times the width of an element or switch, whichever is greater. Corollary 1.1 summarizes the preceding. Corollary 1.1. The combination of Ahomog(t;a,b-1,0), Ahomog (t;a,0,c), and Major-Muxct × (a+c)t implements Adedicated (t;a,b,c) in area O( (a+ab+c+c2 )t2 )≡O(ct)2 .

row i

3 2 1

2

3

4

5

6

7 height ≈ 2ct+1

1

column j

Figure 6: Open switch construction for Major-Muxct × (a+c)t, at = 4, (a+c)t = 7.

Theorem 2. Within any multiplexer for Adedicated (t;a,b,c), the number of terminal-to-terminal wires is at least at(ct+1); the number of switches is at least at(ct+1)/2. Proof. The relations induced by any such multiplexer must span all matchings between the at columns of the spare rows and the (a+c)t nominal and spare columns. By Lemma 1, there are at least at(ct+1) edges in the connection union of these relations. For each edge in the connection union choose a representative path. Each representative path must have at least one distinct terminal-toterminal wire, where, for c > 0, at least one of the terminals is a switch. The number of wires is therefore at least the number at(ct+1) of distinct terminal-to-terminal wires. Three such wires cannot all be connected to the same switch, else they would not be representative of three paths. Therefore, there is at least one switch for every two distinct terminal-to-terminal wires, each of which is representative of a unique path. That is, the number of switches is at least ½ at(ct+1). ❒ Corollary 2.1. The number 2act2+at of terminal-to-terminal wires and count act2 of switches in Major-Muxct × (a+c)t are each within a factor of 2 of best possible. Corollary 2.2. Any multiplexer for Adedicated (t;a,b,c) has area Ω(act2)≡Ω(ct2 ). If c is constant then the area of Major-Muxct × (a+c)t is best possible Θ(ct)2≡Θ(ct2 ) ≡ Θ(t2 ). However, if c grows with t (which is allowed under our model) then there is a gap between the bounds of Theorem 1 and Corollary 2.2. In question is whether we can close this gap with a layout for Major-Mux whose area matches that of Corollary 2.2. As Figure 7 illustrates, the answer is “yes”. By eliminating wires and switches from the basic design of an at × (a+c)t crossbar, we arrive at an alternative, Major-Muxat × (a+c)t ; details are similar to those for Major-Muxct × (a+c)t . Major-Muxat × (a+c)t has the same width and number of switches and wires as Major-Muxct × (a+c)t , but instead has height (at switches + (at+1) units of pitch). Thus Major-Muxat × (a+c)t is more suitable whenever a < c . In practice, c is often less than a, in which case Major-Muxct × (a+c)t is preferred. Since a is a constant, on the other hand, we have Corollary 2.3. The optimal area Θ(ct2 ) multiplexer Major-Muxat × (a+c)t , in combination with Ahomog(t;a,b-1,0) and Ahomog (t;a,0,c), implements Adedicated (t;a,b,c) in optimal area Θ( (a+ab+c+)t2 )≡Θ( ct2 ). 1

row i

4 3 2 1

2

3

4

5

6

7 height ≈ 2at+1

column j

Figure 7: Open switch construction for Major-Muxat × (a+c)t, at = 4, (a+c)t = 7. In question is whether the area Θ(ct 2) of Major-Muxat × (a+c)t offsets the extra area Θ(ct)2 required by the bct2 elements that are present in the lower righthand side of Ahomog (t;a,b,c), but which are absent in Adedicated (t;a,b,c). As the next sections show, the answer is, “No, despite comparable area, the longest wire in Adedicated is longer than that contained in Ahomog .”

5.3 Wirelength versus Area, Elements versus Multiplexer If we select every (1+b)th row and every (1+a/c)th column then the configuration of Ahomog (t;a,b,c) has best case wirelength Θ(1). By contrast, the worst-case wirelength of Ahomog (t;a,b,c) is proportional to ct. To see this, note that at most bt rows and ct columns are bypassed in any t × at configuration, and this is achievable. Since b/c is a constant d, the worstcase wirelength is Θ(ct). These remarks are recorded in Table 2. In the case of Adedicated (t;a,b,c) we presume that the designer does not wish to extend the layout of the multiplexer beyond the left and right sides of the (a+c)t nominal and spare columns. We call this condition the channel width constraint. Lemma 6. Every wire of Adedicated (t;a,b,c) that traverses the multiplexer between the at columns of the spare rows and the (a+c)t nominal and spare columns has length Ω(t). This bound is achieved by Major-Muxat × (a+c)t . Proof. By the channel width constraint, the width of the multiplexer is O(ct). The height is Ω(t). Every column of any configuration makes a connection across this channel, even when no spare rows are selected. Major-Muxat × (a+c)t has height is Θ(t) ❒ Lemma 7. A multiplexer whose induced relations are the maximal majorizations minimizes the wirelength in every configuration of any multiplexer of Adedicated (t;a,b,c). Proof. Consider any multiplexer M that minimizes the wirelength of every at-matching between the columns of the spare rows and the (a+c)t columns of the nominal and spare columns. It suffices to consider only those multiplexers that are at-perfect. If the multiplexer is complete but imperfect then select, for each at-subset of the (a+c)t columns of the nominal and spare columns, a matching whose maximum wirelength is minimized. Suppose that, for terminals (i,j,h,k), the terminal-to-terminal wires (i,j) and (h,k) cross in some configuration. Since all elements are functionally identical, substituting the terminal-to-terminal wires (i,k) and (h,j), over the same tracks, achieves an equivalent matching with wirelength no greater than that originally provided. The lemma follows by applying this procedure to every matching of M. ❒ Theorem 3. Adedicated (t;a,b,c) has best case wirelength Θ(t). For any multiplexer that minimizes the maximum wirelength in Adedicated (t;a,b,c), the worst case wirelength is Θ(ct). These bounds are achieved by Major-Muxat × (a+c)t . Proof. In the best case, every wire that traverses the multiplexer goes straight, or nearly straight, across the channel. This is achievable with Major-Muxat × (a+c)t; by Lemma 6 the best case wirelength of Adedicated is therefore Θ(t). For the second part of the theorem, Lemma 7 says that we need consider only those multiplexers whose induced relations are the maximal majorizations. By Lemma 3, Major-Mux is the only such multiplexer, apart from the particulars of layout. In the worst case, Major-Mux connects two columns whose indices differ by ct; hence the worst-case horizontal length of the longest wire in any such multiplexer is Ω(ct). By construction, Major-Muxat × (a+c)t achieves this bound. Hence the worst-case wirelength is Θ(ct). ❒ Our results for Ahomog provide the worst-case wirelength Θ(ct) in the ensemble of the nominal array and spare columns ( Ahomog (t;a,0,c) ) and spare rows (Ahomog (t;a,b-1,0)) of Adedicated . Corollary 3.1. For any fault distribution: Wdedicated (t;a,b,c) ∈ Ω(t), Wdedicated (t;a,b,c)∈O(ct).

6 Interpretation of Results The preceding section shows how to lay out both Ahomog (t;a,b,c) and Adedicated (t;a,b,c) in optimal area proportional to the number of elements. Whenever both spare rows and spare columns are part of the design, moreover, Adedicated (t;a,b,c) must contain a multiplexer. Such a multiplexer must support a complete set of matchings between either the spare columns and nominal and spare rows or, as assumed in this paper, between the spare rows and the nominal and spare columns. Major-Muxct × (a+c)t provides this function with the least number of matchings  ( a +atc )t and fewest edges at(ct+1) in the connection union of these matchings. Major-Muxct × (a+c)t also minimizes the area, channel height, and maximum wirelength of any such multiplexer. Ahomog (t;a,b,c) requires no such multiplexer, but contains bct2 more elements than Adedicated (t;a,b,c). The tradeoff nevertheless favors Ahomog (t;a,b,c), as we proceed to show. For values of c bounded by constants, the area of Ahomog (t;a,b,c) is roughly the same as that occupied by Adedicated (t;a,b,c). The latter does not lend itself to regular layout, and the bounding box for our best implementation of Ahomog (t;a,b,c) is strictly smaller than that for Adedicated (t;a,b,c). Further, Ahomog (t;a,b,c) is more regular in the sense that there is no distinction between nominal and spare rows and columns. Due to the area overhead for decoders and buffers, moreover, the area efficiency for larger monolithic memory arrays tends to be better than for smaller blocks. From a quantitative perspective, Corollary 3.1 points out how the maximum wirelength of Adedicated (t;a,b,c) grows at least in proportion to t; in the best case this is much worse than the constant wirelength of Ahomog (t;a,b,c). On the other hand, in the worst case the maximum wirelengths in Adedicated (t;a,b,c) and Ahomog (t;a,b,c) are the same order Θ(ct). A disparity such as this raises the natural question, “What happens for other fault distributions?” In the sections that follow we address this question by applying the figures of merit described in section 4, in combination with results from other works. In some cases we have modified or augmented the mathematical derivations cited. We summarize our findings in Table 2. 6.1 Fault Tolerance and Configuration Heuristics for Arrays Spared by Rows and Columns The architectural and overall worst-case fault tolerance of Adedicated (t;a,b,c) equals (b+c)t; the orginal proof of LaForge [6] is amended in [14], with a lower bound independently derived by Blough and Pelc [7], [20]. As observed by LaForge [6], determining the architectural worst-case fault tolerance of Ahomog (t;a,b,c) is equivalent to solving the Problem of Zarankiewicz: “What is the greatest number of zeros in a (1+b)t × (a+c)t binary array that guarantees the existence of a t × at subarray containing ones only?” Here the zeros correspond to faulty elements, and the ones represent good elements. First posed in 1951, the complete solution to this problem has eluded researchers. LaForge [27] derives the solution when the subarray is “large” relative to the embedding array, i.e., where the values of t, a, b, and c tend to be of greatest interest to designers. These formulae are listed in Table 2, and, as illustrated by Figure 8, provide the exact worst case fault tolerance in just over half of all cases. In each of these cases a linear time algorithm finds a cover whenever the number of faults is no greater than the worst case tolerance [27]. Perhaps most remarkable is that the fractional tolerance is the same order Θ(t -1) in the worst and probabilistic cases. Note that Yarch = 1 if and only if f arch, worst-case = farch , discrete and parch, frac worst-case = parch, hypergeometric . Markov’s inequality says that the probability that a cover exists is less than the expected number of covers, E(Khypergeometric, p, homog ) [16].

2

125 15

9 8 7 t

6 5 4

1

6

5 135 7 6 133 134 7 8 132 133 9 8 131 132 14 9 126 131 15 125 134

fa u go lty od

3

13 11 10 9 8 7 12 127 128 129 130 131 132 133 27 17 13 12 11 9 8 113 123 127 128 129 131 132 16 41 13 12 9 124 127 128 131 99 55 13 15 85 125 127 69 71 83 57 97 43 111 29

10

139 1

3 137 4 136 5 135 6 134 7 133 8 132 9 131 17 123

2 138 3 137 4 136 5 135 6 134 7 133 8 132 9 131

1 139 2 138 3 137 4 136 5 135 6 134 7 133 8 132

0 140 1 139 2 138 3 137 4 136 5 135 6 134 7 133

18 9 8 122 131 132 129

1

4 136 5 135 6 134 7 133 8 132 9 131 16 124

11 2

119 109 99 89 79 69 49 39 29 19 9 59 21 31 41 51 61 71 81 91 101 111 121 131 3

4

5

6

7

8

9

10

11

12

13

14

at

Figure 8: fworst-case, homog(t;a,b,c) computed by fomulae of Table 2. (1+b)t = 10, (a+c)t = 14. E(Khypergeometric, p, homog ) is, in turn, no greater than E(KBernoulli, p, homog ), the expected number of homogeneous covers under model FBernoulli (p); the latter is identical to the expected number of Bernoulli covers E(KBernoulli, p, dedicated ) for Adedicated (t;a,b,c) [24, Thms 3.11, 3.12, 3.13]:  ( 1 + b )t  ( a + c )t ( 1 – p )at  t   at 

2

(9)

By Jensen’s inequality, moreover, (9) is an upper bound on the expected number of covers under model Fclustering (p± ), where p is the overall expected proportion of faults. Thus (9) can be applied to any of our probabilistic models. If apt > b+c+2 then, at constant redundancy, (9) approaches zero at a rate exponential in t [24, Thm 4.14]. For large t, that is, a nonzero value of Yarch is impossible to attain. Thus, under any of our models, and for either Adedicated (t;a,b,c) or Ahomog (t;a,b,c), the proportion of faulty elements is at most (b+c+2)/at. This is within 2/at of a lower bound provided by the worst-case fractional fault tolerance (b+c)/at of either dedicated or homogeneous arrays. If the proportion of faults does not exceed (b+c)/at then the extremal algorithms of LaForge [6], [27] are guaranteed to find a fault cover in linear time. Algorithmic refinement of these bounds is the focus of Shi and Fuchs [22] and of Blough and Pelc [7], [20]. In the worst or probabilistic case, the preceding indicates how the fractional tolerance of either Adedicated (t;a,b,c) or Ahomog (t;a,b,c) to faulty elements is Θ(t-1). We would also like to know the normalized tolerance. However, upper and lower bounds on the area of Adedicated (t;a,b,c) were previously unreported. Using Corollary 2.3, we see that the normalized tolerance of each of these architectures is the same magnitude Θ(t-1) as the tolerance to elements. That is, the ratio of area of the faulty elements to the area of the architecture is, to within a constant, the same as the number of faulty elements we can tolerate divided by the total number of elements.

Point of interest

Adedicated (t;a,b,c)

Ahomog (t;a,b,c)

Number of elements, element redundancy

(a+ab+c)t2 , 1+b+ c/a Sec 4.1

(a+ab+ c+ bc)t2 , 1+ b+c/a+ bc/a Sec 4.1

Internal multiplexing: i) Number of wires ii) Number of switches iii) Area iv) Longest wire

Major-Muxat × (a+c)t optimum i) At least at(ct+1), at most 2act2 ii) At least ½ at(ct+1), at most act2 iii) Θ(ct 2) iv) Ω(t), O(ct) Thm 1, Cor 2.1, 2.3, 3.1

None required, but contains Θ(ct)2 area with bct2 more elements than Adedicated (t;a,b,c) Sec 5.2

Area, normalized redundancy

Θ(ct 2), Θ(c), Cor 2.3

Θ(ct)2, Θ(c 2), Sec 5

Maximum wirelength, best and worst case

Wbest case∈Θ(t), Thm 3 Wworst case∈Θ(ct), Thm 3

Wbest case∈Θ(1), Sec 5.3 Wworst case∈Θ(ct), Sec 5.3

(b+c)t [6], [14], [7], [20]

If at > bt( c +1) then bt( c +1) + ct If (ct-1)/t ≤ a/b ≤ c then ct(b+1) + ct mod t if ct ≡ 0 mod t + bt and at > bt( (ct-1)/t+1) or ct ≠ 0 mod + max(b,t) otherwise Always at least bt( c +1) + ct [6], [27]

Worst-case fault tolerance, fworst-case (t) (architectural and overall)

Proportional, normalized fault tolerance pfrac worst-case , phypergeometric, pBernoulli, p± clustering Configuration algorithm pfrac worst-case, phypergeometric, pBernoulli, p± clustering

b+c at

b+c+2 at

b+c+2 at

At least ------------ , at most ---------------------

Normalized ∈ Θ(t -1), Sec 6.1, (9),

Normalized ∈ Θ(t -1)

Cor 2.3, [7], [20], [22], [24]

Sec 6.1, (9), Sec 5, [24]

Deterministic and linear time whenever p does not exceed pfrac worst-case(t), [6] Approximation and probabilistic heuristics in [7], [12], [20], [22]

Element redundancy, normalized redundancy (constant failure proportion p, least failure probability p– , constant coverage, constants 4 ≥ q1 >q2 ≥ 2, constants q3 , q4, , q5 )

b+c at

At least ------------ , at most ---------------------

Deterministic and linear time whenever p does not exceed known value or lower bound on pfrac worst-case(t). [6], [27] ( ( 1 + a ) t ) ⁄ q1

1

At least ----------------------2

at ( 1 – p )

1

At least ----------------------2

a t ⁄ q1

at ( 1 – p )

Normalized ∈ Ω(t-2 exp1/(1-p) q4 t)

Normalized ∈ Ω(t-2 exp1/(1-p) q3t) For Fhypergeometric(p), FBernoulli (p), Fclustering (p± ) : 1

At most -----------------+

( ( 1 + a ) t ) ⁄ q2

(1 – p )

Sec 6.2, [24] Wirelength: Fhypergeometric(p), FBernoulli (p), Fclustering (p± )

Normalized ∈ O(t-2 exp1/(1-p) q5 t) Θ(t) Thm 4

O(log t) Thm 4

Table 2: Comparison of arrays spared by rows and columns.

6.2 Redundancy of Arrays Spared by Rows and Columns For a fixed proportion p of faults, inequality (9) can be used to demonstrate a lower bound on the redundancy, as a function of t. For constants 1 > Yarch > 0 and 4≥ q1 >q2 ≥ 2, LaForge [24, Thms 4.16, 4.17] shows that the number of elements of Ahomog (t;a,b,c) must be at least

1 ---------------( 1 – p)

( 1 + a ) t ⁄ q1

and at most

1 -----------------+ (1 – p )

( 1 + a ) t ⁄ q2

. The identical lower bound applies to the

bounding box of Adedicated ; when the actual number of elements is used we obtain the slightly more forgiving

1 ---------------(1 – p)

a t ⁄ q1

. These results apply to the models FBernoulli (p), Fhypergeometric (p),

and Fclustering (p± ), where, in the latter, p– is the minimum failure probability (3), (4) but equals p in the hypergeometric and Bernoulli cases. Since the fractional worst case is a special case of the hypergeometric model, moreoever, the worst case element redundancy of Adedicated (t;a,b,c) or Ahomog (t;a,b,c) is at least the hypergeometric redundancy: 1/(at2) times an exponential whose base is a constant power q3 of 1/(1-p). We do not know if the worst case element redundancy is at most 1/(at2) times an exponential whose base is a constant power q5 of 1/(1-p), nor are we sure of upper bounds for any of our probabilistic models for Adedicated (t;a,b,c). However, we do know that the probabilistic element redundancy of Ahomog (t;a,b,c), while poor, is effectively no worse than that of Adedicated (t;a,b,c). Most importantly, Corollary 2.3 enables us to compare the normalized redundancy, utility, or size. That is, our bounds Ω(t-2 exp1/(1-p)q3 t), Ω(t-2 exp1/(1-p)q4 t) or O(t-2 exp1/(1-p) q5 t) on the normalized redundancy are the same magnitude as those for the respective element redundancy. These bounds become exact orders in the domain of logarithms. 6.3 Wirelength of Arrays Spared by Rows and Columns Suppose that the redundancy is held constant but, with increasing values t, the Bernoulli proportion p = p(t) of faulty elements is maintained at a level (b+c)/at ≤ p(t)≤ (b+c+2)/at given by section 6.1. Invoke the inequality of Watson and Nehville [28]. For x, u real, 1 ≤ x≤ |u|: e

–u

≥ ( 1 – u ⁄ x)

x

≥ e

–u

2

(1 – u ⁄ x)

(10)

Applying (10) with the values x = t and (b+c)/a ≤ x ≤ (b+c+2)/a : e

–( b + c ) ⁄ a

2 –[ ( b + c + 2 ) ⁄ a ]  p(t) t (b + c + 2) ⁄ a]  1 – [---------------------------------------≥  1 – --------- ≥ e -   t  t  

(11)

For large t, that is, the chance that any t elements contains no faults is bounded from above and below by independent probabilities q+ > q– , constants on each side of (11). The configurations of Ahomog (t;a,0,c) are just the sets consisting of t columns, hence the probability that the t elements of a column cannot be used is independently bounded by q+ and q– . We now apply a result for streaks due to Leighton and Leiserson [29]: for any ε>0, with increasing t and probability approaching one, the longest left-to-right sequence of faulty columns is at least [log1/(1-q+) t] /(1+ε) and at most [log1/(1-q-) t] /(1- ε). With high probability, that is, the longest wire in Ahomog (t;a,0,c) is Θ(log t). The upper bound O(log t) applies as well to a homogeneous array Ahomog (t;a,b,c) with spare rows and columns, though we do not know if this bound is tight.

On the other hand, the exact bound Θ(log t) pertains to the set of nominal and spare columns in Adedicated (t;a,b,c), and, by symmetry, to the set of nominal and spare rows, as long as we neglectthe effect of the multiplexer. Taking this effect into account, Theorem 3 establishes that the maximum wirelength Θ(t) of Adedicated (t;a,b,c) is dominated by the multiplexer. A similar conclusion holds for the models Fhypergeometric (p) and Fclustering (p± ). This establishes our main result: Theorem 4. At constant redundancy and maximum proportion of faulty elements, under models FBernoulli (p), Fhypergeometric (p), and Fclustering (p± ): i) An optimal implementation of Adedicated (t;a,b,c) has wirelength Θ(t); ii) Ahomog (t;a,b,c) has wirelength O(log t). 6.4 Summary and Recommendations As Table 2 indicates, the Θ(t-1 ) fault tolerance and algorithmic configurability of Adedicated (t;a,b,c) and Ahomog (t;a,b,c) are comparable. The normalized redundancy O(t-2 exp t) of Ahomog (t;a,b,c) is no greater than that Ω(t-2 exp t) of Adedicated (t;a,b,c). The best case wirelength Θ(1) of Ahomog (t;a,b,c) is much better than that Θ(ct) of Adedicated (t;a,b,c). For iid or clustered faults, the wirelength Θ(log t) of Ahomog (t;a,b,c) remains much less than that Θ(ct) of Adedicated (t;a,b,c). Therefore, the benefit of the bct2 extra elements contained in Ahomog (t;a,b,c) outweighs the cost of the Θ(ct2) multiplexer which must accompany any implementation of Adedicated (t;a,b,c) having both spare rows and columns. Although Adedicated (t;a,b,c) is arguably the most popular architecture for fault tolerant arrays, the designer would do well to consider Ahomog (t;a,b,c) as a first choice for implementing an array spared by rows and columns. Let us amplify a caveat, first mentioned in section 4.4, that governs our conclusions. The same arguments advanced in section 5.2 establish that a second multiplexer, optimally realized by Major-Muxt × (b+1)t , is required between the t input/output rows and the (b+1)t rows of Adedicated . Foregoing an internal multiplexer, Ahomog nonetheless requires two multiplexers, optimally realized by Major-Muxt × (b+1)t and Major-Muxct × (a+c)t , in its interface between horizontal and vertical input/output. Therefore, if input/output occurs at about the same rate as internal operations in the t × at array configured, then the effective maximum wirelength of Adedicated is comparable to that of Ahomog . On the other hand, if input/output occurs infrequently with respect to the internal operations in the t × at array configured, then Ahomog (t;a,b,c) might be the preferred architecture. Whichever the case, our results provide heretofore undisclosed bounds on the area and wirelength of arrays spared by rows and columns. Moreover, these quantities can be used as a basis for comparison with other schemes for configuration. Under the probability model of section 2, for example, an array with h elements per local block delivers normalized tolerance Θ(t-1 )1/h , can be deterministically configured in linear time, has normalized redundancy Θ(log t), and contains no wire longer than Θ(log t)1/2 [15], [25]. Asymptotically, that is, locally spared arrays would appear to outperform both homogeneous arrays and arrays with dedicated spare rows and columns, even when clustering is taken into account. These predictions may run counter to the interpretation of “clustering”, in a spatial sense. It remains to test these predictions by way of controlled experiment. In the interim, the designer would do well to consider local sparing as a possible alternative to sparing by rows and columns.

References [1]

[2]

[3] [4]

[5] [6]

[7]

[8] [9]

[10] [11] [12] [13] [14]

[15]

[16] [17]

L. Lin and V. K. Jain. Complex-argument universal nonlinear cell for rapid prototyping wafer architecture. Proceedings, IEEE International Conference on Wafer Scale Integration. Los Alamitos, CA: IEEE Computer Society Press, 1995. pp 12-21. P. P. Suni. CCD wafer scale integration. Proceedings, IEEE International Conference on Wafer Scale Integration. Los Alamitos, CA: IEEE Computer Society Press, 1995. pp 123133. W. R. Moore. A review of fault-tolerant techniques for the enhancement of integrated circuit yield. Proceedings of the IEEE. 74, 5, May, 1986. pp 684-698. C. H. Stapper. Development of IBM’s 16MBit dynamic random access memory chip. Address to the Department of Electrical Engineering and Computer Science, University of Vermont, Burlington, VT, April, 1992. I. Koren and D. K. Pradhan. Yield and performance enhancement in VLSI and WSI multiprocessor systems. Proceedings of the IEEE. 74, 5, May, 1986. pp 699-711. L. E. LaForge. Extremally fault tolerant arrays. Proceedings, IEEE International Conference on Wafer Scale Integration. Los Alamitos, CA: IEEE Computer Society Press, 1989. pp 365-378. D. M. Blough. On the reconfiguration of memory arrays containing clustered faults. Digest of Papers: 21st Symposium on Fault-Tolerant Systems. Los Alamitos, CA: IEEE Computer Society Press, 1991. pp 444-451. A. T. Dahbura. System-level diagnosis: a perspective for the third decade. Concurrent Computation: Algorithms, Architectures, Technologies. New York: Plenum Press, 1988. C. H. Stapper. A new statistical approach for fault-tolerant VLSI systems. Digest of Papers: 22nd Symposium on Fault-Tolerant Systems. Los Alamitos, CA: IEEE Computer Society Press, 1992. pp 356-365. M. Tarr, D. Boudreau, and R. Murphy. Defect analysis speeds test and repair of redundant memories. Electronics. January, 1984. pp 175-179. J. R. Day. A fault-driven, comprehensive redundancy algorithm. IEEE Design and Test. June, 1985. pp 35-44. S. Y. Kuo and W. K. Fuchs. Efficient spare allocation for reconfigurable arrays. IEEE Design and Test. February, 1987. pp 24-31. C. L. Wey and F. Lombardi. On the repair of redundant RAM’s. IEEE Transactions on Computer-Aided Design, 6, 2, March, 1987. pp 222-231. L. E. LaForge. Feasible regions quantify the configuration power of arrays with multiple fault types. In Lecture Notes in Computer Science: Dependable Computing EDDC-1. Berlin: Springer-Verlag, 1994. pp 453-469. L. E. LaForge. Feasible regions quantify the probabilistic configuration power of arrays with multiple fault types. Proceedings, IEEE International Conference on Innovative Systems in Silicon. Piscataway, N.J.: IEEE Press, 1996. pp 298-312. M. H. DeGroot. Probability and Statistics. Reading, MA: Addison-Wesley Publishing Company, 1975. Sec. 4.8. I. Koren, Z. Koren, and C. H. Stapper. A unified negative-binomial distribution for yield analysis of defect-tolerant circuits. IEEE Transactions on Computers, 42, 6, June, 1993. pp 724-733.

[18] B. T. Murphy. Cost-size optima of monolithic integrated circuits. Proceedings of the IEEE. 52, December, 1964. pp 1537-1545. [19] I. Koren, Z. Koren, and C. H. Stapper. The impact of floorplanning on the yield of fault tolerant ICs. Proceedings, Seventh Annual IEEE International Conference on Wafer Scale Integration. Los Alamitos, CA: IEEE Computer Society Press, 1995. pp 329-338. [20] D. M. Blough and Andrzej Pelc. New results concerning clustered failure reconfiguration in memory arrays. Technical Report ECE-91-07. University of California, Irvine, Department of Electrical and Computer Engineering. June, 1991. [21] Via electronic mail, author’s communication with Israel Koren, 7-May-1997. [22] W. Shi and W. K. Fuchs. Probabilistic analysis of reconfiguration heuristics. Proceedings, IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems. New York: Plenum Press, 1989. [23] J. D. Ullman. Computational Aspects of VLSI . Rockville, MD: Computer Science Press, 1984. [24] L. E. LaForge. Fault Tolerant Arrays. PhD dissertation. Montreal: McGill University, 1991. [25] L. E. LaForge. What designers of wafer scale systems should know about local sparing. Proceedings, Sixth Annual IEEE International Conference on Wafer Scale Integration. Piscataway, N.J.: IEEE Press, 1994. pp 106-131. [26] T. C. Lee and J. Cong. How interconnection affects submicron design. IEEE Spectrum. March, 1997. p 57. [27] L. E. LaForge. Some Zarankiewicz numbers, with linear time extremal algorithms for finding bipartite cliques. Technical Report SOCS-94.Z. McGill University School of Computer Science, July, 1994. [28] D. S. Mitrinovic. Analytic Inequalities. Berlin: Springer-Verlag, 1970. [29] T. Leighton and C. E. Leiserson. Wafer-scale integration of systolic arrays. IEEE Transactions on Computers. C-34, 5, May, 1985. pp 448-461.