IEEE Transactions on Computers, Volume 48, Number 4, April, 1999
Configuration of Locally Spared Arrays in the Presence of Multiple Fault Types Laurence E. LaForge, Member, IEEE The Right Stuff of Tahoe, Incorporated, 3341 Adler Court, Reno, NV 89503-1263 Tel: (775) 322-5186, Email:
[email protected]
Abstract. The bulk of results for the performance of configuration architectures treat the case of failed processors, but neglect switches that are stuck open or closed. By contrast, the present work characterizes this multivariate problem in the presence of either iid or clustered faults. Suppose that the designer wishes to assure, with high probability, a fault free s × t array. If local sparing is used then, as we prove, the area of the redundant array is i) Θ( st log st ) in the presence of faulty elements or faulty elements and switches stuck open; ii) Θ( st log2 st ) in the presence of faulty elements and switches stuck closed; iii) Θ( [st]2 log st ) in the presence of faulty elements and switches that may be either stuck open or stuck closed. We also furnish bounds on maximum wirelength and an optimal configuration algorithm. Key phrases: configuration architectures, fault tolerance, local sparing, systolic arrays
1
Introduction and Definitions
As submicron feature sizes give rise to increasing levels of integration, the orders of magnitude of the area and maximum wirelength eventually dominate constants that might prevail in systems with a more modest number of elements. By extrapolating from the redundancy of simple systems, such as one-dimensional arrays, one might be misled into thinking that boosting the area of a desired system by factors of 2 , 3 , or even 4 , is an unreasonably high price to pay for yield or reliability. To the contrary, such linear growth in area is usually less than what nature exacts. In fact, we already use configuration architectures whose area is superlinear in the number of elements in the system desired. For example, the popular approach of sparing a t × t array by rows and
t
4 1 columns requires at least 1−p elements [1]. Here pε is the overall proportion of faulty elements, ε distributed in a fashion described by Section 2. Both the number of elements and the area of an array spared by rows and columns grow exponentially in the length of one of the sides of the array. By contrast, and under the same model of faulty elements, Subsection 7.1 establishes that the number of elements and the area of a locally spared t × t array are each Θ(t2 log t) , much less than the cost of row-column sparing.
The constants associated with the layout of configuration architectures are perhaps best revealed by simulation and experiment. However, determining the order of magnitude of the area and maximum wirelength, as a function of the size of the desired system, properly falls within the domain of scaling theory. The latter is the focus of this paper. Our work quantifies the effects of faulty switches coupled with faulty elements, and makes comparisons with the case of faulty elements only. It is not especially surprising that, in addition to faulty elements, the presence of faulty switches increases the overall area and maximum wirelength in a locally spared array. What is surprising is that the area and maximum wirelength increase by orders of magnitude.
1
By a configuration architecture we mean a collection of elements and switches connected by wires. “Element” as used here is is also known as a “processing element” or “PE” in other works. Figure 1 illustrates local sparing , a particularly simple configuration architecture depicted as early as the paper by Koren and Pradhan [2]. The idea is to replace every element in the desired system with a block of h elements. Local sparing is simple and can be applied to any system desired. Under a model where elements fail randomly, LaForge [3] bounds the area and maximum wirelength of locally spared H-trees and hypercubes. Perhaps best-studied are configuration architectures for arrays in one or two dimensions. Chapman et al [4] describe how local sparing is used to enhance the yield of a two-dimensional thermal pixel array, test chip for which has been recently fabricated. Under a model of faulty elements only, Ketchen [5] and LaForge [3] characterize the configuration power of any architecture locally spared, in the probabilistic and worst cases. Chen and Updadhyaya [6] compare the reliability of binary tree architectures in the presence of faulty elements and switches stuck open or closed. However, both the target and configuration architectures of [6] are somewhat different from locally spared arrays as addressed in [3], [2], [4], [5], or in the present paper. Chen, Cheng, and Chou [7] use layout-level simulation to assess the effectiveness of redundant arrays in the presence of faulty elements and switches, but make no distinction between switches that are stuck open and those that are stuck closed. Moreover, neither the analysis of [6] nor the simulations of [7] address scaling trends for the ratio of size of the fault tolerant architecture to the size of the target architecture. Quantifying these trends is an objective of the present paper. Under a model which admits, in addition to faulty elements, broken wires and switches stuck open or closed, LaForge [8] uses feasible regions to determine the worst-case fault tolerance of locally spared two-dimensional arrays. This paper extends the preceding by deriving analytic cost functions for configuration of locally spared arrays in the presence of probabilistically distributed switches, stuck open or closed, as well as probabilistically distributed faulty elements. We provide a complete technical exposition, not furnished in [9].
A(h, k; s, t)
s×t array locally spared with h elements per block and 2k switches between elements from neighboring blocks
Section 2, para 1
bi,j bq;i,j b D, D F (·) h
block at row i, column j
Theorem 1
element q of the block at row i, column j
Theorem 1
mean defect density per quadrat, overall defect density
Section 2, para 3
iid or clustered fault model, · = the set of failure probabilities
Section 2, para 2
number of elements per block, element redundancy
Section 1, para 3
k
number of switches in series between any element and the bus to a neighboring block
Section 2, para 1
m pε + p− ε , pε pstuck closed + p− stuck closed , pstuck closed pstuck open + p− stuck open , pstuck open Yalg , Yarch Y
number of neighboring blocks
Section 5, para 1
iid failure probability, overall proportion of faulty elements
Section 2, para 2
minimum, maximum probability of a faulty element
Section 2, para 5
iid probability, overall proportion switches stuck closed
Section 2, para 2
minimum, maximum probability of a switch stuck closed
Section 2, para 5
iid probability, overall proportion of switches stuck open
Section 2, para 2
minimum, maximum probability of a switch stuck open
Section 2, para 5
algorithmic, architectural configuration coverage
Section 2, para 6
overall configuration coverage Yarch · Yalg
Section 2, para 6
Index of Symbols
2
faulty element
good elements used
good elements not used
Figure 1: Fault cover for a locally spared 2 × 4 array, 2 elements per block. An instance of a configuration architecture identifies the faulty elements and switches stuck open or closed. Identification of faulty components falls to a diagnosis scheme [10]. A configuration algorithm takes as input an instance. The output is either the string “no solution” or a set of instructions indicating how switches should be set in order to connect elements. This set of instructions we will refer to as a configuration. Refer to Figure 1. A fault cover is a configuration that achieves the adjacency of the desired system using good elements only. In this paper the desired system is an s × t array of good elements. The term “fault cover” is known as a “repair” in some works [11]. The purpose of a configuration algorithm is to find a fault cover. A particular combination of configuration architecture and algorithm is a configuration scheme.
2
Fault Model and Figures of Merit
Refer to Figures 1 and 2. We parameterize local sparing A(h, k; s, t) of an s × t array by the number h of elements in each of the st blocks. Assuming that data paths are one bit wide, we fabricate 2st − s − t switching matrices. A fault free matrix permits the connection of any of h2 pairs of elements from neighboring blocks. Any element’s access to a matrix is gated by a vector of k on-off switches in series. Switches that are stuck closed may prevent us from isolating a fault free array from the remaining elements. Amelioration of this effect is the objective of arranging vectors of k switches in series. We presume that elements are either faulty or good. Each switch is two-way; a faulty switch is either stuck open or stuck closed. For our basic probability model F( pε , pstuck open , pstuck closed ) , we assume that the two types of switch failures are mutually exclusive, but otherwise faults are independent and identically distributed (iid): 1) pε is the probability that an element is faulty; 2) pstuck closed is the probability that a switch is stuck closed; 3) pstuck open is the probability that a switch is stuck open. Our development also explores the submodels F( pε ) , F( pstuck closed ) , F( pε , pstuck open ) , F( pε , pstuck closed ) , and F( pstuck open , pstuck closed ) . Consistent with most treatments of nonuniform faults, our clustering model ± ± F( p± , p , p ), and subsets thereof, postulate the existence of nonoverlapping regions stuck open stuck closed ε on wafer, called quadrats. Koren, Koren, and Stapper [12] use the term “block” instead of the more venerable “quadrat.” We employ “quadrat” in order to avoid confusion with “block” as pertains to local sparing. The mean spatial density D of fault-causing defects per quadrat is itself a
3
Θ(h1/2 k1/2)
north bus
north
bus
west bus
east bus
Θ(k1/2)
south bus Each element is replaced by a block of h elements whose respective connections are isolated by k switches in series.
east west
bus
bus
wires cross, do not make contact
Θ(k1/2)
south
bus
wires abut, make contact
Figure 2: Θ(hk) layout for a single block of A(h, k; s, t) , (h, k) = (4, 4) . random variable. In the interest of fitting yield data, D has been presumed to conform to various distributions; e.g. Pr (D ≤ X) = b
triangular(0, 2D )
Pr (D ≤ X) = b
gamma(α, α/D)
Z
b min(X,D)
b2 D
0
Z
0
xdx
X
αα b α Γ(α) D
Z
+
b min(X,2D)
b − x)dx (2D
Murphy [13]
b2 D
b D b
xα−1 e−(αx)/D dx
Koren and Koren [14]
(1) (2)
R
b is the overall mean defect density, Γ(α) is the gamma function ∞ xα−1 e−x dx , and α is where D 0 a constant that depends on the relation between the geometry of quadrats and circuitry [12]. We subsume (1), (2), and similar distributions by way of a truncation approximation: over all quadrats, there is a nonzero minimum mean defect density D− , as well as a maximum mean defect density D+ . Such a truncation appears reasonable in light of the experimental data, such as it has been reported. Clustered faults are therefore independent but are not identically distributed. We can bound the independent failure probability p of a component having length ` and width w as
0 < p− = 1 − e−(`×w)D def
−
≤ p ≤ 1 − e−(`×w)D
+
def
= p+ < 1
Since this applies to faulty elements, switches stuck closed, and switches stuck open, we have + notation for the range of failure probabilities in each case: pε ∈ [p− ε , pε ] ⊂ (0, 1) , pstuck open ∈ − + − + [pstuck open , pstuck open ] ⊂ (0, 1) , and pstuck closed ∈ [pstuck closed , pstuck closed ] ⊂ (0, 1) . As Figure 5 illustrates, this approximation suffices for the purpose of characterizing scaling trends for locally spared arrays in the presence of clustered faults in elements and switches.
4
The architectural configuration coverage Yarch is the probability that a fault cover exists. The algorithmic configuration coverage Yalg is the probability, given the existence of fault cover, that an algorithm can find a cover. The coverage Y of a configuration scheme provides an estimate for the yield or reliability, and equals the product Yarch · Yalg of the architectural and algorithmic coverages. The redundancy of a particular type of component is the number of such components in the configuration architecture divided by the number of components of the same type in the system desired. For illustration, the element redundancy of A(h, k; s, t) is the count h of elements in any block. In the case of switches that appear in the configuration architecture, but which have no corresponding switches in the system desired, we obtain the redundancy by dividing by the number of elementto-element connections in the nominal architecture. Thus, for example, the switch redundancy of A(h, k; s, t) equals k. For integrated circuit systems, it makes sense to normalize the overall redundancy by dividing the area attributable to the configuration architecture by the area of the system desired [15]. Under a model where elements fail randomly, increasing the redundancy tends to increase the configuration coverage. For example, LaForge [3] shows how, if local sparing √ is used, then i) the normalized redundancy of s × t arrays is Θ(log st) ; ii) the wirelength is O( log st ) ; iii) in optimal time Θ(st log st) , a simple algorithm configures a fault free copy of the desired system if and only if a fault free copy exists. Experience with real configuration architectures indicates that the presence of faulty switches, in addition to faulty elements, decreases the configuration coverage and increases the redundancy [17]. On the other hand, the probabilistic scaling trends among coverage, redundancy, and failures in switches and elements are, apparently, heretofore unquantified. By extending the above-mentioned results (i) through (iii), this paper shows how switch failures can affect the order of magnitude of the cost of scaling.
3
How Feasible Regions Quantify Configuration Power
As Figure 1 illustrates, the components of any fault cover may be divided into three classes: 1) the fraction that are good and used (the utility); 2) the proportion that are faulty and not used (the fractional fault tolerance); and 3) the percentage that are good but not used (the waste). This observation provides the fundamental equation for fault tolerant architectures: utility + fractional fault tolerance + waste = 1
(3)
With respect to elements, for example, Figure 1 illustrates an instance whose utility is 12 , whose fractional fault tolerance equals 18 , and whose waste is 38 . The utility is just the reciprocal of the redundancy, and so maximizing the former is equivalent to minimizing the latter. Because of its simplicity, (3) is a guidepost for analysis of scaling: it suffices to express any two of the three terms on the lefthand side. With coverage 0 < Y < 1 , for example, in the case of elements whose iid failure probability equals pε , and where we seek to achieve a target architecture on n elements, local
1/h
Y sparing delivers utility dlog1/pε n − log1/pε (− ln Y )e−1 and fractional fault tolerance − ln [3]. n Here we would like to stress and briefly illustrate how the terms in (3) depend on both the criteria for acceptable service and the fault distribution.
5
If our model admits faults in elements only, then there is no need for redundancy in the switches, and the worst-case tolerance of A(h, 1; s, t) is given by the interval [0, h − 1] . The fractional fault tolerance is therefore
h−1 hst
⊆
Θ(st)−1 ,
which, for all st >
1 − ln Y
h−1 h
h 1/h−1
, is less than that
Θ(st)−1/h in the probabilistic case [3]. Figure 3 illustrates what happens if, in addition to faulty elements, we admit broken wires as well as switches stuck open or closed: differing fault types may be interdependent. Instead of a one-dimensional interval, the fault tolerance is described by a multivariate region, the interior of which prescribes feasible designs for configuration. 2k - 1
number of switches stuck closed
k-1 h-1 h-1 number of broken wires
number of switches stuck open + number of faulty elements
2 h -1
Figure 3: Worst-case feasible region of tolerance for A(h, k; st > 2) ; bounds derived in [8]. Somewhat curiously, the worst-case feasible region of utility (equivalently, of redundancy) for locally spared arrays does not exist whenever st exceeds the reciprocal of the element failure rate [8]. Thus, there are occasions when (3) degenerates. On the other hand, and as we show in this paper, multivariate feasible regions do exist for distributions consistent with contemporary models of faultcausing defects. Focusing on the bivariate (h, k)-redundancy, Sections 4, 5, 6, and 7 establish the boundaries of these feasible regions. By way of emphasis, our treatment neglects broken wires, and it remains to calculate either the fractional fault tolerance or waste. Section 8 points to other areas where there is more work to be done.
4
Linear Time Configuration Algorithm
In this section we derive necessary and sufficient conditions for an instance of A(h, k; s, t) to have a fault cover. These conditions are readily computable and can be used as a deterministic configuration algorithm. An element bq;i,j of an instance of A(h, k; s, t) is available if it is good and none of the switches that gate its access to a matrix is stuck open. A block bi,j is available if at least one if its elements is available. Theorem 1 In the presence of faulty elements and switches stuck open or closed, an instance of A(h, k; s, t) has a fault cover if and only if, for every block bi,j , either E1. No switch vector in bi,j is stuck closed and bi,j is available, or E2. there exists exactly one element bq;i,j ∈ bi,j such that some switch vector of bq;i,j is stuck closed and bq;i,j is available, 1 ≤ q ≤ h .
6
NECESSARY. E1: If no element of bi,j is available then bi,j cannot be represented by a good element. E2: If some element bq;i,j ∈ bi,j has switch vectors stuck closed and some other element br;i,j ∈ bi,j is selected by a configuration then bq;i,j and br;i,j share some neighbor; bi,j is not represented by a unique element. Therefore, bq;i,j must be selected by any cover, bq;i,j is the only element in bi,j that can be selected, and bq;i,j must be available. SUFFICIENT. If bi,j satisfies E1 then select any available element from bi,j . If E2 then select from bi,j the element bq;i,j whose switches are stuck closed. Doing this for every block achieves an independent s × t array of good elements. 2 The sufficiency proof of Theorem 1 establishes the correctness of configuration algorithm ConfigAelements, stuck open, stuck closed , psuedo-code for which appears below. Algorithm Config-Aelements, stuck open, stuck closed % Output cover for A(h, k; s, t) for i=1 to s % index-of-element-selected { for j = 1 to t % At most 1 stuck closed element { number-stuck-closed-and-available = 0 index-of-element-selected[i,j] = 0 for q = 1 to h % per block { if all the switches of some vector of bq;i,j are stuck closed then { increment number-stuck-closed-and-available if number-stuck-closed-and-available> 1 or bq;i,j is not available then print “no solution” and STOP else index-of-element-selected[i, j] = q } else if bq;i,j is available % Pick highest indexed available element and number-stuck-closed-and-available = 0 then index-of-element-selected[i, j] = q } if index-of-element-selected[i, j] = 0 then print “no solution” and STOP } }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Each element of A(h, k; s, t) has at most four switch vectors, so the input size of an instance may be Ω(hkst) . Since any algorithm must read its input, an algorithm that runs in O(hkst) steps is, to within a constant factor, fastest possible. Applying straightforward counting principles, we see that this upper bound is achieved: Corollary 1.1 In optimal time Θ(hkst) , Config-Aelements, stuck open, stuck closed computes a fault cover for any instance of A(h, k; s, t) , if and only if the instance has a fault cover. An analytical exposition such as ours considers the running time and correctness of configuration – a practice often omitted when results are obtained by simulation [7]. Deterministic configuration of local sparing compares favorably with approximation algorithms and probabilistic heuristics for other means of sparing arrays, underlying problems for which are often NP-complete [11], [17], [19]. Config-Aelements, stuck open, stuck closed is a natural extension of the Θ(hst) algorithm Na¨ıve of [3], with the latter for the case of faulty elements only. Under a model that admits broken wires as well as faulty element and switches, the extremal algorithm Config-A of [8] takes time O([h2 + k]st) to compute a cover whenever the number of faults guarantees the existence of a fault cover. Under this same four-variate model, the O([h2 + hk]st) algorithm Alocal of [16, Appendix A] is guaranteed to find a fault cover for A(h, k; s, t) , if and only if such a cover exists.
7
0
20
40
60
80
100
0.0
Number of elements in desired system = st Log10 of coverage
-1.0
-2.0
-3.0
Aspect ratio 1:t (linear, s = 1) Aspect ratio 1:5
-4.0
Aspect ratio 1:3 Aspect ratio 1:1 (square, s = t)
-5.0
Figure 4: The iid configuration coverage of A(h, k; s, t) is log-linear, and decreases with increasing aspect ratio st ≤ 1 . pε = 0.1 , pstuck open = 0.05 = pstuck closed , h = 2 = k .
500 0.0
Number of elements in desired system = st
Log10 of coverage
-5.0 -10.0 -15.0 -20.0 -25.0 -30.0 -35.0 -40.0
pε 0.05
pstuck open 0.05
pstuck closed 0.05
0.05
0.05
0.15
0.05
0.06
0.05
(0.05,0.05,0.15) (0.05,0.06,0.05)
0.05
0.06
0.15
(0.05,0.06,0.15)
0.06
0.05
0.05
0.06
0.05
0.15
(0.06,0.05,0.05) (0.06,0.05,0.15)
0.06
0.06
0.05
(0.06,0.06,0.05)
0.06
0.06
0.15
(0.05,0.05,0.05)
Figure 5: Log-linear bounds on configuration coverage in the presence of clustered faults. ± ± p± ε ∈ [0.05, 0.06] , pstuck open ∈ [0.05, 0.06] , pstuck closed ∈ [0.05, 0.15] , s = 1 , h = 2 = k .
8
1000
25
50
75
100
0.0 -1.0
Log10 of coverage
-2.0 -3.0 -4.0
Number of elements in desired system = st
-5.0 -6.0 -7.0 -8.0 -9.0
h=1 h=2 h=3 h=4
-10.0
Figure 6: Coverage monotone in h . pε = 0.1 , pstuck open = 0.05 = pstuck closed , s = 1 , k = 2 .
25
50
75
100
0.0 -1.0
Number of elements in desired system = st
Log10 of coverage
-2.0 -3.0 -4.0
h=1
-5.0
h=2
-6.0
h=3
-7.0
h=4
-8.0 -9.0 -10.0
Figure 7: Coverage not monotone in h . pε = 0.1 , pstuck open = 0.05 , pstuck closed = 0.25 , s = 1 , k = 2 .
9
25
50
75
100
0.0 -0.5
Log10 of coverage
-1.0 -1.5 -2.0 k= 1 -2.5
k= 2
-3.0
k= 3
-3.5
k= 4
-4.0
Number of elements in desired system = st
-4.5 -5.0
Figure 8: Coverage monotone in k . pε = 0. = 1 = pstuck open , pstuck closed = 0.35 , s = 1 , k = 2 .
25
50
75
100
0.0 -1.0
Number of elements in desired system = st
Log10 of coverage
-2.0 -3.0 -4.0 -5.0 -6.0 -7.0 -8.0 -9.0
k= 1 k= 2 k= 3 k= 4
-10.0
Figure 9: Coverage not monotone in k . pε = 0.1 , pstuck open = 0.05 , pstuck closed = 0.25 , s = 1 , h = 2 .
10
5
Configuration Coverage
In this section we translate Theorem 1 into an explicit formula for the architectural configuration coverage. By Corollary 1.1, the algorithmic coverage of Config-Aelements, stuck open, stuck closed equals one. Thus, our expression for the architectural configuration coverage Yarch is the same as the overall configuration coverage Y of any optimal scheme for A(h, k; s, t) . The availability of an element (resp. block) is the probability that it is available. Let 1 ≤ m ≤ 4 be the number of neighbors of a block bi,j in A(h, k; s, t) . Note that the number of switch vectors adjacent to any element of bi,j equals m . Theorem 2 If E1 and E2 are the events of Theorem 1 then, under model F(pε , pstuck closed , pstuck open ) :
Pr (E1) =
1 − pstuck closed k
Pr (E2) = h (1 − pε ) (
mh
1− 1−(1−pε )
(1−pstuck open )k −pk stuck closed 1−pk stuck closed
1−pkstuck closed
m(h−1)
)
km
( 1−pstuck open )
1− 1−
mh
m h !
(4)
pstuck closed 1−pstuck open
k m
(5)
PROOF. (4): 1−pkstuck closed is the probability that bi,j has no switch vector stuck closed. In particular, denote by S the eventthat at least one m switch in any vector belonging to element bq;i,j is k not stuck closed. Thus, Pr(S) = 1−pstuck closed . Let X be the event that no switch in any vector belonging to bq;i,j is stuck open. Thus XS is the set of events whereby, in any vector belonging to bq;i,j , ` > 0 switches are good and k − ` switches are stuck closed. Each such ` determines a disjoint event, hence "
k X k
Pr(XS) = h
=
`=1
`
!
#m
(1 − pstuck open − pstuck closed ) pstuck closed `
(1 − pstuck open )k − pkstuck closed
k−`
(6)
im
where for the second equality we have used the binomial theorem. Element bq;i,j is available (cf. definition at the beginning of Section 4) if and only if bq;i,j is good and the event X occurs. By the Theorem of Bayes, the probability of X given S equals Pr(XS)/ Pr(S) [18, p 57]. Under the hypothesis that bi,j has no switch vectors stuck closed, the availability of bq;i,j equals (1 − pε )
(1 − pstuck open )k − pkstuck closed 1 − pkstuck closed
!m
(7)
Under the hypothesis that bi,j has no switch vectors stuck closed, the second factor on the righthand side of (4) is the probability that some element is available. (5): Let T be the event that none of the vectors belonging to any particular element bq;i,j are stuck open. Thus Pr(T ) = (1 − pstuck open )km . Let Y be the event that one or more vectors of bq;i,j are stuck closed; that is, in ` > 0 of bq;i,j ’s vectors every switch is stuck closed, and, in the m − ` remaining vectors, at least one switch is good and the rest either good or stuck closed. The probability of the latter is the same as (6) with m − ` in the exponent. Conditioning on T : Pr(Y | T ) =
1 (1−pstuck open )km
Pm
`=1
k k m−` (m`)pk` stuck closed [ (1−pstuck open ) −pstuck closed ] m
=
( [1−pstuck open ]k −pkstuck closed +pkstuck closed )
=
1− 1−
−
[ (1−pstuck open )k −pkstuck closed ]
(1−pstuck open )km
pstuck closed 1−pstuck open
k m
11
(8) m
The status of bq;i,j per se is independent of T and Y , as well as whether some other element has a switch vector stuck closed. The probability of the latter equals (1 − pkstuck closed )m(h−1) . Therefore, (1 − pε ) · Pr(T ) · Pr(Y |T ) · (1 − pkstuck closed )m(h−1) is the probability that element bq;i,j satisfies condition E2 . Equation (5) follows by noting there are h ways for this to happen, and by substituting for the values of Pr(T ) and Pr(Y | T ) . 2 ≤
The step function 0 (x) has value 1 if 0 ≤ x and otherwise equals 0 . Using Theorem 2, we write closed forms for the coverage of A(h, k; s, t) . By independence: Corollary 2.1 Under model F(pε , pstuck closed , pstuck open ) , let E1 and E2 be the events of Theorems 1 and 2. For positive integers 1 < st , s ≤ t , the configuration coverage is Y
if s = 1
t−2 [ Pr (E1) + Pr (E2) ]2m=1 · [ Pr (E1) + Pr (E2) ]m=2
otherwise
[ Pr (E1) + Pr (E2) ]4m=2
= =
(9) (10)
count of two-dimensional edge blocks
z
}|
≤
{
≤
2 (s − 2) 0 (s−3) + (t − 2) 0 (t − 3)
· [ Pr (E1) + Pr (E2) ]m=3 z
(11) count of interior blocks ≤
}|
≤
{
(s − 2) 0 (s − 3) · (t − 2) 0 (t − 3)
· [ Pr (E1) + Pr (E2) ]m=4
(12)
Figures 4 through 9 depict the configuration coverage given by Corollary 2.1; in order to more clearly illustrate trends we have made the failure probabilities somewhat higher than might be expected with a mature process. For constant pε , pstuck closed , and pstuck open , the coverage of local sparing is linear in the logarithm of the number st of elements in the desired array. The extent of the scales reinforces a point made in the first paragraph of Section 1: we cannot hope achieve non-negligible yield or reliability with constant element redundancy. Refer to Figure 4. For constant number of elements, the coverage decreases monotonically with increasing aspect ratio st . To see this note that the greater an array’s aspect ratio, the more interior blocks there are in its configuration architecture. Thus, a higher aspect ratio affords more opportunity for switch vectors to be stuck open or closed. More precisely, let us compare the configuration coverage of an n-block 1 × t locally spared array against that of a q × r locally spared array having the same number n = qr of blocks, q ≤ r . It suffices to show that (9) is no less than the product of (10), (11), and (12). If r = 2 then q = 2 and we have [ Pr (E1)+Pr (E2) ]2m=1 · [ Pr (E1)+Pr (E2) ]2m=2
?
≥
[ Pr (E1)+Pr (E2) ]4m=2
which holds since Pr (E1) + Pr (E2) is nondecreasing in m . If r = 3 and q = 2 then we have [ Pr (E1)+Pr (E2) ]2m=1 · [ Pr (E1)+Pr (E2) ]4m=2
?
≥
[ Pr (E1)+Pr (E2) ]4m=2 · [ Pr (E1)+Pr (E2) ]2m=3
which again holds since Pr (E1) + Pr (E2) is nondecreasing in m . Otherwise, both q and r are greater than or equal to 3 ; in this case it suffices that the exponents of (10), (11), and (12) satisfy
?
t − 2 ≤ 2 + 2[(r − 2) + (q − 2)] + (r − 2)(q − 2)
12
(13)
Since t = n = qr , the left and right sides of (13) are equal. Hence, the configuration coverage of a locally spared two-dimensional array is no greater than that of a locally spared linear array having the same number of blocks. It remains to answer whether this applies when s > 1 ; i.e., is the product of (10), (11), and (12) nonincreasing with respect to aspect ratio? For integers 1 < s < q ≤ r < t , st = qr = n , it suffices that 2([s−2]+[t−2])
[ Pr (E1)+Pr (E2) ]4m=2 · [ Pr (E1)+Pr (E2) ]m=3 ?
≥
· [ Pr (E1)+Pr (E2) ]2m=4 ([s−2] · [t−2]) 2([q−2]+[r−2])
[ Pr (E1)+Pr (E2) ]4m=2 · [ Pr (E1)+Pr (E2) ]m=3
(14) 2([q−2] · [r−2])
· [ Pr (E1)+Pr (E2) ]m=4
Relation (14) holds if any of the factors [ Pr (E1) + Pr (E2) ] equals zero. Otherwise, and since st = qr , (14) reduces to an exponential relation
[ Pr (E1) + Pr (E2) ]m=3 [ Pr (E1) + Pr (E2) ]m=4
!s+t
?
≥
[ Pr (E1) + Pr (E2) ]m=3 [ Pr (E1) + Pr (E2) ]m=4
!q+r
(15)
the base of which strictly greater than one. Noting that s = nt and q = nr , inequality (15) is satisfied if nt + t ≥ nr + r . But this follows since the real valued function nx + x decreases with √ increasing x ∈ (0, n) . Hence, for s ≥ 3 , the configuration coverage is nonincreasing (and, for nonzero values of pstuck open or pstuck closed , decreases) as the aspect ratio st increases. By similar reasoning we conclude that the coverage of A(h, k; q, r) is no greater than, and, in the presence of faulty switches, is strictly less than, that of A(h, k; 2, t) , for 2 < q ≤ r < t and 2t = qr = n . Having established that coverage is a monotone function of aspect ratio, let us examine the effects of clustering, element redundancy, and switch redundancy. For the sake of simplicity the plots of Figures 5 through 9 pertain to one-dimensional arrays. Refer to Figure 5. The expressions of Theorem 2 and Corollary 2.1 decrease monotonically as any of the failure probabilities pε , pstuck closed , and pstuck open increase. Under clustering model ± ± F( p± ε , pstuck open , pstuck closed ) , therefore, Y may be bounded from above and below by replacing pε , pstuck closed , and pstuck open with their respective minimum and maximum values. If pstuck closed = 0 = pstuck open then the coverage given by Corollary 2.1 reduces to Y = ( 1 − phε )st , which is independent of the aspect ratio st and which is derived in [3]. In this case the coverage increases with increasing h . Figures 6 and 8 illustrate how the coverage may also increase with increasing h or k when pstuck closed or pstuck open is nonzero. However, and as shown in Figures 7 and 9, it is possible to have too much redundancy. In general, that is, the coverage is not monotone in either h or k . This poses a multivariate optimization problem whereby we seek h and k that achieve a given level of coverage Y , while at the same time minimizing the area of A(h, k; s, t) . Section 7 furnishes the solution to this problem. Section 6 addresses the question of layout area.
6
Discrete versus Normalized Redundancy
In this section we constructively derive how to translate the element and switch redundancies (h, k) of a locally spared s × t array into the normalized redundancy and maximum wirelength. We begin by prescribing our layout model.
13
The model of Ullman [20] provides that rectangularly bounded cells be laid out on the lattice points of a grid. We adapt this model as follows. Cells are separated in the plane by two layers of interconnect running in the horizontal and, respectively, vertical channels. Wires may make right-angled turns, have a constant nonzero width, and must be separated by constant nonzero pitch. At any point there are at most two wires crossing. Lemma 1 If switches are implemented as fuses or as sequentially programmable √ nonvolatile pass gates then A(h, k; s, t) has normalized redundancy Θ(hk) and wirelength O(k + hk) . PROOF. The total number of constant-area components is hkst , so the area is Ω(hkst) ; the normalized redundancy is therefore Ω(hk) . The upper bound is by construction. We first calculate the area for technologies, such as fuses, that do not require on-wafer lines to control the setting of switches. Figure 2 illustrates for (h, k) = (4, 4) . Focus on arbitrary element ε in a Θ(st) layout ofl them four-neighbor connected desired l√ m system √ A(1, 0; s, t) . Replace ε by a block having at most 4 h horizontal channels and 4 h vertical channels. Within this block make the distance √ between√successive parallel channels equal to the width of an element plus the area of two Θ( k ) × Θ( k ) rasterized squares,√each containing a vector of k switches in series. This gives a block whose O(hk) area contains d he2 ≥ h “holes.” Into h of these √ holes place √ an element, surrounded on four sides by a switch vector whose rasterized layout is Θ( k ) × Θ( k ) . In constant area, wire an element to its respective switch vectors. In constant area per switch vector, wire the outgoing end of each switch vector to a horizontal or vertical bus connecting the block to its corresponding connection in the desired array (north, south, east, or west). Verify that the result permits the adjacency of local sparing as prescribed in Section 2. Each block has area O(hk) . Since blocks abut, A(h, k; s, t) has overall area Θ(hkst) and normalized redundancy Θ(hk) . By a nonvolatile pass gate we mean a switch whose state need not be held by an active signal, but which may be changed by an active signal. The underlying technology may be provided by FowlerNordheim floating gate devices [21]. Sequentially activated nonvolatile pass gates may be laid out as follows. In the horizontal interblock channel to the south of row i , and in the vertical interblock channel to the east of column j , run two lines, row-selecti and column-selectj . Noting that the total number of switches in any √ block equals hk , each switch may be addressed by one of the integer pairs (x, y) , 0 ≤ x, y < d hke . Wire each switch whose abscissa is x to switch-selectx , √ one of Θd hke control lines in the horizontal √ channel to the south of row i . Wire each switch whose ordinate is y to switch-selecty , one of Θd hke√control lines in the vertical channel to the east of column j . The area of each block remains Θd hke . Wire each switch to a global line, switchvalue. Switch (x, y) of the block at row i and column j is set to the value of switch-value if and only if lines row-selecti , column-selectj , switch-select √ x , and switch-selecty are active. The interblock channel width is increased from Θ(1) to O( hk) . The overall area, however, remains Θ(hkst) . The normalized redundancy is Θ(hk) . Regardless whether we use fuses or sequentially programmable nonvolatile pass gates: 1) the length of any connection through a switch vector is O(k) √ , 2) the length of any connection from the outgoing end of a switch vector to the next block is O( hk) , and 3) the wirelength between elements in any √ √ configuration is O(2k + 3 hk) ⊆ O(k + hk) . 2 We should mention that a locally spared array which simultaneously uses at least one active external
14
signal per switch has superlinear area bounded by Ω(hkst)3/2 and O(hkst)2 . It is also worthwhile to note that, when the desired system is other than an array, the order of magnitude of the count of components in the redundant system may be different from the normalized redundancy [3, Section 9].
7
Scaling Trends for Redundancy
In this section we bound, as a function of the number st of elements in the desired array, the element and switch redundancy of A(h, k; s, t) . To within a constant, that is, we determine optimum values h = h(s, t) , k = k(s, t) . In some cases we can determine h and k to arbitrary accuracy. Applying these discrete results to Lemma 1 yields the normalized redundancy and maximum wirelength. When manipulating (4), (5), and variants thereof it will be convenient to treat h and k as real numbers. In practice, of course, the counts of elements and switches are nonnegative integers. By considering model F(pε , pstuck open , pstuck closed ) and subsets thereof, our development shows how switches that are stuck closed pose perhaps the greatest challenge for configuration architectures.
7.1
Effect of Faulty Elements with Fault Free Switches
Let us derive the coverage and redundancy under the univariate model F(pε ) . Since switches are presumed to be good, we build the minimum number k = 1 in each element’svector. Substituting st pstuck open = 0 = pstuck closed into Corollary 2.1, the coverage reduces to Y = 1 − pεh (16) Algebraic manipulation of (16) explicates the element redundancy:
with the understanding that the actual value of h be
1
ln(1−Y st ) ln pε
h =
1 ln(1−Y st ) ln pε
(17)
. Figure 10 suggests that the value
of h is logarithmic in st . This is indeed the case, and is formalized by Theorem 3. 6.0
pε = 0.20
Element redundancy h
At coverage Y = 0.90 5.0
pε = 0.15
4.0
pε = 0.10
3.0
pε = 0.05
2.0
pε = 0.01
1.0
Number of elements in desired system = st 0.0 100
200
300
400
500
600
700
800
900
Figure 10: Redundancy in the presence of faulty elements, but with fault free switches. The actual number h(st) of elements per block equals the least integer not less than the value shown.
15
Theorem 3
Under model F(pε ) , for real c > 1 , Y = 1 − pεh 2 ln st − ln(− ln Y ) − ln c+1 − ln pε
> h >
st
< 1 , and integer st > − logc Y :
ln st − ln(− ln Y ) − ln pε
PROOF. When expanded about zero, the Taylor series of ln 1 − pεh the first term in this series to approximate at least
−pε2h 2(1−pεh )
ln Y st
"
> ln Y >
equals −
P∞
i=1
pεih i
. Use
. By [22, p 646], the remainder is at most zero and
. Therefore, −st pεh
(18)
−st pεh
pεh 1+ 2(1−pεh )
#
(19)
The lower bound of (18) follows directly from the lefthand iside of (19). For the upper bound h def pεh y d 1 h substitute pε = y and calculate the slope: dy 1 + 2(1−y) = 2(1−y) > 0 . Thus 2(1−p 2 h ) is ε
increasing with increasing pεh . The condition st > − logc Y allows us to bound the remainder. If pεh ≥
c−1 c
then (1 − pεh )st
1 , if s > 2 , t > 2 and (s − 2)(t − 2) > − logc Y , then the minimum value of k is min(1, h − 1) and the minimum value of h is bounded 2 ln st − ln[− ln Y ] − ln c+1 ln(s − 2)(t − 2) − ln[− ln Y ] > h > 4 − ln [1 − (1 − pε )(1 − pstuck open ) ] − ln [1 − (1 − pε )(1 − pstuck open )4 ]
(21)
PROOF. The configuration coverage is maximized when the block availability (20) is maximized, and this is achieved when k is minimized. If h = 1 then no interblock switching is necessary, and so the minimum value of k is 0 . If h > 1 then each element requires at least one switch in every direction, and so the minimum value of k is 1 . In what follows, and without loss of generality, we presume that Y < (1 − pε )st ; hence h > 1 and k = 1 . For the lower bound of (21), note that each of the (s − 2)(t − 2) blocks in the interior of A has four neighboring blocks, and that each such block must be available. The probability of this is given by (12). With m = 4 , that is, the configuration coverage is bounded above by expression (20) raised to the power (s − 2)(t − 2) . Similarly, the configuration coverage is bounded below by expression (20) raised to the power st , with m = 4 . Using Taylor’s expansion of the natural logarithm, we obtain bounds on ln Y :
(s − 2)(t − 2) 1 − (1 − pε )(1 − pstuck open )4
h
− ln Y
0. Thus, the righthand side of (22) is increasing with increasing [1 − (1 − pε )(1 − pstuck open )4 ]h . The condition (s − 2)(t − 2) > − logc Y allows us to bound the remainder. If [1−(1−pε )(1−pstuck open )4 ]h ≥
c−1 c
then the coverage is strictly less than
− logc Y 1 c
= Y . But this
4 h contradicts the ε )(1−pistuck open ) ] < h assumption i that the coverage is at least Y . Therefore [1−(1−p h y y c−1 < 1 + c−1 c . Since 1 + 2(1−y) is increasing with increasing y , it follows that 1 + 2(1−y) 2 .
h
Thus, the righthand side of (22) is less than lefthand side of (21).
c+1 2
i
st [1 − (1 − pε )(1 − pstuck open )4 ]h . Solve for the 2
The difference between the left and right sides of (21) approaches zero as st approaches infinity and c approaches one. Since the (real) value of h is sandwiched between these bounds, the asymptotic value of the element redundancy is the logarithm, to the base 1/[1 − (1 − pε )(1 − pstuck open )4 ] , st of − ln Y . The latter is essentially identical to (18), and underscores how the combined effect of faulty elements and stuck-open switches is equivalent to the effect of faulty elements only. The
17
equivalence is asymptotically exact as long as the availability of an element, in the bivariate case, equals the element failure probability, in the univariate case. If, in addition to faulty elements, our fault model admits stuck open switches, then the discrete redundancy is increased, but only by a constant factor log 1/[1−(1−pε )(1−pstuck open )4 ] (1/pε ) . By Lemma 1, under either model F(pε ) or model √ F(pε , pstuck open ) , the normalized redundancy of A is Θ(log st) , and the wirelength is O( log st) . Somewhat surprisingly, F(pε , pstuck open ) is the only combination of faulty elements and switches in our study where the order of magnitude of the redundancy is the same as for the univariate model F(pε ) . We provide details in the subsections that follow.
7.3
Effect of Switches Stuck Closed
Our proposed implementation A(h, k; s, t) may not be the only way to locally spare an s × t array. We can, however, establish conditions whereby the number of switches in A(h, k; s, t) is least possible among all implementations of locally spared arrays. The thrust of this subsection is to establish implementation-independent lower bounds on the switch redundancy in the presence of stuckclosed faults. To do this we begin with the somewhat contrived model F(pstuck closed ) ; i.e., both pε and pstuck open are zero. The model is contrived since the analysis allows values of h greater than one – something that a practical designer would not do when elements are known to be free of faults. Nevertheless, our treatment sets the stage for Subsection 7.4, wherein we exploit the the independence of faulty elements and switches stuck closed. In ensuing subsections we address the increase of h when pε or pstuck open are positive. The closed-switch adjacency of a subset of a configuration architecture is the multi-hypergraph (which may reduce to a graph [23]) that results from shorting together all the poles of every switch in the subset. Terminal nodes correspond to elements; branch points correspond to switches.
Lemma 2 Under model F(pstuck closed ) , for given configuration coverage 0 < Y < A(h, k; s, t) minimizes the count of switches in any locally spared s × t array.
1,
PROOF. Consider all elements b0;x,y , . . . bh−1;x,y in block bx,y and pick arbitrary element (say) b`;x+1,y from a neighboring block (say) bx+1,y . Any implementation of local sparing provides switching and routing between every pair of elements from neighboring blocks. The closed-switch adjacency of b0;x,y , . . . bh−1;x,y with b`;x+1,y therefore contains a tree whose root is at b`;x+1,y and whose leaves correspond to b0;x,y , . . . bh−1;x,y . Consider the branch point that separates paths from b`;x+1,y to any two leaves bq;x,y and br;x,y in this tree. That is, consider the point beyond which, in one direction, a wire runs through kq switches in the vector belonging to bq;x,y and, in the other direction, a wire runs through kr switches in the vector belonging to br;x,y . At most one of the adjacent vectors belonging to bq;x,y and br;x,y has all of its switches stuck closed, else two elements (bq;x,y and br;i,j ) from the same block are forced to be adjacent. The former is implied by but does not imply the existence of a fault cover. The probability that no two switch vectors leading to b`;x+1,y are stuck closed equals h−1 Y i=0
ki (1 − pstuck closed ) +
h−1 X
ki pstuck closed
i=0
Y j6=i
18
j (1 − pstuck closed )
k
(23)
The function pzstuck closed is log-linear, and therefore log-concave. That the function (1 − pz ) is logconcave follows by twice differentiating the Taylor series for ln(1 − pz ) with respect to z ; each i z P 2 (p ) is negative, hence the sum is negative. Since the sum or term of the latter − ∞ i=1 (i ln p) i product of log-concave functions is log-concave [24, Chapter 1], the lefthand side of expression (23) is log-concave. Similarly, the product under the summation on the righthand side of (23) is logPh−1 concave. For fixed pstuck closed and constant value of i=0 ki , Jensen’s inequality implies that (23) def is maximized when all the ki equal the average kb = this average:
b
1 − pkstuck closed
h
1 h
Ph−1 i=0
b
+ h pkstuck closed
ki [25]. Rewrite (23) with ki equal to
b
1 − pkstuck closed
h−1
(24)
Note that (24) is the sum of expressions (4) and (5), with kb = k, m = 1, and pε = 0 = pstuck open . Now strengthen the above argument to necessary and sufficient conditions for a fault cover when pε = 0 = pstuck open . That is, for fixed count of switches, the probability that a block bi,j with m neighbors has no series of kq switches stuck closed between bq;i,j and some branch point is maximized when the number of switches kq equals the average kb , and this is true for all blocks. For any block with m neighbors the probability that no element is stuck to a branch point is at most (1 − pbkstuck closed )mh . Similarly, the probability that exactly one element is istuck to branch points h on at least one but at most m sides cannot exceed h 1 − (1 − pbkstuck closed )m (1 − pbkstuck closed )m(h−1) . The availability of a block is therefore at most
b k
1 − pstuck closed
mh
b k
+ h 1 − (1 − pstuck closed )
b k
1 − pstuck closed
m
m(h−1)
(25)
Expression (25) is just the availability of a block of A(h, k; s, t) with kb = k and pε = 0 = pstuck open . It follows that, for integer values of kb , the coverage of A(h, k; s, t) is best possible among all implementations of local sparing. Under model F(pstuck closed ) , that is, and for fixed coverage 0 < Y < 1 , A(h, k; s, t) minimizes the overall count of switches. 2 Lemma 3 Under model F(pstuck closed ) , for given configuration coverage 0 < Y < 1 , any 1 × t array, locally spared with h elements per block, has k ∈ Ω(log ht) switches per element. PROOF. By Lemma 2 it suffices to show that k ∈ Ω(log ht) is necessary in order that condition (1) or (2) of Theorem 1 hold for every block in the interior of A(h, k; 1, t + 2) . That is, it suffices to prove that k ∈ Ω(log t) is necessary to maintain constant value for the tth power of sum of (4) and (5), with m = 2 , pε = 0 = pstuck open . The probability that (E1) a block has no switch vector stuck closed and at least one element available reduces to (1 − pkstuck closed )2h . The probability that (E2) a block has exactly one element whose switch vectors are stuck closed and that this element h
is available equals h 1 − (1 − pkstuck closed )2
Y =
i
1 − pkstuck closed
h
2(h−1)
. The coverage is bounded
(1 − pkstuck closed )2(h−1) (1 − pkstuck closed )2 + h 1 − [1 − pkstuck closed ]2
− log2 Y then we must have pkstuck closed < 12 , else the first factor (1 − pkstuck closed )2t(h−1) of the lefthand side of (26) is less than Y . Expand to one term the Taylor series of the logarithm of the righthand side of (27), bound the remainder, and get ln ht + ln c − ln(− ln Y ) − ln pstuck closed
2
< k
For s, t > 1 , the proof is similar to that of Lemma 3: replacing m = 2 by m = 4 , argue the availability of the interior blocks of a two-dimensional array. Corollary 4.1 Under model F(pstuck closed ) , for given configuration coverage 0 < Y < 1 , any s × t array, locally spared with h elements per block, has switch redundancy k ∈ Ω(log hst) .
7.4
Effect of Faulty Elements Combined with Switches Stuck Closed
The upper bound that matches Corollary 4.1 is a special case of the next theorem, proof of which makes use of the independence of faulty elements and switches stuck closed. Theorem 5 Under model F(pε , pstuck closed ) , for given configuration coverage 0 < Y < 1 , the redundancy of A(h, k; s, t) is [h ∈ Θ(log st)] · [k ∈ Θ(log st)] ⊆ Θ(log2 st) . PROOF. Under model F(pε ) , Theorem 3 says that the redundancy of A(h, k; s, t) is h ∈ Θ(log st) . By Corollary 4.1, therefore, it suffices to establish the bivariate upper bound for A(h, k; s, t) . It suffices to show that, h ∈ O(log st) , k ∈ O(log st) maintains coverage no less than the stth power of the sum of (4) and (5), at m = 4 . We accomplish this by considering only the first event E1 of Theorems 1 and 2: every block has at least one element available and no switch vector is stuck
4hst
st
closed. We can therefore work with an effective coverage Y = 1 − pkstuck closed 1 − phε , that bounds from below the coverage of A(h, k; s, t) . It suffices to give h ∈ O(log st) , k ∈ O(log st) such that, for arbitrary 0 < q ≤ 0.8 : Y 1−q =
1 − phε
st
Yq =
1 − pkstuck closed
4hst
(28)
That is, we exploit the fact that Y is the separable product of univariate functions of pstuck closed and pε ; the switch redundancy k depends only on the righthand side of (28). Pick any real c > 1 and integer st > (q − 1) logc Y . Use (28) to explicate the (real) values of h and k : ln(1 − Y h = ln pε
1−q st
)
2 ln st − ln([q − 1] ln Y ) − ln c+1 ≤ − ln pε
20
(29)
q
2 ln 4hst − ln(−q ln Y ) − ln c+1 ln(1 − Y 4hst ) k = ≤ ln pstuck closed − ln pstuck closed
(30)
The redundancy is therefore O (d he · dk e). Let us derive the righthand side of (29). When exP pεih panded about zero, the Taylor series of ln 1 − pεh equals − ∞ i=1 i . Use the first term in this series to approximate Therefore,
ln Y 1−q st
. By [22, p 646], the remainder is at most zero and at least "
(1 − q) ln Y > −st pεh def
Substitute pεh = y and calculate the slope:
d dy
pεh 1+ 2(1−pεh )
h
1+
y 2(1−y)
−pε2h 2(1−pεh )
.
#
i
=
(31) 1 2(1−y)2
> 0 . Thus
pεh 2(1−pεh )
is
increasing with increasing pεh . The condition st > (q − 1) log c Y allows us to bound the remainder. (q−1) logc Y 1 h )st < If pεh ≥ c−1 then (1 − p = Y 1−q . But this contradicts (28). Therefore ε c c h i pεh pεh h , it follows that 1 + pεh < c−1 . Since is increasing with increasing p < 1 + c−1 h h ε c 2 . 2(1−pε ) 2(1−pε ) h i c+1 h Thus the righthand side of (31) is greater than − 2 st pε . Solving gives inequality (29).
element redundancy h
A similar manipulation verifies the righthand side of (30), subject to assurance that st > (q − 1) log c Y implies 4hst > −q logc Y . To see this, suppose that st > (q −1) logc Y . Then, since h ≥ 1 , it suffices to consider whether 4st > −q logc Y . The latter holds if 4(1 − q) ≥ q ; that is, if q ≤ 0.8 . But 0 < q ≤ 0.8 by assumption. Therefore, both (29) and (30) hold whenever st > (q−1) logc Y and q ≤ 0.8 . By (29), h ∈ Θ(log st) . Since log(n log n) ∈ O(log n + log log n) ⊆ O(log n) , substituting (29) into (30) results in k ∈ Θ(log st) . Figure 11 illustrates. 2
( subset of ) feasible region at constant coverage Y
ln st−ln([q−1] ln Y )−ln − ln pε
ln
2 c+1
ln st−ln([q−1] ln Y )−ln − ln pε
2 c+1
switch redundancy k
st
− ln pstuck
−ln(−q ln Y )+ln 4−ln
2 c+1
closed
Figure 11: Feasible region of redundancy, A(h, k; s, t) under model F(pε , pstuck closed ) .
7.5
Effect of Switches Stuck Open and Stuck Closed
In the presence of switches stuck open and stuck closed we ask whether, in fact, local sparing is asymptotically feasible. This question arises from the conflicting nature of these two fault types.
21
As introduced in Section 2, the idea of a building a vector of k > 1 switches in series is to increase the probability that a fault free s × t array can be isolated from the remaining elements. However, by making k large we also increase the chance that some switch in any vector will fail stuck open; the effect is to reduce the chance that A(h, k; s, t) contains a fault cover. Under model F( pstuck closed , pstuck open ) or F( pε , pstuck open , pstuck closed ) , what is the minimum redundancy hk = h(s, t) · k(s, t) that can be achieved? This question is answered in part by the present subsection, and in part by subsection 7.6 Theorem 6 Under model F( pstuck closed , pstuck open ) , for given configuration coverage 0 < Y < 1 , the redundancy of A(h, k; s > 2, t > 2) is h ∈ Ω(st) , k ∈ Ω(log st). PROOF. Corollary 4.1 implies k ∈ Ω(log st) . That is, there exists positive constant c1 such that k ≥ c1 ln st for all st sufficiently large. It suffices to show that h ∈ Ω(st) is necessary in order that condition E1 or E2 of Theorems 1 and 2 hold for each of the interior blocks of s rows and t columns, under the hypothesis that the edge blocks are fault free. That is, it suffices to prove that h ∈ Ω(st) is necessary to maintain Y
≤ [ Pr (E1) + Pr (E2) ]st m=4
(32)
The value of (32) is increasing with decreasing pstuck closed or pε . In particular, Pr (E2) is zero when pstuck closed = 0 . It suffices to show that h ∈ Ω(st) is necessary in order that st Y ≤ [ Pr (E1) ]m=4,pε =0=pstuck closed . For all st sufficiently large:
Y ≤
h
1 − 1 − (1 − pstuck open )4c1 ln st
ih (s−2)(t−2)
(33)
Linearize the Taylor series of the logarithm of the righthand side of (33): − ln Y (s − 2)(t − 2)
≥
h
1 − (1 − pstuck open )4c1 ln st
ih
For any real c2 > 0 and st sufficiently large we must have (1 − pstuck open )4c1 ln st < bound from below the logarithm of the righthand side of (34):
(34) c2 −1 c2
; use this to
2 [ ln(s − 2)(t − 2) − ln(− ln Y ) ] ≤ (1 − pstuck open )4c1 ln st h(c2 + 1)
(35)
Take logarithms one more time and get ln h − ln(1 − pstuck open )
≥ 4c1 ln st +
ln[ ln(s − 2)(t − 2) − ln(− ln Y )] + ln c22+1 − ln(1 − pstuck open )
Thus log h ∈ Ω(log st) . That is, h ∈ Ω(st) .
2
The asymptotic bounds of Theorem 6 are unchanged in the case of one-dimensional arrays. In contrast to Corollary 4.1, however, Theorem 6 addresses the redundancy of A(h, k; s, t) only. The lower bounds may hold for an arbitrary implementation of locally spared arrays, but this remains to be proved. From the standpoint of problem complexity, that is, Theorem 6 is not as general as our characterization under models F(pε , pstuck open ) , F(pε , pstuck closed ) , or restrictions thereof.
22
7.6
Faulty Elements Combined with Switches Stuck Open and Closed
We now present our results for the most general fault model in our study. Theorem 7 Under model F( pε , pstuck open , pstuck closed ) , for given configuration coverage 0 < Y < 1 , if pstuck closed < (1 − pstuck open − pstuck closed )2 then the redundancy of A(h, k; 1, t) is [h ∈ Θ(t)] · [k ∈ Θ(log t)] ⊆ Θ(t log t) . PROOF. By Theorem 6 it suffices to show that h ∈ O(t) , k ∈ O(log t) maintains the configuration coverage of A(h, k; s, t) at a level no less than any fixed constant less than one. As with Theorem 5, we can work with an effective coverage Y ≤ [ Pr (E1) ]tm=2 that bounds from below the coverage of A(h, k; 1, t) . It suffices to give h ∈ O(t) , k ∈ O(log t) such that, for arbitrary 0 < q < 1 : Yq =
1 − pkstuck closed
Y
1−q
=
2ht
"
1 − 1 − (1 − pε )
(36)
(1−pstuck open )k −pkstuck 1−pkstuck closed
closed
2 #h t
(37)
As with Theorem 5, we establish conditions that enable us to bound the Taylor expansion of the natural logarithm. i) With 2 ≤ c ≤ Napier’s constant e , and for t ≥ − logc Y , the tth root of (36) is no less than 1c , else (36) times (37) is less than Y . Note that this implies pkstuck closed ≤ c−1 c . "
Similarly, ii) t ≥ − logc Y implies
1 − (1 − pε )
(1−pstuck open )k −pkstuck 1−pkstuck closed
2 #h
closed
≤
c−1 c
. In the
analogous situation with Theorem 5, we were able to exploit the independence of faulty elements and switches stuck closed, and simply made h large enough so that phε < c−1 c . In the present circumstance, however, (36) is increasing with increasing k but decreasing with increasing h . By contrast, (37) is decreasing with increasing k but increasing with increasing h . We must therefore first prove that it is possible to build A(h, k; 1, t) simultaneously satisfying conditions (i) and (ii). For (i) take the natural logarithm of each side of (1 − pkstuck closed )2h ≥ 1c and linearize the Taylor expansion of the lefthand side. Bounding the remainder, it suffices to build h and k satisfying c pkstuck closed 1+c ≤ ln 2 2h . That is, k+
ln h ln pstuck closed
≥
ln ln c − ln(1 + c) ln pstuck closed
(38)
Using Taylor’s expansion, condition (ii) can be written in logarithmic form: ∞ X 1 i=1
i
(1 − pε )
(1 − pstuck open )k − pkstuck closed 1 − pkstuck closed
!2 i
1 ≥ ln h
c c−1
(39)
It therefore suffices
(1−pstuck open )k −pkstuck 1−pkstuck closed
2 closed
≥ ( 1 − pstuck open − pstuck closed )2k ≥
1 h(1−pε )
ln
c c−1
(40)
Where for the lefthand relation we have replaced the numerator by the kth term in (6); in the denominator we have substituted 1 > 1 − pkstuck closed . Taking logarithms once again: ln(1 − pε ) + 2k ln(1 − pstuck open − pstuck closed ) ≥ ln ln
23
c c−1
− ln h
(41)
Rewrite the linear relations (38) and (41) in slope-intercept form: ln h ≤ − ln pstuck closed ·k + ln ln c − ln(1 + c) |
{z
}
|
positive slope
{z
(42)
}
negative intercept
ln h ≥ −2 ln(1 − pstuck open − pstuck closed ) ·k + ln ln |
{z
}
positive slope
|
c c−1
− ln(1 − pε ) {z
(43)
}
intercept
By hypothesis, (1 − pstuck open − pstuck closed )2 > pstuck closed ; hence the slope of is (42) greater than the slope of (43). Figure 12 is faithful to these observations: the intersection of half planes as prescribed by constraints (42) and (43) defines a wedge. Solve for the least k in the intersection of (42) and (43): k =
ln(1+c)+ln( 1−ln[1− 1c ] )−ln(1−pε ) 2 ln(1−pstuck open −pstuck closed )−ln pstuck closed
(44)
feasible region for bounds on Taylor expansion
feasible region for redundancy at constant coverage Y
log of element redundancy ln h
If desired, backsubstitute (44) into (42) or (43) in order to explicate the least ln h in the intersection of the inequalities. This extremum is the apex of our wedge, the interior of which satisfies conditions (i) and (ii). For the main result we consider real values of k and ln h within this wedge.
log-linear scale
identical slopes, inequalities (42) and (46)
intercept increasing in proportion to log log t, inequality (43)
inequalities (43) and (48), identical slopes
Eqn (39) intercept decreasing logarithmically in t, inequality (46)
Figure 12:
k
∝ log t ,
Eqn (49)
switch redundancy k
t ≥ logc Y , e ≥ c ≥ 2
Feasible region of redundancy, A(h, k; 1, t) under model F(pε , pstuck closed , pstuck open ) . The two-dimensional case is similar.
24
Use condition (i) to bound the Taylor series for the logarithm of the righthand side of (36), expanded to one term. It therefore suffices to demonstrate h ∈ O(t) and k ∈ O(log t) satisfying k+
ln h ln pstuck closed
ln(− ln Y ) − ln(1 + c) + ln q − ln t ln pstuck closed
≥
(45)
which may be expressed in slope-intercept form: ln h ≤ − ln pstuck closed · k + ln(− ln Y ) − ln(1 + c) + ln q − ln t
(46)
Use condition (ii) to bound the remainder of the the Taylor series for the logarithm of the righthand side (37), expanded to one term. Replace the numerator by the kth term in (6); for the denominator substitute 1 > 1 − pkstuck closed . It suffices to produce h ∈ O(t) and k ∈ O(log t) for which h(1 − pε ) ( 1 − pstuck open − pstuck closed )2k ≥ ln t + ln(1 + c) − ln 2 − ln(1 − q) − ln(− ln Y )
(47)
Once again take logarithms, and express the result in slope-intercept form: ln h ≥ −2 ln(1 − pstuck open − pstuck closed ) · k − ln(1 − pε )
(48)
+ ln [ ln t + ln(1 + c) − ln 2 − ln(1 − q) − ln(− ln Y ) ] For t ≥ − logc Y the intercept of (46) is less than the intercept of (42). Since e ≥ c ≥ 2 , the intercept of (48) is greater than the intercept of (43) whenever t ≥ − logc Y . The slope of (46) is identical to that of (42). The slope of (48) is the same as the slope of (42). Figure 12 is faithful to these observations: the intersection of the two half planes as prescribed by the constraints above defines a wedge that falls strictly within the feasible region for conditions (i) and (ii). At the apex of the wedge, the (real) value of k takes on its minimum kmin in the feasible region comprising the intersection of (46) and (48): kmin =
ln t + ln[ ln t + ln(1 + c) − ln 2 − ln(1−q) − ln(− ln Y ) ] − ln(1 − pε ) − ln(− ln Y ) + ln(1 + c) − ln q 2 ln(1 − pstuck open − pstuck closed ) − ln pstuck closed
(49)
If desired, backsubstitute (49) into (46) or (48) in order to explicate the value of ln h at the apex of the wedge; i.e., the least value of ln hmin in the intersection of (46) and (48). The product ln hmin · kmin equals the minimum real redundancy in the feasible region. To get integer values for k and h requires a bit more work. If the apex of the feasible region does not correspond to integer values of k and h then translate the intersection of (46) and (48) to the origin: the slopes but not the intercepts make a difference. Let x and ln y be the abscissa and ordinate under this transformation, and consider the corresponding exponentials:
y ≤
1
x
pstuck open
y ≥
1 (1 − pstuck open − pstuck closed )2
x
(50)
If x is no less than one then there is an integer within distance x from the origin along the abscissa. We seek simultaneously to assure that the difference between the exponentials in (50) is at least one. Write this as
1 [1 − pstuck open − pstuck closed ]2
x "
[1 − pstuck open − pstuck closed ]2 pstuck open
25
!x
#
−1
≥ 1
(51)
If x ≥ 1 then (51) holds whenever
1 (1 − pstuck open − pstuck closed )2
x "
(1 − pstuck open − pstuck closed )2 − pstuck closed pstuck open
#
≥ 1
(52)
If the right factor on the lefthand side of (52) is no less than 1 then x ≥ 1 implies that the difference between the exponentials in (50) is greater than one. Pick the least integer k that occurs horizontal distance at least 1 (but less than distance 2) from the apex of the feasible region. Within positive vertical distance 1 of the lower exponential at this point there is an integer value of h that lies in the feasible region. If the right factor on the lefthand side of (52) is less than 1 then solve for the horizontal distance x by taking logarithms:
x =
ln
pstuck open 2 open −pstuck closed ) −pstuck 1 (1−pstuck open −pstuck closed )2
(1−pstuck
ln
closed
(53)
Pick the least integer k that occurs horizontal distance at least x as given by (53) (but less than distance x + 1) from the apex of the feasible region. Within positive vertical distance 1 of the lower exponential at this point there is an integer value of h that lies in the feasible region. The procedure above assures the existence of feasible integers (h, k) within a constant distance of the apex. By (49), a feasible integer value of k near the apex is O(log t) . The value of ln h is proportional to k ; hence a feasible integer value of h near the apex is O(t) . The integer values are the same order of magnitude as the respective coordinates at the apex. Thus h ∈ O(t) and k ∈ O(log t) . 2 The bivariate redundancy of a locally spared two-dimensional array is similar to the one-dimensional case. The ensuing inequalities are changed, but only by a constant: Corollary 7.1 Under model F( pε , pstuck open , pstuck closed ) , for given coverage 0 < Y < 1 , if pstuck closed < (1 − pstuck open − pstuck closed )4 then the redundancy of A(h, k; s, t) is [h ∈ Θ(st)] · [k ∈ Θ(log st)] ⊆ Θ(st log st) .
8
Conclusion
In this section we interpret our results, and point to directions where further investigation may be profitable. By the remarks at the end of Section 5, the (h, k)-redundancy calculated in Section 7 pertains not only to an iid model of faults, but to any distribution whose configuration coverage is bounded by log-linear functions. As Figure 5 illustrates, this includes our clustered fault model ± ± F( p± ε , pstuck open , pstuck closed ) and subsets thereof. Under the latter, we have a range of redundancy calculations, but the order of magnitude is unchanged. Thus, Table 1 applies to faults whose distribution is either iid or clustered. If faults occur in elements only then the redundancy h of locally spared arrays is logarithmic in the number st of elements in the array desired. A non-negligible fraction of switches stuck open acts to reduce the availability of an element, but the order of h remains logarithmic in st . If, in addition, we allow for stuck-closed faults then the number of switches per element k scales with the logarithm of the total number of elements. Moreover, in the presence of switches that may be
26
p± ε > 0? yes
p± stuck
closed
no
> 0?
p± stuck
open
no
> 0?
redundancy
maximum wirelength
Θ(log st)
yes
no
yes
Θ(log st)
yes
yes
no
Θ(log st)2
yes
yes
yes
Θ(st log st)
reference
optimal?
Θ(log st) 2
Thm 3
yes
1 2
Thm 4
yes
Θ(log st)
Thm 5
yes
Thm 7, Cor 7.1
don’t know
1
Θ(log st)
Θ(st log st)
1 2
Table 1: Performance of A(h, k; s, t) as function of fault model. The maximum wirelength is for a two-dimensional implementation. The righthand column pertains to local sparing only. stuck open or stuck closed, the element redundancy of A(h, k; s, t) jumps to a quantity that is linear in the number of elements in the array desired. From these observations we can conclude: i) the impact of faulty switches is significant; ii) switches that are stuck closed tend to be more damaging that switches that are stuck open and, in the large, more problematic than faulty elements. In addition to redundancy, configuration architectures may be assessed according to at least one other figure of merit: maximum wirelength. For many technologies, the speed of a systolic array decreases as the square of the length of the longest wire. The results of Section 7 enable us to characterize the maximum wirelength as a function of our probabilistic fault model. Refer again to √ Table 1. If faults occur in elements only then the wirelength is O( log st) . The order of magnitude of the maximum wirelength remains unchanged if we couple faulty elements with switches stuck open. In the presence of faulty elements and switches stuck closed, however, the overall area of A(h, k; s, t) is proportional to st(log st)2 . In consequence, our bound on wirelength increases to O(log st) . Under our most general fault model F( pε , pstuck open , pstuck closed ) , the overall p area is proportional to (st)2 log st , and our upper bound on maximum wirelength jumps to O(st log(st)) . In each of the preceding cases a matching lower bound is provided by noting that at least one wire is no shorter than the average wirelength Ω(hk)1/2 . Several issues remain open. Using numerical solutions to recurrence relations, LaForge [16] provides evidence that the switching matrices of A(h, k; s, t) provide sufficent intrinsic redundancy so that the effect of random breaks in wires is neglible. However, this falls short of closed form expressions, analogous to those of Corollary 2.1, which account for broken wires. Note that the lower bound of Theorem 6 is independent of pstuck closed < (1−pstuck open −pstuck closed )2 , a condition used in Theorem 7 to obtain a matching upper bound. This condition stipulates that the likelihood of a switch being stuck closed is considerably less than the chance that it is good, and is reasonable over a realistic range of failure probabilities. For example, if a switch is equally likely to be stuck open or closed then the condition is satisfied as long as pstuck open = pstuck closed < 0.25 . The condition stems from our sufficiency argument for (41) and (47), wherein the configuration uses only switches that are good. We do not know if this condition is essential to the asymptotic feasibility of locally spared arrays. Nor do we know if, under our most general fault model F( pε , pstuck open , pstuck closed ) , the redundancy (h, k) ∈ O(st; log st) is best possible over all implementations of local sparing. Even if the above-mentioned questions are definitively answered, it remains to work out the probabilisitic feasible regions of tolerance or waste for locally spared arrays. With respect to our fundamental equation (3), that is, more work remains to be done. Yet another challenge would be to extend Leighton and Leiserson’s work for discretionary wiring to include faulty switches in the presence of clustering [17]. Finally, it remains to characterize local sparing for regular structures, such as hypercubes and H-trees, in the presence of faulty switches and clustering.
27
References [1] L. E. LaForge. “How to Lay Out Arrays Spared by Rows and Columns”. Proceedings, IEEE International Conference on Innovative Systems and Silicon. San Francisco: IEEE Computer Society, October, 1997. [2] I. Koren and D. K. Pradhan. “Yield and Performance Enhancement in VLSI and WSI Multiprocessor Systems.” Proceedings of the IEEE . Vol 74, No 5, May, 1986. pp 699–711. [3] L. E. LaForge. “What Designers of Wafer Scale Systems Should Know About Local Sparing.” Proceedings, IEEE International Conference on Wafer Scale Integration. San Francisco: IEEE Computer Society, January, 1994. pp 106–131. [4] G. H. Chapman, L. Carr, M. J. Syrzycki, and B. Dufort. “Test Vehicle for a Wafer-Scale Thermal Pixel Scene Simulator.” In Proceedings, International Conference on Wafer Scale Integration. IEEE Computer Society. San Francisco. January, 1994. pp 1–10. [5] M. B. Ketchen. “Point-Defect Model for Wafer Scale Integration.” IEEE Circuits and Devices Magazine. July, 1987. pp 24–34. [6] Y-Y Chen and S. J. Upadhyaya. “Reliability, Reconfiguration, and Spare Allocation Issues in Binary-Tree Architectures Based on Multiple-Level Redundancy.” IEEE Transactions on Computers. Vol 42, No 6, June, 1993. pp 713–723. [7] Y. U. Chen, C. H. Cheng, and Y. C. Chou. “An Effective Reconfiguration Process for Fault Tolerant VLSI/WSI Array Processors.” In Lecture Notes in Computer Science: Dependable Computing EDDC-1 . Berlin: Springer-Verlag. pp 421–438. [8] L. E. LaForge. “Feasible Regions Quantify the Configuration Power of Arrays with Multiple Fault Types.” In Lecture Notes in Computer Science: Dependable Computing EDDC-1 . Berlin: Springer-Verlag. pp 453–469. [9] L. E. LaForge. “Feasible Regions Quantify the Probabilitistic Configuration Power of Arrays with Multiple Fault Types.” Proceedings, IEEE International Conference on Innovative Systems in Silicon. San Francisco: IEEE Computer Society, October, 1996. pp 299–311. [10] K. Huang, V. K. Agarwal, L. E. LaForge, and K. Thulasiraman. “A Diagnosis Algorithm for Constant Degree Structures and Its Application to VLSI Circuit Testing.” IEEE Transactions on Parallel and Distributed Systems. Vol 6, No 4, April, 1995. pp 363–372. [11] S. Y. Kuo and W. K. Fuchs. “Efficient Spare Allocation for Reconfigurable Arrays.” IEEE Design and Test. February, 1987. pp 24–31. [12] I. Koren, Z. Koren, and C. H. Stapper. “A Unified Negative-Binomial Distribution for Yield Analysis of Defect-Tolerant Circuits.” IEEE Transactions on Computers. Vol 42, No 6, June, 1993. pp 724–733. [13] B. T. Murphy. “Cost-Size Optima of Monolithic Integrated Circuits.” Proceedings of the IEEE . Vol 52, December, 1964. pp 1537–1545. [14] I. Koren and Z. Koren. “The Impact of Floorplanning on the Yield of Fault Tolerant ICs.” Proceedings, IEEE International Conference on Wafer Scale Integration. San Francisco: IEEE Computer Society, January, 1995. pp 329–338.
28
[15] A. Jain and J. Rajski. “Probabilistic Analysis of Yield and Area Utilization.” In Proceedings, International Workshop on Defect and Fault Tolerance in VLSI Systems. IEEE Computer Society. October, 1988. pp 7.1.1–12. [16] L. E. LaForge. Fault Tolerant Arrays. PhD dissertation. Montreal: McGill University, 1991. [17] T. Leighton and C. E. Leiserson. “Wafer-scale Integration of Systolic Arrays.” IEEE Transactions on Computers. Vol C-34, No 5, May, 1985. pp 448–461. [18] M. H. DeGroot. Probability and Statistics. Reading, Ma: Addison-Wesley Publishing. 1975. [19] L. E. LaForge. “Extremally Fault Tolerant Arrays.” Proceedings, IEEE International Conference on Wafer Scale Integration. San Francisco: IEEE Computer Society, January, 1989. [20] J. D. Ullman. Computational Aspects of VLSI . Rockville, Md: Computer Science Press, 1984. [21] Intel Memory Products 1992 . Mt. Prospect, Illinois. Intel Corporation. 1991. [22] G. B. Thomas. Calculus and Analytic Geometry. Fourth edition. Reading, Ma: AddisonWesley Publishing. 1969. [23] C. Berge. Graphs and Hypergraphs. Amsterdam: North Holland Publishing. 1973. [24] E. Artin. The Gamma Function. Michael Butler, trans. New York: Holt, Rinehart, and Winston, Inc, 1964. Original monograph “Einf¨ uhrung in die Theorie der Gammafunktion” appeared in Hamburger Mathematische Einzelschriften. Leipzig: Verlag B. G. Teubner, 1931. [25] D. S. Mitrinovi´c. Analytic Inequalities. Berlin: Springer-Verlag, 1970.
29