Yield prediction via spatial modeling of clustered defect counts across ...

13 downloads 5231 Views 174KB Size Report
E-mail: jungyoon.hwang@samsung.com. 3Department of Electrical and Computer Engineering, University of Tennessee, Knoxville, TN ... Because the price.
IIE Transactions (2007) 39, 1073–1083 C “IIE” Copyright  ISSN: 0740-817X print / 1545-8830 online DOI: 10.1080/07408170701275335

Yield prediction via spatial modeling of clustered defect counts across a wafer map SUK JOO BAE1,∗ , JUNG YOON HWANG2 and WAY KUO3 1

Department of Industrial Engineering, Hanyang University, Seoul, Korea E-mail: [email protected] 2 System Engineering Team, Memory Division, Samsung Electronics Corporation, Hwasung City, Korea E-mail: [email protected] 3 Department of Electrical and Computer Engineering, University of Tennessee, Knoxville, TN 37996, USA E-mail: [email protected] Received November 2005 and accepted December 2006

In this paper we propose spatial modeling approaches for clustered defects observed using an Integrated Circuit (IC) wafer map. We use the spatial location of each IC chip on the wafer as a covariate for the corresponding defect count listed in the wafer map. Our models are based on a Poisson regression, a negative binomial regression, and Zero-Inflated Poisson (ZIP) regression. Analysis results indicate that yield prediction can be greatly improved by capturing the spatial distribution of defects across the wafer map. In particular, the ZIP model with spatial covariates shows considerable promise as a yield model since it additionally models zero-defective chips. The modeling procedures are tested using a practical example. Keywords: Generalized linear models, negative binomial regression, spatial clustering, wafer map, yield, zero-inflated Poisson regression

1. Introduction Integrated circuit (IC) processes and production equipment have undergone tremendous changes over time, fostering rapid technological advances throughout the industry. The semiconductor industry has been able to double the number of transistors on a wafer every 2 years. Because the price of semiconductor products declines rapidly over the lifecycle of a technology, the ability to increase yield quickly after the introduction of new technology is fundamental to earning high revenues. As a performance measure of a manufacturing process, yield information has been widely used as a key index of profitability for business. Yield is defined as the proportion of the total number of items at the end of production that are useable compared to the total number of potentially usable items at the beginning of production. Advances in semiconductor technology result in design geometries shrinking continuously, making physical failure analysis more difficult and a reactive approach prohibitively slow. To secure high revenues in such a challenging environment, the need for accurate yield prediction in order to evaluate productivity and estimate production costs is essential. ∗

Corresponding author

C 2007 “IIE” 0740-817X 

ICs on a wafer are highly vulnerable to defects generated during the numerous steps involved in the manufacturing process. When a defect is located in a defect-sensitive area (or critical area), it is called a fatal defect or fault and it causes an IC to fail (see Kuo et al. (1998)). Semiconductor manufacturers monitor the production processes using database systems such as a wafer map which provides detailed information about defects on the wafer. In general, (fatal) defective IC chips occur close to one another on a semiconductor wafer. Similarly, defect-free chips are found adjacent to one another. Models that ignore this clustered defect pattern may risk grossly underestimating the true yield. For example, a classical Poisson model underestimates the true yield by assuming that the defects are homogeneously distributed on the wafer. To illustrate the effect of clustered defects, Stapper et al. (1983) introduced a compound Poisson distribution for the number of defects N that has the form of  ∞ k λ −λ e p(λ)dλ, k = 0, 1, 2 . . . , (1) P(N = k) = k! 0 with a gamma distributed p(λ); that is, p(λ) = λa−1 e−λ/b ((a)ba )−1 , where a is a shape (or cluster) parameter and b is a scale parameter. It follows that the resulting distribution is a negative binomial distribution and that the yield can be obtained by calculating the

1074

Bae et al.

probability from this negative binomial distribution. In the negative binomial model, the cluster parameter a determines the degree of clustering in the model. When a → ∞, representing the complete absence of clustering, the negative binomial becomes a Poisson distribution. For a = 1, the negative binomial yield model is equivalent to Seed’s model (Seed, 1967) and Murphy’s model for a = 4.2 (Murphy, 1964). While the negative binomial model has been widely used as an IC yield prediction method, it exhibits some limitations in: (i) incorporating a number of different defect sources with various spatial patterns into an overall clustering parameter (a); and (ii) describing the spatial arrangement of defects caused by their tendency to cluster at specific locations on a wafer. Furthermore, the cluster parameter a depends strongly on the IC chip size fabricated on a wafer. In attempts to overcome these limitations, Koren et al. (1993) suggested the use of a unified negative binomial model to incorporate various defects sources and Tyagi and Bayoumi (1994) considered a generalized Poisson distribution to model clustered defects. However, these two models fail to explicitly capture the spatial characteristics of defects. In IC manufacturing, for example, defective IC chips tend to aggregate around the periphery of the wafer. Yield models that do not consider this kind of spatial defects pattern can lead to flawed yield predictions. In this article, we propose spatial modeling approaches for clustered defects based on a Poisson regression, negative binomial regression, and Zero-Inflated Poisson (ZIP) regression. Our main idea is to use the spatial location of each IC chip on the wafer as a covariate for the defect count that is listed in a wafer map. Through the analysis of a practical example, we show that yield prediction can be substantially improved by incorporating spatial information about defects into the models.

2. Spatial modeling of defects on a wafer map In semiconductor manufacturing, a wafer map provides valuable information about defects scattered on the wafer, i.e., location, size and type of defects. The spatial features of defects extracted from a wafer map can improve wafer yield prediction when they are incorporated into a regression model as covariates. Suppose that m integrated circuits are produced per wafer and Ni defects are observed within mutually exclusive chip region Ai for i = 1, . . . , m. Then, the expected defect counts in the ith chip region can be modeled as E[Ni ] ≡ i = η(f(xi ) β),

(2)

where f(xi ) denotes the (p × 1) covariate vector evaluated at ith chip location xi , and β is the (p × 1) unknown coefficients vector including an intercept term. The number of

parameters (p) should be less than the number of chips (m) to be estimable. η(·) represents a link function to variables that are functions of a location. The covariates may include Cartesian coordinates of the chip centers xi = (xi , yi ), or polar coordinates xi = (ri , θi ), measured from the center of the wafer which is taken to be the origin. In the model (2), dominant effects owing to defect clustering are expressed via spatial variables f(x). For example, radial-angular nonuniformity in defect density can be described with polar coordinates. In semiconductor manufacturing, it has been observed that in fact defects on a wafer tend to form clusters rather than randomly distribute over the wafer. Defects are more likely to occur near the periphery of IC wafers. This phenomenon is largely the result of nonuniform diffusion and pattern distortion at the periphery owing to the high thermal and mechanical stresses that are concentrated at the edge of a wafer. The angular variation in defect density occurs mainly during high temperature heat treatments. Thermal gradients generated by variations in gas flow during the thermal processing result in angular defect variations over the wafer. To describe this phenomenon, Gupta et al. (1974) introduced a defect density distribution as the function of radial and angular distance. However, their model is based largely on empirical results from a specific size of wafers, and thus needs generalization. In this paper, we apply a modeling approach first used for the evaluation of hazard exposure risk from a pollution source in spatial epidemiology to the problem of describing the radial-angular variation in defect density. Lawson (2001) provided an overview of statistical methods available in the field of spatial epidemiology. Generally, it has been proven that many respiratory, skin and genetic diseases are directly related to environmental pollution sources such as chemical reprocessing plants or solid waste incinerators. The phenomena of raised disease incidence near the source, or directional preference related to a dominant wind direction were fully explored by Besag and Newell (1991) and Lawson (1993). Based on the empirical density distribution in Gupta et al. (1974) and the modeling approach in Lawson (2001), we choose the functional form of η(·) as η = exp(f(xi ) β) to model the radial and angular variation in the defect density. Here, xi = (ri , θi ) are the polar coordinates of ith chip center relative to the wafer center. Each row of f(x) consists of a selection of the variables: {r, cos φ, sin φ, r cosφ, r sinφ}. The first three variables represent the distance (r ) and directional (cos φ, sin φ) effects and the other variables are distance-directional correlation effects. The spatial defect pattern can be efficiently modeled by selecting either one or a combination of these variables. For example, frequently occuring defects around the edge of a wafer can be illustrated through the inclusion of r only and the modeling of defects clustered at a specific location can be facilitated by combining r and {cos φ, sin φ}, or {r cosφ, r sinφ}.

1075

Yield prediction via spatial modeling 2.1. Poisson regression model The spatial distribution of defects across a wafer can be regarded as a nonhomogeneous random pattern of points, and in fact as a realization of a NonHomogeneous Poisson Process (NHPP). Spatial NHPP models are extensively used in the fields of astronomy, epidemiology, and biology to model clustering patterns of events (i.e., observed galaxies, disease occurrences, habitats of trees or animals) within the region of interest (see Cressie (1993, ch. 8) for details). Spatial NHPP models assume that the number of events observed per unit area depends on the spatial coordinates x. Using the spatial NHPP approach to model the number of defective chips on a wafer allows the expected number of defects in the ith chip region Ai (∈ R2 ) with area |Ai | to be written as  λ(x)dx, i = 1, . . . , m, (3) i = Ai

where λ(x) denotes an intensity function for a defect location {x : x ∈ A} that is defined as   E[N(dx)] λ(x) = lim , |dx|→0 |dx| for dx, which is an infinitesimal region containing the location x. When λ(x) is constant over R2 as λ(x) ≡ λ, the mean number of defects in Equation (2) is equal to λ|A|, denoting a homogeneous Poisson process. For the case of semiconductor defects, the NHPP model has a great flexibility in that the intensity function λ(x) allows the inclusion of explanatory variables which take account of spatial variations in the defect density as λ(x) = exp(f(x) β). In general, because IC chips on a wafer are manufactured in the same size (as in the example introduced later), the integration in Equation (3) can be done independently over the chip area. Accordingly, the mean defect count for the ith chip can be approximated by i ≈ exp(f(xi ) β).

(4)

The parameters β can be estimated using the maximum likelihood (ML) method which maximizes the log-likelihood: m  −i ni   e i log L(n; β) ≡ l(n; β) = log ni ! i=1 m  {ni f(xi ) β − exp(f(xi ) β) − log(ni !)}. (5) = i=1

Maximum likelihood estimates (MLEs) of β in the loglinear model can be derived following a generalized linear model (GLM) approach as in McCullagh and Nelder (1989). Sometimes, extraneous Poisson variation (or overdispersion) occurs when the variance of the response exceeds the Poisson variance . Overdispersion can arise in a number of different ways. The simplest and most common mechanism is clustering in a population. If the precise mechanism

that produces the overdispersion is known, specific methods may be used. In the absence of such knowledge, it is convenient to assume a relationship between the variance and the mean of the form Var(Ni ) = ζ i , where the constant ζ (>1.0) is called a “dispersion parameter”. We can estimate the regression parameters β and ζ jointly via a quasi-likelihood approach. The quasi-likelihood approach allows the estimation of parameters and inferential testing without full knowledge of the probability distribution of the data (Wedderburn, 1974; McCullagh and Nelder, 1989). The introduction of the dispersion parameter does not introduce a new probability distribution, it simply gives a correction term for testing the parameter estimates under the Poisson model. The models are fit in the usual way, and the parameter estimates are not affected by the value of ζ , but the standard errors of the estimated parameters are inflated by this factor. McCullagh and Nelder (1989) suggested that the dispersion parameter ζ be estimated as a ratio of the deviance or the Pearson’s χ 2 statistic:  m  ˆ i )2 χ2 (ni −  ˆζ = = (m − p). (6) ˆi m−p  i=1 See McCullagh and Nelder (1989, ch. 6.2.3) for further details about the estimation of the dispersion parameter. 2.2. Negative binomial regression model Defective chips on a wafer tend to display substantial deviation from the Poisson model since they commonly occur in clusters or display systematic patterns. As an alternative to the Poisson model, a negative binomial distribution has been widely accepted as a semiconductor yield model that accommodates non-Poisson variation (Stapper, 1973, 1989). Yield prediction might be further improved if spatial covariates are included in the negative binomial model. Suppose that the distribution of N given ρ and x is Poisson with mean ρ(x) and ρ is distributed as gamma, then the marginal distribution of N is the negative binomial. It is well known that negative binomial models are much more flexible than Poisson models in their ability to accommodate overdispersion. As an analogous model to handle overdispersion, the negative binomial regression model considered here is of the form: P(Ni = k|xi ) =

(ζ (xi ))k (k + 1/ζ ) × , (k + 1)(1/ζ ) (1 + ζ (xi ))k+1/ζ (7)

for k = 0, 1, 2, . . . and i = 1, 2, . . . , m, where ζ > 0, is the inverse of the cluster parameter a in the gamma probability density function p(λ) in Equation (1). This is referred to as a dispersion parameter in Lawless (1987). The mean and variance of N = (N1 , . . . , Nm ) given spatial locations x = (x1 , . . . , xm ) are E(N|x) = (x) and

Var(N|x) = (x) + ζ (x)2 , (8)

1076

Bae et al.

respectively. Additionally assuming a log-linear model with respect to (x) as in Equation (4), the yield for the ith chip is P(Ni = 0|xi ) = (1 + ζ (xi ))−1/ζ = (1 + ζ exp(f(xi ) β))−1/ζ . (9) By incorporating the spatial locations of defective chips across the wafer as covariates, the above yield model has a greater potential to accurately estimate the true yield than the yield models in Stapper et al. (1983) and Tyagi and Bayoumi (1994). The parameters ζ and β can be estimated via the ML or quasi-likelihood (QL) method following a GLM approach (McCullagh and Nelder, 1989, ch. 9). MLE for the negative binomial model is relatively straightforward as for fixed values of ζ the distribution is in the exponential family and estimates for regression parameters β can be obtained by maximizing the log-likelihood: m   l(n; β, ζ ) = ni f(xi ) β − (ni + 1/ζ ) i=1



(k + 1/ζ ) × log(1 + ζ exp(f(xi ) β)) + log (k + 1)(1/ζ ) 

 ,(10)

using the standard Iteratively Re-weighted Least-Squares ˜ ) and (IRLS) algorithm for GLMs. This gives estimates β(ζ ˜ ), ζ ). To estimate ζ the Newtonthe profile likelihood l(β(ζ ˜ ) and ζ , Raphson algorithm is iteratively used between β(ζ ˆ ˆ finally obtaining the joint MLEs (β, ζ ). Asymptotic properties of ML estimates (MLEs) for (β, ζ ) and inference procedures based on them are fully examined by Lawless (1987).

It can be easily shown that the mean and the variance of Ni in the ZIP model are E(Ni ) = (1 − pi )i ≡ µi , Var(Ni ) = (1 − pi )i + pi (1 − pi )i2   pi ≡ µi + µi2 . 1 − pi

(12)

Let Λ = (1 , . . . , m ) and p = (p1 , . . . , pm ) . When the ZIP regression model is applied to spatially distributed defects on the wafer, log(Λ) and logit (p) = log(p/(1 − p)) are assumed to be linear functions of the spatial covariates: log(Λ) = f(x) β

and

logit (p) = g(x) γ,

(13)

where f(x) and g(x) are the covariates matrices evaluated at x, β and γ are the (q × 1) unknown parameter vectors; therefore, the number of parameters in the ZIP model is 2q. Note that the log function with respect to Λ is based on the model (4) and the logit function is the commonly used link function for p. In some applications, f(x) and g(x) may coincide, in which case more parsimonious models can be developed by relating the two linear predictors in some way. Within a Bayesian framework an ecological application of the ZIP regression with spatial covariates was given in Agarwal et al. (2002). The ZIP model (11) can be fit using the ML method via the EM algorithm, the convergence of the EM algorithm is discussed in Lambert (1992). The details including statistical inference and asymptotic distribution of MLEs are summarized in the Appendix.

3. Practical example 2.3. The ZIP regression model As a result of defect clustering, IC chips without defects are commonly observed in the process of wafer manufacturing. The modeling of zero-defect chips is crucial to accurately predict the true yield and then use it as a key indicator of production quality. The Poisson regression or the negative binomial regression model may not adequately account for clustered defect patterns that contain significant numbers of zero-defect chips. As another modeling approach for this situation, the ZIP regression proposed by Lambert (1992) is introduced. For the ZIP distribution, the response Ni , i = 1, . . . , m has the following probability mass function (pmf) form: P(Ni = 0) = pi + (1 − pi )exp(−i ), exp(−i )ik P(Ni = k) = (1 − pi ) , k = 1, 2, . . . , k!

(11)

for 0 ≤ pi ≤ 1. With pi = 0, the ZIP model is reduced to the (inhomogeneous) Poisson distribution. The ZIP model is very useful to model discrete data with more zeros (called “inflated zeros”) than the Poisson PMF can accommodate. Its extension to multivariate cases is found in Li et al. (1999).

Tyagi and Bayoumi (1994) analyzed spatial defect patterns on three IC wafer maps. Using wafer maps with uniquely different defect patterns, they illustrated several descriptive measures to describe defect clustering; these measures include the variance-to-mean ratio, turning points, mean crowding, and patchiness. However, they concentrated only on figuring out the degree of defect clustering in terms of chip areas rather than modeling the spatial pattern of the defects. Assuming that all defects are critical and cause failures, we analyze only one of the three wafer maps (given in Fig. 1) as its analysis procedure is similarly applicable to the others. In Fig. 1, the center of the wafer is marked as × and angular start line (φ = 0) is represented with a dotted line. The number in the equally sized 20 × 20 squares (IC chips) represents the defect count, showing the spatial clustering patterns of defects according to radial-angular distance. Figure 1 also shows that a large number of zerodefect chips aggregate at specific locations on the wafer. The spatial patterns are mainly the result of the fact that an uneven temperature distribution during the rapid thermal annealing process can produce a ring of defective chips

1077

Yield prediction via spatial modeling

1

1

3

2

1

2

1

3

1

3

4

1

1

1

4

2

5 12 1

1

9

8

2

1

4

5

1

2

4

11 10 1

1

3

6

2

1

3

2

7

6

1

1

1

1

1

1

4

3

1

3

1

1

4

7

2

1

5

8

6

6 7

3

1

2

8

2

5

4

5

1

1

1

1

1

6

4

1

2

3

1

3

Fig. 1. Defect count data on a wafer map (Tyagi and Bayoumi, 1994).

around the edge of the wafer, and excess variation in a manufacturing machine can cause all the chips in a contiguous region of a wafer to be defective (Hansen et al., 1997). Taking these spatial features of the defects into consideration, we pinpoint the spatial location of each chip on the wafer in such a way that an individual chip’s radial distance (r ) and angle (θ ) from the center of the wafer are calculated. Then values or functions of the (r, θ) are incorporated as covariates into several yield models: Poisson, negative binomial, and ZIP regression.

Table 1 shows the results obtained using the SAS GENMOD procedure (Anon, 1999) for the model parameters. The parameter estimates are obtained after fixing the dispersion parameter at one in order to model general Poisson regression without overdispersion. The estimates are not affected by the value of ζ and for a fixed value of ζ , the ML estimates are identical to the QL estimates and WLS estimates. The Wald statistic and its p–value are given in the last two columns of this table. The Wald statistic tests a null hypothesis H0 : βi = 0 and for a MLE βˆ i , it is defined in Equation (19). Here, the variance of Table 1. GLM analysis of the Poisson regression model

3.1. Poisson regression analysis First, a Poisson regression model is fitted to the count of the number of defects. Based on Equation (4), a log-link function is considered for the mean of the defect count and f(xi ) includes the radial distance, the angular direction, and distance-directional correlation effects of the defects. The MLEs of the parameters in the model βˆ are derived following a GLM approach.

Effect Intercept r cos φ sin φ r cos φ r sin φ

DF

Estimate

Std err

Chi-square

Pr > Chi

1 1 1 1 1 1

−0.3149 −0.0329 −0.4735 0.5154 0.0733 −0.1535

0.1768 0.0318 0.2492 0.2455 0.0444 0.0439

3.17 1.07 3.61 4.41 2.73 12.20

0.0749 0.3011 0.0574 0.0358 0.0984 0.0005

1078

Bae et al.

βˆ i is the diagonal element of the inverse of the observed information matrix; that is ∂ 2l(n; β) Var(βˆ i )−1 = − ˆ. βi =βi ∂βi2 The Wald statistic is asymptotically distributed as χ12 under H0 . As mentioned in Myers and Montgomery (1997), because t–tests are not adequate to admit a natural scale parameter, we use the Wald statistic for testing the hypotheses on individual coefficients. The Wald test shows all effects except distance (r ) are significant at the 10% significance level. The obvious phenomenon of defect aggregation at specific boundary areas of the wafer (see Fig. 1) confirms that the number of defects on an IC chip is proportional to the radial-angular distance rather than solely the radial distance. The final model for mean defect counts is ˆ i = −0.4870 − 0.5035 cos φi + 0.5079 sin φi log + 0.0775ri cos φi − 0.1483 ri sin φi . (14)

ˆ i = −0.4716 − 0.0649 ri sin φi , log

10

Pearson Residuals

5 0 -3

-2

-1

(15)

with a re-estimated dispersion parameter ζˆ = 2.1776. It was observed that even the Poisson regression model considering overdispersion significantly underestimates the observed wafer yield, giving a predicted yield of 52.06%. This result indicates that the discrepancy between the predicted yield and the observed yield may be caused by the incorrectness of the assumed model rather than an overdispersion problem. As a model checking method, Fig. 2 shows a normal the fitted probability plot of Pearson residuals (rp ) for ˆ ˆ i . The model (14), which is defined as rp = (ni − i )/ 

15

Note that the parameter estimates in the final reduced model are not identical to those in the full model. Based on the fitted model (14), the zero-defect probability with respect to every IC chip is estimated. The resulting wafer yield

400

400 ˆ ˆ is i=1 P(N = 0|i )/400 = i=1 exp(i )/400 = 0.5341 (or 53.41%), considerably underestimating the observed wafer yield (79.5%). Noting that the Poisson yield ˆ = 0.5233 where without spatial covariates is exp(−)

400 ˆ MLE  = i=1 ni /400 = 0.6475, thus including spatial information on the defects results in little improvement in the prediction of the wafer yield using the Poisson model. The Poisson model underpredicts large defect counts as

well. The model (14) predicts that only 1.1% of the chips on the wafer have at least five defects while 5% of the chips actually contain at least five defects. To take account of overdispersion, we introduce a dispersion parameter ζ into the relationship between the variance and the mean Var(Ni ) = ζ i . Using the QL approach and by specifying SCALE=DEVIANCE in the SAS GENMOD procedure, we estimate the dispersion parameter ζ as 2.1899 as a ratio of deviance (≡ Deviance/DF). The fact that ζˆ has a value much larger than one is strong evidence of overdispersion. Standard errors of coefficients should be multiplied by the factor ζˆ to incorporate overdispersion into the Poisson model. However, the coefficient estimates are the same as those obtained in Table 1 because the overdispersion parameter does not affect the point estimation. We recalculated the χ 2 test statistic based on standard errors adjusted for overdispersion. The resulting log-linear model for the mean of the defect counts at the 10% significance level is

0

1

2

Quantiles of Standard Normal

Fig. 2. Normal probability plot of the Pearson residuals derived from the spatial Poisson regression.

3

1079

Yield prediction via spatial modeling interpretation of this plot is directly analogous to the normal probability plot in a multiple regression study. In Fig. 2 the Pearson residual seriously deviates from the normality assumption. The Pearson residuals were also calculated for the model (15), and we found that the residuals were significantly different from a normal distribution (the plot is not shown here). This implies that the Poisson regression model is not adequate for data on the IC defects on the wafer, thus the need for other proper model.

age is 79.32%, predicting the observed wafer yield (79.5%) almost exactly. Furthermore, the negative binomial regression model provides improved estimates over the Poisson regression for large defect counts. The negative binomial regression model predicts that 4.11% of the chips on the wafer possess at least five defects, a 0.89% difference from the true value. It has been shown that yield prediction is greatly improved by incorporating the spatial pattern of the defects into the negative binomial model which accommodates overdispersed defects on the wafer.

3.2. Negative binomial regression analysis A negative binomial model is introduced to take account of extraneous Poisson variation. First, a negative binomial model without spatial covariates was applied to the example. The negative binomial wafer yield is given by: P(N = 0) = (1 + ζ )−1/ζ , where  is mean number of defects per chip. The dispersion parameter ζ is calculated as ζ = (Var(N) − )/2 . Applying the “method of moments” discussed in Stapper (1985) to our exam 400 400  = i=1 ˆ = ( i=1 ple,  ni )/400 = 0.6475 and Var(N) (ni − 2 ˆ /399 = 3.0509, gave ζˆ = 5.733, and a resulting wafer ) yield of 76.31%. The yield prediction error, defined as the observed wafer yield minus the estimated wafer yield, is substantially reduced by considering the dispersion (or clustering) effect of defects scattered on the wafer when it is compared to that from the Poisson model. However, the negative binomial model gives incorrect results in predicting the probability of a chip possessing large defects in this example. Next, a negative binomial regression model was applied to the wafer map data with the log-link and same spatial covariates as in the Poisson regression. The Wald statistics derived from Equation (19) were used to test the significance of the spatial covariates. Based on the Wald statistics, only the distance-directional correlation effect ri sin φi is significant at the 10% significance level. The summary is given in Table 2, and the final mean function of negative binomial model is ˆ i = −0.4684 − 0.0592 ri sin φi , log

(16)

with ζˆ = 7.5462. We also used the ML method in the SAS GENMOD procedure to estimate all the parameters including the dispersion parameter. The chip yield was obtained by plugging the mean predicted using Equation (16) and the estimate of dispersion parameter in to Equation (9). The wafer yield can be considered as an average of the chip yield. The distribution of the chip yields seems to be normal in type and the averTable 2. GLM analysis of the negative binomial regression model Effect Intercept r sin φ

DF

Estimate

Std err

Chi-square

Pr > Chi

1 1

−0.4684 −0.0592

0.1517 0.0358

9.54 2.74

0.0020 0.0978

3.3. ZIP regression analysis The previous Poisson regression analysis showed strong evidence for extraneous Poisson variation. It is necessary to examine if the overdispersion is a result of a large number of zero-defect chips. As shown in Fig. 3, there are more zero-defect chips on the wafer than those under the Poisson distribution with the mean  = 0.6475. Obviously, the large number of extra zeros makes the IC defect data deviate significantly from the Poisson distribution in this example. In order to be able to handle large numbers of zero-defect chips on a wafer, the ZIP regression approach was applied to the defect count data in Fig. 1. First, a binomial regression model with a logit-link function was fitted to (transformed) binary defects data, with a “0” signifying that a chip has no defects and a “1” signifying the opposite case. Because the distribution of good chips also exhibits a certain spatial pattern (as shown in Fig. 1), spatial variables {r, cos φ, sin φ, r cos φ, r sin φ} are used as covariates. The final binomial model includes sin φ and r sin φ as important factors for logit (p). This model can be a good candidate for excess zeros in the ZIP regression model. In reality, it gave a smaller log-likelihood value than any other models for p in the combination of Poisson covariates for Λ, hence after fitting extra-Poisson zeros with the above two factors. The Poisson part is also fitted with spatial covariates to consider the spatial clustering of defects. The MLEs of the parameters in the ZIP regression model, which are obtained using the S code supported by Lambert (1992), are summarized in Table 3. Standard errors of the Table 3. Zero-inflated Poisson regression analysis Effect log Λ Intercept r sin φ cos φ r sin φ r cos φ logit p Intercept sin φ r sinφ

DF

Estimate

Std Err

Chi-square

Pr > Chi

1 1 1 1 1 1

1.5164 −0.0788 0.0760 0.2951 −0.0368 −0.0340

0.0445 0.0139 0.2130 0.0755 0.0382 0.0412

1161.35 32.39 0.1273 15.28 0.9280 0.6810