randomness and dependence in etch process and their ... - CiteSeerX

2 downloads 0 Views 299KB Size Report
The randomness and dependence in etching times allow insights on cycle time simulations because of their e ect on average cycle time. 1. Introduction.
RANDOMNESS AND DEPENDENCE IN ETCH PROCESS AND THEIR EFFECTS ON PREDICTABILITY OF FAILURES AND ON CYCLE TIMES SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE Abstract. We nd signi cant dependence in machine signatures for etch processes and point out

how this may lead to a good prediction method for future failures. The randomness and dependence in etching times allow insights on cycle time simulations because of their e ect on average cycle time.

1. Introduction In the semiconductor manufacturing process, it is desirable to predict machine failures by monitoring in-situ process signals and machine signatures. Such predictions allow real-time tuning (stop and reset the machine if necessary) to prevent faults and result in minimal wasted time and materials and provide a higher yield as well as improving quality of the product. Moreover, if we could forecast a subsystem failure, we are likely to improve the throughput by reducing the troubleshooting time. For instance, one might wish to predict vacuum system malfunctions by monitoring the pressure in a plasma reactor's chambers, so that technicians can change a defective gasket or O-ring in one particular chamber to prevent vacuum system fault without searching through the whole system. Hence, e ective equipment diagnostic capability is likely to reduce manufacturing cost dramatically. For example, since plasma etching has become the workhorse process in modern semiconductor manufacturing, Almgren (1997) reports that Texas Instruments loses an estimated $135 million annually in each of its factories due to the lack of adequate real-time control over plasma reactors, and this emphasizes the potential for dramatic savings. In order to predict failures based on machine signatures, we have to achieve two goals: 1. Derive from machine readings the chance of faults currently or in the near future. 2. Find relationships between machine signatures of the current time and those of the future. Item 1 means nding in machine readings precursors to failures and/or system regimes where the quality of the product is likely to deteriorate. Item 2 means that signatures now reveal information about signatures in the future so we can forecast the evolution of these machine signatures. Consequently, 1 and 2 together will enable us to predict machine failures well in advance of their actual occurrences, using the in-situ sensor data. Our study has focused on part 2, in large part due to the nature of the available data. Lucent Technologies has made available for us records of 8 machine signatures for 2 plasma reactors. While working on goal 2, Rietman and Beachy (1998) have looked at a similar data set and they found high correlations in the data. High correlations may mean signi cant degree of predictability even over long periods of time. However, the study by Rietman and Beachy (1998) involved little or no data preprocessing. This is important because the data is highly seasonal, as it involves physical readings for each wafer as it goes through the etch process, which is divided into 3 steps. Figure 1 shows the pressure readings for each etch step of 4 consecutive wafers. The This research was partially supported by NSF Grant DMI-9713549 at Cornell University. 1

2

SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE 350

300

pressure

250

200

150

100

50

0

0

10

20

30

40

50 time

60

70

80

90

100

Figure 1. Pressure readings by step for 4 wafers

seasonality comes from predictable variations due to initialization of the etching process and the roughly similar etch time for the same step for di erent wafers, as well as the di erence between the three steps and relatively similar pressure for the same step for di erent wafers. However, these predictable variations are normal and carry no information about the well-being of the etch process. In our study, we use a number of tools to remove this systematic non-stationarity in order to focus on what we perceive as important in the data. These cleanup procedures are described in detail in subsequent sections of this report. Interestingly, signi cant autocorrelation was found in the clean data, sometimes even at fairly high lags. This indicates the existence of dependence for these machines signatures over a fairly long time scale, which is crucial for possible forecasting far into the future. If the signatures were independent, no prediction power would be provided at all! Besides increasing chip fabrication yield, improving the reliability of processing equipment and maintaining consistent levels of product quality, reducing product cycle time is another important facet in the battle for lower manufacturing cost. Cycle time is the total time the product spends in the system, including waiting time and processing time. Anyone interested in designing the process in such a way as to minimize the cycle time or, at least, to keep the cycle time under control, will most likely resort to simulating a part of the technological process involving plasma etching. Do the randomness and dependencies in processing times matter? Should they be accounted for in a simulation design? So far, this has not been the standard practice in cycle time studies for cluster tools. (A plasma reactor is a cluster tool since it has one robot arm serving up to four di erent chambers.) However, in our analysis of the Lucent data, we found not only high correlations in machine signatures, but also in the lengths of the rst processing step. We present simulation results showing that randomness and dependence can potentially increase the average cycle time in a cluster tool. 2. Reactor, Process and Databases The plasma reactors used in our study were two Drytek Quad Reactors of Lucent Technologies. Each machine has four etch chambers and all four are supposed to perform identically. Wafers come

RAMDOMNESS AND DEPENDENCE

3

in cassettes. Each cassette typically holds a lot of 25 wafers but lot size sometimes is less than 25. Once a cassette gets locked in the loading dock of the machine, the machine starts pumping down the air pressure in the loading dock. Then one robot arm puts the wafers, one at a time, into the 4 chambers. Each wafer goes through 3 steps of the etching process in any one of the 4 chambers. The end of step 1 is determined by an optical emission (30 to 35 seconds after the start), step 2 is timed for 40 to 45 seconds and step 3 timed for 15. When all 3 steps are nished, the robot brings the wafer back to the dock and loads the next wafer into the chamber that has just become available. During the processing of wafers, eight machine signatures are recorded in roughly, but not precisely, 5-second time intervals. They are: gas1, gas2, gas3, gas4, rf applied, rf re ected, pressure and dc bias. These signatures are stored in a bu er, along with a time stamp (in seconds) (relative to start of that step) to mark when these signature are recorded. All readings and the time stamps are rounded to integers. At the end of the etching process, the data in the bu er is written to an ASCII le, via the SECS interface, to a UNIX host computer. Our study focused on the part of the data that was collected between September 9th of 1997 and February 17th of 1998. This part consists records for about 1300 wafers for each of the two machines. The entire data set is quite a bit larger, and covers the period of about one year. However, there are several breaks in the data, consisting of periods of time from several days to several weeks, for which no data were available. We were unable to obtain any explanation for missing data. As a part of our e ort to clean up the data and make it stationary, we had to concentrate on the longest continuous piece of the data, which was exactly the period between September 9th, 1997 and February 17th, 1998. The names of the ASCII les, each one of which corresponds to a wafer, normally starts with \TUB" followed by a number of numerical digits with a well speci ed meaning. But about 5% of the les indicate abnormal lot names that start with \unknown", \TUBRE", \TUBTEST" and \TUBTUNE". The last two may indicate testing and tuning. These lots seem to be smaller, as for machine 6662, 36 of the 43 lots with unusual names have fewer than 25 wafers recorded while only 78 of the other 1286 lots have fewer than 25 records. In our initial analysis of the data we removed the data corresponding to the lots with unusual name as part of the cleaning process. However, doing so had only minimal e ect on our ndings, and so we kept this part of the data while preparing this report. 3. Summary of Findings Since vacuum system malfunctions are critical in plasma etching, this report initially focuses on the pressure eld, but studying simultaneously pressure and other elds revealed high cross correlations between some of the elds, and this is potentially useful in prediction. Because of the interest in the cycle time, we also investigate the duration of the rst processing step for each wafer. 3.1. The Pressure Field. All results presented here are for step 1 of machine number 6662 only. Since most of the time, four di erent wafers are processed in four di erent chambers simultaneously, we consider it necessary to split the data set into four separate parts according to chamber number. This is, once again, part of the cleaning process. Di erent chambers turn out to have di erent characteristics (which may or may not have been noticed before in statistical context) and combining di erent chambers together leads precisely to the systematic periodicities we need to avoid. The typical reading of pressure for one wafer versus time looks like the one shown in Figure 2. For valid statistical analysis under the steady state null hypothesis, the smallest unit of data corresponds, in the natural way, to one wafer. To obtain a single number representing the pressure regime for a given wafer, it is natural to look at the level where the pressure reading for a wafer

4

SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE 180

170

160

150

Pressure

140

130

120

110

100

90

80 −5

0

5

10

15 Time

20

25

30

35

Figure 2. Typical pressure reading for one wafer in step 1

stabilizes. Due to the nature of the typical pressure curve shown in Figure 2, we used the median of all the readings for pressure for a given step. Median is less sensitive than the mean to the initial bias when the gas is pumped in the chamber and, as long as we have enough observations at each step, the median should be close to the level where the pressure stabilizes. The result is four time series of medians of pressures per wafer, one for each chamber. For chamber 1, the time series plot is shown in Figure 3, and the sample autocorrelation plot is shown in Figure 4 together with the 95% con dence zone (bounded by 1:96n?1=2, with n being the sample size). It can be seen that sample correlations are big, even for large lags. An alternative way to look at the data is by selecting cassette, not wafer, as a unit of data. One pays the price of e ectively reducing the sample size by a factor of about 25. The gains include the possibility of looking on variability of pressure among the wafers within the same cassette (that are processed in the same chamber). This information may be statistically signi cant as a precursor of future failures. A side bene t of studying in-cassette variability is in improvement of understanding of variability in the quality of the product depending on wafer location within a cassette. For each cassette and each chamber, we found all the wafers in that cassette that were processed in that chamber. For each one of these wafers, we computed the median pressure like we did before. Finally we computed the mean and standard deviation of these medians (usually 6 or 7 medians per cassette per chamber, since usually each cassette has 25 wafers to be distributed among 4 chambers). So each chamber yielded two time series, one for the mean, one for the standard deviation, with the order of cassette being the serial index. Time series and ACF plots are shown in Figure 5. Again, the autocorrelations expressively indicate that the present pressure reading reveals some information for the pressure in the future. 3.2. The Time Field. We now turn to analyzing the duration of step 1 which is important for understanding the cycle time. More details are given in a subsequent section. Our task here is di erent from the one we face while studying the pressure eld. Here the very presence of signi cant

RAMDOMNESS AND DEPENDENCE

5

179

178

177

Pressure

176

175

174

173

172

171

170

0

1000

2000

3000

4000

5000 6000 Wafer Time

7000

8000

9000

10000

Figure 3. Time series: pressure by wafer 0.16

0.14

0.12

0.1

ACF

0.08

0.06

0.04

0.02

0

−0.02

−0.04

0

10

20

30

40

50 Lag

60

70

80

90

100

Figure 4. ACF: pressure by wafer

randomness is an issue; if one is convinced that randomness is important, then dependence between durations of step 1 for di erent wafers and, in particular, correlations, becomes an issue. An interesting modeling and statistical challenge here is that we do not know exactly how long step 1 lasted for each wafer, but we do know the time stamp of the last observation. We put these time stamps (relative to the start of the corresponding etching step, subsequently rounded to the nearest whole second) into one time series for each of the four chambers. The two left plots in Figure 7 shows the time series and autocorrelations, where dependence can be seen. One

6

SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE series: mean

series: std

172.2

1

172

0.8 0.6 std

mean

171.8 171.6

0.4 171.4 0.2

171.2 171

0

500 1000 cassette time

0

1500

0

500 1000 cassette time acf: std

1.2

1.2

1

1

0.8

0.8

0.6

0.6

ACF

ACF

acf: mean

0.4

0.4

0.2

0.2

0

0

−0.2

0

10

20 lag

1500

30

40

−0.2

0

10

20 lag

30

40

Figure 5. Pressure by Cassette

might wonder if that is caused by randomness in observations times rather than by randomness in actual processing times. Clearly, the former does not have any e ect on the cycle time, while the latter may have such an e ect. To remove this doubt, we compared the time series of the last time stamps for step 1 with the corresponding series for steps 2 and 3. From the fourth plot in Figure 6, we can see an obviously larger spread for the series for step 1 than for the other two. This can also be con rmed by the three histograms as the rst one shows a much bigger spread. Since the randomness in the observation times should have a similar e ect on all three steps, this phenomenon indicates that the duration of step 1 has more randomness than steps 2 and 3. We also looked at the time series of the time stamp of the fth observation since almost all wafers have 5 observations recorded, some with 6 or 7. Randomness in observation times should have a similar e ect on this time series as on the time series of the last readings for step 1. For comparison, the time series plot and autocorrelations for this series is shown to the right of the original series in Figure 7. This time series has a spread a bit lower than that of the last reading for step 1. However, it is even more important that the former time series does not exhibit any signi cant correlations,

RAMDOMNESS AND DEPENDENCE Histogram: Step 1

Histogram: Step 2

2000

2000

1500

1500

1000

1000

500

500

0

0

20

40

7

60

0

0

20

Histogram: Step 3

40

60

Series: all 3 steps

3000

50 STEP2

2500

40

2000 30

STEP1

1500 20 1000

0

STEP3

10

500 0

20

40

60

0

0

2000

4000

6000

8000

10000

Figure 6. The last time stamp of the three steps

and one is forced to conclude that the correlations presented in the time series for the last reading are due to the random, and correlated, nature of actual durations of step 1 for di erent wafers. 4. Towards Prediction We have seen that signi cant correlations are present in the time series obtained from the pressure readings, viewed either by wafer or by cassette. The next step should be model building, veri cation and actual prediction. It is here that we face the inadequacy of the data. The literature is dominated by black box models: Linear models include AR model applied by Rietman and Beachy (1998), nonlinear models built by neural networks (Rietman and Beachy, 1998; Zhang and May, 1998; May and Spanos, 1993; Baker et al., 1995; Kim and May, 1997). We strongly believe that a structural (i.e. a non-black box) model should be used in the present situation. One of the reasons is that the data has a strongly non-Gaussian character, and so correlations, though indicative of existing dependence, do not describe nearly the whole story of what is going on. The black box models are especially suitable when one looks at correlations or another reasonably small set of parameters.

SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE 50

50

40

40 fifth time stamp

last time stamp

8

30 20 10

20 10

0

2000

4000 6000 wafer time

8000

0

10000

0.2

0.2

0.15

0.15

0.1

0.1

ACF

ACF

0

30

0.05

0.05

0

0

−0.05

0

20

40

60 lag

80

100

−0.05

0

2000

0

20

4000 6000 wafer time

40

60

8000

10000

80

100

lag

Figure 7. ACF of the last time stamp of step 1 compared with that of the fth time stamp

Here is what one needs in order to build an ecient model with a good predictive power. 1. One needs the engineering expertise of a person intimately familiar with the etch process and preferably, with the tool on which the data was collected. Such a person will supply answers to the questions that include but are not limited to, the following:  There are many cassettes with unusual names, fewer than 25 wafers, and wafers with missing steps. Should these be included for model selection or thrown away as artifacts, defects, results from testing, or data recording errors?  We have seen jumps in the time series plot. Are they caused by a change of recipe or should they be considered the randomness that needs to be represented by our model? 2. One needs a good maintenance record for the tool on which the data was collected. One needs to know when the failures occurred, and what kind of failures those were. With this information at hand, the modeler can build in the model, and then verify, statistical connections between machine signatures and actual failures. The combination of 1 and 2 will allow one to build a reliable statistical model of etch process and associate failures. The autocorrelations we discovered are very promising, and this is an indication

RAMDOMNESS AND DEPENDENCE

9

that a real-time statistical prediction procedure is possible, with certain predictive power several cassettes ahead. In the absence of more informative data, one can fall back on black box modelling and proceed as follows using the current data to construct a real time fault prediction procedure. We t the time series of the means of the median pressures with the following well selected AR(7) model: Xt = 0:26523 Xt?1 + 0:11835 Xt?2 + 0:15134 Xt?3 + 0:10432 Xt?4 + 0:07433 Xt?5 + 0:05297 Xt?6 + 0:06778 Xt?7 + Zt ; where Zt s are iid normal and Xt is the mean of medians of the tth cassette with the mean of all the means removed. The top two plots of Figure 8 shows the partial ACF plot of the data as well as the prediction error of the AR(7) model. Without any knowledge on how pressure reading a ects machine operation, we go with the naive fault warning policy: Use this AR model to make predictions 10 cassettes into the future. If any one of those 10 predictions goes out of the 95 percentile of the empirical marginal distribution of our tted model, an alarm should be sent out. The bottom plot of Figure 8 shows the result when such a policy is applied to the rst 100 readings of the series. One can see that two alarms, marked by the s are sent out, one starts at the 14th cassette, the other at the 83rd. 5. Cycle Time Study Should randomness and dependence be taken into account when we conduct cycle time studies for cluster tools? To answer this question, we ask two other questions: 1. Are randomness and dependence present in the duration of an etch step? 2. Do they make a di erence in average cycle time? From Section 3.2, we have an armative answer to the rst question. For the second one, we did some exploratory simulation. The model we simulated is an over simpli ed model of a cluster tool. There is no claim whatsoever that this is a faithful model of any real system. However, we believe that this model retains enough common features with a real etch cluster tool. If one guages the e ect of randomness and dependence of cycle time in the model, it is almost certain that their e ect on cycle times in the real system will not be negligible. The model is a 4 chamber cluster tool. Cassettes of wafers arrive at the machine after deterministic intervals. Suppose pump down time is deterministically 120 seconds for each cassette. There are 25 identical wafers in each cassette. Each needs two steps of etching. Step 1 happens in chamber 1 and step 2 can happen in any one of 2, 3 and 4. For each wafer, step 1 must be done before step 2. It takes the robot 10 seconds to load chamber 1, 10 seconds to unload chamber 2, 3 or 4, and 20 seconds to transfer one wafer from chamber 1 to 2, 3 or 4. Processing time for step 2 is deterministically 100 seconds. For step 1, the processing time is N (40;  ). Simulation results are shown in Figure 9 for di erent  's. It can be seen that bigger variance in per-wafer processing time results in bigger average cycle time. This is a vivid demonstration of the e ect of randomness of the cycle time. In our next model, we add dependence to the processing times in chamber 1. Once again, we use a simple black box model, an AR(1), not because we believe that this is a good model for step durations, but rather, because it allows us in a simple setup to see what dependence does to cycle times. It may well happen that even more signi cant e ect will result from other types of models. Let the processing time of the t'th wafer in chamber 1 form an AR(1) process fYt g with mean of

10

SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE 0.6

0.16 0.155

0.4 error

PACF

0.15 0.2

0.145 0.14

0 0.135 −0.2

0

5

0

10

10 lag

15

20

0.13

0

10 20 30 40 prediction distance (cassettes)

50

172.2

pressure

172 171.8 171.6 171.4 171.2 171

20

30

40

50 60 cassette time

70

80

90

100

Figure 8. The AR(7) model: the partial autocorrelation function, the prediction

error and the rst 100 readings with two failure alarms.

40 seconds. That is,

Yt = 40 + Xt; and

Xt = Zt + Xt?1; where Z 's are iid N (0;  ), and Xt is the processing time of the t'th wafer (t can be bigger than 25. t = 26 represents the rst wafer of the second cassette). In the simulation, for each ,  is chosen accordingly so that Xt is N (0; 10). As we can see from Figure 10, as  changes from 0 to .9, the average cycle time increases drastically when arrival rate is high. As a comparison, when  is ?:9, the system is as good as the one with deterministic processing time. Therefore, we see the e ect of dependence on cycle time.

RAMDOMNESS AND DEPENDENCE

11

1900 12

Average Cycle Time (seconds)

1850

1800

1750

1700 8

1650 4 0

1600 0.9

0.91

0.92

0.93 0.94 0.95 0.96 −1 Cassette Arrival Rate ((1630 seconds) )

0.97

0.98

0.99

Figure 9. Cycle time simulation: iid case 2700 0.9 2600

Average Cycle Time (seconds)

2500

2400

2300

0.75

2200

2100

0.6 0.45 0

2000

1900 0.88

−0.9

0.89

0.9

0.91 0.92 0.93 0.94 0.95 −1 Cassette Arrival Rate ((1980 seconds) )

0.96

0.97

0.98

Figure 10. Cycle time simulation: AR1 case

6. Conclusions And Suggestions For Future Work We have demonstrated that machine signatures in etch process involve deviations from the average that can be viewed as random, and that possess signi cant dependence that remains for many lags, measured in wafers or even cassettes. This carries promise of good predictability of future failures and, hence, signi cant capital savings once there is a reliable statistical model of the process and failures. We have suggested what kind of data and expertise is needed to build

12

SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE

such a statistical model. It is our hope that such data and expertise will come together in not too distant future. In the absence of better data, we have suggested, as a fallback position, a black box approach for predicting failures. We have, nally, demonstrated that etch step durations are nontrivially random and dependent, and demonstrated the potential e ect of such randomness and dependence on cycle time. It is our belief that our ndings should be incorporated in future simulations designed to study ways to reduce the cycle time. References C. Almgren (1997): The role of measurements in plasma etching. Semiconductor International . M. D. Baker, C. D. Himmel and G. S. May (1995): Automated malfunction diagnosis of

semiconductor fabrication equipment: a plasma etch application. IEEE Trans. Semiconduct. Manufact. 8:62{71. B. Kim and G. S. May (1997): Automated malfunction diagnosis of semiconductor fabrication equipment: a plasma etch application. IEEE Trans. Comp., Pachag., Manufact. technol. C 20:39{47. G. May and C. Spanos (1993): Automated malfunction diagnosis of semiconductor fabrication equipment: a plasma etch application. Trans. Semiconduct. Manufact. 6:28{40. E. A. Rietman and M. Beachy (1998): A Study on Failure Prediction in a Plasma Reactor. IEEE Transactions on Semiconductor Manufacturing 11:670{680. B. Zhang and G. S. May (1998): Towards real-time fault identi cation in plasma etching using neural networks. In Proc. 9th Adv. Semison. Manufac. Conf.. Boston, MA. Sidney Resnick and Gennady Samorodnitsky, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY 14853

E-mail address : [email protected]

[email protected]

Fang Xue, Center for Applied Mathematics, Cornell University, Ithaca, NY 14853

E-mail address : [email protected]

Suggest Documents