RANDOMNESS AND DEPENDENCE IN ETCH PROCESS AND THEIR EFFECTS ON PREDICTABILITY OF FAILURES AND ON CYCLE TIMES SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE Abstract. We nd signi cant dependence in machine signatures for etch processes and point out
how this may lead to a good prediction method for future failures. The randomness and dependence in etching times allow insights on cycle time simulations because of their eect on average cycle time.
1. Introduction In the semiconductor manufacturing process, it is desirable to predict machine failures by monitoring in-situ process signals and machine signatures. Such predictions allow real-time tuning (stop and reset the machine if necessary) to prevent faults and result in minimal wasted time and materials and provide a higher yield as well as improving quality of the product. Moreover, if we could forecast a subsystem failure, we are likely to improve the throughput by reducing the troubleshooting time. For instance, one might wish to predict vacuum system malfunctions by monitoring the pressure in a plasma reactor's chambers, so that technicians can change a defective gasket or O-ring in one particular chamber to prevent vacuum system fault without searching through the whole system. Hence, eective equipment diagnostic capability is likely to reduce manufacturing cost dramatically. For example, since plasma etching has become the workhorse process in modern semiconductor manufacturing, Almgren (1997) reports that Texas Instruments loses an estimated $135 million annually in each of its factories due to the lack of adequate real-time control over plasma reactors, and this emphasizes the potential for dramatic savings. In order to predict failures based on machine signatures, we have to achieve two goals: 1. Derive from machine readings the chance of faults currently or in the near future. 2. Find relationships between machine signatures of the current time and those of the future. Item 1 means nding in machine readings precursors to failures and/or system regimes where the quality of the product is likely to deteriorate. Item 2 means that signatures now reveal information about signatures in the future so we can forecast the evolution of these machine signatures. Consequently, 1 and 2 together will enable us to predict machine failures well in advance of their actual occurrences, using the in-situ sensor data. Our study has focused on part 2, in large part due to the nature of the available data. Lucent Technologies has made available for us records of 8 machine signatures for 2 plasma reactors. While working on goal 2, Rietman and Beachy (1998) have looked at a similar data set and they found high correlations in the data. High correlations may mean signi cant degree of predictability even over long periods of time. However, the study by Rietman and Beachy (1998) involved little or no data preprocessing. This is important because the data is highly seasonal, as it involves physical readings for each wafer as it goes through the etch process, which is divided into 3 steps. Figure 1 shows the pressure readings for each etch step of 4 consecutive wafers. The This research was partially supported by NSF Grant DMI-9713549 at Cornell University. 1
2
SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE 350
300
pressure
250
200
150
100
50
0
0
10
20
30
40
50 time
60
70
80
90
100
Figure 1. Pressure readings by step for 4 wafers
seasonality comes from predictable variations due to initialization of the etching process and the roughly similar etch time for the same step for dierent wafers, as well as the dierence between the three steps and relatively similar pressure for the same step for dierent wafers. However, these predictable variations are normal and carry no information about the well-being of the etch process. In our study, we use a number of tools to remove this systematic non-stationarity in order to focus on what we perceive as important in the data. These cleanup procedures are described in detail in subsequent sections of this report. Interestingly, signi cant autocorrelation was found in the clean data, sometimes even at fairly high lags. This indicates the existence of dependence for these machines signatures over a fairly long time scale, which is crucial for possible forecasting far into the future. If the signatures were independent, no prediction power would be provided at all! Besides increasing chip fabrication yield, improving the reliability of processing equipment and maintaining consistent levels of product quality, reducing product cycle time is another important facet in the battle for lower manufacturing cost. Cycle time is the total time the product spends in the system, including waiting time and processing time. Anyone interested in designing the process in such a way as to minimize the cycle time or, at least, to keep the cycle time under control, will most likely resort to simulating a part of the technological process involving plasma etching. Do the randomness and dependencies in processing times matter? Should they be accounted for in a simulation design? So far, this has not been the standard practice in cycle time studies for cluster tools. (A plasma reactor is a cluster tool since it has one robot arm serving up to four dierent chambers.) However, in our analysis of the Lucent data, we found not only high correlations in machine signatures, but also in the lengths of the rst processing step. We present simulation results showing that randomness and dependence can potentially increase the average cycle time in a cluster tool. 2. Reactor, Process and Databases The plasma reactors used in our study were two Drytek Quad Reactors of Lucent Technologies. Each machine has four etch chambers and all four are supposed to perform identically. Wafers come
RAMDOMNESS AND DEPENDENCE
3
in cassettes. Each cassette typically holds a lot of 25 wafers but lot size sometimes is less than 25. Once a cassette gets locked in the loading dock of the machine, the machine starts pumping down the air pressure in the loading dock. Then one robot arm puts the wafers, one at a time, into the 4 chambers. Each wafer goes through 3 steps of the etching process in any one of the 4 chambers. The end of step 1 is determined by an optical emission (30 to 35 seconds after the start), step 2 is timed for 40 to 45 seconds and step 3 timed for 15. When all 3 steps are nished, the robot brings the wafer back to the dock and loads the next wafer into the chamber that has just become available. During the processing of wafers, eight machine signatures are recorded in roughly, but not precisely, 5-second time intervals. They are: gas1, gas2, gas3, gas4, rf applied, rf re ected, pressure and dc bias. These signatures are stored in a buer, along with a time stamp (in seconds) (relative to start of that step) to mark when these signature are recorded. All readings and the time stamps are rounded to integers. At the end of the etching process, the data in the buer is written to an ASCII le, via the SECS interface, to a UNIX host computer. Our study focused on the part of the data that was collected between September 9th of 1997 and February 17th of 1998. This part consists records for about 1300 wafers for each of the two machines. The entire data set is quite a bit larger, and covers the period of about one year. However, there are several breaks in the data, consisting of periods of time from several days to several weeks, for which no data were available. We were unable to obtain any explanation for missing data. As a part of our eort to clean up the data and make it stationary, we had to concentrate on the longest continuous piece of the data, which was exactly the period between September 9th, 1997 and February 17th, 1998. The names of the ASCII les, each one of which corresponds to a wafer, normally starts with \TUB" followed by a number of numerical digits with a well speci ed meaning. But about 5% of the les indicate abnormal lot names that start with \unknown", \TUBRE", \TUBTEST" and \TUBTUNE". The last two may indicate testing and tuning. These lots seem to be smaller, as for machine 6662, 36 of the 43 lots with unusual names have fewer than 25 wafers recorded while only 78 of the other 1286 lots have fewer than 25 records. In our initial analysis of the data we removed the data corresponding to the lots with unusual name as part of the cleaning process. However, doing so had only minimal eect on our ndings, and so we kept this part of the data while preparing this report. 3. Summary of Findings Since vacuum system malfunctions are critical in plasma etching, this report initially focuses on the pressure eld, but studying simultaneously pressure and other elds revealed high cross correlations between some of the elds, and this is potentially useful in prediction. Because of the interest in the cycle time, we also investigate the duration of the rst processing step for each wafer. 3.1. The Pressure Field. All results presented here are for step 1 of machine number 6662 only. Since most of the time, four dierent wafers are processed in four dierent chambers simultaneously, we consider it necessary to split the data set into four separate parts according to chamber number. This is, once again, part of the cleaning process. Dierent chambers turn out to have dierent characteristics (which may or may not have been noticed before in statistical context) and combining dierent chambers together leads precisely to the systematic periodicities we need to avoid. The typical reading of pressure for one wafer versus time looks like the one shown in Figure 2. For valid statistical analysis under the steady state null hypothesis, the smallest unit of data corresponds, in the natural way, to one wafer. To obtain a single number representing the pressure regime for a given wafer, it is natural to look at the level where the pressure reading for a wafer
4
SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE 180
170
160
150
Pressure
140
130
120
110
100
90
80 −5
0
5
10
15 Time
20
25
30
35
Figure 2. Typical pressure reading for one wafer in step 1
stabilizes. Due to the nature of the typical pressure curve shown in Figure 2, we used the median of all the readings for pressure for a given step. Median is less sensitive than the mean to the initial bias when the gas is pumped in the chamber and, as long as we have enough observations at each step, the median should be close to the level where the pressure stabilizes. The result is four time series of medians of pressures per wafer, one for each chamber. For chamber 1, the time series plot is shown in Figure 3, and the sample autocorrelation plot is shown in Figure 4 together with the 95% con dence zone (bounded by 1:96n?1=2, with n being the sample size). It can be seen that sample correlations are big, even for large lags. An alternative way to look at the data is by selecting cassette, not wafer, as a unit of data. One pays the price of eectively reducing the sample size by a factor of about 25. The gains include the possibility of looking on variability of pressure among the wafers within the same cassette (that are processed in the same chamber). This information may be statistically signi cant as a precursor of future failures. A side bene t of studying in-cassette variability is in improvement of understanding of variability in the quality of the product depending on wafer location within a cassette. For each cassette and each chamber, we found all the wafers in that cassette that were processed in that chamber. For each one of these wafers, we computed the median pressure like we did before. Finally we computed the mean and standard deviation of these medians (usually 6 or 7 medians per cassette per chamber, since usually each cassette has 25 wafers to be distributed among 4 chambers). So each chamber yielded two time series, one for the mean, one for the standard deviation, with the order of cassette being the serial index. Time series and ACF plots are shown in Figure 5. Again, the autocorrelations expressively indicate that the present pressure reading reveals some information for the pressure in the future. 3.2. The Time Field. We now turn to analyzing the duration of step 1 which is important for understanding the cycle time. More details are given in a subsequent section. Our task here is dierent from the one we face while studying the pressure eld. Here the very presence of signi cant
RAMDOMNESS AND DEPENDENCE
5
179
178
177
Pressure
176
175
174
173
172
171
170
0
1000
2000
3000
4000
5000 6000 Wafer Time
7000
8000
9000
10000
Figure 3. Time series: pressure by wafer 0.16
0.14
0.12
0.1
ACF
0.08
0.06
0.04
0.02
0
−0.02
−0.04
0
10
20
30
40
50 Lag
60
70
80
90
100
Figure 4. ACF: pressure by wafer
randomness is an issue; if one is convinced that randomness is important, then dependence between durations of step 1 for dierent wafers and, in particular, correlations, becomes an issue. An interesting modeling and statistical challenge here is that we do not know exactly how long step 1 lasted for each wafer, but we do know the time stamp of the last observation. We put these time stamps (relative to the start of the corresponding etching step, subsequently rounded to the nearest whole second) into one time series for each of the four chambers. The two left plots in Figure 7 shows the time series and autocorrelations, where dependence can be seen. One
6
SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE series: mean
series: std
172.2
1
172
0.8 0.6 std
mean
171.8 171.6
0.4 171.4 0.2
171.2 171
0
500 1000 cassette time
0
1500
0
500 1000 cassette time acf: std
1.2
1.2
1
1
0.8
0.8
0.6
0.6
ACF
ACF
acf: mean
0.4
0.4
0.2
0.2
0
0
−0.2
0
10
20 lag
1500
30
40
−0.2
0
10
20 lag
30
40
Figure 5. Pressure by Cassette
might wonder if that is caused by randomness in observations times rather than by randomness in actual processing times. Clearly, the former does not have any eect on the cycle time, while the latter may have such an eect. To remove this doubt, we compared the time series of the last time stamps for step 1 with the corresponding series for steps 2 and 3. From the fourth plot in Figure 6, we can see an obviously larger spread for the series for step 1 than for the other two. This can also be con rmed by the three histograms as the rst one shows a much bigger spread. Since the randomness in the observation times should have a similar eect on all three steps, this phenomenon indicates that the duration of step 1 has more randomness than steps 2 and 3. We also looked at the time series of the time stamp of the fth observation since almost all wafers have 5 observations recorded, some with 6 or 7. Randomness in observation times should have a similar eect on this time series as on the time series of the last readings for step 1. For comparison, the time series plot and autocorrelations for this series is shown to the right of the original series in Figure 7. This time series has a spread a bit lower than that of the last reading for step 1. However, it is even more important that the former time series does not exhibit any signi cant correlations,
RAMDOMNESS AND DEPENDENCE Histogram: Step 1
Histogram: Step 2
2000
2000
1500
1500
1000
1000
500
500
0
0
20
40
7
60
0
0
20
Histogram: Step 3
40
60
Series: all 3 steps
3000
50 STEP2
2500
40
2000 30
STEP1
1500 20 1000
0
STEP3
10
500 0
20
40
60
0
0
2000
4000
6000
8000
10000
Figure 6. The last time stamp of the three steps
and one is forced to conclude that the correlations presented in the time series for the last reading are due to the random, and correlated, nature of actual durations of step 1 for dierent wafers. 4. Towards Prediction We have seen that signi cant correlations are present in the time series obtained from the pressure readings, viewed either by wafer or by cassette. The next step should be model building, veri cation and actual prediction. It is here that we face the inadequacy of the data. The literature is dominated by black box models: Linear models include AR model applied by Rietman and Beachy (1998), nonlinear models built by neural networks (Rietman and Beachy, 1998; Zhang and May, 1998; May and Spanos, 1993; Baker et al., 1995; Kim and May, 1997). We strongly believe that a structural (i.e. a non-black box) model should be used in the present situation. One of the reasons is that the data has a strongly non-Gaussian character, and so correlations, though indicative of existing dependence, do not describe nearly the whole story of what is going on. The black box models are especially suitable when one looks at correlations or another reasonably small set of parameters.
SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE 50
50
40
40 fifth time stamp
last time stamp
8
30 20 10
20 10
0
2000
4000 6000 wafer time
8000
0
10000
0.2
0.2
0.15
0.15
0.1
0.1
ACF
ACF
0
30
0.05
0.05
0
0
−0.05
0
20
40
60 lag
80
100
−0.05
0
2000
0
20
4000 6000 wafer time
40
60
8000
10000
80
100
lag
Figure 7. ACF of the last time stamp of step 1 compared with that of the fth time stamp
Here is what one needs in order to build an ecient model with a good predictive power. 1. One needs the engineering expertise of a person intimately familiar with the etch process and preferably, with the tool on which the data was collected. Such a person will supply answers to the questions that include but are not limited to, the following: There are many cassettes with unusual names, fewer than 25 wafers, and wafers with missing steps. Should these be included for model selection or thrown away as artifacts, defects, results from testing, or data recording errors? We have seen jumps in the time series plot. Are they caused by a change of recipe or should they be considered the randomness that needs to be represented by our model? 2. One needs a good maintenance record for the tool on which the data was collected. One needs to know when the failures occurred, and what kind of failures those were. With this information at hand, the modeler can build in the model, and then verify, statistical connections between machine signatures and actual failures. The combination of 1 and 2 will allow one to build a reliable statistical model of etch process and associate failures. The autocorrelations we discovered are very promising, and this is an indication
RAMDOMNESS AND DEPENDENCE
9
that a real-time statistical prediction procedure is possible, with certain predictive power several cassettes ahead. In the absence of more informative data, one can fall back on black box modelling and proceed as follows using the current data to construct a real time fault prediction procedure. We t the time series of the means of the median pressures with the following well selected AR(7) model: Xt = 0:26523 Xt?1 + 0:11835 Xt?2 + 0:15134 Xt?3 + 0:10432 Xt?4 + 0:07433 Xt?5 + 0:05297 Xt?6 + 0:06778 Xt?7 + Zt ; where Zt s are iid normal and Xt is the mean of medians of the tth cassette with the mean of all the means removed. The top two plots of Figure 8 shows the partial ACF plot of the data as well as the prediction error of the AR(7) model. Without any knowledge on how pressure reading aects machine operation, we go with the naive fault warning policy: Use this AR model to make predictions 10 cassettes into the future. If any one of those 10 predictions goes out of the 95 percentile of the empirical marginal distribution of our tted model, an alarm should be sent out. The bottom plot of Figure 8 shows the result when such a policy is applied to the rst 100 readings of the series. One can see that two alarms, marked by the s are sent out, one starts at the 14th cassette, the other at the 83rd. 5. Cycle Time Study Should randomness and dependence be taken into account when we conduct cycle time studies for cluster tools? To answer this question, we ask two other questions: 1. Are randomness and dependence present in the duration of an etch step? 2. Do they make a dierence in average cycle time? From Section 3.2, we have an armative answer to the rst question. For the second one, we did some exploratory simulation. The model we simulated is an over simpli ed model of a cluster tool. There is no claim whatsoever that this is a faithful model of any real system. However, we believe that this model retains enough common features with a real etch cluster tool. If one guages the eect of randomness and dependence of cycle time in the model, it is almost certain that their eect on cycle times in the real system will not be negligible. The model is a 4 chamber cluster tool. Cassettes of wafers arrive at the machine after deterministic intervals. Suppose pump down time is deterministically 120 seconds for each cassette. There are 25 identical wafers in each cassette. Each needs two steps of etching. Step 1 happens in chamber 1 and step 2 can happen in any one of 2, 3 and 4. For each wafer, step 1 must be done before step 2. It takes the robot 10 seconds to load chamber 1, 10 seconds to unload chamber 2, 3 or 4, and 20 seconds to transfer one wafer from chamber 1 to 2, 3 or 4. Processing time for step 2 is deterministically 100 seconds. For step 1, the processing time is N (40; ). Simulation results are shown in Figure 9 for dierent 's. It can be seen that bigger variance in per-wafer processing time results in bigger average cycle time. This is a vivid demonstration of the eect of randomness of the cycle time. In our next model, we add dependence to the processing times in chamber 1. Once again, we use a simple black box model, an AR(1), not because we believe that this is a good model for step durations, but rather, because it allows us in a simple setup to see what dependence does to cycle times. It may well happen that even more signi cant eect will result from other types of models. Let the processing time of the t'th wafer in chamber 1 form an AR(1) process fYt g with mean of
10
SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE 0.6
0.16 0.155
0.4 error
PACF
0.15 0.2
0.145 0.14
0 0.135 −0.2
0
5
0
10
10 lag
15
20
0.13
0
10 20 30 40 prediction distance (cassettes)
50
172.2
pressure
172 171.8 171.6 171.4 171.2 171
20
30
40
50 60 cassette time
70
80
90
100
Figure 8. The AR(7) model: the partial autocorrelation function, the prediction
error and the rst 100 readings with two failure alarms.
40 seconds. That is,
Yt = 40 + Xt; and
Xt = Zt + Xt?1; where Z 's are iid N (0; ), and Xt is the processing time of the t'th wafer (t can be bigger than 25. t = 26 represents the rst wafer of the second cassette). In the simulation, for each , is chosen accordingly so that Xt is N (0; 10). As we can see from Figure 10, as changes from 0 to .9, the average cycle time increases drastically when arrival rate is high. As a comparison, when is ?:9, the system is as good as the one with deterministic processing time. Therefore, we see the eect of dependence on cycle time.
RAMDOMNESS AND DEPENDENCE
11
1900 12
Average Cycle Time (seconds)
1850
1800
1750
1700 8
1650 4 0
1600 0.9
0.91
0.92
0.93 0.94 0.95 0.96 −1 Cassette Arrival Rate ((1630 seconds) )
0.97
0.98
0.99
Figure 9. Cycle time simulation: iid case 2700 0.9 2600
Average Cycle Time (seconds)
2500
2400
2300
0.75
2200
2100
0.6 0.45 0
2000
1900 0.88
−0.9
0.89
0.9
0.91 0.92 0.93 0.94 0.95 −1 Cassette Arrival Rate ((1980 seconds) )
0.96
0.97
0.98
Figure 10. Cycle time simulation: AR1 case
6. Conclusions And Suggestions For Future Work We have demonstrated that machine signatures in etch process involve deviations from the average that can be viewed as random, and that possess signi cant dependence that remains for many lags, measured in wafers or even cassettes. This carries promise of good predictability of future failures and, hence, signi cant capital savings once there is a reliable statistical model of the process and failures. We have suggested what kind of data and expertise is needed to build
12
SIDNEY RESNICK, GENNADY SAMORODNITSKY AND FANG XUE
such a statistical model. It is our hope that such data and expertise will come together in not too distant future. In the absence of better data, we have suggested, as a fallback position, a black box approach for predicting failures. We have, nally, demonstrated that etch step durations are nontrivially random and dependent, and demonstrated the potential eect of such randomness and dependence on cycle time. It is our belief that our ndings should be incorporated in future simulations designed to study ways to reduce the cycle time. References C. Almgren (1997): The role of measurements in plasma etching. Semiconductor International . M. D. Baker, C. D. Himmel and G. S. May (1995): Automated malfunction diagnosis of
semiconductor fabrication equipment: a plasma etch application. IEEE Trans. Semiconduct. Manufact. 8:62{71. B. Kim and G. S. May (1997): Automated malfunction diagnosis of semiconductor fabrication equipment: a plasma etch application. IEEE Trans. Comp., Pachag., Manufact. technol. C 20:39{47. G. May and C. Spanos (1993): Automated malfunction diagnosis of semiconductor fabrication equipment: a plasma etch application. Trans. Semiconduct. Manufact. 6:28{40. E. A. Rietman and M. Beachy (1998): A Study on Failure Prediction in a Plasma Reactor. IEEE Transactions on Semiconductor Manufacturing 11:670{680. B. Zhang and G. S. May (1998): Towards real-time fault identi cation in plasma etching using neural networks. In Proc. 9th Adv. Semison. Manufac. Conf.. Boston, MA. Sidney Resnick and Gennady Samorodnitsky, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY 14853
E-mail address :
[email protected]
[email protected]
Fang Xue, Center for Applied Mathematics, Cornell University, Ithaca, NY 14853
E-mail address :
[email protected]