Plasma etch modeling using optical emission spectroscopy - AVS

Plasma etch modeling using optical emission spectroscopy Roawen Chen,a) Herb Huang,b) and C. J. Spanosc) Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, California 94720-1770

Michael Gatto Advanced Micro Devices, Austin, Texas 78741

~Received 4 October 1995; accepted 22 January 1996! Plasma etching is often considered a yield limiter in the manufacturing of submicron integrated circuit devices. Much effort has been devoted to developing reliable models that relate the process outputs to variations in real-time sensor signals. These models, called chamber state models, allow semiconductor manufacturers to predict etch behavior. In this article, we propose to use optical emission spectroscopy ~OES! as a real-time sensor to quantify and predict the etching performance in an integrated circuit manufacturing line. This method is especially useful in plasma processing because it provides in situ and real-time analysis without disturbing the plasma or interfering with the process. This study is based on an OES system that has been installed on an Applied Materials 5300 Centura dielectric etcher with a single optical fiber connected from the reactor viewport to a spectrograph. A designed experiment was performed on oxide test wafers. Several etch characteristics, including etch rate, within-wafer uniformity, and aspect-ratio dependent etching ~ARDE!, were modeled in this study. Various modeling techniques such as multivariate principle component analysis, and partial least squares were employed to relate the various OES signatures to etching performance. The results show that 87% of etch rate variation and more than 95% of the variation in within-wafer uniformity can be explained by these models, although the OES signals can only explain 65% of the variation in ARDE. © 1996 American Vacuum Society.

I. INTRODUCTION Plasma etching has been widely used in the manufacturing of submicron integrated circuit ~IC! devices. In the past, empirical optimization of plasma parameters has produced acceptable etching and deposition ‘‘recipes.’’1 However, with the advent of smaller feature sizes, the requirements for better machine monitoring and effective diagnostic techniques are increasing. Numerical simulations based on fundamental physical principles have been developed, but they are too computationally expensive for real-time manufacturing applications.2 Because of this, recent efforts have focused on statistical methods that are needed to provide in-line plasma process diagnosis and control. A great deal of effort has been spent developing reliable models that relate the response of process outputs ~e.g., etch rate or etching uniformity! to variations in real-time sensor readings.3 These models, called chamber state models, are needed in order to predict etch behavior under a wide range of operating conditions to a high degree of precision. Recently, optical emission spectroscopy ~OES! has emerged as a powerful diagnostic technique for the real-time monitoring of plasma processes.4,5 This method provides in situ and real-time analysis without disturbing the plasma or interfering with the process. In this article, we propose to estimate and predict postetching wafer characteristics using OES sensor readings ~see Fig. 1!. Spectral data are difficult a!

Electronic mail: [email protected] Electronic mail: [email protected] c! Electronic mail: [email protected] b!

1901

J. Vac. Sci. Technol. A 14(3), May/Jun 1996

to interpret using conventional linear regression because of the large number of variables involved. Consequently, principal component analysis ~PCA! and partial least squares ~PLS! have been employed to compress the spectral data and relate them to the etching performance.

II. BACKGROUND A. Data acquisition

An SC Technologies optical emission spectrograph and controller were installed on the Applied Materials 5300 Centura dielectric etching chamber. An optical fiber was mounted on the reactor viewport and directed toward the entrance slit of an image spectrograph. The grating system dispersed the incident light and projected the spectrum onto a photodiode sensor array. Because there are 501 pixels on the photodiode array, we had the capability of monitoring the intensities of 501 different wavelengths simultaneously during the etching process. The control of this system and the subsequent data conversion and analysis were done on an IBM 486 PC and Hewlett-Packard ~HP! Workstation, respectively. The S-plus commercial software package was used for the statistical analysis. Since the intensity ratio between C and F is strongly related to the etching performance,6 the spectral data were acquired in the 200–500 nm range in order to ensure that most of the Cx Fy spectral features were monitored. In this work, we used the 600 groove/ mm grating with a 0.6 nm/pixel resolution and an acquisition time of less than 1 s per frame.

0734-2101/96/14(3)/1901/6/$10.00

©1996 American Vacuum Society

1901

1902

Chen et al.: Plasma etch modeling using OES

1902

FIG. 1. Chamber state modeling scheme using OES sensor readings.

B. Experimental design

A designed experiment was conducted in order to develop the chamber state models. The experiment was a five-factor, 26-run central composite design plus four center points;7 this is a design of resolution V without blocking. The input variables in this experimental design were source power ~W!, bias power ~W!, C2F6 flow rate ~sccm!, roof temperature ~°C!, and throttle opening ratio ~%!. The deviations from the center values of the input settings used in this experiment were about 615%. This range of input settings provides a sufficient range of real-time data and output wafer responses. The monitor wafers used in this experiment were covered with 2.7 mm TEOS and 1.35 mm photoresist ~see Fig. 2! patterned and developed for narrow line openings. In-line sensor data were captured during the processing of each wafer and the responses were measured for each of the trials. The output wafer measurements ~i.e., the responses of this experiment! were oxide etch rate, oxide etching uniformity,8 aspect-ratio dependent etching9 ~ARDE! near the center of the wafer, and ARDE near the edge of the wafer. Film thickness was measured both pre- and postetch on the Prometrix UV 1050 at 17 points on each wafer; etch rate and uniformity were calculated from these thickness measurements. ARDE values were determined from scanning electron microscopy ~SEM! photographs. C. Data reduction techniques

Since the typical OES data set contains 501 correlated variables in each spectrum, it is not practical to use all of them. One challenge worth noting is to decide which wavelengths should be chosen to represent the entire set of OES variables. Several data filters are introduced in this work. The easiest method is simply to filter the wavelengths based on spectral identification prior to the data analysis. Additionally, PCA and PLS are two multivariate data reduction tech-

FIG. 2. Cross section of the test structure for a monitor wafer. J. Vac. Sci. Technol. A, Vol. 14, No. 3, May/Jun 1996

FIG. 3. A typical spectrum collected from the Applied Materials 5300 during the main etching ~etchant is C2F6! with spectral lines labeled with their corresponding species.

niques for compressing a large number of variables down to a small number ~,10! of orthogonal variables. These reduced variables can then be used as the input matrix for a regression. All three methods were applied in this work. Next, these three methods will be discussed in some detail.

1. Preselected spectral lines

A typical optical emission spectrum acquired from the Applied Materials 5300 is shown in Fig. 3, along with the corresponding chemical species associated with chosen spectral lines. Only those spectral peaks associated with the important plasma species are chosen as the input variables for regression modeling. An F-distribution test is also employed in order to confirm whether these intensities vary significantly during the experiment. This test compares the variance of the real-time signals collected from the factorial experiment and those collected from the baseline runs ~i.e., center points of the DOE!. Those variables that have a substantial variation relative to the baseline data are considered to be sensitive to the equipment settings. More specifically, the F statistic is calculated by s 2all / n all s 2ctr / n ctr

;F a , n all , n cent ,

~1!

where s 2all is the estimated variance of the signals collected during the factorial experiment, s 2ctr is the estimated variance of the signals collected during the centerpoint runs, nall is the degree of freedom in the factorial experiment, and nctr is the degree of freedom in the centerpoint runs. In our case, nall is 29 and nctr is 3. Those wavelengths having high F-test values are determined to have statistically significant variation. Using this approach, we selected eight particular wavelengths representing the dominant chemical species, listed in Table I, as the input variable for building regression models.


1903

1903

TABLE I. The distinct wavelengths selected for the ordinary least-squares regression model. Wavelength ~nm!

Possible species

248 251.6 258 288 385 437 440 467

CF or NO CF2 CF2 CF2 CN or He C2 SiF C2 FIG. 4. The flow chart of chamber state modeling.

2. Principal component analysis

If spectral identification is not practical, then an alternative data reduction technique is PCA. The purpose of PCA is to explain most of the variance in the original data set by only a few principal components without resulting in too much loss of information. PCA transforms the input variables to a set of orthogonal variables known as principal components ~PCs!, which are linear combinations of the original variables.10,11 Since the covariance matrix of the original input matrix X is symmentric, it can be decomposed into X T X 5 VVV T , where the diagonal elements of the V are the eigenvalues and the columns of V are the eigenvectors of X T X. the coefficients of the original variable are the eigenvectors V. The PCA transformation is thus Z5V T ~ X2X¯ ! ,

~2!

where X¯ is the vector of average values of each variable in X, and Z are the coordinates in the transformed space. A large eigenvalue means that the variation along the corresponding eigenvector is large. Typically, the first few eigenvalues can explain most of the variation of input data set ~i.e., X!. Once the principal component analysis is complete, the reduced PC variables can be used as the input matrix for modeling.

ments p a j and q a j express the contribution of each variable x j and y j , respectively. The total number of PLS components ~i.e., A! needed to extract the information from X and Y is usually low ~typically 2–5! and can be determined by cross validation in order to prevent model overfitting. This is done by minimizing a criterion called the prediction error sum of squares ~PRESS!13 defined as m

PRESS5

( ~ Yˆ 2Y ! 2 ,

i51

~5!

where m is the total number of variables used in the model. The first few dimensions are usually the most important and dominate the model. Hence, PLS can be viewed as a biased regression method and the final model of Eq. ~4! can be expressed in terms of the X data as the regression model:13 Y 5X b 1F,

~6!

where the matrix of regression coefficients is given by

b 5W ~ P T W ! 21 Q T ,

~7!

where W, P, and Q are the matrices whose columns are the vectors w a , p a , and q a for a 5 1,2,...,A. A flow chart of the modeling scheme using the OES signals is illustrated in Fig. 4.

3. Partial least-squares analysis

Another statistically based technique for data reduction is PLS. Like PCA, PLS is also based on projecting information from a high-dimensional space down onto a low-dimensional space defined by a small number of basis vectors, known as latent variable.11–13 These new latent variables summarize the important information contained in the original data set. Unlike PCA, in the PLS algorithm both the response variables ~Y! and input variables ~X! are used. The scaled X and Y can be decomposed as A

X5

(

a51

t a p Ta 1E,

~3!

t a q Ta 1F,

~4!

A

Y5

(

a51

where t a is the latent vector which is calculated by X a21 w a /w Ta w a , and w a is the loading vector that maximizes the covariance between X a21 and Y a21 . Finally, E and F are the residuals, and p a and q a are loading vectors whose eleJVST A - Vacuum, Surfaces, and Films

III. RESULTS AND DISCUSSION For each wafer, five spectra frames were collected throughout the 150s of the main etch step. The signals collected during the main etch step fluctuated whenever the rf power was applied or removed. The unstable signals caused by these transient effects must be removed before any further statistical analysis. For this study, the OES data sets were selected by choosing the third spectrum which was collected at about the 80th second of processing for each wafer. OES light intensity data were placed into a twodimensional 303501 data matrix X ~one row per wafer and one column per wavelength!. Each column was centered by subtracting the mean and then scaled to unit variance by dividing by the standard deviation of the column. Ordinary least-squares regression14 was employed to relate the three reduced input data sets to the etching performance. The results obtained by each of the three data reduction methods ~summarized in Table II! are compared in the following sections.

1904


1904

TABLE II. Summary of the results of chamber state models. Data reduction method

No. of input variables

R2

Adj. R 2

Data reduction method

No. of input variables

R2

Adj. R 2

Response: oxide–ER Species identification 8 0.88 PCA 7 0.89 PLS 5 0.9

0.834 0.851 0.872

Response: oxide–uniformity Species identification 8 0.94 PCA 7 0.96 PLS 5 0.96

0.92 0.951 0.955

Response: ARDE at center Species identification 8 0.56 PCA 7 0.62 PLS 2 0.42

0.392 0.5 0.38

Response: ARDE at edge Species identification 8 0.64 PCA 7 0.74 PLS 3 0.64

0.507 0.66 0.6

A. Regression models using the intensities of the preselected wavelengths

The results reveal that OES signals have a strong correlation with oxide etch rate and uniformity because of good R 2 values ~greater than 0.8!, although the OES signals can only explain 65% of the variation in ARDE. ~The value of the R 2 statistic is a common measure for model goodness of fit. It is a measure of the proportion of total variation explained by the regression model. A perfect model fit is indicated by an R 2 statistic of 1.! However, despite the high R 2 value, the models resulting from this method are not suitable for the purpose of prediction. This is because the high degree of correlation among these input OES variables induced a multicollinearity problem, and as a result, the prediction capability of the model could be very poor. B. Regression models using PCA reduced variables

As detailed previously, PCA reduces the dimensionality of the data by projecting them on a low-dimensional space and converting them into a uncorrelated data set. One important task is to determine how many PCs should be retained in the model. An empirical method is to make a screeplot that indicates the percentage of variation explained by each principal component10 in this work. We found that seven PCs are adequate to explain 99.9% of process variation. Using these seven PCs, the PCA models can explain 85% and 95% of the variation in etch rate and uniformity, respectively. Also, an R 2 of 0.75 is achieved for the ARDE models. The loadings plots of the first four PCs ~Fig. 5! reveal that these PCs consist mostly of contributions from only 10–20 spectral peaks. The rest of the wavelengths have only negligible weights in the PCA modeling, which agrees with our previous observation that only 10–20 spectral lines among the 501 monitored wavelengths can be related to the variation of the process outputs. C. Regression models using PLS reduced variables

PLS also reduces the number of terms in the final model. The main difference is that, while in PCA the transformation is only dependent on the variability in the input matrix X, in PLS the transformation depends on both the input X and the J. Vac. Sci. Technol. A, Vol. 14, No. 3, May/Jun 1996

response Y. In order to determine which variables are significant and should be retained in the modeling, the PRESS statistics @see Eq. ~5!# are minimized and they indicate that two to six variables ~dependent on the different wafer responses! are sufficient to describe the input data; they are then used as the regressor for the wafer responses. Like PCA, the results of PLS models exhibit good R 2 values for oxide etch rate and uniformity, and moderate R 2 values for ARDE. Table II gives a comparison of all three data reduction and modeling methods. It shows the number of variables used in each model and the performance of each model in terms of adjusted R 2 values. ~This is a modified version of the R 2 statistic that considers the number of parameters of the model and is given by Ad jR 2 512(12R 2 )n/n2p, where n is the number of observations and p is the number of parameters.! This comparison reveals that PLS generally uses fewer parameters to achieve the same results as PCA. The overall limited success in predicting ARDE from OES signals might be due to the inaccuracy of ARDE measurements.

IV. CONCLUSION Run-to-run chamber state modeling using real-time OES signals is effective in capturing the process variation and explaining the final wafer characteristics, especially for etch rate and within-wafer uniformity. This method is potentially powerful in real-time control and monitoring because it uses noninvasive and in situ sensor readings for prediction while the wafer is processing. In this study, three data reduction techniques were compared. First, ordinary least-squares regression is performed on wavelengths selected based on species identification. Two other modeling techniques, principal component analysis and partial least squares were also introduced to eliminate the correlation among input variables and reduce the input matrix size. The resulting models of oxide etch rate and within-wafer uniformity were very good. Eighty-five percent of the etch rate variation and more than 95% of uniformity variation were explained by these models. However, only 65% of ARDE variation can be captured by OES signals, which might be due to the inaccuracy of ARDE

1905


1905

FIG. 5. The loading vectors of the first four principal components.

measurements. No modeling technique is overwhelmingly better than the others in terms of their R 2 values. Nevertheless, PLS generally used fewer parameters to achieve the same results. Although the methods performed in this study are based on OES sensor data collected from an Applied Materials 5300, the methodology presented is general and can be applied to other types of equipment and sensor readings. For example, multivariate data collected from rf monitor sensors or residual gas analysis ~RGA! sensors can be used in the same manner. Moreover, this methodology can also be apJVST A - Vacuum, Surfaces, and Films

plied to other semiconductor equipment that can be monitored by a multivariate sensor. V. FUTURE WORK Although the OES sensor is capable of continuously collecting spectra in real time, this work only looked at a single spectrum per wafer. In future work we plan to take advantage of the wealth of real-time data that an OES system can provide. We are presently developing methods for characterizing the time behavior of the plasma. These methods allow us to

1906


incorporate real-time data into our models, thus increasing the accuracy of our predictions. In addition, one specific OES problem has not been addressed here. This problem relates to ‘‘window clouding’’ that might attenuate the optical emission spectrum over a period of time. This problem did not interfere with our short-term experiment, but proper data preprocessing and calibration will be necessary in longer runs. ACKNOWLEDGMENTS The authors are grateful to Advanced Micro Devices, Texas Instruments, and the Semiconductor Research Corporation ~Grant No. FP-700! for funding this work. 1 2

J. A. Chan, Semicond. Int. 18, 85 ~1995!. C. P. Ho, J. D. Plummer, S. E. Hansen, and R. W. Dutton, IEEE Trans. Electron Devices ED-30, 1438 ~1983!.

J. Vac. Sci. Technol. A, Vol. 14, No. 3, May/Jun 1996

1906 S. F. Lee and C. Spanos, IEEE Trans. Semicond. Manuf. 8, 252 ~1995!. B. Wangmaneerat, Ph.D. thesis, University of New Mexico, 1992. 5 H. M. Anderson and M. P. Splichal, Proc. SPIE 2091, 333 ~1993!. 6 S. Wolf and R. Tauber, Silicon Processing for the VLSI Era ~Lattice, Sunset Beach, CA, 1986!, Vol. 1, pp. 550–551. 7 G. E. P. Box, W. G. Hunter, and J. S. Hunter, Statistics for Experimenters ~Wiley, New York, 1978!. 8 Within-wafer uniformity is defined as ~standard deviation of etch rate over 17 selected sites on wafer!/~average of etch rate over the 17 sites on wafer!3100. 9 Aspect-ratio dependent etching ~ARDE! is determined from SEM readings and is defined as 1003~depth of contact hole–depth of open area!/ ~depth of contact hole!. 10 J. E. Jackson, A User’s Guide to Principal Components ~Wiley, New York, 1991!. 11 I. Frank and J. Friedman, Technometrics 35, 109 ~1993!. 12 J. MacGregor, C. Jaeckle, C. Kiparisside, and M. Koutoudi, Proc. Sys. Eng. 40, 826 ~1994!. 13 D. M. Haaland and E. V. Thomas, Anal. Chem. 60, 1193 ~1988!. 14 K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis ~Academic, London, 1979!. 3 4