Dirichlet Process Gaussian Mixture

Submitted manuscript for review Accepted by IEEE Trans. J. of Automation and Science (in press)

1

Dirichlet Process Gaussian Mixture (DPGM) Models for Real-Time Monitoring and its Application to Chemical Mechanical Planarization (CMP) Jia (Peter) Liu, Omer F. Beyca, Prahalad K. Rao, Zhenyu (James) Kong, Member, IEEE, and Satish T.S. Bukkapatnam Abstract—The goal of this work is to use sensor data for online detection and identification of process anomalies (faults). In pursuit of this goal we propose Dirichlet process Gaussian mixture (DPGM) models. The proposed DPGM models have two novel outcomes: (i) Dirichlet process (DP)-based SPC chart for anomaly detection; and (ii) unsupervised recurrent hierarchical Dirichlet process (RHDP) clustering model for identification of specific process anomalies. The presented DPGM models are validated using numerical simulation studies, as well as wireless vibration signals acquired from an experimental semiconductor chemical mechanical planarization (CMP) testbed. Through these numerically simulated and experimental sensor data we test the hypotheses that DPGM models have significantly lower detection delays compared to statistical process control charts in terms of the average run length (ARL1); and higher defect identification accuracies (Fscore) than popular clustering techniques, such as mean shift. For instance, the DP-based SPC chart detects pad wear anomaly in CMP within 50 milliseconds, as opposed to over 140 milliseconds with conventional control charts. Likewise, DPGM models were able to classify Note to Practitioners—This work forwards novel Dirichlet process Gaussian mixture (DPGM) models for online process quality monitoring. The practical outcome is that the deleterious impact of process drifts on product quality are identified in their early stages using the presented DPGM models. For instance, sensor signal patterns from contemporary advanced manufacturing processes rarely follow distribution symmetry or normality assumptions endemic to traditional SPC methods. These assumptions limit the effectiveness of traditional SPC methods for detection of process anomalies from complex, heterogeneous sensor data. In comparison, the proposed DP-based SPC is capable of detecting process changes in the sensor data notwithstanding the characteristics of the underlying distributions. Moreover, we show that recurrent hierarchical Dirichlet process (RHDP) clustering model identifies process anomalies with higher fidelity compared to traditional methods, such as mean shift clustering. Index Terms—Chemical mechanical planarization (CMP), Dirichlet process (DP), Dirichlet process Gaussian mixture (DPGM) models, fault detection, online process monitoring, recurrent hierarchical Dirichlet process (RHDP) clustering. This research is supported in part by the National Science Foundation (NSF) under grants CMMI-1000978, CMMI-1131665 and CMMI-1401511. Jia (Peter) Liu is with Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061 USA. (email: [email protected]). Omer F. Beyca is with Department of Industrial Engineering, Istanbul Technical University, Istanbul, 34357, Turkey. (email: [email protected]). Prahalad K. Rao is with Mechanical and Materials Engineering Department, University of Nebraska-Lincoln, Lincoln, NE, 68588-0526. (email: [email protected]).

I. INTRODUCTION A. Motivation his work addresses two research questions from the perspective of sensor-based online process monitoring: 1) How to detect the onset of anomalous process states? 2) How to identify/classify different types of anomalous process states? The first falls under the category of anomaly (fault) detection; the second under the purview of anomaly (fault) diagnosis. These questions are resolved using Dirichlet process Gaussian mixture (DPGM) models. The utility of the proposed DPGM models are demonstrated in the context of process monitoring in chemical mechanical planarization (CMP) using in situ wireless vibration sensor signals [1]. This research is valuable for mitigating process anomalies in the manufacture of ultraprecision, high-value components. For instance, CMP related defects, e.g., dishing and erosion, are attributed to be among the prominent reasons inhibiting semiconductor device yield rates [2, 3]. Therefore, there is a burgeoning need for in situ sensor-based quality monitoring approaches in ultraprecision manufacturing processes, such as CMP, so that incipient process anomalies can be identified and corrected at an earlier stage [1].

T

B. Challenges Monitoring of complex manufacturing processes, e.g., CMP, from sensors signals, is challenging due to the inherent nonlinear and non-Gaussian dynamics [4]. The complex process dynamics manifest prominently in the acquired sensor signals. For instance, Fig. 1 shows a representative wireless MEMS vibration sensor signal obtained from the experimental CMP testbed used in this work (see Sec. IV). The signal time series in Fig. 1 has aspects that are evocative of nonlinear quasi-periodic dynamics, as revealed in our recent publication [1].

Zhenyu (James) Kong is with Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061 USA. (corresponding author, email: [email protected]). Satish T.S. Bukkapatnam is with Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX 77843 USA. (email: [email protected]).

2 DPGM models are applied to experimentally acquired vibration signals from CMP process in Sec. IV; and conclusions and avenues for further research are summarized in Sec. V. II. REVIEW OF RELATED RESEARCH The review of the relevant literature is conducted in the following two parts: (1) Recent developments in SPC. (2) Research in techniques such as neural networks, wavelet analysis and Gaussian mixture models as enhancements to traditional SPC.

Fig. 1. Characteristics of a typical experimental CMP wireless vibration sensor signal sampled at ~670Hz (see Sec. IV). (a) The temporal signal with prominent cycles occurring at 1 second intervals and intermittent spikes, (b) the state space plot of the signal data at three consecutive time instances (X(t), X(t+1) and X(t+2)) depicts quasi-periodic dynamics, (c) histogram of vibration signal data reveals that the distribution is asymmetric and has bimodal tendency.

Traditional statistical process control (SPC) charts and clustering techniques (e.g., mean shift, k-means, expectation maximization (EM)) are ill-suited for monitoring of such complex dynamics (e.g., Fig. 1) due to the following reasons: • Parametric SPC charts have limited applicability for data with the following salient characteristics: the underlying distribution is non-Gaussian or multimodal, with persistent autocorrelation, or marked periodicity [5]. • Most distribution-free (nonparametric) SPC charts for nonGaussian data are based on data ranking and constrained by the symmetry assumption of data distribution [6]. • Control charts cannot identify/localize the different process anomalies [7]. • Classical clustering techniques, for instance, mean shift requires sufficient high data density with clear gradients to locate the cluster centers, and k-means and EM methods require a priori assumptions regarding the number of clusters in the data [8-10]. C. Contributions and Novelty We seek to overcome the aforementioned shortcomings in traditional SPC and clustering techniques using Dirichlet process Gaussian mixture (DPGM) models for signal analysis [11]. DPGM models surmount limitations, such as normality and symmetry of data inherent to SPC charts. Furthermore, DPGM models inherently account for temporal autocorrelation in the data. Accordingly, the main contributions from DPGM models presented in this work are: 1) An SPC chart using Dirichlet process (DP) mixture model for detection of anomalies. 2) Unsupervised clustering of process states based on recurrent hierarchical Dirichlet process (RHDP) clustering model, which is used for identifying/classifying specific process anomalies. The rest of the paper is structured as follows: a review of the pertinent literature is presented in Sec. II; the methodology of DPGM models is detailed in Sec. III, which includes delineation of constitutive equations for DP-based SPC and RHDP clustering, and further corroboration with numerical studies;

A. Statistical Process Control (SPC) for Process Monitoring Traditional parametric SPC charts, such as Shewhart Xbar and R, cumulative sum (CUSUM), and exponentially weighted moving average (EWMA), have been widely used in various scenarios ranging from manufacturing to service industries for process improvement [7]. Despite the underlying normality and independence assumptions (NID) the effectiveness of Shewhart control charts have been attested; they are particularly useful for situations where sub-grouped measurements can be made and the process shifts are significant (> 1 standard deviation) [7]. CUSUM and EWMA control charts can be applied for both sub-grouped and individual measurements, and are particularly suited for detecting small drifts. However, the latter (EWMA) are not directionally invariant, i.e., the control chart has a certain inertia effect in reacting to process drifts [7]. To overcome these restrictive assumptions with traditional parametric control charts, researchers devised nonparametric SPC charts, which are also called distribution-free SPC charts. Chakroborti, et al. [12] provided a comprehensive review of nonparametric SPC charts. Although a specific type of distribution does not restrain nonparametric charts, nonetheless, most are based on data ranking methods, which entail that the data is implicitly assumed to be symmetric about the median. To overcome this drawback, Qiu and Li [13] proposed a categorization-based nonparametric SPC chart for univariate data sequences. Their method relaxes the data symmetry assumption and is shown to be effective for non-Gaussian data. However, relying on a priori categorization of data for analysis results in information loss. Particularly, the selection of the number of groups for categorization, which is a heuristic parameter, is critical to the performance of control chart. In another article, Qiu and Li [14] devised nonparametric SPC charts leveraging Gaussian transformations, i.e., transforming data belonging to an unknown distribution to approximately Gaussian. However, the normality of transformed data cannot be universally guaranteed for cases where the data is patently multimodal and complex, such as in the CMP vibration signals used in this work. To overcome these challenges, researchers have explored wavelet and neural network-based SPC. These techniques can accommodate complex process dynamics, and have also been applied in CMP process [15].

3 B. Wavelet and Neural Network-based Monitoring Wavelet analysis has been successfully implemented in modeling and monitoring of functional data in advanced manufacturing [16]. For instance, Ganesan, et al. [17] developed the wavelet-based SPC approach for real-time identification of delamination defects in CMP process. Guo et al. [5] presented an approach that uses wavelet coefficients in an SPC setting for detecting process drifts. Their method involves multi-scale decomposition of a signal using a predetermined Harr wavelet basis function. Subsequently, they tracked the wavelet coefficients at a predetermined optimal (wavelet) level using CUSUM and EWMA control charts. Jeong, et al. [18] described a similar wavelet-SPC procedure using the Symlet-8 wavelet basis function for functional data analysis of radio antenna reception patterns. Their approach uses a customized control chart with control limits derived from a statistic resembling the multivariate Hotelling’s T2 [7]. Pugh [19] showed that feed-forward neural networks (NN) has significantly lower type I and type II errors compared to traditional Shewhart X-bar and R charts, and therefore could be valuable for process monitoring applications. Subsequently, several researchers [20] have developed methods that employ neural networks (NN) for process monitoring applications. As an example of NN-based process monitoring, Rao et al. [21, 22] integrated a feedback-delay embedded recurrent neural network (RNN) with Bayesian particle filtering (PF) for real-time detection of mean shift in ultraprecision diamond turning process. The evolving surface morphology of diamond turned workpieces is predicted in real-time from in situ heterogeneous sensor data using PF-updated RNN weights. The network weights are subsequently monitored in an SPC setting using mean shift clustering [23]. Although these wavelet and NN-based SPC methods are applicable to complex signals without being constrained by the underlying assumptions of data distribution, they are nonetheless computationally demanding and engender a large number of variables that have to be tracked simultaneously. Moreover, these approaches require a predetermined model or basis function, such as the structure of the NN, and the basis and scaling function for wavelet decomposition. Therefore, decision uncertainty due to model selection remains a contentious challenge. In contrast, SPC methods with Gaussian mixture modeling (GMM) overcome these aforementioned data distribution and model selection limitations. In this context, Choi et al. [24] and Thissen et al. [25] proposed PCA-based monitoring techniques, where GMM-derived models constructed via EM algorithms are used to approximate the data pattern. Similarly, Chen et al. [26] utilized infinite Gaussian mixture models (IGMM) to construct the control chart. The Dirichlet process Gaussian mixture (DPGM) models presented in this work extend GMM-based SPC techniques towards both process anomaly detection and identification. It is noted that Dirichlet process (DP) mixture models are the basis for constructing DPGM models. DP mixture modeling has been previously applied for data in several areas, such as medical images, online documents [27, 28]. DP mixture models approximate an empirical (arbitrary) data distribution via a mixture of

finite Gaussian distributions without a priori knowledge of the number of mixture components. For instance, Liu et al. [29] applied DP mixture model to identify process anomalies in fused filament fabrication (FFF) - an additive manufacturing process. III. RESEARCH METHODOLOGY AND VERIFICATION WITH NUMERICAL SIMULATIONS In this research, we first propose DP-based SPC for detecting process anomalies. Then towards identifying different types of process anomalies from a continuous sensor data stream, two extensions to DP models are forwarded to accommodate the following aims: i. Estimating distribution characteristics within a contiguous time epoch given the autocorrelation in the data. ii. Tracking the evolution of the data distribution between time epochs. The first aim is realized using the recurrent Dirichlet process (RDP) method proposed by Ahmed and Xing [30]. The second involves using the hierarchical Dirichlet process (HDP) developed by Teh et al. [31]. The integration of these two entities is accomplished in this work, and is termed recurrent hierarchical Dirichlet process (RHDP). Specifically, the RDP and HDP parts resolve the following questions: • RDP – What are the characteristics of the data distribution at the current time epoch, given the knowledge of the distribution characteristics at the previous time epochs? • HDP – What category is the process state (anomaly/fault) at the current time epoch, given the distribution characteristics estimated using RDP? In other words, RDP determines characteristics of data distribution at the current time epoch by including information from previous time epochs, while HDP enables data to be classified by allowing distributions falling under the same DP model to be clustered together. The overall framework of the proposed methodology is summarized in Fig. 2.

Fig. 2. Overall methodology of DPGM modeling for Online Monitoring.

First, the DP mixture model is introduced, which approximates an empirical data distribution as a mixture of Gaussian components, and develop DP-based SPC (Sec. III.A). Next, the novel recurrent hierarchical Dirichlet process (RHDP) model is elucidated for clustering process states and identifying process anomalies (Sec. III.B). A. Dirichlet process (DP)-based SPC for monitoring complex, non-Gaussian waveforms For time series data with complex nonlinear dynamics, such as the vibration sensor data acquired from the CMP process (Fig.

4 1), the data distribution may not be Gaussian. This poses a significant challenge for process modeling and monitoring with traditional methods based on the presumption of normality and symmetry of data. Pertinently, since a non-Gaussian distribution can be modeled by using a mixture of Gaussian distributions, such impediments could be overcome. The essential concept of our DPGM models is to represent a non-Gaussian probability distribution as a mixture of multiple Gaussian distributions. This implication can be stated mathematically as follows, 𝑘𝑘

𝑝𝑝(𝑥𝑥) = � 𝜋𝜋𝑗𝑗 ℕ�𝑥𝑥�𝜃𝜃𝑗𝑗 �

(1)

𝑗𝑗=1

where 𝑥𝑥 represents a time series collected by sensors from the process, 𝑝𝑝(𝑥𝑥) is its data distribution, 𝑘𝑘 is the number of Gaussian components ℕ(∙) in the mixture, each of which is modeled with weight 𝜋𝜋𝑗𝑗 and parameters 𝜃𝜃𝑗𝑗 (mean 𝜇𝜇𝑗𝑗 and variance 𝜎𝜎𝑗𝑗 2 ). In reality 𝑘𝑘 may be unknown. We apply the Dirichlet process (DP) mixture model, which is a data-driven nonparametric Bayesian approach to approximate a non-Gaussian distribution without any a priori knowledge of 𝑘𝑘 [11]. The procedure is broadly elucidated in the forthcoming section. 1) Theoretical development of DP mixture model In Dirichlet process, the limit for the number of clusters k goes to infinity [32]. In other words, when k→∞, the conditional prior distribution for the component indicators reaches its limit as follows, 𝑝𝑝(𝑐𝑐𝑖𝑖 = 𝑗𝑗|𝒄𝒄−𝑖𝑖 , 𝛼𝛼) 𝑛𝑛−𝑖𝑖,𝑗𝑗 if exsisting component 𝑗𝑗 is chosen (2) 1 + 𝛼𝛼 ∝ �𝑁𝑁 − 𝛼𝛼 if a new component is created 𝑁𝑁 − 1 + 𝛼𝛼 where 𝒄𝒄 = (𝑐𝑐1 , … , 𝑐𝑐𝑁𝑁 ) are indicators of data points for components, 𝛼𝛼 is the concentration parameter, 𝑛𝑛𝑗𝑗 is the number of data points in Gaussian component j, N is the number of data points, 𝑁𝑁 = ∑𝑘𝑘𝑗𝑗=1 𝑛𝑛𝑗𝑗 . The subscript −𝑖𝑖 indicates all indices except 𝑖𝑖, and similarly, 𝑛𝑛−𝑖𝑖,𝑗𝑗 indicates the number of observations in component 𝑗𝑗 for all data points except point 𝑖𝑖. For each component indicator 𝑐𝑐𝑖𝑖 drawn conditioned on all other component indicators from the multinomial distribution, there is a corresponding component parameter 𝜃𝜃𝑖𝑖 drawn from a base distribution 𝐺𝐺0 . This result signifies a Dirichlet process (DP) mixture model, which can be used to model a set of observations (𝑥𝑥1 , … , 𝑥𝑥𝑖𝑖 , … , 𝑥𝑥𝑁𝑁 ) , with latent variables of 𝜽𝜽 = (𝜃𝜃1 , … , 𝜃𝜃𝑖𝑖 , … , 𝜃𝜃𝑁𝑁 ) as follows, 𝐺𝐺~ 𝐷𝐷𝐷𝐷(𝛼𝛼, 𝐺𝐺0 ) 𝜃𝜃𝑖𝑖 ~ 𝐺𝐺 𝑥𝑥𝑖𝑖 ~ ℕ(. |𝜃𝜃𝑖𝑖 )

(3)

where 𝐷𝐷𝐷𝐷(𝛼𝛼, 𝐺𝐺0 ) is the Dirichlet process (DP) with base distribution 𝐺𝐺0 and concentration parameter 𝛼𝛼; 𝐺𝐺 is a random discrete distribution drawn from 𝐷𝐷𝐷𝐷(𝛼𝛼, 𝐺𝐺0 ) ; each 𝜃𝜃𝑖𝑖 is drawn from the discrete distribution 𝐺𝐺; and each data point 𝑥𝑥𝑖𝑖 (which may include statistical features, e.g., mean, variation, etc., from the sensor data) is drawn from a normal distribution with parameter 𝜃𝜃𝑖𝑖 . Because the empirical distribution G is discrete, the same values can be assigned to multiple 𝜃𝜃𝑖𝑖 . Data points which

have the same latent value belong to the same component [11, 32]. Furthermore, on integrating out 𝐺𝐺, the following conditional distribution for 𝜃𝜃𝑖𝑖 is obtained [33], 𝛼𝛼 𝜃𝜃𝑖𝑖 |𝜽𝜽−𝑖𝑖 , G0 , 𝛼𝛼~ G + 𝑁𝑁 − 1 + 𝛼𝛼 0

𝑁𝑁−1

�

𝑗𝑗=1 ∪ 𝑗𝑗≠𝑖𝑖

1 𝛿𝛿(𝜃𝜃𝑗𝑗 ) 𝑁𝑁 − 1 + 𝛼𝛼

(4)

where 𝛿𝛿(𝜃𝜃𝑗𝑗 ) is the Dirac delta function peaked on 𝜃𝜃𝑗𝑗 . Subsequently, combining the prior distribution for 𝜃𝜃𝑖𝑖 of data 𝑖𝑖 in (4) and the likelihood function in (3) results in the following posterior distribution for Gaussian components parameters, 𝑝𝑝(𝜃𝜃𝑖𝑖 = 𝑗𝑗|𝜽𝜽−𝑖𝑖 , 𝑥𝑥𝑖𝑖 )~

𝑛𝑛−𝑖𝑖,𝑗𝑗 ℕ�𝑥𝑥𝑖𝑖 �𝜃𝜃𝑗𝑗 � if existing component 𝑗𝑗 is chosen (a) (5) 𝑁𝑁 − 1 + 𝛼𝛼 � 𝛼𝛼𝛼𝛼 𝐻𝐻(𝜃𝜃|𝑥𝑥𝑖𝑖 ) if a new component is created (b) 𝑁𝑁 − 1 + 𝛼𝛼 𝐺𝐺0 (𝜃𝜃)ℕ�𝑥𝑥𝑖𝑖 �𝜃𝜃 � where 𝑞𝑞 = ∫ 𝐺𝐺0 (𝜃𝜃)ℕ(𝑥𝑥𝑖𝑖 |𝜃𝜃)𝑑𝑑(𝜃𝜃) , 𝐻𝐻(𝜃𝜃|𝑥𝑥𝑖𝑖 ) = ∫ 𝐺𝐺 (𝜃𝜃) . ℕ�𝑥𝑥𝑖𝑖 �𝜃𝜃 �𝑑𝑑(𝜃𝜃) 0

Equation (5(a)) shows the probabilities of 𝜃𝜃𝑖𝑖 having the same value with existing Gaussian component parameter 𝜃𝜃𝑗𝑗 , and (5(b)) is the posterior probability of 𝜃𝜃𝑖𝑖 choosing a new value which is randomly generated from 𝐻𝐻(𝜃𝜃|𝑥𝑥𝑖𝑖 ).

2) Dirichlet process (DP)-based statistical process control (SPC) chart A control chart is a visual tool that is used for monitoring whether a process or system at a given time is under the influence of common cause (chance) variation or special cause (assignable) variation [6, 7]. The limits of the control charts represent thresholds that are obtained when a system operates wholly under common cause variation (in-control (IC) condition). In IC condition, the monitoring statistic falls within the control limits threshold. If special causes take effect, the control chart should presumably signal a change in terms of the monitoring statistic drifting outside the control limits (out-of-control (OOC) condition). Thus, the control chart is effectively a twostate or binary classifier as it signals only IC or OOC process states. The control chart does not identify, explicitly, the type of anomaly/special cause. In DP-based SPC, the likelihood values of new data are calculated under IC data distribution acquired by DP mixture model. Process changes are detected once the likelihood values drop, indicating such data are not likely generated under IC condition. An effective way to detect the OOC operation is by monitoring average log-likelihood value in a subgroup of incoming data under IC data distribution as in (6), 1 log�𝐿𝐿�𝑥𝑥1 , … , 𝑥𝑥𝑖𝑖 , … , 𝑥𝑥𝑤𝑤 �𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑗𝑗 �� 𝑤𝑤 𝑗𝑗 𝑤𝑤 (6) 1 = � log �� 𝜋𝜋𝑘𝑘 ℕ(𝑥𝑥𝑖𝑖 |𝜃𝜃𝑘𝑘 )� 𝑤𝑤 𝑖𝑖=1

𝑘𝑘=1

where 𝑥𝑥𝑖𝑖 is the incoming data, j is the number of components for mixture distribution for IC condition, w is the subgroup size of testing data. The larger the value of w, the more reliable the detection of OOC operation, but a longer delay is caused to detect process changes. Based on empirical results, we choose w as the minimal number of observations to achieve the average run length (ARL0) for type I error of likelihood values below a

5 certain value, e.g., 0.05, in order to balance fast detection and detecting accuracy. By the central limit theorem, the average log-likelihood values of incoming data are approximately normally distributed. Therefore, the problem of monitoring original complex nonGaussian data reduces to a scheme of monitoring normally distributed average log-likelihood values. For simplicity, we construct the DP-based SPC by closely emulating the framework of the CUSUM chart with the average log-likelihood values as the monitoring target. Therefore, representing the average loglikelihood value in time epoch t as 𝑦𝑦𝑡𝑡 , we have monitoring statistics for DP-based SPC: �

+ ] 𝐶𝐶𝑡𝑡+ = 𝑚𝑚𝑚𝑚𝑚𝑚[0, 𝑦𝑦𝑡𝑡 − (𝜇𝜇0 + 𝐾𝐾) + 𝐶𝐶𝑡𝑡−1 − − ] 𝐶𝐶𝑡𝑡 = 𝑚𝑚𝑚𝑚𝑚𝑚[0, (𝜇𝜇0 − 𝐾𝐾) − 𝑦𝑦𝑡𝑡 + 𝐶𝐶𝑡𝑡−1

and variance 1, ℕ(μ,1); and the Chi-squared distribution with one degree of freedom, 𝜒𝜒12 . The mean of these distributions will be shifted from the in-control state of zero mean, and the ARL1 will be evaluated for CUSUM, EWMA, and DP-based SPC. We note that the latter distribution (𝜒𝜒12 ) is inherently asymmetric (right skewed), and is theoretically equivalent to a F distribution, F(1,∞). Both the Gaussian and Chi-squared distributions are approximated by mixtures of Gaussian components using DP mixture model, as exemplified in Fig. 3.

(7)

with the control limits (threshold) for the chart set at H = 𝐿𝐿𝐿𝐿

(8)

where 𝜎𝜎 and 𝜇𝜇0 are the standard deviation and mean of the sequential data 𝑦𝑦𝑡𝑡 under IC condition; the parameters 𝐾𝐾 and 𝐿𝐿 are adjusted for a given average run length criteria (ARL0) [7]; the average log-likelihood value 𝑦𝑦𝑡𝑡 is obtained from (6). The cumulative sums 𝐶𝐶𝑡𝑡+ and 𝐶𝐶𝑡𝑡− are tracked over time, if these quantities are greater than H, then OOC status is signaled. The cumulative sums 𝐶𝐶𝑡𝑡+ and 𝐶𝐶𝑡𝑡− are never negative. The implementation of DP-based SPC is introduced in Appendix I. 3) Application of DP-based SPC for process monitoring - numerical case studies In this section, we show that DP-based SPC can capture changes in the data despite the underlying distribution being asymmetric and multimodal. We compare the results with two conventional control charts, namely, exponentially weighted moving average (EWMA) and cumulative sum (CUSUM) [7]. The traditional control charts monitor the raw data values, while the DP-based SPC uses the average log-likelihood values in (6) within the CUSUM framework in (7). For comparison purposes, we use the following two average run length criteria (ARL) as widely used for performance evaluation of control charts: ARL0 and ARL1 [7]. We now test the hypothesis that DP-based SPC has superior ability (i.e., smaller ARL1) in capturing the changes in incoming data compared to EWMA and CUSUM given identical ARL0. The following three scenarios are investigated: • Case N1: Detecting mean shifts in univariate, unimodal Gaussian and non-Gaussian distributions. • Case N2: Detecting mean shifts in univariate, multimodal non-Gaussian distributions. • Case N3: Detecting shifts in a multivariate, nonlinear, quasi-periodic data from the Rössler chaotic attractor [34]. a)

Case N1: DP-based SPC for data from univariate, unimodal Gaussian and non-Gaussian distributions

The aim of this study is to ascertain the ARL1 performance of DP-based SPC towards detecting a shift in mean (location parameter) of a distribution. Furthermore, we contrast the ARL1 performance of DP-based SPC with that of EWMA and CUSUM control charts. This study is conducted with data generated from two basic univariate distributions: the Gaussian distribution with mean μ

Fig. 3. True pdf, Gaussian components, and approximated distribution by DP mixture model for data generated from (a) ℕ(μ,1) and (b) 𝜒𝜒12 .

Fig. 4. Case N1 results - OOC ARL1 values of different control charts when ARL0 is fixed at 500, and actual IC data generated from (a) ℕ(μ,1) and (b) 𝜒𝜒12. Scale on the y-axis is in natural logarithm.

The out-of-control (OOC) data are obtained by mean shift, ranging from -1.0 to 1.0 with a step of 0.2. The control limit is acquired by adjusting parameter L in (8) to obtain average ARL0 in 5,000 repetitions at 500 under IC condition; the ARL1 values are reported based on 10,000 replications. The average ARL1 results for the EWMA, CUSUM, and DP-based SPC are reported in Fig. 4. The following observations can be tendered from Fig. 4: i. Fig. 4(a): When the normality assumption is not violated, as the case with the Gaussian distribution ℕ(μ,1), CUSUM and EWMA perform better (lower ARL1) than DP-based SPC. ii. Fig. 4(b): If the data is patently non-Gaussian, i.e., the normality condition as in the case of 𝜒𝜒12 is violated, then the ARL1 of DP-based SPC is smaller than EWMA and

6 CUSUM control charts. b)

Case N2: DP-based SPC for data from univariate, multimodal, non-Gaussian distributions

In this case study, the data are obtained from an underlying bimodal distribution consisting of ℕ(10,1) and 𝜒𝜒12 . As evident in Fig. 5, DP mixture model closely approximates the data distribution, which corroborates our assertion that DP mixture model can capture complex distributions. As in the previous case (Case N1) OOC data are obtained by shifting the mean of the data in the range of -1.0 to 1.0 with a step of 0.2. Once again, the control limit is acquired by adjusting parameter L in (8) to obtain average ARL0 of 5,000 repetitions at 500 under IC condition, and ARL1 results from a 10,000-replication study are reported (Fig. 6). It can be inferred from Fig. 6 that under a multimodal distribution, and when the data is patently non-Gaussian and asymmetric, the performance of EWMA and CUSUM is considerably inferior to DP-based SPC; the ARL1 of DP-based SPC is smaller than EWMA and CUSUM. Indeed, the performance of the DP-based SPC is almost identical to Fig. 4 (b), thus further affirming that the DP-based SPC is not influenced by symmetry and modes of the underlying data.

ical system, which exhibits chaotic nonlinear behavior predicated by the choice of three parameters, namely, a, b , and c in (9) [34]. 𝑑𝑑𝑑𝑑 = −𝑦𝑦(𝑡𝑡) − 𝑧𝑧(𝑡𝑡) 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 = 𝑥𝑥(𝑡𝑡) + 𝑎𝑎 ⋅ 𝑦𝑦(𝑡𝑡) (9) 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 = 𝑏𝑏 + 𝑧𝑧(𝑡𝑡) ⋅ [𝑥𝑥(𝑡𝑡) − 𝑐𝑐] 𝑑𝑑𝑑𝑑 We fix the parameters as follows: a = 0.2, b = 0.2, c = 5. The Rössler system depicts prominent chaotic dynamics under these conditions; the dynamics of the Rössler system has been extensively investigated by Crutchfield et al. [34]. The Rössler attractor state-phase diagram obtained as a result of (9) are shown in Fig. 7(a). We note that, in this simulation, data generated from the Rössler system of (9) are purposely contaminated with Gaussian white noise ℕ(0, 𝜎𝜎 2 𝑰𝑰𝟑𝟑 ), where 𝑰𝑰𝟑𝟑 is the identity matrix of order 3; the effect of variance 𝜎𝜎 2 on ARL1 of DP-based SPC is tested in this case study. Shown in Fig. 7(b) is a sample 1,000 data points from the contaminated Rössler attractor. Next, DP mixture model is used to approximate the data distribution of Rössler contractor using a mixture of multivariate Gaussian distributions. 1,000 new data points are generated from the DP approximated distribution of the contaminated Rössler attractor as shown in Fig. 7(c). It is apparent from Fig. 7(c) that the data generated from the DP approximated distribution resembles closely the data sampled from the contaminated Rössler attractor Fig. 7(b). The Chi-square goodness of fit (GoF) test attests that there is no significant difference between the actual and DP approximated data in Fig. 7(b) and (c), respectively.

Fig. 5. True pdf, Gaussian components and fitted distribution by DP model for data generated from a bimodal distribution consisting of ℕ(10,1) and 𝜒𝜒12 .

Fig. 7. (a) The Rössler attractor delineated in (9), (b) a sample 1000 data points from the Rössler attractor contaminated with white noise ℕ(0, 𝑰𝑰𝟑𝟑 ), (c) a new sample from the approximated distribution of the Rössler attractor.

Fig. 6. Case N2 results - OOC ARL1 values of three SPC methods when ARL0 is 500, and actual IC data are generated from a bimodal distribution consisting of ℕ(10,1) and 𝜒𝜒12 . Scale on the y-axis is in natural logarithm.

c)

Case N3: DP-based SPC for multivariate, nonlinear,

quasi-periodic data Real-world signals customarily portray strong nonlinearity and high dimensionality; such behavior has been observed in several practical instances in manufacturing processes, including CMP [22, 35, 36]. In this case study, we show that DP mixture model can accommodate multidimensional data depicting nonlinear quasi-periodic dynamics [36]. The three-dimensional Rössler system, as delineated in (9), is used in this case study [34]; it consists of three coupled ordinary differential equations to define a continuous-time dynam-

In order to detect the effects of mean and variance shifts, OOC data are generated as follows (the IC state is the data obtained from (9) with white noise ℕ(0,𝑰𝑰𝟑𝟑 ) : • for the mean shift case, OOC data are obtained by translating the original data from (9) in all directions (𝑥𝑥(𝑡𝑡), 𝑦𝑦(𝑡𝑡), 𝑧𝑧(𝑡𝑡)) in the range of 0.5 to 2.5 (step size 0.5); • for variance shifts, the OOC data are obtained by contaminating original data with different levels of Gaussian noise ℕ(0, 𝜎𝜎 2 𝑰𝑰𝟑𝟑 ) with variance (𝜎𝜎 2 ) ranging from 1.5 to 4 (step size 0.5). The ARL0 of the multivariate extension of the EWMA (MEWMA), Hotelling’s T2 multivariate control chart, and DPbased SPC is maintained at 500 to obtain the control limit, and ARL1 is assessed [7]. We use the Hotelling’s T2 instead of the CUSUM, because the Hotelling’s T2 is easier to implement than to extend the CUSUM to the multivariate case, and it is also considered one of the standard multivariate control charts [37].

7

Fig. 8. Case N3 results - OOC ARL1 values of three SPC methods when IC ARL0 is 500; scale on the y-axis is in natural logarithm. (a) ARL1 results for mean shifts, (b) ARL1 detection for data variation 𝜎𝜎 2 .

As in previous cases (Case N1 and N2), the ARL1 results from 10,000 replications are reported for the three control charts. The ARL1 results of the DP-based SPC are compared with Hotelling’s T2 and MEWMA in Fig. 8. It can be inferred, based on the evidence presented in Fig. 8, that the DP-based SPC has significantly smaller ARL1, i.e., DP-based SPC is able to detect data shifts and variability earlier than either of the conventional control charts compared (Hotelling’s T2 and MEWMA) for multivariate, nonlinear, quasi-periodic data. Note on Computational Time: For univariate data (Case N1 and Case N2), the computational time of DP-based SPC is about 0.02ms per data point, and that of EWMA and CUSUM is about 0.005ms per data point; for the complex multivariate data (Case N3), the computational time of DP-based SPC is about 0.2ms per data point, and that of MEWMA and Hotelling’s T2 is about 0.02ms per data point (with Intel® Core™ i7-4770 CPU@ 3.40GHz). Although DP-based SPC is slower than traditional SPC charts, it is fast enough (~50KHz for one-dimensional data, and ~5KHz for three-dimensional multivariate data) to handle many manufacturing processes (e.g., in CMP, the sampling frequency of vibration sensors is ~670Hz), and it is superior in monitoring complex signal data.

a) Recurrent Dirichlet process (RDP) model The basic DP modeling delineated in Sec. III.A assumes that data points (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑁𝑁 ) are fully exchangeable. In other words, the autocorrelation of the data is not accounted in the basic DP modeling. This assumption is not detrimental for DPbased SPC, since the IC condition model does not rely on the temporal order of IC data. However, it is critical to overcome this exchangeability limitation for classifying process states from sequential data. Specifically, the exchangeability limitation can be surmounted using the recurrent Dirichlet process (RDP) model proposed by Ahmed and Xing [30], which divides time series data into contiguous sequential epochs (windows); data points within the same epoch are assumed to be exchangeable, while the temporal order is maintained across epochs. Thus, the autocorrelation in consecutive epochs is accounted in the RDP model. In the implementation of RDP, the incoming sensor data is divided using a sliding window technique; the data inside a sliding window is an epoch. Furthermore, the time lag of two consecutive epochs is controlled by arranging the data overlap between the two windows. For instance, if the first epoch consists of a set of data points (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑁𝑁−1 ), then the second epoch will consist of data points (𝑥𝑥2 , 𝑥𝑥3 , … , 𝑥𝑥𝑁𝑁 ) by using time lag of one. In RDP model, the data point 𝑥𝑥𝑖𝑖𝑡𝑡 at time epoch t is generated from a distribution with latent variable 𝜃𝜃𝑖𝑖𝑡𝑡 . The conditional distribution of 𝜃𝜃𝑖𝑖𝑡𝑡 can be formulated by using the information from an immediately preceding epoch as follows [30], 𝑡𝑡 𝜃𝜃𝑖𝑖𝑡𝑡 |𝜽𝜽𝑡𝑡−1 , 𝜃𝜃1𝑡𝑡 , … , 𝜃𝜃𝑖𝑖−1 , 𝐺𝐺0 , 𝛼𝛼~ ∙

1 � 𝑁𝑁 𝑡𝑡−1 + 𝑖𝑖 − 1 + 𝛼𝛼

�

𝑗𝑗∈𝐽𝐽𝑡𝑡−1 ∪ 𝐽𝐽𝑡𝑡

𝑡𝑡 �𝑛𝑛∙𝑗𝑗𝑡𝑡−1 + 𝑛𝑛−𝑖𝑖,𝑗𝑗 � 𝛿𝛿�𝜙𝜙𝑗𝑗𝑡𝑡 � + 𝛼𝛼𝐺𝐺0 �

(10)

B. Recurrent hierarchical Dirichlet process (RHDP) for evolutionary clustering of process states For complex manufacturing processes, each process state manifests in unique signal distributions. A control chart cannot classify the differences in process states because the control limits are estimated based on the so-called, in-control state alone. In order to identify the specific process anomalies (drifts), we herewith propose recurrent hierarchical Dirichlet process (RHDP) clustering. And we demonstrate the utility of the RHDP clustering via a simulation-based study.

where 𝜃𝜃𝑖𝑖𝑡𝑡 is the distribution parameter for data 𝑥𝑥𝑖𝑖𝑡𝑡 at time epoch stands for distribution parameters for all data at previous t, 𝜽𝜽𝑡𝑡−1 ∙ time epoch t-1; 𝜙𝜙𝑗𝑗𝑡𝑡 is the distribution parameter value of mixture component 𝑗𝑗 at time epoch 𝑡𝑡 ; 𝛿𝛿( 𝜙𝜙𝑗𝑗𝑡𝑡 ) is the Dirac delta function peaked on 𝜙𝜙𝑗𝑗𝑡𝑡 ; 𝑁𝑁 𝑡𝑡−1 denotes the number of data points at time epoch t-1; 𝑛𝑛∙𝑗𝑗𝑡𝑡−1 is the number of data points associated 𝑡𝑡 with mixture component 𝑗𝑗 at time epoch t-1; 𝑛𝑛−𝑖𝑖,𝑗𝑗 is the number of data points associated with mixture component 𝑗𝑗 at time epoch 𝑡𝑡 with data point 𝑖𝑖 excluded; and 𝐽𝐽𝑡𝑡 includes all the unique mixture components at time epoch t. In accordance with these conditions, in a given epoch t, an observation belonging to component j is proportional to the number of observations in component j at that epoch t, plus the number of observations in component j at previous epoch t-1. Subsequently, the distribution parameters of a particular epoch can be estimated by invoking the RDP model. Component assignments are made by using Gibbs sampling algorithm.

1) Theoretical Development of RHDP model As noted previously at the beginning of Sec. III, there are two elements that are critical towards formulating RHDP model, namely, recurrent Dirichlet process (RDP) model, and hierarchical Dirichlet Process (HDP) model. We will cover these elements in the forthcoming sub-sections, and finally demonstrate the development of the RHDP model.

b) Recurrent hierarchical Dirichlet process (RHDP) model Although we have estimated the data distribution at each time epoch in a sequential manner using RDP, we have not yet considered the evolution of distributions between time epochs. For instance, during the physical process, as depicted in Fig. 9, the following four possibilities exist: 1) the signals dynamics remain stationary (no change);

8 new components may emerge; the parameters of mixture components may change over time; and 4) an existing component may die. Therefore, to classify the process, it is essential to track the evolution of mixture components between time epochs. 2) 3)

We can estimate marginal distributions of mixture component at two levels of DP by integrating out 𝐺𝐺0 and 𝐺𝐺 𝑡𝑡 . The conditional distribution for 𝜃𝜃𝑖𝑖𝑡𝑡 can be calculated by integrating out 𝐺𝐺 𝑡𝑡 as follows, 𝑡𝑡 𝜃𝜃𝑖𝑖𝑡𝑡 |𝜽𝜽𝑡𝑡−1 , 𝜃𝜃1𝑡𝑡 , … , 𝜃𝜃𝑖𝑖−1 , 𝐺𝐺0 , 𝛼𝛼~ .

𝑁𝑁𝑡𝑡−1

Fig. 9. Possible evolutions of data distribution in a physical process.

The concept of hierarchical Dirichlet process (HDP), which is essentially a multiple-level Dirichlet process, as proposed by Teh et al. [31], could be adopted in this context. Unlike with DP, in a two-level HDP model, the parent Dirichlet process 𝐺𝐺0 is a random variable distributed with concentration parameter 𝛾𝛾 and base distribution H. The so-called child Dirichlet processes 𝐺𝐺𝑗𝑗 ′𝑠𝑠 have concentration parameter 𝛼𝛼 and base distribution 𝐺𝐺0 . Since 𝐺𝐺0 is discrete, the child Dirichlet processes 𝐺𝐺𝑗𝑗 ′𝑠𝑠 share atoms (mixture components) with each other. Data distributions parametrized by 𝐺𝐺𝑗𝑗 ′𝑠𝑠 with the same atoms, will have the similar Gaussian components, and therefore could be clustered together [31]. In this way, clustering of data distributions can be achieved by HDP. In order to monitor the evolution in RDP-formulated data distributions between consecutive time epochs, we propose a novel approach, which integrates RDP with HDP, and consequently term the resulting approach as recurrent hierarchical Dirichlet process (RHDP). Given a temporal dataset, RHDP model could be used to monitor the distribution evolution among multiple sequential epochs, subsequently the time epochs with similar distribution characteristics can be grouped/clustered together [31, 38]. The RHDP model is formulated as, 𝐺𝐺0 |𝛾𝛾 ~ 𝐷𝐷𝐷𝐷(𝛾𝛾, 𝐻𝐻)

𝐺𝐺 𝑡𝑡 |𝛼𝛼 ~ 𝐷𝐷𝐷𝐷(𝛼𝛼, 𝐺𝐺0 ) 𝜃𝜃𝑖𝑖𝑡𝑡 |𝐺𝐺 𝑡𝑡 ~ 𝐺𝐺 𝑡𝑡 𝑡𝑡 𝑡𝑡 𝑥𝑥𝑖𝑖 |𝜃𝜃𝑖𝑖 ~ ℕ(∙ |𝜃𝜃𝑖𝑖𝑡𝑡 )

(11)

where 𝑥𝑥𝑖𝑖𝑡𝑡 for 𝑖𝑖 =1,…, 𝑁𝑁 𝑡𝑡 are observations in time epoch t; ℕ(∙ |𝜃𝜃𝑖𝑖𝑡𝑡 ) denotes the Gaussian component parameterized by 𝜃𝜃𝑖𝑖𝑡𝑡 , which is sampled from the child Dirichlet process 𝐺𝐺 𝑡𝑡 . Data in different epochs are modeled by using Gaussian mixture distributions with parameters {𝜃𝜃1𝑡𝑡 … , 𝜃𝜃𝑁𝑁𝑡𝑡 𝑡𝑡 } sampled from 𝐺𝐺 𝑡𝑡 . If the process is stationary, the parameters of the mixture distribution would remain constant. However, if there is a change in the underlying process, entailing a change in the data distribution, the current data distribution will not suit the new data, i.e., the existing parameters drawn from 𝐺𝐺 𝑡𝑡 will not appropriately model the new data. Accordingly, new samples for 𝐺𝐺0 will be drawn from the base function 𝐻𝐻 of parent Dirichlet process.

1 � + 𝑖𝑖 − 1 + 𝛼𝛼

�

𝑗𝑗∈𝐽𝐽𝑡𝑡−1 ∪𝐽𝐽𝑡𝑡

𝑡𝑡 �𝑛𝑛∙𝑗𝑗𝑡𝑡−1 + 𝑛𝑛−𝑖𝑖,𝑗𝑗 � 𝛿𝛿�𝜙𝜙𝑗𝑗𝑡𝑡 � + 𝛼𝛼𝐺𝐺0 �

(12)

where 𝜙𝜙𝑗𝑗𝑡𝑡 represents the distribution parameter of mixture component 𝑗𝑗 at time epoch 𝑡𝑡. Although (12) appears to be exactly the same as (10), they have one significant difference - note that in (12), 𝐺𝐺0 is not fixed, but distributed as Dirichlet process. The subsequent step is to integrate out 𝐺𝐺0 to get the conditional distribution for 𝜙𝜙𝑗𝑗𝑡𝑡 . Since 𝐺𝐺0 is distributed as Dirichlet process, it can be integrated out as follows, 𝑡𝑡 𝜙𝜙𝑗𝑗𝑡𝑡 |𝝓𝝓𝑡𝑡−1 , 𝜙𝜙1𝑡𝑡 , … , 𝜙𝜙𝑗𝑗−1 , H, 𝛾𝛾~ ∙

𝑀𝑀𝑡𝑡−1

1 � + 𝑗𝑗 − 1 + 𝛾𝛾

�

𝑙𝑙∈𝐿𝐿𝑡𝑡−1 ∪ 𝐿𝐿𝑡𝑡

𝑡𝑡 �𝑚𝑚∙𝑙𝑙𝑡𝑡−1 + 𝑚𝑚−𝑗𝑗,𝑙𝑙 � δ(𝜏𝜏𝑙𝑙 ) + 𝛾𝛾𝛾𝛾�

(13)

where 𝜏𝜏𝑙𝑙 denotes a value drawn from base distribution H, 𝑀𝑀𝑡𝑡−1 is the number of all Gaussian components in epoch t-1, 𝑚𝑚∙𝑙𝑙𝑡𝑡−1 is the number of Gaussian components associated with 𝜏𝜏𝑙𝑙 at time 𝑡𝑡 is the number of the Gaussian components exepoch t-1, 𝑚𝑚−𝑗𝑗,𝑙𝑙 cept components j associated with 𝜏𝜏𝑙𝑙 at time epoch 𝑡𝑡, and 𝐿𝐿𝑡𝑡 denotes the collection of samples drawn from H at epoch t. Subsequently, we obtain the posterior probability distributions for both the component values of Dirichlet process 𝐺𝐺0 in (14), and its child Dirichlet process 𝐺𝐺 𝑡𝑡 in (15), 𝑡𝑡 𝑝𝑝�𝜙𝜙𝑗𝑗𝑡𝑡 = 𝑙𝑙�𝝓𝝓𝑡𝑡−1 , 𝜙𝜙1𝑡𝑡 , … , 𝜙𝜙𝑗𝑗−1 , {𝑐𝑐(𝑥𝑥𝑖𝑖𝑡𝑡 ) = 𝑗𝑗}�~ ∙

𝑡𝑡 �𝑚𝑚∙𝑙𝑙𝑡𝑡−1 + 𝑚𝑚−𝑗𝑗,𝑙𝑙 �𝐹𝐹�𝜙𝜙𝑗𝑗𝑡𝑡 �𝜏𝜏𝑙𝑙 � if component 𝑙𝑙 is chosen � 𝛾𝛾𝛾𝛾𝛾𝛾�𝜏𝜏�𝜙𝜙𝑗𝑗𝑡𝑡 � if a new component is created

where 𝑠𝑠 = ∫ 𝐻𝐻(𝜏𝜏)𝐹𝐹�𝜙𝜙𝑗𝑗𝑡𝑡 �𝜏𝜏�𝑑𝑑(𝜏𝜏), 𝑇𝑇(𝜏𝜏|𝜙𝜙𝑗𝑗𝑡𝑡 ) =

𝑡𝑡 𝐻𝐻(𝜏𝜏)𝐹𝐹�𝜙𝜙𝑗𝑗 �𝜏𝜏 � 𝑡𝑡 ∫ 𝐻𝐻(𝜏𝜏)𝐹𝐹�𝜙𝜙𝑗𝑗 �𝜏𝜏 �𝑑𝑑(𝜏𝜏)

(14)

; and

𝐹𝐹�𝜙𝜙𝑗𝑗𝑡𝑡 �𝜏𝜏𝑙𝑙 � is the probability of 𝜙𝜙𝑗𝑗𝑡𝑡 getting the value of 𝜏𝜏𝑙𝑙 , which can be represented by likelihood of all data belonging to component j in the mixture distribution at epoch t (i.e. all data with indicator 𝑐𝑐(𝑥𝑥𝑖𝑖𝑡𝑡 ) = 𝑗𝑗). 𝑡𝑡 𝑝𝑝(𝜃𝜃𝑖𝑖𝑡𝑡 = 𝑗𝑗|𝜽𝜽𝑡𝑡−1 , 𝜃𝜃1𝑡𝑡 , … , 𝜃𝜃𝑖𝑖−1 , 𝑥𝑥𝑖𝑖𝑡𝑡 )~ ∙

𝑡𝑡 (𝑛𝑛𝑡𝑡−1 + 𝑛𝑛−𝑖𝑖,𝑗𝑗 )ℕ�𝑥𝑥𝑖𝑖𝑡𝑡 �𝜃𝜃𝑗𝑗𝑡𝑡 � if component 𝑗𝑗 is chosen (15) � ∙𝑗𝑗 𝑡𝑡 𝛼𝛼𝛼𝛼𝛼𝛼(𝜃𝜃|𝑥𝑥𝑖𝑖 ) if a new component is created

where 𝑞𝑞 = ∫ 𝐺𝐺0 (𝜃𝜃)ℕ(𝑥𝑥𝑖𝑖𝑡𝑡 �𝜃𝜃)𝑑𝑑(𝜃𝜃), 𝑅𝑅(𝜃𝜃|𝑥𝑥𝑖𝑖𝑡𝑡 )=

𝑡𝑡 𝐺𝐺0 (𝜃𝜃)ℕ�𝑥𝑥𝑖𝑖 �𝜃𝜃 � . 𝑡𝑡 ∫ 𝐺𝐺0 (𝜃𝜃)ℕ�𝑥𝑥𝑖𝑖 �𝜃𝜃 �𝑑𝑑(𝜃𝜃)

In (15), 𝑥𝑥𝑖𝑖𝑡𝑡 represents data observation 𝑖𝑖 during time epoch t; represents the distribution parameter for data 𝑥𝑥𝑖𝑖𝑡𝑡 at time epoch t; 𝜙𝜙𝑗𝑗𝑡𝑡 represents the jth atom value of child Dirichlet process 𝐺𝐺 𝑡𝑡 (i.e. mixture component 𝑗𝑗 at time epoch 𝑡𝑡). If the base distributions H is Gaussian, i.e. it is conjugate with distribution of observations, then the integrals in (14) and (15) have analytical solutions. 𝜃𝜃𝑖𝑖𝑡𝑡

9 RHDP could attain unsupervised clustering of process states by monitoring the change of mixture components among time epochs, i.e. the evolution of sequential data distributions. RHDP clustering includes the following two major steps: 1) RDP modeling is used for sequential process data, which are segregated into sliding windows. Gibbs sampling is adopted to update the data distribution, and Pearson’s Chi-square goodness of fit (GoF) test is used to evaluate the accuracy of distribution modeling. 2) Cluster data of which the mixture distributions are from the same realizations in HDP. The average log-likelihood value of current data under previous distribution is continuously calculated and monitored as follows, 1 log�𝐿𝐿�𝑥𝑥1𝑡𝑡 , … , 𝑥𝑥𝑖𝑖𝑡𝑡 , … , 𝑥𝑥𝑤𝑤𝑡𝑡 �𝜃𝜃1𝑡𝑡−1 , 𝜃𝜃2𝑡𝑡−1 , … , 𝜃𝜃𝑗𝑗𝑡𝑡−1 �� 𝑤𝑤 𝑗𝑗 𝑤𝑤 (16) 1 = � log �� 𝜋𝜋𝑘𝑘 ℕ(𝑥𝑥𝑖𝑖𝑡𝑡 |𝜃𝜃𝑘𝑘𝑡𝑡−1 )� 𝑤𝑤 𝑥𝑥𝑖𝑖𝑡𝑡

𝑖𝑖=1

𝑘𝑘=1

where is the incoming data in epoch t, j is the number of components in epoch t-1. If the average log-likelihood values calculated as (16) remain stable and without significant drop, it indicates that the data in these consecutive windows have the similar distribution, therefore, could be clustering as one process state. This is computationally amenable than tracking the change of mixture components. The implementation of RHDP clustering is introduced in Appendix II. In this way, by tracking the evolution of mixture distributions at consecutive time epochs using RHDP model, process drifts in complex manufacturing processes, such as semiconductor CMP, can be monitored, and different process states (e.g. different anomalies) can be identified. We demonstrate this assertion herewith using a numerical example. 2) RHDP clustering analysis for simulated data in sequential epochs The aim of this case study is to demonstrate the ability of the RHDP clustering to group non-Gaussian, nonstationary sequentially acquired time series data. We show that by using numerically generated data the unsupervised clustering technique of RHDP identifies specific process states contingent on their data distributions. As noted in the preceding section, we continuously monitor the average log-likelihood value of new data as in (16). For a stationary process, the data distribution does not change over time, therefore average log-likelihood values remain stable. If the average log-likelihood values were to change dramatically, it indicates that the current data is not generated from the existing distribution but from a new one. Therefore, all the time epochs preceding the change of log-likelihood values are grouped into the same cluster, given their distribution similarity. Additionally, we note that a transition period between two process states is inevitable, because RHDP model splits the data into time epochs (windows), consequently, some windows will contain data from two temporally adjacent process states. In this study, we define following three mixture distributions from which the data is sequentially generated (Fig. 10(a)): • D1: 𝑥𝑥𝑡𝑡 ~0.5ℕ(0,0.2) + 0.5ℕ(1.5,1) • D2: 𝑥𝑥𝑡𝑡 ~0.2ℕ(0,0.5) + 0.8ℕ(3,0.5)

•

D3: 𝑥𝑥𝑡𝑡 ~0.5ℕ(0,0.5) + 0.5ℕ(2,0.7)

Fig. 10. (a) Generated three-part data from different Gaussian mixture distributions, (b) average log-likelihood values of data in time epochs. Three different shades indicate data from distributions D1, D2 and D3, and the white areas between different parts of data are transition periods.

Referring to Fig. 10(a), the data naturally clusters into three parts, as shaded by different colors. The corresponding average log-likelihood values as estimated using RHDP are shown in Fig. 10(b). The unshaded parts indicate the transition periods. We report results from a ten-replication study, and compare the clustering results from RHDP with mean shift method [23], a frequently used unsupervised clustering method; mean shift uses the raw data as opposed to utilizing average log-likelihood values by RHDP. Since the labels of sequential data are known, in order to evaluate the effectiveness of RHDP in terms of percentage of correctly clustering data, we use the F-score (precision and sensitivity) as the evaluation metric [39]. The higher the F-score, the more accurate the model is. The clustering results are presented in Table I, and it is evident that RHDP clustering has both higher precision and sensitivity compared to mean shift, consequently, the F-score for RHDP clustering is significantly higher than mean shift (98% vs 85%). This is because RHDP utilizes all the characteristics of data distribution to compute average log-likelihood values (see (16)), while mean shift only uses the average values of data. TABLE I F-SCORE RESULTS FOR DATA SERIES WITH THREE DISTRIBUTIONS – COMPARISON OF RHDP CLUSTERING VS. MEAN SHIFT (THE VALUES IN THE PARENTHESIS ARE THE STANDARD DEVIATION) RHDP Clustering D1

D2

D3

Precision 0.9872 0.9784 0.9826 (0.0117) (0.0150) (0.0097) Sensitivity

0.9913 (0.0194)

1 (0)

1 (0)

F-score

0.9892

0.9891

0.9912

Average F-score

0.9898

Mean Shift D1 1 (0)

D2

D3

0.6573 0.9825 (0.2644) (0.0304)

0.9263 0.9889 0.6852 (0.0798) (0.0248) (0.3060) 0.9617

0.7897

0.8074

0.8592

Note on Computational Time: Due to continuous updates to the distribution estimates on sequential data, the computational time of RHDP clustering is about 1.3ms per data point (200point window with 10-point overlap is used in this simulation), and that of mean shift is about 0.03ms per data point (with Intel® Core™ i7-4770 CPU@ 3.40GHz). Still the computational

10 time of RHDP clustering is fast enough (with sampling frequency ~700Hz) for our application in CMP. IV. APPLICATION OF DPGM MODELS FOR ONLINE MONITORING OF CMP PROCESS The aim of this section is to verify the effectiveness of the proposed Dirichlet process Gaussian mixture (DPGM) models in a practical advanced manufacturing scenario, namely, for monitoring a semiconductor chemical mechanical planarization (CMP) process [1, 40]. DPGM models will be used towards attaining two specific goals in the context of CMP: i. Detection of process anomalies using the DP-based SPC as explained in Sec. III.A. ii. Identification of anomalies using RHDP clustering as described in Sec. III.B. A. Experimental Setup CMP is a vital back-end-of-line (BEOL) process in semiconductor manufacturing. Semiconductor wafer defects resulting from CMP process drifts can lead to high yield losses [3]. It is therefore desirable to ensure defect-free operation in CMP by employing real-time in situ sensor-based process monitoring approaches [1]. Various sensors, such as acoustic emission, force, and vibration sensors, have been applied for CMP process monitoring [1, 41-46]. Miniature wireless MEMS devices are particularly attractive for in situ monitoring applications due to their weight, and energy efficiency. MEMS vibration sensors have been successfully used hitherto for model-based monitoring, material removal rate estimation, and endpoint detection in CMP process [45, 46]. In this work, we use a Buehler Automet 250 benchtop CMP apparatus for our experiments. Further details of the setup and experimental outcomes are available in our recent publication (Ref. [1]). A tri-axis MEMS vibration sensor (ADXL 335) manufactured by Analog Devices Inc. is mounted on the apparatus to collect sensor data. The sensor signals are sampled at 670 Hz and transmitted wirelessly to a desktop computer with a matching wireless receiver unit. The CMP setup and wireless sensor network are shown in Fig. 11(a) and (b). Blanket copper wafer disks of Φ1.625 inch (40.625 mm) are polished in KOH-based alkaline colloidal silica slurry medium, which has a constant flow rate of 20ml/min. Near-optical (arithmetic average roughness, Sa ~ 5 nm) quality surface finish blanket copper wafers are obtained by polishing with a priori identified optimal processing conditions (Fig. 11(c) and (d)). However, as described previously in Fig. 1, sensor signals acquired from CMP process are complex; they may violate normality and linearity conditions. Consequently, traditional SPC and mean shift clustering approaches may not lend towards detection of CMP process anomalies as demonstrated in the numerical case studies presented in the foregoing Sec. III. B. CMP Experimental Tests and Case Studies In our experimental tests, certain CMP process parameters are deliberately changed to induce precisely controlled defects on the semiconductor wafer (e.g., scratches on the wafer). The following practical case studies are illustrated in this section.

(1) Case E1 – changes in polishing load or downforce; (2) Case E2 – wear of the polishing pad; and (3) Case E3 – sequential changes in processing conditions. The first two of the above cases are instances where DPbased SPC will be applied for detecting process anomalies; the last case, Case E3, involves identification of specific anomalies using RHDP clustering.

Fig. 11. (a) (b) Buehler Automet® 250 experimental CMP setup with the integrated wireless sensor, (c) (d) near-specular CMP finished copper wafers.

1) Case E1 – Capturing changes in CMP polishing load (downforce) with data slightly violating normal assumption The polishing load is one of the most significant factors in CMP and determines not only physical aspects, such as the nature of tribological contact, but also key process output variables, namely, material removal rate, within wafer non-uniformity, surface quality, etc. [3]. In this experiment, a change in polishing load (downforce) is monitored based on acquired vibration sensor data. As depicted in Fig. 12(a), after a low load (5 lb.) is active for the first half time, the load is suddenly increased to a high load (8 lb.) condition. All other factors, namely, head speed and base speed are maintained constant at 60 RPM and 150 RPM, respectively. We acquire 4000 data points in total, amounting to about 6 seconds of polishing, during which the change of load occurs approximately midway. A visible prominent shift in signal mean, as well as variation, is evident; the signal mean and variation increase with an increase in downforce. CUSUM, EWMA, and DP-based SPC are applied to the same time series data, allowing us to compare their ARL1 results. The control limits are adjusted a priori to maintain identical type I error probabilities (α-error) at 5%, this translates to an ARL0 of 200. The results from a ten-replication study are presented in Fig. 12(b). Moreover, it is observed that the CMP vibration data departs from Gaussian distribution as indicated by the Anderson-Darling goodness-of-fit test. Depending on the severity (p-value) of the non-Gaussian nature of the data distribution, the DP-based SPC charts are faster (low ARL1) compared to the CUSUM and EWMA charts. For instance, referring to Table II, the DP-based SPC detects the change in polishing load within ~ 21ms (14 data points) on average, whereas

11 CUSUM and EWMA require ~ 27ms (18 data points).

Fig. 12. (a) Representative vibration signal patterns obtained under changing load conditions, (b) comparison of ARL1 in changing load conditions.

2) Case E2 – Capturing wear of CMP polishing pad with data severely violating normal assumption Degradation of the polishing pad is caused by wear overtime, selection of sub-optimal process conditions, or improper postprocess handling [1]. For instance, inadequate post-process cleaning allows the residual slurry to dry and coagulate on the pad. Also, some portions of the polishing pad may be sheared away during polishing, thus exposing the underlying hard layer. Such polishing pads are glazed, i.e., the fibers of the polishing pad become entangled and lose the ability to retain slurry abrasives [1]. Polishing with a glazed pad leads to deep scratches and non-uniform wafer morphology [1]. The DP-based SPC aims to detect a degraded pad condition. It is constructed by training the in-control (IC) mixture distribution with operational data using good pads, then applied to monitor CMP runs. This is akin to building a Phase I control chart based on an a priori in-control process state [7]. The degraded pad is treated as the shifted process state. In the case study, in order to verify the efficiency of DPbased SPC in detecting a degraded pad, we combine data from two experiments. The first half of the data (2000 data points, ~ 3 sec) is obtained from an experiment where a new pad is used, while the second half is gathered from an experiment conducted with a glazed pad. DP-based SPC is compared with CUSUM and EWMA in terms of detection of the pad condition change.

CUSUM requires ~ 56ms (37 data points), and EWMA over 140ms (99 data points). In addition, another polishing experiment is conducted to show the effectiveness of DP-based SPC in detecting signal change with a mildly-used pad (neither brand new nor glazed). From Fig. 14(a), it is noticed that there is a slight mean shift of vibration signals after switching to the mildly-used pad. Comparing with the results in Fig. 13(b), the time of detecting pad degradation by DP-based SPC increases to ~ 65ms (43 data points) on average, and CUSUM increases to ~ 71ms (48 data points) on average. Yet DP-based SPC still outperforms EWMA and CUSUM.

Fig. 14. (a) Representative vibration signal patterns obtained for pad degradation experiments, (b) comparison of ARL1 for pad degradation.

The ARL1 results from the foregoing cases are summarized in Table II. The following inferences can be obtained: • If the in-control data slightly deviate from Gaussian distributed as in Case E1, then DP-based SPC detects the process anomalies nearly as quickly as EWMA and CUSUM. • If the normality or symmetry conditions for the data distributions are violated severely as in Case E2, then DP-based SPC significantly outperforms CUSUM and EWMA. These results agree closely with the implications from the numerical studies discussed in Sec. III.A. It is noticed that the relative performance of DP-based SPC against traditional SPC drops with experimental data: while numerical case studies are generated from highly non-Gaussian and/or nonlinear systems (𝜒𝜒12 , bimodal data, or Rössler attractor), the experimental data manifest a modicum of similarity to Gaussian data [1, 21]. TABLE II COMPARISON OF ARL1 VALUES FOR TWO PREDEFINED PROCESS ANOMALIES WITH TRADITIONAL SPC AND DP-BASED SPC. THE UNITS ARE IN MILLISECONDS (MS). CASE DP-SPC EWMA CUSUM

Fig. 13. (a) Representative vibration signal patterns obtained for pad wear experiments, (b) comparison of ARL1 for pad wear.

Fig. 13(a) depicts the vibration time series data gathered under the following CMP conditions: 8 lb. contact load, 150 RPM base speed and 60 RPM head speed. We discern from Fig. 13(a) that, not only does the mean of the vibration signal change, but also the variance of the signal slightly increases. Moreover, compared to the previous in this instance, the vibration signals are found to depart more severely from Gaussian behavior. Therefore, DP-based SPC significantly outperforms the other two methods; it detects the pad wear earlier than CUSUM, and twice as quicker than EWMA control charts. Referring to Table II, the DP-based SPC detects the change in pad wear within ~ 47ms (31 data points) on average, whereas

CASE E1: LOAD CHANGE

21

26

28

PAD WEAR CASE E2: PAD DEGRADATION

47 65

140 102

56 69

3) Case E3 – Identifying multiple sequentially occurring CMP process anomalies using RHDP clustering After having demonstrated the utility of DP-based SPC for detecting process anomalies in the last two cases, we now apply the RHDP unsupervised evolutionary clustering approach explained in Sec. III.B for identifying multiple sequentially occurring faults in CMP. This is important from an application standpoint for CMP process, since if the process is out of control, it is valuable to know what type of anomaly is prevalent so that the appropriate corrective action can be taken. In this study, three different kinds of CMP process operating

12 conditions are sequentially activated during a single experimental run, these are: • The normal condition (C1) occurs under nominally optimal process conditions, viz., 5 lb. polishing load, 150 RPM base speed, and 60 RPM head speed. • Condition C2 occurs after 3 seconds of operation (~2000 data points), the polishing load is increased to 8 lb. (the other settings are maintained at constant) for 3 seconds. • Condition C3 is when the slurry feed is low for 3 seconds (2000 data points) while the polishing load is kept at 8 lb. Vibration signal patterns acquired for this experiment are presented in Fig. 15.

methods under conditions where the sensor signal patterns are nonlinear and non-Gaussian. Practical outcomes from this research are as follows: (1) DP-based SPC detects the onset of CMP process anomalies, such as changes in pad wear, within 50 milliseconds of their inception. In contrast, the traditional methods, such as exponentially weighted moving average (EWMA) control chart has a delay of over 140 milliseconds. (2) RHDP clustering model classifies with about 80% fidelity (F-score) multiple, sequential process drifts; traditional mean shift clustering accounts for F-score under 70%. Consequently, this work addresses one of the significant challenges for process monitoring in ultraprecision manufacturing applications. As part of our future research, we aim to improve DPGM modeling in the following manner: • Increase the accuracy of distribution approximation by using extracted features instead of raw data in DP model. • Improve the computational tractability of RHDP clustering model for high dimensional data by incorporating dimension reduction techniques.

Fig. 15. Vibration data time series for the multiple process states (Case E3), including normal condition (C1), high load (C2) and low slurry (C3).

Comparison between RHDP clustering and mean shift method is also based on F-score (precision and sensitivity) borrowed from classification. The results for Case E3, presented in Table III, indicate that despite the continual change in CMP operating conditions, RHDP clustering identifies the different process states with higher precision and sensitivity compared to the conventional mean shift clustering method. For our three process states, RHDP clustering achieves an average F-score of 0.7923, which is about 10% higher than mean shift method. Moreover, the sampling frequency of RHDP clustering is ~700Hz, faster than the sampling frequency of vibration sensors (~670Hz) in CMP experiments.

ACKNOWLEDGEMENTS One of the authors (SB) also thanks the Rockwell International Chair Professorship at Texas A&M for additional financial support. The authors dedicate this article to the fond memory of Dr. Ranga Komanduri (1942-2011), a scholar and a mentor whose presence would be deeply missed. REFERENCES CITED [1]

[2]

TABLE III CLUSTERING RESULTS FOR MULTIPLE PROCESS STATES IN CMP EXPERIMENT – COMPARISON OF RHDP CLUSTERING VS. MEAN SHIFT (THE VALUES IN THE PARENTHESIS ARE THE STANDARD DEVIATION) RHDP CLUSTERING C1

C2

C3

MEAN SHIFT C1

C2

C3

PRECISION

0.8834 0.7699 0.8181 (0.0746) (0.1942) (0.1696)

0.9765 0.7121 0.7075 (0.0132) (0.1639) (0.1888)

SENSITIVITY

0.7750 0.7822 0.7367 (0.1554) (0.1911) (0.2557)

0.0740 0.5644 0.5422 (0.0548) (0.1679) (0.2253)

F-SCORE AVERAGE F-SCORE

0.8256

0.7760 0.7923

0.7752

0.8419

0.6297

[3]

0.6139

0.6952

V. CONCLUSIONS AND FUTURE RESEARCH In this paper, we devised Dirichlet process (DP)-based statistical process control (SPC) and recurrent hierarchical Dirichlet process (RHDP) unsupervised clustering for sensor-based process monitoring by the concept of Dirichlet process Gaussian mixture (DPGM) modeling. We validated these approaches using numerical simulations and real-world wireless vibration signals acquired from an experimental chemical mechanical planarization (CMP) setup. Based on these studies, it is evident that DP-based SPC and RHDP clustering outperform traditional

[4]

[5]

[6] [7] [8] [9] [10]

[11] [12]

[13]

P. K. Rao, M. B. Bhushan, S. T. Bukkapatnam, Z. Kong, S. Byalal, O. F. Beyca, et al., "Process-Machine Interaction (PMI) Modeling and Monitoring of Chemical Mechanical Planarization (CMP) Process Using Wireless Vibration Sensors," IEEE Transactions on Semiconductor Manufacturing, vol. 27, pp. 1-15, 2014. J. D. Morillo, T. Houghton, J. M. Bauer, R. Smith, and R. Shay, "Edge and bevel automated defect inspection for 300mm production wafers in manufacturing," presented at the 2005 IEEE/SEMI Advanced Semiconductor Manufacturing Conference and Workshop, Munich, Germany, 2005. J. M. Steigerwald, S. P. Murarka, and R. J. Gutmann, Chemical mechanical planarization of microelectronic materials. Weinheim, Germany: Wiley-VCH, 2008. S. Bukkapatnam, S. Kamarthi, Q. Huang, A. Zeid, and R. Komanduri, "Nanomanufacturing systems: opportunities for industrial engineers," IIE Transactions, vol. 44, pp. 492-495, 2012. H. Guo, K. Paynabar, and J. Jin, "Multiscale monitoring of autocorrelated processes using wavelets analysis," IIE Transactions, vol. 44, pp. 312-326, 2012. P. Qiu, Introduction to statistical process control. Boca Raton, FL: CRC Press, 2013. D. C. Montgomery, Introduction to Statistical Quality Control, 6 ed. New York, NY: John Wiley & Sons, 2008. T. K. Moon, "The expectation-maximization algorithm," IEEE Signal Processing Magazine, vol. 13, pp. 47-60, 1996. A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review," ACM computing surveys (CSUR), vol. 31, pp. 264-323, 1999. D. Comaniciu and P. Meer, "Mean shift analysis and applications," in Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 1999, pp. 1197-1203. C. E. Rasmussen, "The infinite Gaussian mixture model," Advances in Neural Information Processing Systems, vol. 12, pp. 554-560, 2000. S. Chakraborti, P. Van der Laan, and S. Bakir, "Nonparametric control charts: an overview and some results," Journal of Quality Technology, vol. 33, 2001. P. Qiu and Z. Li, "On nonparametric statistical process control of univariate processes," Technometrics, vol. 53, 2011.

13 [14] P. Qiu and Z. Li, "Distribution-free monitoring of univariate processes," Statistics & Probability Letters, vol. 81, pp. 1833-1840, 2011. [15] R. Ganesan, T. K. Das, and V. Venkataraman, "Wavelet-based multiscale statistical process monitoring: A literature review," IIE Transactions, vol. 36, pp. 787-806, 2004. [16] J.-C. Lu, S.-L. Jeng, and K. Wang, "A review of statistical methods for quality improvement and control in nanotechnology," Journal of Quality Technology, vol. 41, pp. 148 - 164, 2009. [17] R. Ganesan, T. K. Das, A. K. Sikder, and A. Kumar, "Wavelet-based identification of delamination defect in CMP (Cu-low k) using nonstationary acoustic emission signal," IEEE Transactions on Semiconductor Manufacturing, vol. 16, pp. 677-685, 2003. [18] M. K. Jeong, J.-C. Lu, and N. Wang, "Wavelet-based SPC procedure for complicated functional data," International Journal of Production Research, vol. 44, pp. 729-744, 2006. [19] G. A. Pugh, "A comparison of neural networks to SPC charts," Computers & Industrial Engineering, vol. 21, pp. 253-255, 1991. [20] F. Zorriassatine and J. Tannock, "A review of neural networks for statistical process control," Journal of Intelligent Manufacturing, vol. 9, pp. 209-224, 1998. [21] P. K. Rao, "Sensor-based monitoring and inspection of surface morphology variations in ultraprecision manufacturing processes," PhD Dissertation, Industrial Engineering and Management, Oklahoma State University, Stillwater, OK, 2013. [22] P. Rao, S. Bukkapatnam, O. Beyca, Z. J. Kong, and R. Komanduri, "Real-Time Identification of Incipient Surface Morphology Variations in Ultraprecision Machining Process," Journal of Manufacturing Science and Engineering, vol. 136, pp. 021008-1 - 021008-11, 2014. [23] D. Comaniciu and P. Meer, "Mean shift: A robust approach toward feature space analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 603-619, 2002. [24] S. W. Choi, J. H. Park, and I.-B. Lee, "Process monitoring using a Gaussian mixture model via principal component analysis and discriminant analysis," Computers & Chemical Engineering, vol. 28, pp. 1377-1387, 2004. [25] U. Thissen, H. Swierenga, A. De Weijer, R. Wehrens, W. Melssen, and L. Buydens, "Multivariate statistical process control using mixture modelling," Journal of Chemometrics, vol. 19, pp. 23-31, 2005. [26] T. Chen, J. Morris, and E. Martin, "Probability density estimation via an infinite Gaussian mixture model: application to statistical process monitoring," Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 55, pp. 699-715, 2006. [27] A. R. Ferreira da Silva, "A Dirichlet process mixture model for brain MRI tissue classification," Medical Image Analysis, vol. 11, pp. 169-182, 2007. [28] J. Zhang, Z. Ghahramani, and Y. Yang, "A probabilistic model for online document clustering with application to novelty detection," presented at the Advances in Neural Information Processing Systems 17, 2005. [29] P. K. Rao, J. Liu, D. Roberson, Z. Kong, and C. B. Williams, "Online Real-time Quality Monitoring in Additive Manufacturing Processes using Heterogeneous Sensors," Journal of Manufacturing Science and Engineering, vol. 137, p. 061007, 2015. [30] A. Ahmed and E. P. Xing, "Dynamic non-parametric mixture models and the recurrent chinese restaurant process," in Proceedings of the 2008 SIAM International Conference on Data Mining, Atlanta, GA, 2008, pp. 219-230. [31] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, "Hierarchical dirichlet processes," Journal of the American statistical Association, vol. 101, pp. 1566-1581, 2006. [32] M. D. Escobar and M. West, "Bayesian Density Estimation and Inference Using Mixtures," Journal of the American Statistical Association, vol. 90, pp. 577-588, 1995. [33] D. Blackwell and J. B. MacQueen, "Ferguson distributions via Pólya urn schemes," The Annals of Statistics, vol. 1, pp. 353-355, 1973. [34] J. Crutchfield, D. Farmer, N. Packard, R. Shaw, G. Jones, and R. Donnelly, "Power spectral analysis of a dynamical system," Physics Letters A, vol. 76, pp. 1-4, 1980. [35] S. Bukkapatnam, P. Rao, and R. Komanduri, "Experimental dynamics characterization and monitoring of MRR in oxide chemical mechanical planarization (CMP) process," International Journal of Machine Tools and Manufacture, vol. 48, pp. 1375-1386, 2008. [36] H. Kantz and T. Schreiber, Nonlinear time series analysis, 2 ed. vol. 7. Cambridge, UK, New York: Cambridge University Press, 2004. [37] C. A. Lowry and D. C. Montgomery, "A review of multivariate control charts," IIE Transactions, vol. 27, pp. 800-810, 1995.

[38] A. Rodriguez, D. B. Dunson, and A. E. Gelfand, "The nested Dirichlet process," Journal of the American statistical Association, vol. 103, pp. 1131-1154, 2008. [39] D. M. Powers, "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation," Flinders University, Adelaide, Australia, Online SIE-07-001, 2007. [40] P. B. Zantye, A. Kumar, and A. Sikder, "Chemical mechanical planarization for microelectronics applications," Materials Science and Engineering: R: Reports, vol. 45, pp. 89-220, 2004. [41] Z. Kong, A. Oztekin, O. F. Beyca, U. Phatak, S. T. S. Bukkapatnam, and R. Komanduri, "Process performance prediction for chemical mechanical planarization (CMP) by integration of nonlinear bayesian analysis and statistical modeling," IEEE Transactions on Semiconductor Manufacturing, vol. 23, pp. 316 - 327, 2010. [42] H. Jeong, H. Kim, S. Lee, and D. Dornfeld, "Multi-sensor monitoring system in chemical mechanical planarization (CMP) for correlations with process issues," CIRP Annals-Manufacturing Technology, vol. 55, pp. 325-328, 2006. [43] A. Sikder, F. Giglio, J. Wood, A. Kumar, and M. Anthony, "Optimization of tribological properties of silicon dioxide during the chemical mechanical planarization process," Journal of Electronic Materials, vol. 30, pp. 1520-1526, 2001. [44] J. Tang, D. Dornfeld, S. K. Pangrle, and A. Dangca, "In-process detection of microscratching during CMP using acoustic emission sensing technology," Journal of Electronic Materials, vol. 27, pp. 10991103, 1998. [45] Z. Kong, O. Beyca, S. Bukkapatnam, and R. Komanduri, "Nonlinear Sequential Bayesian Analysis-Based Decision Making for End-Point Detection of Chemical Mechanical Planarization (CMP) Processes," IEEE Transactions on Semiconductor Manufacturing, vol. 24, pp. 523532, 2011. [46] U. Phatak, S. Bukkapatnam, Z. Kong, and R. Komanduri, "Sensor-based modeling of slurry chemistry effects on the material removal rate (MRR) in copper-CMP process," International Journal of Machine Tools and Manufacture, vol. 49, pp. 171-181, 2009.

APPENDIX I Step 1

Step 2

Step 3

Step 4

(IMPLEMENTATION OF DP-BASED SPC) Assign conjugate prior to parameter 𝜃𝜃𝑖𝑖 of Gaussian component (mean 𝜇𝜇𝑖𝑖 and variance 𝜎𝜎𝑖𝑖2 ) by 𝐺𝐺0 in DP. 𝜎𝜎𝑖𝑖2 ~𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼ℎ𝑎𝑎𝑎𝑎𝑎𝑎 (𝑆𝑆0 , 𝑛𝑛0 ) 𝜇𝜇𝑖𝑖 |𝜎𝜎𝑖𝑖2 ~𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁(𝜇𝜇0 , 𝑘𝑘0 𝜎𝜎𝑖𝑖2 ) 𝐺𝐺0 = 𝑝𝑝(𝜃𝜃𝑖𝑖 ) = 𝑝𝑝(𝜇𝜇𝑖𝑖 , 𝜎𝜎𝑖𝑖2 ) where 𝑆𝑆0 , 𝑛𝑛0 , 𝜇𝜇0 , 𝑘𝑘0 are hyperparameters. Apply Gibbs sampling to acquire parameters of each Gaussian components from posterior conditional distributions in (5). 𝑛𝑛−𝑖𝑖,𝑗𝑗 ℕ�𝑥𝑥𝑖𝑖 �𝜃𝜃𝑗𝑗 � 𝛼𝛼𝛼𝛼𝛼𝛼(𝜃𝜃|𝑥𝑥𝑖𝑖 ) (𝜃𝜃𝑖𝑖 = 𝑗𝑗|𝜽𝜽−𝑖𝑖 , 𝑥𝑥𝑖𝑖 )~ + 𝑁𝑁 − 1 + 𝛼𝛼

𝑁𝑁 − 1 + 𝛼𝛼

Choose average log-likelihood values as monitoring statistic, and calculate average log-likelihood values 𝑦𝑦𝑡𝑡 for IC data under the trained mixture distribution in (6). 𝑤𝑤

𝑗𝑗

𝑖𝑖=1

𝑘𝑘=1

1 𝑦𝑦𝑡𝑡 = � log �� 𝜋𝜋𝑘𝑘 ℕ(𝑥𝑥𝑖𝑖 |𝜃𝜃𝑘𝑘 )� 𝑤𝑤

Establish the control limit of monitoring statistic under the scheme of CUSUM, and perform anomaly detection. APPENDIX II (IMPLEMENTATION OF RHDP CLUSTERING)

14 Step 1 Step 2

Initialize data distribution by using DP mixture model on the initial window of data. Calculate the average log-likelihood value of data in next window 𝑦𝑦 𝑡𝑡 under current mixture distribution in (16).

Step 3

Step 4

Step 5

𝑤𝑤

𝑗𝑗

𝑖𝑖=1

𝑘𝑘=1

1 𝑦𝑦 = � log �� 𝜋𝜋𝑘𝑘 ℕ(𝑥𝑥𝑖𝑖𝑡𝑡 |𝜃𝜃𝑘𝑘𝑡𝑡−1 )� 𝑤𝑤 𝑡𝑡

Update the mixture distribution on next window of data by using RDP in (15). 𝑡𝑡 (𝜃𝜃𝑖𝑖𝑡𝑡 = 𝑗𝑗|𝜽𝜽𝑡𝑡−1 , 𝜃𝜃1𝑡𝑡 , … , 𝜃𝜃𝑖𝑖−1 , 𝑥𝑥𝑖𝑖𝑡𝑡 ) ∙ 𝑡𝑡 ~(𝑛𝑛∙𝑗𝑗𝑡𝑡−1 + 𝑛𝑛−𝑖𝑖,𝑗𝑗 )ℕ�𝑥𝑥𝑖𝑖𝑡𝑡 �𝜃𝜃𝑗𝑗𝑡𝑡 � + 𝛼𝛼𝛼𝛼𝛼𝛼(𝜃𝜃|𝑥𝑥𝑖𝑖𝑡𝑡 ) Investigate whether consecutive mixture distributions are from the same draws in HDP by detecting a drop in average log-likelihood values. Cluster data sharing similar mixture distributions and achieve different states identification.

Jia (Peter) Liu received the B.S. and M.S. degrees in electrical engineering from Zhejiang University, Hangzhou, China, in 2005 and 2007, respectively. He is currently pursuing the Ph.D. degree in industrial and systems engineering at Virginia Tech, Blacksburg, VA, USA. His research interest includes the development of nonparametric Bayesian statistical models and spatial statistical models for online monitoring and heterogeneous sensor data analysis in manufacturing processes. Mr. Liu is a member of the Institute of Industrial Engineers, and the Institute for Operations Research and the Management Sciences. Omer F. Beyca received the B.S. degree from Fatih University and Ph.D. degree from Oklahoma State University in industrial engineering and management. He currently is an Assistant Professor in the Industrial Engineering Department at Istanbul Technical University, Turkey. His research interests include dynamic modeling of nonlinear systems, quality control in micromachining, and sensor-based modeling. Prahalad K. Rao received the B. Eng. degree (First Class) from Victoria Jubilee Technical Institute (VJTI), Bombay University, India, in 2003, and the M.S. and Ph.D. degrees in industrial engineering from Oklahoma State University (OSU), Stillwater, OK, USA, in 2006 and 2013, respectively. He is currently Assistant Professor in the Mechanical and Materials Engineering Department at University of NebraskaLincoln (UNL). Prior to joining UNL, from 2014-2016 he held the position of Assistant Professor, System Science and Industrial Engineering (SSIE) Department at State University of New

York at Binghamton (Binghamton University). His research focuses on sensor-based monitoring of manufacturing processes (ultraprecision diamond turning, chemical mechanical planarization, and additive manufacturing processes); his research has attracted funding from the National Science Foundation. Dr. Rao is a member of Institute of Industrial Engineers (IIE), American Society for Quality (ASQ), and Institute for Operations Research and the Management Sciences (INFORMS). Zhenyu (James) Kong received the B.S. and M.S. degrees in mechanical engineering from the Harbin Institute of Technology, Harbin, China, in 1993 and 1995, respectively, and the Ph.D. degree from the Department of Industrial and System Engineering, University of Wisconsin-Madison, Madison, WI, USA, in 2004. He is currently an Associate Professor with the Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA. Prior to that, he was a Faculty Member with the School of Industrial Engineering and Management, Oklahoma State University, Stillwater, OK, USA. His research focuses on sensing and analytics for advanced manufacturing, and automatic quality control for large and complex manufacturing systems. Dr. Kong received the Halliburton Outstanding Faculty Award from the College of Engineering, Architecture and Technology at the Oklahoma State University in 2013. He is a member of Institute of Electrical and Electronics Engineers, the Institute of Industrial Engineers, and the American Society of Mechanical Engineers. Satish T. S. Bukkapatnam received his Ph.D. degree in industrial and manufacturing engineering from the Pennsylvania State University. He currently serves as Rockwell International Professor with Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, USA. He is also the Director of Texas A&M Engineering Experimentation Station (TEES) Institute for Manufacturing Systems, and has joint appointments with Biomedical and Mechanical Engineering departments. His research addresses the harnessing of high-resolution nonlinear dynamic information, especially from wireless MEMS sensors, to improve the monitoring and prognostics, mainly of ultraprecision and nanomanufacturing processes and machines, and cardiorespiratory processes. His research has led to 133 peer-reviewed publications (74 published/ accepted in journals and 59 in conference proceedings); five pending patents; $5 million in grants as PI/Co-PI from the National Science Foundation, the U.S. Department of Defense, and the private sector; and 10 best-paper/poster recognitions. Prof. Bukkapatnam is a fellow of the Institute for Industrial and Systems Engineers (IISE), and he has been recognized with Oklahoma State University regents distinguished research, Halliburton outstanding college of engineering faculty, IISE Eldin outstanding young industrial engineer and the Society of Manufacturing Engineers Dougherty outstanding young manufacturing engineer awards.

Dirichlet Process Gaussian Mixture

Dirichlet Process Gaussian Mixture

Suggest Documents

A hierarchical Dirichlet process mixture of generalized

Spiked Dirichlet Process Priors for Gaussian Process Models

Fast Bayesian Inference in Dirichlet Process Mixture Models - CiteSeerX

Nonparametric empirical Bayes for the Dirichlet process mixture model

Dirichlet Process Mixture Models on Symmetric Positive Definite ...

Spike Sorting Using Time-Varying Dirichlet Process Mixture Models

Nonparametric empirical Bayes for the Dirichlet process mixture model ...

Active learning for constrained Dirichlet process mixture ... - CiteSeerX

Spike Sorting Using Time-Varying Dirichlet Process Mixture Models

Gaussian mixture models

Adaptive Collaborative Gaussian Mixture

Variational Mixture of Gaussian Process Experts - NIPS Proceedings

Dirichlet Process - Google Sites

Continuous Gaussian Mixture Modeling - CiteSeerX

GAUSSIAN SCALE MIXTURE MODELS FOR

Gaussian Mixture Probability Hypothesis Density

Hierarchical Dirichlet Scaling Process - CiteSeerX

Generalized spatial Dirichlet process models

Dirichlet Mixtures, the Dirichlet Process, and the Structure of Protein ...

Iterative Mixture Component Pruning Algorithm for Gaussian Mixture ...

An Improved Gaussian Mixture CKF Algorithm under Non-Gaussian ...

Application of incremental Gaussian mixture ... - Stanford University

Expectation-Maximization Gaussian-Mixture Approximate Message ...

An Unsupervised Gaussian Mixture Classification Mechanism Based ...