Concurrent control chart patterns recognition with singular spectrum ...

6 downloads 50742 Views 336KB Size Report
Nov 6, 2012 - more, it has good generalization performance of dealing with the small samples. Superior ... of the most important tools for monitoring process disturbances, ..... Seo (2007) did comparisons for face recognition with various.
Computers & Industrial Engineering 64 (2013) 280–289

Contents lists available at SciVerse ScienceDirect

Computers & Industrial Engineering journal homepage: www.elsevier.com/locate/caie

Concurrent control chart patterns recognition with singular spectrum analysis and support vector machine q Liangjun Xie a, Nong Gu b,⇑, Dalong Li c, Zhiqiang Cao d, Min Tan d, Saeid Nahavandi b a

Schlumberger Limited, 1310 Rankin Road, Houston, TX 77073, USA Centre for Intelligent Systems Research, Deakin University, 75 Pigdons Road, Waurn Ponds, VIC 3216, Australia c Hewlett–Packard, 11445 Compaq Center Dr. West, Houston, TX 77070, USA d State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, PR China b

a r t i c l e

i n f o

Article history: Received 23 January 2012 Received in revised form 6 October 2012 Accepted 8 October 2012 Available online 6 November 2012 Keywords: Control charts Concurrent patterns Singular spectrum analysis Support vector machine

a b s t r a c t Since abnormal control chart patterns (CCPs) are indicators of production processes being out-of-control, it is a critical task to recognize these patterns effectively based on process measurements. Most methods on CCP recognition assume that the process data only suffers from single type of unnatural pattern. In reality, the observed process data could be the combination of several basic patterns, which leads to severe performance degradations in these methods. To address this problem, some independent component analysis (ICA) based schemes have been proposed. However, some limitations are observed in these algorithms, such as lacking of the capability of monitoring univariate processes with only one key measurement, misclassifications caused by the inherent permutation and scaling ambiguities, and inconsistent solution. This paper proposes a novel hybrid approach based on singular spectrum analysis (SSA) and support vector machine (SVM) to identify concurrent CCPs. In the proposed method, the observed data is first separated by SSA into multiple basic components, and then these separated components are classified by SVM for pattern recognition. The scheme is suitable for univariate concurrent CCPs identification, and the results are stable since it does not have shortcomings found in the ICA-based schemes. Furthermore, it has good generalization performance of dealing with the small samples. Superior performance of the proposed algorithm is achieved in simulations. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction In manufacturing process, statistical process control (SPC) is one of the most important tools for monitoring process disturbances, equipment malfunctions, and other abnormalities. Control charts invented by Shewhart (1931) are key components for implementing statistical process control and have been most widely used to detect changes in the performance of a manufacturing process by observing variations of quality characteristics (Cheng, Ma, & Bu, 2011; Psarakis, 2011; Yu & Liu, 2011). The basic Shewhart charts are X and R charts plotting a sequence of process measurements with the upper/lower control limits. X chart is used for monitoring process mean, while R chart is for process variance. If the process measurements fall outside of the control limits or a series of sampled values of process variables exhibit an unnatural pattern, it is considered as an out-of-control manufacturing process. In order

q

This manuscript was processed by Area Editor Min Xie.

⇑ Corresponding author. Tel.: +61 3 52272268; fax: +61 3 52271046. E-mail addresses: [email protected] (L. Xie), [email protected] (N. Gu), [email protected] (D. Li), [email protected] (Z. Cao), [email protected] (M. Tan), [email protected] (S. Nahavandi). 0360-8352/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cie.2012.10.009

to pull the process back to normal, it is important to recognize the abnormal control chart patterns (CCPs) since they normally associate with special causes (Nelson, 1984; Western Electric, 1956). Traditionally, it relies on the running rules (Nelson, 1984; Western Electric, 1956) from the experienced process experts, which sometimes may lead to false or missing alarm (Davis & Woodall, 1998). Hence, developing intelligent methods to automatically identify the abnormal CCPs is highly demanded. Recently, many studies have been conducted to recognize the CCPs (Cheng, 1995; Ebrahimzadeh & Ranaee, 2010; Gauri, 2010; Gauri & Chakraborty, 2007, 2009; Guh & Tannock, 1999; Guh, 2005, 2010; Lin, Guh, & Shiue, 2011; Perry, JK, & Velasco, 2001; Pham & Oztemel, 1992; Pham & Wani, 1997; Purintrapiban & Corley, 2012; Yang & Yang, 2002; Yu, 2011). Most of them are mainly developed to handle single unnatural process variation, i.e., they assume that the observed process data only has one basic abnormal CCP. However, the observed process data can be a mixed results of several abnormal patterns which coexist in the manufacturing process. Since the mixed data cannot be represented by any single CCP, the underlying causes cannot be discovered with these approaches, which usually results in serious performance degradation for pattern recognition. Moreover,

L. Xie et al. / Computers & Industrial Engineering 64 (2013) 280–289

identifying the concurrent CCPs is more complicated than that for single CCP. As a consequence, the pattern classifiers with the capability of recognizing the concurrent CCPs are more desirable and challenging. Unfortunately, there are limited researches of this issue in the literatures. The common objective of the concurrent CCPs issue is to recover the basic abnormal patterns from the mixed observation without any knowledge of how they are mixed. Guh and Tannock (1999) proposed a BP neural network to recognize the concurrent CCPs which loses the capability to handle single type CCPs. Moreover, the complexity of network and training time is significantly increased due to addition of the concurrent pattern training samples. Chen, Lu, and Lam (2007) applied wavelet transform to decompose the concurrent CCPs into different levels of basic patterns, and then used BP network classifiers to classify the patterns. Wang, Kuo, and Qi (2007) also explored the idea of using wavelet filtering for decomposition and an ART network for recognition. In the wavelet pyramid structure, each signal is down-sampled and decomposed into the approximation and detail in next layer, thus for three layers wavelet, the mixed signal will be decomposed into multiple signals with different length. For example, if the original signal has some periodical series, the periods are different in different layers. Consequently, the complexity has been significantly increased. Another technical challenge of these wavelet based methods is to determine the proper level for decomposition. In this case, the details (high frequency) are discarded, thus the reconstructed patterns may not be identified correctly. Independent component analysis (ICA) methods, which have been widely used in fields such as mobile communication (Gu, Xiang, Tan, & Cao, 2007, 2011), are effective methods for recognizing concurrent CCPs. Wang, Dong, and Kuo (2009) designed a scheme that employs an ICA to separate the mixed data, and a decision tree (DT) to identify the abnormal CCPs. Lu, Shao, and Li (2011) proposed an ICA–SVM scheme where the FastICA algorithm (Hyvarinen, 1999) was first applied to decompose the mixture process data. Then, a trained support vector machine (SVM) network was employed to recognize the pattern for each component. Several shortcomings of this scheme can be noticed. FastICA cannot provide a unique solution as different initial conditions will result in different solutions. Moreover, these ICA-based methods require the number of observed manufacturing process data equal to or greater than the number of independent components (ICs). Thus, they do not work successfully to monitor univariate processes with only one key measurement. Inherent permutation and scaling ambiguities are also common issues for those approaches (Xiang, Nguyen, & Gu, 2006), which result in the incorrectly estimated sign of recovered ICs. For the concurrent CCPs, a typical example is that the up-trend patterns will be identified as the down-trend patterns. Hence, these conventional ICA-based methods may be incapable to handle the situation when one of the original patterns presented in the mixture has a counterpart pattern. In this study, a novel hybrid SSA–SVM scheme is proposed by integrating singular spectrum analysis (SSA) (Takens, 1981) and SVM. The SSA allows decomposition of a mixed signal given only one observation, thus it is suitable for univariate manufacturing process. Compared with the traditional ICA-based methods, The SSA does not cause any inherent permutation and scaling ambiguity in the ICs. Therefore it has the advantage in the separation of the concurrent CCPs. Furthermore, it is easy to reconstruct ICs by only tuning two parameters. Gu et al. (2012) adopted SSA and learning vector quantization (LVQ) in concurrent CCPs recognition, and achieved satisfactory results with reasonable large training samples. However, the parameter selection of the SSA has not been discussed in detail. Moreover, it will be time consuming and costly to accumulate enough samples to train a LVQ network in practice. Further tests indicate that the solution of LVQ is not stable in that

281

different results will be derived even with same training parameters. These shortcomings inspire us to investigate a new method based on support vector machine (SVM) for this problem. SVM is the state-of-the-arts technique for classification (Vapnik, 1995) and regression (Xie, Li, & Simske, 2011). It is theoretically well-established, and has convex object function for nonlinear SVM classifiers. As a consequence, the solution to an SVM is global and unique, while the neural networks approaches suffer from the existence of multiple local minimum solutions (Burges, 1998). Unlike neural network methods, the SVM approach is not sensitive to dimensionality since its model complexity does not depend on the number of features, and thus is suitable for high dimensional data. Moreover, the SVM has good generalization performance of dealing with the small samples and are less prone to overfitting. Another advantage of the SVM is that it has a simple geometric interpretation by optimizing the margin and gives a sparse solution. It has been shown that the SVM has better accuracy and computational advantages over their contenders (Guyon et al., 2002). Li et al. (2004) compared various classification methods for tissue classification based on gene expression and found that the SVM to be the best classifier. Based on these features mentioned above, the proposed SSA–SVM scheme should be able to provide an effective alternative in concurrent CCPs recognition. The remainder of this paper is organized as follows. The basic CCP patterns and the concurrent CCP patterns are formulated in Section 2. Section 3 briefly explains the foundations of the SSA and multi-class SVMs, and then details the proposed SSA–SVM scheme for the concurrent CCPs recognition. Numerical simulations to evaluate our proposed method are shown in Section 4. Finally, conclusions are drawn in the last section.

2. Basic and concurrent CCP patterns The common patterns appeared in SPC charts have been formalized in several research papers. By observing systematic patterns on the control charts, the Western Electric (1956) first presented a guideline to identify an out-of-control process. Nelson (1984) further systematically developed a summary for the out-of-control patterns. Most researchers (Guh & Tannock, 1999; Guh, 2005; Wang et al., 2007; Yang & Yang, 2002; Yu & Hassen, 2010; Yousef, 2004) use the following seven common types of the basic CCPs in their research. The first six patterns mean unnatural variation, i.e., a process is out of statistical control. Details of these basic CCPs and their causes are listed below. Up and Down Trend Patterns (UTP and DTP) are sequences of points which are gradually climbing/falling on the control chart. The trend patterns on the X and R charts have different meanings. Gradual introduction of a new material, tool wear, operator’s fatigue, and machine due to adjustment are the typical causes of the trends on X chart. A deteriorating process caused by gradually increasing material variability, deteriorations of operator’s skills, and loosening fixtures are represented by the upward trend on R chart, while processing improving due to better training of operators and better maintenance program is indicated by a downward trend on R chart. Up and Down Shift Patterns (USPs and DSPs) are composed of consecutive points whose values suddenly jump from one to the other at certain time index and then keep in the line afterwards. Shifts can indicate improvements as well as problems in the process. The parts being measured and charted are too large or too small when a shift away from the target line on X chart is observed. An up shift on R chart is a trouble indicator of the increased production variation. In contrast, a down shift on R chart indicates process improvement normally caused by a sudden change in the machine setting or material.

282

L. Xie et al. / Computers & Industrial Engineering 64 (2013) 280–289

Cyclic Patterns (CPs) on a control chart are patterns that periodically repeat on a regular basis, which indicates something is periodically affecting the manufacturing process. Possible causes are the periodic rotation of operators, seasonal or environmental changes, measurement gauge rotation and power fluctuations in the manufacturing process. A systematically occurred point-to-point fluctuation is the characteristic of Systematic Patterns (SPs). It means that a low point is always followed by a high point and vice versa. Possible causes include difference between test sets and difference between production lines where the product is sampled in rotation. Natural Patterns (NPs) consist of a random distribution of points on a control chart associated with in-control processes, where the natural variation always exists no matter how well the manufacturing process is designed and maintained. The equations to generate these basic CCPs in this paper are given in Table 1, where y(k) denotes the sample value of a process measurement at the kth time index and r(k) is a standard normal process. These basic CCPs on X chart are illustrated in Fig. 1. In this study, we consider the concurrent CCPs as a linear combination of the basic CCPs. The concurrent CCPs can be represented by the following equation:

xðkÞ ¼ AsðkÞ

ð1Þ

where x(k) is the observed vector of measurement variables for manufacturing processes at time index k. It is a linear combination of y(k) in Table 1. s(k) is the vector of original sources at time k, i.e., any element of s(k) is one of the y(k) in Table 1; A is the mixture matrix. The common objective of identifying concurrent CCPs is to recover the original source s from the mixed observations x without any knowledge of A, and then to recognize the abnormal patterns.

4

4 2

2

0 0

−2

−2

−4 0

10

20

30

40

0

10

(a)

20

30

40

30

40

30

40

(b)

6

5

4

0

2 −5

0 −2

−10 0

10

20

30

40

0

10

(c)

20

(d)

5

5

0

0

−5

−5 0

10

20

30

40

0

(e)

10

20

(f)

5

0

−5 0

10

20

30

40

(g) Fig. 1. Basic CCPs: (a) NP, (b) CP, (c) UTP, (d) DTP, (e) USP, (f) DSP, and (g) SP.

embedding in the SSA literature. Given a discrete time series x(k), where k = 1, 2,    , K, the trajectory matrix X can be constructed as

3. Methodology

0

3.1. Overview of singular spectrum analysis SSA is a powerful non-parametric technique of time series analysis, which decomposes a signal in several independent components whose sum is the original signal. In principle, the original signal should contain information about the dynamics of all important variables involved in the evolution of the system. Ruelle (1980) and Takens (1981) suggested that a discrete time series and its successive shifts should be enough to describe the dynamics of the system. The SSA consists of four steps: embedding, singular value decomposition, grouping, and diagonal averaging. The aim of embedding step is to construct the trajectory matrix, which contains the complete record of patterns that have occurred within a sliding window of size L. L refers to the dimension of

xð2Þ



B xð2Þ B X¼B B .. @ .

xð3Þ .. .

   xðJ þ 1Þ C C C .. C A  .

xðLÞ

xðL þ 1Þ   

xðJÞ

1

xð1Þ

ð2Þ

xðKÞ

where J = K  L + 1. Denote Xi,j by the element of X at ith row and jth column. X is a Hankel matrix since Xi1,j = Xi,j1 for all i, j > 1 and the columns of X are far from linearly independent. The second step is the singular value decomposition of the covariance matrix S which is defined as follows

S ¼ X T X:

ð3Þ

S is a real and symmetric matrix with J  J dimensions. Denote k1,    , kJ the eigenvalues of S in decreasing order of magnitude,

Table 1 Equations and parameters for basic CCPs. Control chart patterns

Pattern equations

Pattern parameters

Parameter values

Natural patterns (NP)

y(k) = r(k) r(k) 2 N(l, r2)

Mean (l) Standard deviation (r)

l=0 r=1

Up Trend Patterns (UTP) Down Trend Patterns (DTP)

y(k) = r(k) + kg y(k) = r(k)  kg

Gradient (g)

g 2 [0.05r, 0.1r]

Up Shift Patterns (USP) Down Shift Patterns (USP)

y(k) = r(k) + a(k)g y(k) = r(k)  a(k)g a(k) = 1 if k P P, else a(k) = 0

Shift magnitude (g) Shift position (P)

g 2 [1.5r, 2.5r] P = 20

Cyclic Patterns (CP)

y(k) = r(k) + K sin (2pk/T)

Amplitude (K) Period (T)

K 2 [1.5r, 2.5r] T = 8,16

Systematic Patterns (SP)

y(k) = r(k) + d  (1)k

Systematic departure (d)

d 2 [1r, 3r]

L. Xie et al. / Computers & Industrial Engineering 64 (2013) 280–289

U1,    , UJ the eigenvectors corresponding to these eigenvalues, and N the number of non-zero eigenvalues. X can be decomposed as

f ðxÞ ¼

l X W i yi kðxi ; xÞ þ b

283

ð9Þ

i¼1



N X

Xi

ð4Þ

i¼1

where each component Xi, i = 1,    , N can be determined by

pffiffiffiffi ki U i V Ti pffiffiffiffi V i ¼ X T U i = ki Xi ¼



ð5Þ qðxÞ ¼

where Xi has rank of 1 and is an elementary matrix, i.e., kXik2 = ki. X1 corresponding to the largest eigenvalue contributes to the kXk2 much higher than the individual other matrices. Similarly, last matrix with the smallest eigenvalue is insignificant in practice and usually be interpreted as the noise in the signal. In the grouping step, the N elementary matrices Xi will be categorized into several groups and the matrices within each group will be summed individually. For example, the set of indices I = {1,    , N} is partitioned into S disjoint subsets, where the value of S depends on the specific application. Let I1 = {i1, i2,    , iM} be PM the first subset of indices and X I1 ¼ ii¼i X i be the matrix corre1 sponding to the subset I1. Repeating the same procedure for the rest of subset It, t = 2,    , S individually leads to the final decomposition of the trajectory matrix as follows

X ¼ X I1 þ    þ X IS

ð6Þ

For a given subset Ii, one can obtain the contribution of the component X Ii by the share of the corresponding eigenvalues as PN P i¼1 ki . i2Ii ki = The last step is the diagonal averaging which transfers each matrix XI into a time series. The time series fe x I ðkÞg corresponding to XI can be obtained by the diagonal averaging algorithm as

8 1 Pkþ1 e X > > < kþ1P  f ¼1 f ;g L e f ;g e x I ðkÞ ¼ L1 f ¼1 X > > : 1 PKK  þ1 e f ¼kK  þ2 X f ;g Kk

where W is the weight vectors (the vector of hyperplane), and b is a bias term. The traditional linear function is to seek the hyperplane WTX + b = 0 to separate the data from two classes with maximal margin width 2/kWk2, and all the points under the boundary is named the support vector. The binary classification rule is

0 6 k < L  1 L  1 6 k < K 

ð10Þ

In the high-dimensional space, over-fitting usually occurs. To limit over-fitting, a soft margin and a regularization term C are introduced into the objective function which (Vapnik, 1995) has the following enhanced object function l X 1 min W T W þ C npi W;b;n 2 i¼1

s:t: yi ðW T /ðX i Þ þ bÞ P 1  ni ; ni P 0;

i ¼ 1; . . . ; l:

where ni is a non-negative slack variable used to relax the inequalities for the case of non-separable data. C is a tradeoff balancing the margin and the number of training set errors. In the case of the parameter p = 1, the L1-soft margin SVM is employed and for p = 2 it is denoted as the L2-soft margin SVM. This soft margin method improves the robustness of the SVM. In the above objective function, 12 W T W is a regularization term to smooth the function WT/(Xi) + b in order to limit over-fitting. Effectively, the regularization term constrains the line to be as flat as possible. This flatness is measured by the norm WTW. It is more efficient to solve the dual problem l l X 1X Wi þ y y W i W j kðxi ; xj Þ W;b;n 2 i;j¼1 i j i¼1

min ð7Þ



K 16k 5. Hence, the optimal embedding dimension L shall be selected as 6. It shall be noted that L elementary matrices will be derived from the second step of SSA. The second issue relates to allocating these elementary matrices into different subsets as indicated in the third step of SSA. For the concurrent CCPs with two unnatural patterns, these matrices shall be partitioned into three subsets. The first subset only contains the elementary matrix corresponding to the largest eigenvalue, i.e., I1 = {1}. The second subset I2 is composed of several elementary matrices {2,    , p} and the rest

1

2

3

4

5

6

7

8

9

10

Dimension Fig. 2. Selection of optimal embedding dimension.

{L  p + 1,    , L} matrices are grouped into the third subset I3. The first two subsets shall correspond to unnatural patterns respectively while the third one represents the contribution of a noise signal. In this study, different embedding dimensions and grouping policies were evaluated in terms of the classification accuracy, which is summarized in Table 4. L is selected from a range of [5, 7] as the optimal L is 6. It shall be noted that all the results in Table 4 were obtained with the DAGSVM with a linear kernel. The ‘‘Accuracy’’ row represents the average accuracy rate. From Table 4, we can find that the average accuracy rate shall be improved with a increasing number of elements in the second group. Moreover, one can see that the best average accuracy can be achieved with L = 6. Table 4 also reveals that the accuracy rate of the concurrent CCPs mixed by the USP and the CP is lower than that of other concurrent CCPs. Further investigation shows that 75 samples of the USP are mis-classified as the UTP among the total 105 mis-classified samples in the 5th column. The possible reason we believe is that the magnitude of shift of the USP is quite low and close to the slope of the UTP. This trend has been deteriorated by the random term, and thus mis-identified. Similarly, for the concurrent CCPs mixed by the DSP and the CP, 80 samples of the DSP are mis-classified as the DTP among the total 86 mis-classified samples in the 5th column. These kinds of errors also appear in the ICA– SVM scheme. Fig. 3 is used to illustrate the performance of the proposed method with the mixture of an UTP pattern and a CP pattern. The original sample data of the concurrent CCPs is shown in Fig. 3a. The sequences recovered by the SSA are plotted in Fig. 3b–d respectively, with L = 6 and I2 = {2, 3, 4, 5}. These series are fed into the trained DAGSVM network with the linear kernel and are classified as the UTP pattern, CP pattern, and NP pattern respectively. The result shows that this concurrent pattern is successfully identified by the proposed algorithm. The classification results show that the SSA–SVM scheme can achieve high recognition accuracy rate for any listed combination. In order to compare the performance difference over different kernel functions of the DAGSVM, both the linear and the rbf (Gaussian) kernels are tested with parameters optimized respectively for L = 6. The results are shown in Table 5. Here the superscript Li means the linear kernel, and rbf represents the ‘‘rbf’’ kernel. It achieves the best accuracy rate with ‘‘rbf’’ kernel when its tuning parameters C = 30 and rrbf = 0.0001. Table 5 shows that

286

L. Xie et al. / Computers & Industrial Engineering 64 (2013) 280–289

Table 4 Accuracy rate (%) of the SSA–SVM scheme with the DAGSVM (linear kernel) for concurrent CCPs vs. subset of second group.

a b c

CCPs

I2 = {2}b

I2 = {2, 3}b

I2 = {2, 3, 4}b

I2 = {2, 3, 4, 5}b

I2 = {2, 3, 4}a

I2 = {2, 3, 4, 5, 6}c

UTP + CP USP + CP DTP + CP DSP + CP SP + CP UTP + SP USP + SP DTP + SP DSP + SP Accuracy (%)

84.2 67.2 88.4 89.2 76 93.8 82.2 97.4 77.2 83.9

91.2 75.6 92.8 81.8 93.6 83.2 96.4 80.8 82.4 88.1

92.4 78 98.2 94.6 83 92.2 83.6 95.2 83 88.9

92.6 79 94.4 82.8 98.4 92.6 84.8 97.6 85.6 89.31

90 72 99.4 89.4 87.4 83.8 82.6 89.8 84.4 86.5

93.2 73.2 99.2 93.4 81.4 86 81.4 90.6 83.6 87.1

This subscript shows L = 5. This subscript shows L = 6. This subscript shows L = 7.

10

4

5

2

0

0

−5

−2

0

10

20

30

40

Table 6 Results of the SSA–SVM scheme with the DAGSVM (L = 6) for different levels of parameters of USP + CP with linear kernel.

0

10

5

1

0

0

0

10

20

30

40

(b)

(a)

−5

20

30

40

−1

0

10

20

30

UTP + CP

USP + NP

317 354 360 370 424 407 352 418 432 76.3

60 76 79 80 69 81 82 79 67

44 5 1 37 1 56 3 1

UTP + NP 8

10 1

CP + NP

NP

47 65 60 1 5 12

24

2

10

40

Fig. 3. Original mixed data of an UTP and a CP and those separated by an SSA: (a) original mixture of an UTP and a CP, (b) first series recovered by the SSA, (c) second series recovered by the SSA, and (d) third series recovered by the SSA.

Table 5 Accuracy rate (%) of the SSA–SVM scheme with the DAGSVM (L = 6) for the concurrent CCPs vs. different kernels.

rbf

USP + CP

(1.5, 1.5) (1.5, 2.0) (1.5, 2.5) (2.0, 1.5) (2.0, 2.0) (2.0, 2.5) (2.5, 1.5) (2.5, 2.0) (2.5, 2.5) Accuracy (%)

(d)

(c)

Li

(g, K)

CCPs

I2 = {2, 3, 4, 5}Li

I2 = {2, 3, 4, 5}rbf

UTP + CP USP + CP DTP + CP DSP + CP SP + CP UTP + SP USP + SP DTP + SP DSP + SP Accuracy (%)

92.6 79 94.4 82.8 98.4 92.6 84.8 97.6 85.6 89.31

92.2 79.8 95 80.4 98.4 91.8 86 97.8 85.8 89.24

This subscript shows the linear kernel. This subscript shows the Gaussian kernel.

both kernels can produce satisfying average accuracy rate and the differences are quite small. The advantage of linear kernel over ‘‘rbf’’ is that it only have one parameter C to be tuned. When C = 0.03, the linear kernel generates the best result. All the above experiment results indicate the over-all performance for the concurrent CCPs generated with the setting in Table 1. The following experiments were conducted to evaluate the performance of proposed method for different levels of pattern parameters. For the sake of simpleness, the case of a concurrent CCPs mixed by the USP and the CP are considered. The USP has two parameters: the shift magnitude g, and shift position. For the CP, the parameters are the amplitude K and the period. Both

g of the USP and K of the CP are in the range of [1.5r, 2.5r]. Without loss of generality, g and K were chosen for testing. Here, both g and K are classified into three values: 1.5r, 2.0r, and 2.5r. 500 testing samples were generated with each fixed g for the USP and with each fixed K for the CP. Table 6 summarizes the performance of our method for different g and K. The first column (g, K) shows the parameters’ combination of g and K, which were used to generate the 500 concurrent CCPs samples. The second column is the total number of samples that both the USP and the CP were correctly identified by the proposed approach. The third column shows the total number of samples that the USP was misidentified to be the UTP and the CP was identified correctly. The rest columns reveal the other cases of mis-classifications. From Table 6, it is easy to see that when the K or g is small, the signal to noise ratio is small, and thus it is more likely to be mis-classified.

4.4. Implementing the SSA–SVM scheme for the single CCP recognition The proposed hybrid approach not only works well for the concurrent CCPs, but also works well for the case when the input sequence of the data only contains one basic pattern. In this case, A equals to [1 0]. Using the same DAGSVM classifier specified in Section 3.3 with linear kernel, satisfactory results can be achieved for a single basic pattern. It shall be noted that most existing methods for concurrent CCPs identification are not robust enough and cannot work for both cases. The following experiments were conducted to address the robustness of the proposed scheme for a single CCP. All the 500 samples of the basic patterns generated for testing in Section 4.3 were fed into the proposed approach. The linear kernel was chosen and the embedding length was L = 6, the indices in the second group set were I2 = {2, 3, 4, 5}. The results are shown in Table 7.

287

L. Xie et al. / Computers & Industrial Engineering 64 (2013) 280–289 Table 7 Results of the SSA–SVM scheme with the DAGSVM (L = 6) for the single CCP with linear kernel. Input

Correct identi.

NP

UTP

DTP

USP

DSP

CP

SP

Type I error (%)

NP UTP DTP USP DSP CP SP Accuracy (%)

500 491 500 446 435 487 500 95.97

0 0 0 0 0 13 0

0 0 0 54 0 0 0

0 0 0 0 65 0 0

0 9 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0

1 2 3 4 5 6 7 8 9 10 Worst Best

1.8 0 10.8 13 2.6 0

Table 9 Accuracy (%) of the DAGSVM.

Table 8 Accuracy (%) of the LVQ2. Index of experiment

Type II error (%)

Num. of training samples 500

200

100

50

70.29 84.66 97.43 71.34 84.62 97.54 84.74 84.74 84.74 70.4 70.29 97.54

92.89 70.26 94 71.03 79.69 70.23 93.46 93.91 70.23 82.34 70.23 94

78.49 89.91 67.74 77.86 80.57 78.6 68 89.17 74.94 76.83 67.74 89.91

69.83 72.03 66.4 58.11 72.89 76.77 73.71 70.83 73.11 71.89 58.11 76.77

The first column is the input of the basic patterns, while column 2 (’’Correct Identi.’’) is the number of samples that are correctly identified from the total 500 testing samples. The other columns list the number of samples that has been mis-identified. From Table 7, we find that the biggest error comes from the mis-classifying the DSP as the DTP, followd by recognizing the USP as the UTP. For the CP, when the amplitude is small, it tends to be mis-identified to be the NP. Define the Type I error as the frequency that an unnatural pattern is not present but detected, and define the Type II error as the frequency that an unnatural pattern is presented but not detected. Then, Table 7 also represents those two types of errors. The proposed approach does not produce Type I error for this data sets. The Type II errors of the unnatural patterns are listed in the table. In a word, the proposed approach also works well when there is only one basic CCP. Furthermore, each unnatural pattern has an additive Gaussian noise term, which can be regarded as NP. The sum of two independent Gaussians can be considered as one NP. Thus, combining NP and other unnatural CCP can be evaluated by identifying the basic unnatural CCP with the proposed method.

4.5. LVQ vs. SVM Gu et al. (2012) applied LVQ to classify concurrent CCPs, where both LVQ1 and LVQ2 were investigated for this problem. Simulation results indicate that LVQ2 outperforms LVQ1 marginally. However, as mentioned in Section 1, LVQ requires large samples to achieve satisfactory results. In practice, such a requirement may be too restrictive as collecting enough qualified samples is a time consuming and costly process. Moreover, LVQ suffers from converging towards a local maximum of a non-convex solution. These shortcomings can be mitigated by applying SVM. Such an argument can be supported by simulation results presented in this subsection. To make a fair comparison, LVQ2 and DAGSVM were considered. The LVQ2 network was implemented with the best set-

Num. of training samples

500

200

100

50

Accuracy (%)

98.31

98

97.49

96.83

Table 10 Identification accuracy (%) of different methods for concurrent CCPs. Method

SSA + LVQ2 SSA + DAGSVM

Num. of training samples 500

200

100

50

89.08 89.31

82.54 88.62

78.4 87.35

76.71 86.6

tings given by Gu et al. (2012). The DAGSVM network with a linear kernel and C = 0.03 was employed, which generates the best performance among the tests in Section 4.3. 10 independent experiments were conducted with training, validation and testing data sets. These data sets were employed by LVQ2 and SVM methods. In each experiment, the number of training samples in data sets varies from 500 to 50. Although the traing samples are different in each experiment, the number of testing samples are the same (500 samples for each single CCP). Tables 8 and 9 summarize the training performance of LVQ2 and SVM respectively. The row Best and Worst reveal the margin of recognition rate of LVQ2 among 10 experiments. Several observations are in order. First, the training results of LVQ are not consistent among these independent tests. This implies that one has to train LVQ several times to select best network. Different with LVQ, SVM always produces the same result due to the unique solution of the convex object function (Burges, 1998). This advantage will release the burden of training network. Second, the performance of LVQ degrade significantly with decreasing number of training samples. When the number of training samples of each CCP pattern changes from 500 to 50, the recognition accuracy of LVQ2 drops from 97.54% to 76.77%, while the accuracy of DAGSVM are almost same. This small change demonstrates that SVM has good generalization performance of dealing with the small samples and less prone to overfitting. Table 10 presents identification results of applying LVQ2 and DAGSVM for concurrent CCPs. 4500 samples were generated for the nine concurrent CCPs (500 samples for each category) from the seven basic CCPs. It shall be noted that only the best trained LVQ2 networks listed in Table 8 were considered. For example, when the training dataset has 500 samples, the 6th training network in Table 8 was selected. Without any surprise, the identification accuracy of SSA + LVQ significantly decreases with the less number of training samples. Clearly, SSA + DAGSVM enjoys more robustness against the varying number of training samples as the accuracy only drops from 89.31% to 86.6%.

288

L. Xie et al. / Computers & Industrial Engineering 64 (2013) 280–289

5. Conclusion In the manufacturing process, the observed process data can exhibit a combination of different basic patterns simultaneously. Since each unnatural CCPs associates to some causes of out-of-control processes, it is highly desirable to develop an efficient systems to recover the original sources from the observed process measurements and to identify them correctly. In this paper, a hybrid scheme for the concurrent CCPs was proposed by integrating an SSA and an SVM. The proposed scheme initially applies an SSA to the mixture patterns to recover the components of the basic CCPs series. After comparing various SVM-based multi-class classifiers for the basic CCPs, the DAGSVM model is applied to each component for pattern recognition. Several advantages of the proposed method can be observed. First, the proposed scheme can be applied to the univariate process, while the ICA–SVM scheme needs the mixing matrix A to have more or equal rows than columns, i.e., more or equal measurements than the number of components in the mixed CCPs. Furthermore, unlike the ICA, the SSA does not suffer from the permutation and scaling ambiguities. Moreover, the SSA is easy to be implemented by tuning the dimension of embedding L and the grouping sets. Experimental results show that the proposed approach can achieve 89.3% accuracy rate in average in the tested data set for the nine concurrent CCPs. It outperformed the comparable ICA–SVM scheme significantly. Other experiments over single basic CCP and over the different levels of parameters for concurrent CCPs also confirmed the efficiency of the proposed scheme. According to the simulation results, we come to the conclusion that not only can the proposed scheme effectively identify the concurrent CCPs, but also it works for single basic pattern. Finally, series of experiments confirm that SVM can achieve more robust identification than LVQ, especially when the number of the training samples is limited. The SSA requires the components to be uncorrelated in order to separate them successfully. This restriction makes the proposed scheme infeasible for mixed CCPs from two basic correlated CCPs, such as the USP and the UTP. This limitation also applies to the ICA–SVM scheme. Further investigation should be done to address this limitation. Acknowledgements The authors would like to thank the anonymous referees for their constructive comments that improve the quality of this paper significantly. This research was supported under the Australian Research Council’s Linkage Projects funding scheme (project number: LP110200364) and National Natural Sciences Foundation of China (project number: NSFC61273352, 61227804). The corresponding author would also like to acknowledge the support of the Endeavour Australia Cheung Kong Research Fellowship Programme. References Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 955–974. Cao, L. Y. (1997). Practical method for determining the minimum embedding dimension of a scalar time series. Physica D., 110(1–2), 43–50. Cheng, C. (1995). A multi-layer neural network model for detecting changes in the process mean. Computers & Industrial Engineering, 28(1), 51–61. Cheng, Z., Ma, Y., & Bu, J. (2011). Variance shifts identification model of bivariate process based on LS–SVM pattern recognizer. Communications in Statisticssimulation and Computation, 40(2), 286–296. http://dx.doi.org/10.1080/ 03610918.2010.535625. Chen, Z., Lu, S., & Lam, S. (2007). A hybrid system for SPC concurrent pattern recognition. Advanced Engineering Informatics, 21(3), 303–310. Davis, R. B., & Woodall, W. (1998). Performance of the control chart trend rule under linear shift. Journal of Quality Technology, 20(4), 260–262. Ebrahimzadeh, A., & Ranaee, V. (2010). Control chart pattern recognition using an optimized neural network and efficient features. ISA Transactions, 49(3), 387–393.

Franc, V., & Hlavac, V. (2002). Multi-class support vector machine. In 16th international conference on pattern recognition (Vol. 2, pp. 236-239). Gauri, S. (2010). Control chart pattern recognition using feature based learning vector quantization. International Journal of Advanced Manufacturing Technology, 48(9–12), 1061–1073. Gauri, S. K., & Chakraborty, S. (2007). A study on the various features for effective control chart pattern recognition. International Journal of Advanced Manufacturing Technology, 34(3), 385–398. Gauri, S. K., & Chakraborty, S. (2009). Recognition of control chart patterns using improved selection of features. Computers & Industrial Engineering, 56, 1577–1588. Gu, N., Cao, Z., Xie, L., Creighton, D., Tan, M., & Nahavandi, S. (2012). Identification of concurrent control chart patterns with singular spectrum analysis and learning vector quantization. Journal of Intelligent Manufacturing. http://dx.doi.org/ 10.1007/s10845-012-0659-0. Guh, R. (2005). A hybrid learning-based model for on-line detection and analysis of control chart patterns. Computers & Industrial Engineering, 49(4), 35–62. Guh, R. (2010). Simultaneous process mean and variance monitoring using artificial neural networks. Computers & Industrial Engineering, 58(4), 739–753. Guh, R., & Tannock, J. (1999). A neural network approach to characterize pattern parameters in process control charts. Journal of Intelligent Manufacturing, 10(5), 449–462. Gu, N., Nahavandi, S., Yu, W., & Creighton, D. (2011). Family of blind source separation methods based on generalised constant modulus criterion. Electronics Letters, 47(10), 595–597. Gu, N., Xiang, Y., Tan, M., & Cao, Z. (2007). A new blindequalization algorithm for an FIR SIMO system driven by MPSK signal. IEEE Transactions on Circuits and Systems – II Express Briefs, 54(3), 227–231. Guyon, I. et al. (2002). Gene selection for cancer classification using support vector machines. Machine learning, 46, 389–422. Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Network, 13, 415–425. Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3), 626–634. Keerthi, S. S., Shevade, S. K., Bhattacharya, C., & Murthy, K. R. K. (2000). A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Transactions on Neural Networks, 11(1), 124–136. Knerr, S., Personnaz, L., & Dreyfus, G. (1990). Single-layer learning revisited: A stepwise procedure for building and training a neural network. In FogelmanSoulie & Herault (Eds.), Neurocomputing: Algorithms, Architectures and Applications. Springer. Kressel, U. (1999). Pairwise classification and support vector machines. In B. Scholkopf, C. J. C. Burges, & A. J. Smola (Eds.). Advances in kernel methods: Support vector learning (Vol. 255268). Cambridge, MA: MIT Press. Li, T. et al. (2004). A comparative study of feature selection and multiclassification methods for tissue classification based on gene expression. Bioinformatics, 20, 2429–2437. Lin, S., Guh, R., & Shiue, Y. (2011). Effective recognition of control chart patterns in autocorrelated data using a support vector machine based approach. Computers & Industrial Engineering, 61(4), 1123–1134. Lu, C. J., Shao, Y. E., & Li, P. H. (2011). Mixture control chart patterns recognition using independent component analysis and support vector machine. Neurocomputing. http://dx.doi.org/10.1016/j.neucom.2010.06.036. Nelson, L. S. (1984). The shewhart control chart tests for special causes. Journal of Quality Technology, 16(4), 237–239. Penland, C., & Weickmann, K. M. (1991). Adaptive filtering and maximum entropy spectra with application to changes in atmospheric angular momentum. Journal of Geophysical Research, 22, 659–671. Perry, M., JK, S., & Velasco, T. (2001). Control chart pattern recognition using back propagation artificial. Journal of Geophysical Research, 39(15), 3399–3418. Pham, D., & Oztemel, E. (1992). Control chart pattern recognition using neural networks. Journal of Systems Engineering, 2(4), 256–262. Pham, D., & Wani, M. A. (1997). Feature-based control chart recognition. International Journal of Production Research, 35(7), 1875–1890. Platt, J.C., Shawe-Taylor, J., Cristianini, N., 2000. Large Margin DAGs for Multiclass Classification, (editors) Solla, S.A., Leen, T.K., and Muller, K.R., 547-553, MIT Press. Psarakis, S. (2011). The use of neural networks in statistical process control. Quality and Reliability Engineering International, 27(5), 641–650. http://dx.doi.org/ 10.1002/qre.1227. Purintrapiban, U., & Corley, H. W. (2012). Neural networks for detecting cyclic behavior in autocorrelated process. Computers & Industrial Engineering, 62(4), 1093–1108. Ruelle, D. (1980). Strange attractors. The Mathematical Intelligencer, 2(1), 126–137. Seo, N. (2007). A Comparison of Multi-class Support Vector Machine Methods for Face Recognition. Shewhart, W. A. (1931). Economic control of quality of manufactured product. New York: D. Van Nostrand Company. Takens, F. (1981). Dynamical Systems and Turbulence. New York: Springer-Verlag, p 366–381 (chap. Detecting strange attractors in turbulence). Vapnik, V. (1995). Multichannel blind separation and deconvolution of images for document analysis. The nature of statistical learning theory. New York: SpringerVerlag. Wang, C., Dong, T., & Kuo, W. (2009). A hybrid approach for identification of concurrent control chart patterns. Journal of Intelligent Manufacturing, 20(4), 409–419.

L. Xie et al. / Computers & Industrial Engineering 64 (2013) 280–289 Wang, C., Kuo, W., & Qi, H. (2007). An integrated approach for process monitoring using wavelet analysis and competitive neural network. International Journal of Production Research, 45(1), 227–244. Western Electric (1956). Statistical quality control handbook Indianapolis: Western Electric Corporation. Weston, J., & Watkins, C. (1998). Multi-class support vector machines. In Technical report CSD-TR-98-04. Xiang, Y., Nguyen, V., & Gu, N. (2006). Blind equalization of nonirreducible systems using the cm criterion. IEEE Transactions On Circuits and Systems II – Express Briefs, 53(8), 758–762. Xie, L., Li, D., & Simske, S. J. (2011). Feature dimensionality reduction for examplebased image super-resolution. Journal of Pattern Recognition Research, 2, 130–139. Yang, M., & Yang, J. (2002). A fuzzy-soft learning vector quantization for control chart pattern recognition. International Journal of Production Research, 40(12), 2721–2731.

289

Yousef, A. (2004). Recognition of control chart patterns using multiresolution wavelets analysis and neural networks. Computers & Industrial Engineering, 47, 17–29. Yu, J. (2011). Pattern recognition of manufacturing process signals using Gaussian mixture models-based recognition systems. Computers & Industrial Engineering, 61(3), 881–890. Yu, T., & Hassen, J. (2010). Automatic beamforming for blind extraction of speech from music environment using variance of spectral flux-inspired criterion. IEEE Journal of Selected Topics in Signal Processing, 4(5), 785–797. Yu, J., & Liu, J. (2011). LRProb control chart based on logistic regression for monitoring mean shifts of auto-correlated manufacturing processes. International Journal of Production Research, 49(28), 2301–2326. http:// dx.doi.org/10.1080/00207541003694803.

Suggest Documents