c Heldermann Verlag ISSN 0940-5151
Economic Quality Control Vol 16 (2001), No. 1, 5 – 15
ARL Comparisons Between Neural Network Models and x¯-Control Charts for Quality Characteristics that are Nonnormally Distributed Junsub Yi, Victor R. Prybutok and Howard R. Clayton
Abstract: One widely used control chart, the x¯-chart, is based on the assumption that means of samples drawn from the process are normally distributed. When the normality assumption is not valid, control chart users may choose from several different courses of action. These include using Box-Cox power transformations on the original data to yield an approximate normal distribution, increasing the size of the samples drawn from the process until the distribution of the sample means is considered normal, and modifying the x¯-chart to employ asymmetric control limits instead of limits that are equidistant from the process target mean. Since none of the remedies for handling nonnormal processes is completely satisfactory, we build on previous neural network research by developing a neural network to control nonnormal processes. Comparison of the performance of our neural network model with that of traditional x¯-control charts shows that the neural network model is superior to the traditional x¯-control charts. Key Words: Statistical Quality Control, Statistical Process Control, Control Charts, Neural Networks, Simulation
1
Introduction
The control chart was introduced in 1925 by Shewhart in the Journal of the American Statistical Association. Since then control charts have become a powerful tool to monitor process means and variances [12]. When used appropriately, control charts help determine if processes are “in control” or “out of control” A process found to be out-of-control can be corrected, thereby reducing the variability of the process means [1]. One of the most widely used control charts, the x¯-chart, is based on the assumption that the distribution of the means of samples drawn from the process is sufficiently close to a normal distribution. However, in many industrial processes, this assumption is invalid. The nonnormality of sample means can have a significant effect on the performance of the x¯-chart [21]. If the x¯-chart is used when the normality assumption is invalid, the chance of wrongly interfering with an in-control process or allowing an out-of-control process to go uncorrected can be significant. Consequently, design considerations for the x¯-chart must include recognition of the degree of nonnormality of the underlying data [18, 23].
6
Junsub Yi, Victor R. Prybutok and Howard R. Clayton
When the normality assumption is not valid, control chart users may choose from several different courses of action. Rigdon [19] suggested using Box-Cox power transformations on the original data to yield an approximate normal distribution. The usual charts could then be used on the transformed data. The shortcomings of the transformation approach are first, the difficulties in determining the appropriate transformation, and second, when a transformation has been chosen, the difficulties in implementing the procedure and justifying the transformation [13, 23]. Yourstone and Zimmer [23] suggested another course of action that requires increasing the size of the samples drawn from the process until the distribution of the sample means is considered normal. However, large sample sizes are often not operationally feasible and can be costly. A third course of action is to modify the x¯-chart. One way to modify the chart is to employ asymmetric control limits instead of limits that are equidistant from the process target mean. The difficulty with this approach is that it requires at least a good approximation of the underlying distribution of the process data. The cumulative sum (CU SU M ) control chart is an alternative to the x¯-chart for controlling nonnormal data [9]. Despite its usefulness in detecting small shifts in the process mean, the CU SU M chart has been found to be inferior to the x¯-chart in detecting relatively large shifts. In general, none of the remedies for handling nonnormal processes has been found to be completely satisfactory. Previous researchers have developed economic designs for x¯, CU SU M , and asymmetric control charts. For example, the economic design of an x¯-chart entails determining the optimal sample size, sampling interval, and control limit coefficients that minimize the costs associated with designing the chart [9, 17, 5]. As in the case of the previously mentioned remedies, none of these designs is uniformly satisfactory for dealing with the nonnormality problem. A number of authors [7, 10, 14, 15, 16], and others, have used neural networks to monitor process means when data are normally distributed. We build on the previous research by developing a neural network to control nonnormal processes. We compare the performance of our neural network model with that of traditional x¯-control charts. In the next section, we begin with a description of the model for generating process data then we describe our neural network model, the training of the neural network, and the performance measures for the control charts. In the last section we present and discuss the results of our research.
2 2.1
Research Methodology Data Generation
For our investigation we generated process data from the Burr distribution which was utilized in a previous simulation study by Yourstone and Zimmer [23]. The distribution function FX (x) of the Burr distribution is given by 0 for x < 0 (1) FX (x) = 1 k 1 − 1+xc for x ≥ 0
Comparisons Between Neural Network Models and x ¯-Control Charts
7
where c, k ≥ 1. A wide variety of probability distributions, which may be used in industrial environments, can be approximated with the Burr by judicious choice of c and k values. In accordance with Yourstone and Zimmer, we used skewness and kurtosis to measure the degree of departure from normality of a given distribution. Burr [3, 4] tabled c and k values that relate to various combinations of skewness and kurtosis. We found this table very useful to generate distributions with a desired skewness and kurtosis. Table 1: The Simulated Processes Case Case Case Case Case Case Case Case Case
1 2 3 4 5 6 7 8 9
Description skewness Almost uniform distribution 0 Normal 0 Symmetric and peaked 0 Positively skewed 1 Positively skewed and peaked 1 Positively skewed and peaked 1 Highly skewed, highly peaked 2 Highly skewed, severely peaked 2 Almost exponential distribution 2
kurtosis 2 3 4 3 4 5 6.2 7.2 8.8
c -18.1484 -11.2519 27.0689 -13.0680 -7.1760 2.3471 -21.4163 -7.5077 -5.8643
k 0.0629 0.1463 1.3258 0.0301 0.0787 4.4286 0.0074 0.0270 0.0442
Table 1, above, describes the skewness and kurtosis of nine simulated distributions investigated in our study along with the c and k values used in their generation from the Burr distribution. As can be seen, we chose distributions that were symmetrical, moderately skewed and peaked, as well as severely skewed and peaked. Although some of these situations may not regularly be encountered in monitoring real processes, we included them for the sake of comparing the neural network and control charts. 2.2
Neural Network Description
In this study we used a fully-connected feed-forward network with five input neurons, five hidden neurons, and three output neurons. Choosing five input neurons was logical once we decided upon using samples of size five because each neuron would represent an observation in the sample. The simplicity of this neural network architecture guaranteed a realistic training time. We implemented the back-propagation learning algorithm for our neural network. Although the slowness of learning is usually a major concern with back-propagation networks, the learning speed of the training sets was fast enough for use in our study. In addition to the architectural features of number of hidden layers, number of hidden layer neurons, and number of output neurons, factors such as the learning rate, momentum, learning rule, and type of transfer function are crucial to the performance of a trained neural network We used a learning rate of 0.1 and a momentum of 0.6 after testing more than 30 different combinations of the parameters. A sigmoid transfer function was used because of its desirable features. It is monotonic, differentiable, and semi-linear and is
8
Junsub Yi, Victor R. Prybutok and Howard R. Clayton
partially insensitive to noise. We pilot-tested on the data the two most popular learning rules: the generalized delta and cumulative delta. We employed the generalized delta learning rule because it yielded better results.
3
Training of the Neural Network
For the neural network to satisfactorily monitor a process, the range of shifts in the process mean utilized in the training procedure should be representative of the whole spectrum of the shifts of interest. However, the determination of the range of mean shifts used in training is subject to the required recognition capability, and the expected training time. The magnitude of shifts in the process mean was measured in units of the standard deviation of the mean (σ x¯ ). When small magnitudes of process changes were included in the training set the neural network performed poorly, registering a high false alarm rate (an increase in commitment of Type I errors). This poor performance for small shifts in the process mean was probably due to the high overlap between the in-control observations and out-of-control observations. As a result, data with small shifts (.25, .5, and .75 σ x¯ ) were not included in the training sets. Consistent with previous simulation studies on control charts [6, 20, 22] we used a target mean (µ0 ) of zero in this study. The number of training examples was determined through experimentation. For each combination of skewness and kurtosis we used 900 examples in a training set. The 900 comprised 100 examples of in-control processes and 100 examples each of out-of-control processes with positive and negative shifts ranging from ±1 to ±4 units. The value of the three output neurons were set to [0, 0, 1] to signal an in-control process, [0, 1, 0] to signal an out-of-control process with a positive shift, and [1, 0, 0] to signal an out-of-control process with a negative shift. In neural network training, the weights connecting the nodes were adjusted after a cycle consisting of a fixed number of presentations of training samples. The number of training samples between each adjustment is defined as the epoch which can be related to the speed of training process. We tested several epochs. Our results showed that the epoch had no effect on the speed of learning. Another concern, related to training time, is the determination of the optimal stopping point of learning process. During the training process, the performance of the neural network on the training data set continuously improves. However, the longer the neural network learns the training data set, the closer the neural network comes to memorizing the training data set. Thus, the neural network may make a very accurate prediction on the training data set but may not be able to generalize well on a fresh data set. There exists no method of optimizing the selection of a stopping point or the selection of parameters. This means that any neural network solution might be sub-optimal. We relied on the software (SAVEBEST utility in NeuralWare 1993) to determine the stopping point.
Comparisons Between Neural Network Models and x ¯-Control Charts
3.1
9
Data Collection and Performance Measures
Following the convention established in previous research, comparison of the neural network and x¯-control charts was based on average run length (ARL) which is defined as the average number of subgroups (samples) that are observed from a process before an out-of-control signal is detected. For a given process, this average was based on 10,000 simulated replications. When the process is in-control (not shifted), it is desirable to have a large ARL. However, if the mean of the process is out-of-control (shifted), it is desirable to have a small ARL. In other words, the monitoring process should generate signals as quickly as possible (i.e., give shorter ARLs) if the production process is out of control, and as late as possible (i.e., give longer ARLs) if the production process is in control. As mentioned in the training of the neural network, the binary digits (0, 1) were used on the output neurons to signal in-control and out-of-control states. We recognized that, owing to the use of the sigmoid function to train the neural network, the output neurons could assume any value between 0 and 1. Therefore we had to determine cut-off values for output neurons a and b in the output vector [a, b, c] that could be deemed close enough to 1 to signal, respectively, an out-of-control process due to a negative or a positive shift. (Recall that target values a = 1, b = 0, c = 0 were selected to designate out-of-control due to a negative shift and a = 0, b = 1, c = 0 to designate out-of-control due to a positive shift). Table 2: Cutoff points for neural network models Case 1 2 3 4 5 6 7 8 9
Neural network cutoff point (a) Neural network cutoff point (b) 0.8236 0.9224 0.7675 0.8738 0.5235 0.8578 0.7001 0.8718 0.8700 0.4850 0.8375 0.5170 0.5185 0.6370 0.6400 0.5330 0.8400 0.3820
Selecting low cutoff values would increase the probability of committing a Type I error (the rate of false alarm; concluding the process is out-of-control when it is really in-control), but would reduce the probability of committing a Type II error (concluding the process is in-control when it is really out-of-control). On the other hand, selecting high cutoff values would increase the probability of Type II error, but reduce the probability of a Type I error. To obtain fair comparisons between the x¯-chart and neural network model, the cutoff values for a and b were determined by trial and error simulations so that the in-control ARLs of the neural network would match those of the x¯-charts for each one of the nine simulated processes. The cutoff values are shown in Table 2.
10
4
Junsub Yi, Victor R. Prybutok and Howard R. Clayton
Simulation Results
For the sake of brevity, only the results for Cases 2, 4, 6, and 9 will be presented and discussed. These cases were chosen to illustrate any effect caused by (1) increasing the skewness at a given kurtosis level (Case 2 vs. Case 4), (2) increasing the kurtosis at a given skewness level (Case 4 vs. Case 6), and (3) increasing both skewness and kurtosis levels (Case 6 vs. Case 9). The results for the four chosen cases are illustrated in Tables 3 - 6 which contain the ARL values for the neural network and x¯-control chart for various shifts in the process. In addition, the tables list ratios (denoted by ρ) of each x¯-control chart ARL to the corresponding neural network ARL. For any shift in the process, values of ρ substantially larger than one indicate that the neural network is better than the x¯control chart while values of ρ substantially smaller than one indicate the x¯-control chart to be better than the neural network. In all the cases the in-control ARLs of the two control procedures were kept the same for ease of comparison of the out-of-control ARLs. Table 3: Comparison of ARL between and neural network (skewness=0, kurtosis=3) Shift 0.00 0.25 0.50 0.75 1.00 1.50 2.00 3.00 4.00 where ρ = 4.1
x¯-chart NN ρ 372.28 371.96 280.16 272.85 1.03 157.1 151.09 1.04 83.27 79.48 1.05 44.9 43.48 1.03 15.01 14.73 1.02 6.28 6.17 1.02 1.98 1.97 1.01 1.19 1.19 1.00
Shift x¯-chart NN ρ 0.00 -0.25 282.48 292.57 0.96 -0.50 153.81 163.16 0.94 -0.75 81.06 85.70 0.94 -1.00 43.94 45.87 0.96 -1.50 14.99 15.68 0.95 -2.00 6.31 6.51 0.97 -3.00 2.02 2.05 0.99 -4.00 1.18 1.19 0.99
of x¯-chart . of neural network
ARL ARL
Case 2 (skewness=0, kurtosis=3), Table 3
In Case 2, we examine a normal process in which process shifts in positive and negative directions must be detected equally well because of the distribution’s symmetry. This property is known as the directional invariance property. Table 3 shows roughly similar out-of-control ARLs for the neural network and x¯-control chart for both positive and negative shifts in the process mean. As a result the ρ values are all close to one, indicating no advantage in utilizing the neural network over the x¯-control chart. 4.2
Case 4 (skewness=1, kurtosis=3), Table 4
Here we have a process distribution with the same kurtosis as in the normal (previous) case but with a positive skew. In Table 4 we observe ρ values close to one for positive shifts but much larger than one for negative shifts. This indicates that for positively skewed processes the neural network model gives an out-of-control signal much quicker than the x¯-control chart when the process mean shifts off target in a direction opposite the
Comparisons Between Neural Network Models and x ¯-Control Charts
11
skew. For example, the 4.97 for the ρ value at mean shift -0.25 tells us the neural network model reacts nearly five times as fast to a genuine shift than the standard x¯-control chart. Comparing the results in Table 3 and Table 4 points to the effect on the match-up between the neural network and x¯-control chart by a change only in the skewness of the process. We observe an impressive ascendance of the neural network model. Table 4: Comparison of ARL between and neural network (skewness=1.0, kurtosis=3.0) Shift 0.00 0.25 0.50 0.75 1.00 1.50 2.00 3.00 4.00
4.3
x¯-chart NN ρ 256.70 256.38 142.29 142.80 1.00 81.20 81.71 0.99 48.70 49.01 0.99 30.23 30.79 0.98 12.74 12.79 1.00 6.13 6.22 0.99 2.15 2.16 1.00 1.20 1.21 0.99
Shift 0.00 -0.25 -0.50 -0.75 -1.00 -1.50 -2.00 -3.00 -4.00
x¯-chart
NN
ρ
477.90 96.25 4.97 936.22 29.52 31.71 1876.82 13.70 136.99 392.54 7.73 50.78 19.17 3.46 5.54 5.99 2.07 2.89 1.88 1.25 1.50 1.21 1.05 1.15
Case 6 (skewness=1.0, kurtosis=5.0), Table 5
The effect of a change in only the kurtosis of the process data can be observed by comparing the results in Table 5 with those in Table 4. Table 5 shows ρ values larger than one for negative shifts but not nearly as large as those in Table 4. We can infer that although the neural network once more exhibits superiority over the x¯-control chart, signaling as much as 82% faster for a -0.75 shift, the neural network’s ascendancy over the x¯-control chart is only moderate for an increase in kurtosis. Table 5: Comparison of ARL between and neural network (skewness=1.0, kurtosis=5.0) Shift 0.00 0.25 0.50 0.75 1.00 1.50 2.00 3.00 4.00
x¯-chart NN ρ 193.76 193.81 121.69 123.69 0.98 76.02 77.54 0.98 48.06 48.73 0.99 30.60 31.45 0.97 13.23 13.66 0.97 6.41 6.57 0.98 2.11 2.15 0.98 1.18 1.19 0.99
Shift x¯-chart NN ρ 0.00 -0.25 297.93 267.52 1.11 -0.50 336.35 223.12 1.51 -0.75 203.60 112.00 1.82 -1.00 88.08 49.66 1.77 -1.50 19.19 12.77 1.50 -2.00 6.46 4.85 1.33 -3.00 1.90 1.69 1.12 -4.00 1.18 1.14 1.04
12
Junsub Yi, Victor R. Prybutok and Howard R. Clayton
Table 6: Comparison of ARL between and neural network (skewness = 2, kurtosis = 8.8) Shift 0.00 0.25 0.50 0.75 1.00 1.50 2.00 3.00 4.00 4.4
x¯-chart NN ρ 116.86 117.04 77.69 77.55 1.00 52.81 52.43 1.01 36.03 35.74 1.01 24.84 24.72 1.00 12.28 12.32 1.00 6.36 6.36 1.00 2.27 2.27 1.00 1.2 1.20 1.00
Shift x¯-chart NN ρ 0.00 -0.25 175.65 57.31 3.06 -0.50 265.88 12.72 20.90 -0.75 400.02 6.31 63.39 -1.00 597.13 3.9 153.11 -1.50 74.26 2.2 33.75 -2.00 6.16 1.54 4.00 -3.00 1.77 1.13 1.57 -4.00 1.18 1.03 1.15
Case 9 (skewness=2, kurtosis=8.8), Table 6
Comparing the results in Table 6 with those in Table 5 gives an opportunity to observe the effect of an increase in both skewness and kurtosis of the process data. Table 6 shows a substantial increase in ρ values for negative shifts. Because the extent of the increase in ρ values is about the same as was observed when only the skewness was increased (Case 2 vs. Case 4), it is difficult to delineate the role of increased kurtosis here. We can say, however, that once more the neural network shows vast superiority over the x¯-control chart in detecting shifts in the process opposite to the direction of the skew. As for all the other cases, we observe no difference between the two monitoring systems in detecting positive shifts in the process.
5
Discussion and Conclusion
In this paper the performance of the neural network was evaluated by estimating the ARLs. The neural network approach presented here offers a competitive alternative to existing control schemes. Because the performance of the neural network depends on the selection of training samples, the results reported in this research are not necessarily optimal. However, we have demonstrated the feasibility of applying a neural network model to monitoring a process. The comparisons showed that use of the neural network model appeared to be a better control procedure for detecting sudden changes in the process mean than the x¯-control chart. The neural network developed in this study was primarily designed to detect a sudden shift in the process mean. Nevertheless there are other characteristics of a process, such as trends, cycles, systematic and stratification patterns and mixtures, that may cause one to regard the process as out-of-control. In future research these characteristics ought to be considered and might need to be redefined under nonnormality.
Comparisons Between Neural Network Models and x ¯-Control Charts
13
Many authors [2, 8, 14] suggest that neural networks offer advantages over traditional statistical methods. Among these advantages is the employment of a pattern recognition approach by neural networks as opposed to a complex statistical approach. Other advantages of neural networks over control charts for detecting out-of-control conditions are their flexibility and high speed computation. In practical terms this flexibility allows an analyst to design a desired control scheme corresponding to the particular production process. According to Hwarng and Hubele [11], flexibility in training and high-speed computation resulting from the parallel structure of the networks enable neural network models to be used in real-time applications. Even though several researchers [7, 10, 11, 14, 15, 16] have developed neural networks as alternatives to the standard x¯-control charts for monitoring process means, these studies are limited in number and the extent of their coverage. The studies were conducted under the assumption the process being monitored is normally distributed while in many manufacturing environments such an assumption may be invalid. An infinite number of potentially complicated nonnormal situations can develop from process data possessing different combinations of skewness and kurtosis. The need to better define approaches for handling nonnormal data was a major motivation for our research. One of the major contributions of this paper is to demonstrate a competitive alternative to standard control charts under conditions of nonnormality.
References [1] Aft, L. S. (1988): Quality Improvement Using Statistical Process Control. Harcourt Brace Jovanovich, San Diego. [2] Archer, N. and Wang, S. (1993): Application of the Back Propagation Neural Network Algorithm with Monotonicity Constraints for Two-Group Classification Problem. Decision Sciences 24, 60-75. [3] Burr, I. W. (1967): The effect of non-normality on constants for and R charts. Industrial Quality Control, May, 563-569. [4] Burr, I.W. (1973): Parameters for a general system of distributions to match a grid of α3 and α4 . Communications in Statistics 2, 1-21. [5] Collani, E.v. (1997): Determination of the Economic Design of Control Charts Simplified. In Optimization in Quality Control. Eds. Khaled S. Al-Sultan and M.A. Rahim. Kluwer, Boston, 89-143. [6] Crowder, S. V. (1987): A Simple method for studying run length distributions of exponentially weighted moving average charts. Technometrics 29, 401-407. [7] Davis, S. and Illingworth, B. (1989): Neural Network Simulation Applied to Statistical Process Control. Internal Paper, Texas Instruments Incorporated, Dallas.
14
Junsub Yi, Victor R. Prybutok and Howard R. Clayton
[8] Hill, T., Marquez, L., O’Connor, M. and Remus, W. (1994): Artificial Neural Network Models for Forecasting and Decision Making. International Journal of Forecasting 10, 5-15. [9] Ho, C., and Case, K. E. (1994): Economic Design of Control Charts: A Literature Review for 1981-1991. Journal of Quality Technology 26, 39-53. [10] Hwarng, H. B. and Hubele, N. F. (1993): Back-propagation Pattern Recognizers for x¯ Control Charts: Methodology and Performance. Computers Industrial Engineering 24, 219-235. [11] Hwarng, H. B. and Hubele, N. F. (1993): barX control chart pattern identification through efficient off-line neural network training. IIE Transactions 25, 27-38. [12] Keats, J. B. and Hubele, N. F. (1989): In Statistical Process Control in Automated Manufacturing. Marcel Dekker, New York. [13] Montgomery, D. C. (1991): In Design and Analysis of Experiments. 3rd ed, Wiley, New York. [14] Prybutok, V. R., Sanford, C. C. and Nam, K.T. (1994) A Comparison of Neural Network to Shewhart X-bar Control Chart Applications. Economic Quality Control 9, 143-164. [15] Pugh, G. A. (1989): Synthetic Neural Networks for Process Control. Proceedings of the 13th Annual Conference of Computers Industrial Engineering 17 (1-4), 24-26. [16] Pugh, G. A. (1991): A Comparison of Neural Networks to SPC Charts. Proceedings of the 15th Annual Conference of Computers Industrial Engineering 21 (1-4), 253255. [17] Rahim, M. A. (1985): Economic Model of x¯-Charts Under Non-normality And Measurement Errors. Computers and Operations Research 12, 291-299. [18] Ramsey, P. P. and Ramsey, P. H. (1990): Simple tests of normality in small samples. Journal of Quality Technology 22, 299-309. [19] ] Rigdon, S. E., Cruthis, E. N. and Champ, C. W. (1994): Design Strategies for Individuals and Moving Range Control Charts. Journal of Quality Technology 26, 274-287. [20] Runger, G. C. and Pignatiello, H. J. Jr. (1991) Adaptive sampling for process control. Journal of Quality Technology 23, 135-155. [21] Ryan, T. P. (1989): Statistical Methods for Quality Improvement. Wiley, New York. [22] Walker, E., Philpot, J. W. and Clement, J. (1991): False signal rates for the Shewhart control chart with supplementary runs tests. Journal of Quality Technology 23, 247-252.
Comparisons Between Neural Network Models and x ¯-Control Charts
15
[23] Yourstone, S. A. and Zimmer, W. J. (1992): Non-normality and the Design of Control Charts for Averages. Decision Sciences 23, 1099-1113.
Junsub Yi Dept. of Management Information Systems Kyungsung University 110-1 Daeyeon-dong, Nam-gu,Pusan 608-736, South Korea
[email protected] Victor R. Prybutok University of North Texas Denton TX 76203-5249
[email protected] Howard R. Clayton Auburn University Auburn, AL 36849-5241
[email protected]