The proposed procedures are based on sign-test statistics computed for each sample, and are used in Shewhart and cumulative sum control charts. When the ...
NONPARAMETRIC QUALITY CONTROL CHARTS BASED ON THE SIGN STATISTIC Key Words: Average Run Length, Cumulative Sum, Process Control, Sign Test, Sign Chart, Curtailed Sampling. Raid W. Amin Dept. of Math. & Statistics U. of West Florida Univ. Pensacola FL 32514
Marion R. Reynolds, Jr. Dept. of Statistics Virginia Polytechnic Instit. & State Blacksburg, VA 24061 and Saad Bakir Dept. of Business Administration Alabama State University Montgomery, AL 36101-0271
ABSTRACT Nonparametric procedures are presented for the problem of detecting changes in the process median (or mean), or changes in the process variability when samples are taken at regular time intervals. The proposed procedures are based on sign-test statistics computed for each sample, and are used in Shewhart and cumulative sum control charts. When the process is in control the run length distributions for the proposed nonparametric control charts do not depend on the distribution of the observations. An additional advantage of the non-parametric control charts is that the variance of the process does not need to be established in order to set up a control chart for the mean. Comparisons with the corresponding parametric control charts are presented. It is also shown that curtailed sampling plans can considerably reduce the expected number of observations used in the Shewhart control schemes based on the sign statistic. 1. INTRODUCTION For many years control charts have been used for detecting a shift in the center of the distribution of observations taken as a sequence of samples at regular intervals from a process. Most control charts for this problem are based on sample means and are designed on the assumption that the distribution of the observations is normal which in turn will imply that the distribution of the sample mean is normal. If the distribution of the observations is not normal then the central limit theorem is usually used to justify the assumption that the distribution of the sample mean is approximately normal. There are, however, several potential disadvantages with control charts based on sample means. When the actual distribution of the observations is not close to the normal distribution, the distribution of the sample mean can be far enough from the normal distribution so that normal theory calculations of properties of the chart are no longer reliable. Another problem is that the efficiency of the sample mean can be quite low for some non-normal distributions in the sense that the expected time to detect out-of-control situations is larger for the sample mean than for some other statistics. In addition, control charts based on the sample mean require that the variance of the process be either known
1
or accurately estimated. When a process is just starting up there may not be enough data to obtain a good estimate of the variance. This paper develops and compares control charts that use a sign statistic in place of the sample mean when the objective is to detect a shift in the center of the distribution of the observations. A chart based on a sign statistic is also developed for detecting increases in process variability. These nonparametric control charts are easy to understand and use and they avoid the disadvantages discussed above for the control charts based on the sample mean. In quality control applications there has been relatively little development and use of nonparametric control chart procedures. Attributes control charts have been widely used to control the fraction defective but apparently not to control the center of the process distribution. Parent (1965) and Reynolds (1972) developed control chart procedures based on the signed sequential ranks of the observations. These procedures involve the ranking of long sequences of observations, and the exact properties of the procedures are not easy to evaluate. McGilchrist and Woodyer (1975) developed a distribution-free CUSUM technique for use in monitoring rainfall amounts. Bakir (1977) and Bakir and Reynolds (1979) developed a nonparametric procedure using the Wilcoxon signed-rank statistic in a cumulative sum (CUSUM) control chart. The ranking was done within groups where the groups were either samples taken at each point or else artificially defined groups. Park and Reynolds (1987) developed procedures based on linear placement statistics for nonparametric Shewhart and CUSUM procedures. Amin and Searcy (1991) studied the behavior of the EWMA procedure for the Wilcoxon signed-rank statistic. Hackl and Ledolter (1991, 1992) also considered nonparametric control charts. The nonparametric procedures discussed in this article for controlling the process median (or mean) are based on signs computed within samples and used in place of the sample means in the Shewhart chart and the CUSUM chart. In the case of procedures based on signs, the ARL will be the same for all distributions with median .! . If the distribution is symmetric, then the mean and the median are the same if the mean exists. The nonparametric procedures thus have the advantage that the assumption of normality is not necessary for calculating the control limits of the charts. Another advantage is that the nonparametric procedures are usually more efficient than the procedure based on X when the distribution of the observations is “heavy-tailed", that is when observations in the tails of the distribution have a higher probability than for the normal distribution. This situation is common in many applications where occasional extreme values seem to occur more frequently than would be expected if the distribution was normal. An additional advantage of these nonparametric procedures is that the variance of the process does not need to be known or estimated in order to apply the control chart. In fact, the nonparametric procedures for controlling the median are not affected by changes in the variance as long as . is constant. Ghosh, Reynolds and Hui (1981) and Quesenberry (1993) have shown that using a small number of observations to estimate the process variance when setting action limits for X -chart can drastically alter the properties of the X -chart. Thus, nonparametric procedures may be particularly useful when a process is just starting up and it is desirable to apply a control chart before there is enough data to get a reasonable estimate of the variance. The effects of erroneously assuming normality are more severe in control charts for variability than in control charts for the center. To our knowledge there is no published work on nonparametric control procedures for the variance. This article proposes a variation of the sign statistic for purposes of controlling process variability.
2
2. THE EFFECT OF NON-NORMALITY ON X CHARTS In order to develop the terminology and framework for the problem, consider the situation in which samples of n observations are taken at regular intervals from a process. Let X3 = (X31 , X3# , . . . , X38 ) be the sample taken at the ith time point. The distribution of the observations will be assumed to be continuous with median .. The objective of the control chart is to detect quickly any shift in the value of . away from a specified control value, say .! , by using the information in the samples up to the current time. Control charts are usually evaluated and compared in terms of the average run length (ARL), which is the expected number of samples required by the procedure to signal that a shift in . has occurred. As long as . remains at .! the ARL should be large so that the frequency of false signals is low, but if . shifts from .! the ARL (counted from the time of shift) should be small so that the change is quickly detected. The Shewhart X -chart is based on the sample means X 1 , X 2 , , . . . where X 3 = !X34 /n. This chart signals that . has shifted from .! at the first i for which X i falls n
j=1
. The ARL, say L(.), outside .! „ a" , where the constant a" is frequently taken to be 35\ expressed as a function of ., can be computed as
L(.) = 1/P(|X 3 | .! + a" |.),
(2.1)
as long as the distribution of X i is known. If interest is in deviations in one direction only, then one of the action limits .! +a" or .0 a1 is used. When an upper limit only is used, the ARL will be denoted by L (.). The design of Shewhart charts using X i is usually based on the assumption that the distribution of X i will be approximately normal even if the distribution of the observations is not normal. Studies by Burr (1967) and Schilling and Nelson (1976) indicate that, unless the distribution of the observations is far from normal, the probability of a signal when the action limits are set under normality assumptions will still be small when . = .! . Although this probability of a signal may appear to be small in comparison with the commonly used probabilities for a Type I error in statistical hypothesis testing, there can be rather large differences between the ARL for non-normal distributions and the ARL calculated under the normality assumption. For distributions which are markedly non-normal, the distribution of X 3 may converge very slowly to the normal distribution or may not converge at all to the normal as in the case of the Cauchy distribution. Bradley (1971, 1973) gives examples where the convergence of X 3 to the normal is very slow. The normal approximation to the distribution of X 3 is usually particularly poor in the extreme tails of the distribution which are used to set the limits of the control chart. An additional problem with the design of X -charts arises when the variance of the process must be estimated from current data. Hillier (1969) and Yang and Hillier (1970) have shown how to choose the constant a" so that P(|X i | .! + a" |.! ) is a specified value when the variance is estimated. This specified value can, for example, be taken as limits are used and the variance is known. .0027, the probability of a signal when 35\ Although a specified probability of a signal may be obtained, the run length distribution is not geometric and the ARL is not the inverse of the signal probability. An additional w problem is the calculation of the constant a1 (or a1 and a1 for asymmetric distributions)
3
for the case of unknown variance is based on the t-distribution which assumes that the process distribution is normal. Thus if the distribution is far from normal, or the sample size used to estimate the variance is not large, the properties of these procedures are unknown. for the Table 1 gives the ARL of the Shewhart X -chart with n=10 and a" = 35\ uniform, normal, gamma, exponential, and Cauchy distributions. An upper action limit only is used and various shifts in . are considered. For simplicity, the ARL was calculated under the assumption that the variance of the distribution was known and not estimated. For the normal distribution the variance is taken as one without loss of = 1/È"! . The ARL for the uniform distribution was included as an generality so that 5\ example of the effect of a light tailed distribution on the ARL. The density of the uniform distribution can be expressed as f\ (x) =
" #- ,
. - < x < . + -,
(2.2)
8 D Xi4 is given by 4œ" Johnson and Kotz (1970) for the case . = - =1/È2 . The cdf of X i for the general case is
where . is the mean and - > 0 is the scale parameter. The cdf of
(x) = D F\ (-1)4 (84 ) 4D
ÐD4Ñ8 8x
, 0 < z < n,
(2.3)
where z = n( B#.-- ) and the summation is over all nonnegative integers strictly less than z. In the computations - was taken as È$ so that the variance is one. The ARL was computed for the double exponential distribution, which is a symmetric distribution with heavier tails than the normal. The density of this distribution is f\ (x) =
" #-
e ±B.±Î- , -_ < x < _,
(2.4)
where - > 0 is the scale parameter and . is the mean. A distribution with variance equal to one is obtained by letting - = 1/È#. The density of X 3 is given in Johnson and Kotz (1970) and the cumulative distribution function can be obtained from the density by integration and expressed as (x) = 1 F\
8 " 4"#8 #84# D 2 ( 8" ) P{G4" > x} 4œ!
(2.5)
when x > 0 and . = 0, where G4" is a gamma random variable with scale parameter -/n and shape parameter j + 1. The gamma distribution was also used since it is an asymmetrical distribution for which the distribution of X 3 can be easily evaluated. The density is f(x) =
/ÐB$ ÑÎ- ÐB$ Ñ/ " -8 > Ð/ Ñ
, x > $,
(2.6)
where $ is the location parameter, - > 0 is the scale parameter and / > 0 is the shape parameter. The parameter values used in Table 1 are -=1/2 and / =4 which gives a variance equal to one. The action limit is set at a" units above the mean. Using an action limit at a" units above the median would give lower ARL values. The exponential
4
distribution has - = 1, / = 1 and the control value is a" units above the mean. When the underlying distribution is the gamma given by (2.6) then the distribution of X " is also a gamma distribution with scale parameter -/n and shape parameter n/ . The Cauchy distribution (t-distribution with 1 degree of freedom) was also used because it is a symmetric distribution with extremely heavy tails and the distribution of \ i is the same as the distribution of X34 . The density of this distribution is f(x) =
1Ò-# ÐB.Ñ# Ó
,-_ 0 is a constant. Equivalently, it is possible to give decision rules based on the statistic Ti . Using SNi has the advantage of keeping the control limits symmetric about 0, while Ti has the advantage of being a standard binomial random variable. A one-sided scheme can be obtained by using an action limit in one direction only. The ARL for the two-sided case is 1/P(|SN3 | a# ),
5
where the probability can be easily computed for any distribution for which p can be evaluated. The in-control ARL will be the same for all distributions with median .! . Since the distribution of the observations is assumed to be continuous, P(X34 .! = 0) = 0 and the situation where sign (X34 .! ) = 0 (“zeros") should, in theory, never arise. In practice, of course, the observations may be rounded off and occasional zeros may be observed. As long as the zeros do not occur too often it is probably safe to just compute SN3 as defined and use the limits computed under the assumptions that there are no zeros. The sign test is very easy to apply since the numerical values of the observations need not be determined exactly, but only whether the observation is above or below .! . This test should be very useful when little is known about the distribution of the observations or when the distribution is known to be markedly non-normal. Since the magnitudes of the observations are not used, SN3 will be relatively inefficient compared with X 3 when the observations are near normal. On the other hand the efficiency advantage may be reversed for distributions with very heavy tails. The efficiency of Shewhart charts for detecting small shifts in the parameter of interest can usually be improved by introducing warning limits inside the action limits. For Shewhart charts using \ 3 , the warning limits are .! „ w" where w" is a constant satisfying 0 Ÿ w" < a" . A signal is given if r consecutive points fall between .! + w" and .! + a" or if any r consecutive points fall between .! a" and .! w" or if any point falls outside the action limits. The properties of \ -charts with warning limits have been studied by Page (1962) , Weindling, Littauer, and Oliveira (1979), and Champ and Woodall (1987). In the case where SN3 is used in the Shewhart chart, the warning limits are integers „ w# satisfying 0 Ÿ w# < a# . A signal is given if there are r consecutive points satisfying w# Ÿ SN3 < a# , or r consecutive points satisfying -a# < SN3 Ÿ -w# , or any point on or outside the action limits determined by a# . The ARL of a one-sided chart with warning and action lines in the positive direction only can be determined as in Page (1962) as L (.) =
": -w# |.} substituted for p! and P{[-a# < SN3 Ÿ -w# |.} substituted for p" . Table 2 gives L (.! ) for various values of a# , w# , and r. The value for n was taken as 10 in the sign charts for controlling the median. ARL values for other values of n can be constructed using the methods described here. Because L (.! ) for the Shewhart chart using SN3 depends on the distribution of the observations only through p, the values of L (.! ) are the same for any distribution with p = "# . The ARL of a symmetric two-sided scheme is L (.! )/2 when p = "# . Unless n is of moderate size, it may be difficult to achieve even approximately a specified value of L (.! ). In fact, the largest possible
6
value of L (.! ) when n = 10 is 1024, which is obtained when a# = 10 and there is no warning line. If w# is close to a# and r is reasonable large, the introduction of warning limits will have little effect on L (.! ) but can significantly reduce L (.) for small shifts in .. Table 3 compares three procedures, one without warning limits and two with warning limits. The ARL is expressed as a function of p. The values of . that gives a specific value of p will, of course, depend on the form of the underlying distribution of the observations. The presence of zeros alters the distribution of SN3 but the effect should not be large as long as the probability of a zero or tie is small. Thus, in this case, the same control limits can be used without significantly affecting the properties of the procedure. 3.2 A Shewhart Sign Chart for Controlling Process Variability When it is necessary to control the variance as well as the center of the distribution, then a chart for variability will usually be used, such as the R, S, or the S2 charts for monitoring the process variance 5 2 (see Wetherill and Brown 1991). The R chart is less efficient than the corresponding S2 chart when the underlying distribution is normal. The S2 chart for controlling an increase in process variability gives a signal if S2 exceeds the ^ 2 ;!2 , where ;2! denotes the upper ! percentage points of the chi-square control limit a3= 5 ^ 2 is an estimate of the process variance. distribution with (n-1) degrees of freedom, and 5 It is a common practice to estimate 5 from process data, and then to use the appropriate percentage point of the chi-square distribution in setting up the control limit. The probability limits above are only appropriate with normal data, and the ARL for the S2 chart depends heavily on using the correct constants in setting up the control limits. We are unaware of any published work on the appropriate control limits of control charts for variablility when the underlying distribution is non-normal. The effects of non-normality are more severe in control charts for variability than in the case of charts for location. The variance of S2 is given by V(S2 ) =
254 (n-1)
Š1+
#2 (n-1) 2n
‹,
(3.5)
where #2 is the coefficient of kurtosis. Wetherill and Brown (1991) point out that the kurtosis of the original distribution can have a large effect on the variance of S2 , and that this effect does not disappear with an increase of sample size. Nonparametric charts for variability clearly are needed here, but little work on this topic has been published. It is possible to adapt nonparametric tests for the equality of two variances (see Lehmann (1975)) for use as control statistics in nonparametric control charts for variability. Control charts using tests statistics for comparing two variances would require obtaining an initial sample (of size m) when the process is considered to be in-control. Then at each sample time i, a sample of size n is obtained from the process, and the pooled sample of size m+n is obtained. The observations in the pooled sample then are ranked from smallest to largest, and some statistic based on the ranks of the observations is calculated. Such control procedures would be complicated from a user's perspective, and it seems not to be possible to obtain a charting procedure that is simple to use for purposes of controlling process variability this way. Another approach to this problem is Westenberg's Interquartile Range Test as given in Bradley (1968). This test is based on the number of observations above the third quartile (Q3 ) and below the first quartile (Q1 ), where Q1 and Q3 are correspond to the in-
7
control distribution. In practice, Q1 and Q3 would need to be specified by process engineers or more likely estimated from process data when the process is in control. Let Uij =1 if Xij Q3 , 0 if Xij =Q1 or Xij =Q3 , and 1 if Q1 3œ" 3œ"
(4.1)
where h > 0 and k > 0 are parameters of the procedure. A one-sided procedure for detecting negative deviations signals at the first t for which ? > 7+B D (SN3 + k) D (SN3 + k) h. !Ÿ?Ÿ>3œ" 3="
(4.2)
The corresponding two-sided procedure signals at the first t for which either of the onesided procedures signals. An alternate and equivalent way to apply the CUSUM chart involves the use of a graphical V-mask scheme (see, for example, Van Dobben de Bruyn (1968)). If k and h are non-negative integers then the above one-sided positive procedure
9
is equivalent to a discrete time Markov chain {SN‡> , t = 0, 1, 2, . . .} with the state space a subset of {0, 1, 2, . . . , h}, where SN‡! = 0 and SN‡> = min{h, max {0, SN‡>" + {SN> - k}},
(4.3)
where the state h is an absorbing state, and absorption corresponds to a signal by the procedure. The ARL of the CUSUM chart using SN3 can thus be determined from the mean w absorption times for the state h. Let 7 = (m! , m" , . . . , m2" ), where m4 is the mean absorption time, given that the chain started initially in state j. If the CUSUM chart starts with S*0 =0, then the ARL is just m! . If Q is the hxh matrix of transition probabilities for the nonabsorbing states of the Markov chain, I is the hxh identity matrix, and " is an hx1 vector with all elements being unity, then it is well known that 7 is given by (I-Q)-" ". The transition probabilities for the Markov chain can be easily computed since the distribution of SN3 can be obtained from the binomial distribution. The value of the ARL for the CUSUM chart using SN3 depends on the values of the parameters h and k. One approach to selecting h and k is to choose the parameter values that minimize L (." ) subject to maintaining a specified value of L (.! ), where ." is a value of . that is considered as a significant shift. The optimal value for k is then approximately k = "# E[SN3 |." ], (see Reynolds (1975)). Using this value of k, the value of h should then be chosen to achieve the desired value of L (.! ). Table 4 gives the optimal values of k for various values of ." - .! when n=10 for the uniform, normal, double exponential, Cauchy, and gamma distributions. The scale parameter values are the same as the values used in the section on X -charts so that the values of ." -.! can be considered to be in units of standard deviation. Except for the Cauchy distribution, the values of k do not differ very much for the various distributions. The specification of ." is usually not that precise in practice so using the k-values for the normal distribution will not lead to large errors. The normal k-values rounded to the nearest integer are given in the last column of Table 4. Table 5 gives values of L (.! ) for various values of h and k when n = 10. Note that when k is an even integer the state space consists of even integers, and the ARL for odd values of h will be the same as the ARL for the next even integer. Recall that for the Shewhart charts using SN3 it was necessary to have n of moderate size in order to have a reasonably large value of L (.! ). For the CUSUM chart however, the value of n can be smaller since the procedure is based on a cumulative sum of statistics from individual samples and h can be chosen large enough to give arbitrarily large values of L (.! ). The disadvantage of small samples for the CUSUM chart using SN3 is that it is not possible for the procedure to signal after only one sample if n < h+k. A similar CUSUM procedure can be developed for controlling process variability after substituting Vi for SNi in (4.1) and (4.2). The problem of choosing an appropriate value for k needs a separate investigation. 5. COMPARISONS OF PROCEDURES Comparisons of the procedures that have been proposed were carried out by computing the ARL of each of the nonparametric procedures along with the ARL of the corresponding parametric procedure based on X 3 and S2i for several underlying distributions. When .=.! the ARL for the sign chart for controlling the process center is
10
independent of the underlying distribution, whereas the sign chart for controlling process variability usually requires that Q1 and Q3 be estimated from the data when the process is in control. The limits for the parametric procedures were adjusted for each distribution so that the procedures being compared in each case have the same in-control ARL value. When simulation was used to estimate the ARL, the values of L (.! ) were only approximately equal. By keeping L (.! ) constant and comparing values of L (.) for . > .! or 5 >50 , the relative efficiency of procedures in detecting shifts of various magnitudes can be assessed. It should be noted that one of the main advantages of the sign chart is that the correct in-control ARL can be obtained easily for any underlying distribution, whereas the parametric control charts were matched to the corresponding sign charts for comparison purposes. It is reasonable to assume that many users of control charts would have used the incorrect control limits that are based on normality, even when in fact the true distribution is non-normal. Table 6 gives the ARL values for two-sided Shewhart charts, and Table 7 gives values of L (.) for Shewhart charts using SN3 and X 3 for various distributions for the case n=10. These distributions were scaled in the same way as in Section 2 where, except for the Cauchy distribution, the variances were taken as one. The values of L (.) will also apply to situations where the variance is not one if the values of a" and . .! are considered to be in units of the standard deviation of the underlying distribution. From Tables 6 and 7 it is clear that the chart using X 3 is much more efficient than the chart using SN3 for the normal and the light tailed uniform distributions. However, for detecting small shifts for the heavy tailed distributions such as the double exponential, Cauchy and gamma, the chart using SN3 is more efficient. When the distribution is clearly asymmetric (gamma2 and gamma3), the nonparametric chart is more efficient. For detecting large shifts, SN3 is not as efficient as X 3 for the double exponential but more efficient for the Cauchy. Table 8 is similar to Table 7 except that warning limits are used and L Ð.! Ñ is 593.7 for both procedures. The warning limits are set so that p! and p" are the same for both procedures. Again X 3 is more efficient than SN3 for the uniform and normal distributions and less efficient for the Cauchy distribution. For the double exponential distribution X 3 is more efficient for large shifts and less efficient for small shifts. For the gamma distribution there is not much difference between the performance of X 3 and SN3 . In Table 9 the ARL values for the Shewhart nonparametric chart for variability are compared to those of a corresponding S2 chart for the normal, gamma, and double exponential distributions for n=7. The ARL values of the S2 chart for non-normal distributions were obtained by simulations based on 10,000 runs each. It is illustrated in (*) that the in-control ARL for the S2 chart changes from 128 for the normal distribution to 39.1 and 4.9 for the gamma and double exponential distributions respectively, when 5 is estimated correctly but the control limits are based on constants that assume normality. The ARL values for the cases 5 >50 seem to indicate that the S2 chart is more efficient than the proposed nonparametric chart, but this is only true if the in-control ARL is correct. The numbers in (**) give the in-control ANOS values when curtailed sampling plans are used. Clearly, the nonparametric procedure with curtailed sampling has the advantage of using fewer observations on the average when the process is in control. In the normal case, an in-control ARL of 128 requires on the average 254 observations for the sign chart compared to 896 observations for the S2 charts. The one-sided Shewhart sign charts with c = n have an in-control expected sample size E(() Ÿ 2. It is the in-control case in which most of the observations would be taken if the process is in control most of the time, and it is reasonable to compare the
11
nonparametric chart for variability to a corresponding S2 chart such that both charts have the same in-control ANOS. The S2 chart with n=2 has slightly higher efficiency than the nonparametric chart, as shown in the last column of Table 9. Table 10 gives a comparison of ARL values for the Shewhart X chart and the sign chart for location with curtailed sampling for the normal distribution and the gamma distribution. The X chart was matched to the sign chart with curtailed sampling such that both procedures had the same in-control ARL and roughly the same in-control ANOS. Clearly the sign chart is more efficient than the X chart in this comparison. It is true that E((|.) > E((|.0 ) for . Á .0 , but most savings in observations are obtained when the process is in control. The sign chart does considerably better than the S2 chart when the underlying distribution is nonnormal, as in the case of the (slightly) asymmetric gamma distribution with scale parameter / =4 and shape parameter -=1/2. ARL values for the CUSUM chart using SN3 and X 3 are given in Table 11. The values of k were chosen to be optimal for a shift of approximately one unit and the value of L (.! Ñ for SN3 was taken as 1074.2. The ARL for the CUSUM chart using X 3 was simulated using 1000 runs. For this reason it was not possible to adjust the parameter h in the CUSUM chart to get L Ð.! Ñ exactly equal to 1074.2. The efficiency of SN3 relative to X 3 in the CUSUM chart follows the same general pattern the efficiency of SNi relative to the Shewhart X chart. However one anomaly is that SN3 is more efficient than X 3 for small shifts for normal observations. It was found that L Ð.Ñ for . .! was sensitive to the choice of h and k used to achieve the specified value of L (.! Ñ. By using a smaller k and larger h the performance of X 3 for small shifts can be improved considerably, although performance for large shifts will deteriorate slightly. The results also indicate that the CUSUM chart using SN3 is more efficient than Shewhart charts using SN3 whether or not warning limits are used. 6. DISCUSSION AND CONCLUSIONS From the results that have been presented here as well as from results in other studies, it seems clear that if the distribution of the observations is close to the normal and the sample size is not too small then the distribution of X will be close to the normal and the resulting ARL calculations based on the normal will be approximately correct. In addition X will be reasonably efficient in detecting shifts in .. This is not the case with the distribution of S2 when the distribution of the observations is non-normal. In cases where the distribution of the observations is heavy tailed, the nonparametric procedures based on signs have the advantage of fixed ARL when in control and high efficiency in detecting shifts in .. Thus it seems reasonable to recommend procedures based on signs when dealing with a very heavy tailed or highly skewed distribution. Note that the sign test can be applied to control the mean instead of the median, but then the in-control ARL will not be exactly the calculated value in the same way that the actual in-control ARL for the X -chart will not be exactly equal to the normal theory value. These nonparametric procedures appear to be most useful when relatively large samples are being used to detect small changes in .. Duncan ((1974), p. 449) points out that samples larger than the usual 4 or 5 are desirable for detecting small changes. An additional advantage of the nonparametric procedures for monitoring the process center is the fact that the variance does not need to be known or estimated in order to set up the procedures. Problems caused by an incorrect specification of the variance in setting up the parametric procedures are thus eliminated. As long as .
12
remains at .! the nonparametric procedures are completely unaffected by changes in the variance of the process. If it is necessary to control the variance as well as the center of the distribution, then the nonparametric procedures, as well as the X -chart, need to be used in conjunction with a chart for the variance. Charts using the sample range or standard deviation have traditionally been used with the X -chart but these charts are very sensitive to the normality assumption. In this article we have introduced a simple nonparametric chart for the variance. The nonparametric chart for variability is very useful for controlling process variability when the distribution of the observations is non-normal and/or skewed. The effect of non-normality on the ARL values of the control charts for variability is far more severe than the effect on charts for location. No properties of nonparametric control charts for variability have been published so far, and no suggestions have been offered on how to adjust the constants for the control limits when the distributions are non-normal. As far as convenience of use is concerned, the proposed nonparametric procedures should be as easy to set up and apply as the corresponding parametric procedures. In fact the sign test is by far the easiest procedure to use. Thus the nonparametric procedures seem to offer an attractive alternative to the standard procedures for cases where normality can not reasonably be assumed.
13
APPENDIX Proof that E((|.0 ) Ÿ 2 when p=0.5 in one-sided Shewhart sign charts with c=n: Setting c=n in equation (3.7) gives E((|.0 )
= ! k (0.5)k + n (0.5)n n
k=1
= (0.5) !k (0.5)k-1 + n (0.5)n n
k=1
=(0.5) !k hk-1 |h=0.5 + n (0.5)n n
k=1
=(0.5) ! hk |h=0.5 + n (0.5)n n
k=1
=(0.5)
d dh
Ö!hk } |h=0.5 + n (0.5)n n
k=1
d dh
š 11h h 1›|h=0.5 + n(0.5)n n+1
=
(0.5)
=
(0.5) š -(n+1)h (1(1h)h)(12h
=
(0.5)
=
2 (n+1)(0.5)n1 + n (0.5)n + n(0.5)n
=
2 (0.5)n1 2 for n 1.
n
n+1
1-(n+1)(0.5)n+ n(0.5)n+1 0.25
)(1)
›|h=0.5 + n(0.5)n
+ n(0.5)n
lim E((|.0 ) = 2 lim (0.5)n1 =2. n p_ np_
REFERENCES Amin, R. W. and Searcy, A. J., “A Nonparametric Exponentially Weighted Moving Average Control Scheme," Communications in Statistics: Simulation and Computation, Vol. 20, No. 4, 1991, pp. 1049-1072. Arnold, H. J., “Small Sample Power of the One Sample Wilcoxon Test for Non-Normal Shift Alternatives, "Annals of Mathematical Statistics, Vol. 36, No. 6, 1965, pp. 1767-1778. Bakir, S. T., “Nonparametric Procedures for Process Control," Ph.D. Dissertation,
14
1977, Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia. Bakir, S. T., and Reynolds, M. R., Jr., “A Nonparametric Procedure for Process Control Based on Within-Group Ranking," Technometrics, Vol. 21, No. 2, 1979, pp. 175183. Bradley, J. V., Distribution-Free Statististical Tests, Prentice-Hall, New Jersey, 1968. Bradley, J. V., “A Large-Scale Sampling Study of the Central Limit Effect," Journal of Quality Technology, Vol. 3, No. 2, 1971, pp. 51-68. Bradley, J. V., “The Central Limit Effect for a Variety of Populations and the Influence of Population Moments," Journal of Quality Technology, Vol. 5, No. 4, 1973, pp. 171-177. Burr, I. W., “The Effect of Non-Normality on Constants for \ and R Charts," Industrial Quality Control, Vol. 23, No. 11, 1967, pp. 563-568. Champ, C. W. and Woodall, W. H., “Exact Results for Shewhart Control Charts with Supplementary Runs Rules," Technometrics, Vol. 29, pp. 393-399, (1987). Duncan, A. J., Quality Control and Industrial Statistics, 4th Ed., Richard D. Irwin, Inc., Homewood, Illinois, 1974. Ghosh, B. K., Reynolds, M. R., Jr. and Hui, Y. V., “Shewhart \ -Charts With Estimated Process Variance," Communications in Statistics, Vol. A10, No. 18, 1981, pp. 1797-1822. Hackl, P. and Ledolter, J., “A Control Chart based on Ranks," Journal of Quality Technology, Vol. 23, 1991, pp. 117-124. Hackl, P. and Ledolter, J., “A New Nonparametric Quality Control Technique,"Control Chart based on Ranks," Communications in Statistics, Vol. B21, 1992, pp. 423443. Hillier, F. S., “\ - and R-Chart Control Limits Based on a Small Number of Subgroups," Journal of Quality Technology, Vol. 1, No. 1, 1969, pp. 17-26. Johnson, N. L., and Kotz, S., Continuous Univariate Distributions-2, Houghton Mifflin Company, Boston, Massachusetts, 1970. Lehmann, E. L., Nonparametrics: Statistical Methods Based on Ranks, Holden-Day, San Francisco, California, 1975. McGlichrist, C. A. and Woodyer, K. D., “Note on a Distribution-free CUSUM Technique", Technometrics, Vol. 17, 1975, pp. 321-325. Page, E. S., “Continuous Inspection Schemes," Biometrika, Vol. 41, 1954, pp. 100-114. Page, E. S., “A Modified Control Chart with Warning Lines," Biometrika, Vol. 49, 1962, pp. 171-176. Parent, E. A., Jr., “Sequential Ranking Procedures," Technical Report No. 80,
15
1965, Department of Statistics, Stanford University, Stanford, Calfornia. Reynolds, M. R., Jr., “A Sequential Nonparametric Test for Symmetry with Application to Process Control," Technical Report No. 148, 1972, Department of Operations Research and Department of Statistics, Stanford University, Stanford, California. Reynolds, M. R., Jr., “Approximations to the Average Run Length in Cumulative Sum Control Charts," Technometrics, Vol. 17, No. 1, 1975, pp. 65-71. Schilling, E. G., and Nelson, P. R., “The Effect of Non-Normality on the Control Limits of X Charts, Journal of Quality Technology, Vol. 8, No. 4, 1976, pp. 183-188. Van Dobben de Bruyn, C. S., Cumulative Sum Tests, Hafner, New York, 1968. Wetherill, G. B. and Brown, D. W., Statistical Process Control, Chapman and Hall, New York, 1991. Weindling, J. I., Littauer, S. B., and De Oliveira, J. T., “Mean Action Time of the \ Control Chart with Warning Limits," Journal of Quality Technology, Vol. 2, No. 2, 1970, pp. 79-85. Williams, W. W., Looney, S. W., and Peters, M. H., “Use of Curtailed Sampling Plans in the Economic Design of np-Control Charts," Technometrics, Vol. 27, No. 1, 1985, pp. 57-63.. Yang, C., and Hillier, F. S., “Mean and Variance Control Chart Limits Based on a Small Number of Subgroups," Journal of Quality Technology, Vol. 2, No. 1, 1970, pp. 9-16. Quesenberry, C. P., “The Effect of Sample Size on Estimated Limits for X and X Control Charts," Journal of Quality Technology, Vol. 25, No. 4, 1993, pp. 237-247.
LIST OF TABLES TABLE 1. TABLE 2. TABLE 3. TABLE 4. TABLE 5. TABLE 6. TABLE 7. TABLE 8. TABLE 9.
Values of L+(.Ñ for Shewhart Charts Using X 3 when n = 10 and a" =35\ for Various Distributions. Values of L Ð.! Ñ for Shewhart Charts Using SN3 with n=10. Values of L Ð.Ñ as a Function of p for Shewhart Charts Using SN3 with n=10. Approximate Optimum Values of k for the Cusum Chart Using SN3 when n=10. Values of L (.! ) for the Cusum Chart Using SN3 when n=10. ARL Values for Two-sided Shewhart SN3 and X 3 Charts for Various Distributions when n=10 and LÐ.! Ñ = 512. ARL Values for One-sided Shewhart SN3 and X 3 Charts for Various Distributions when n=10 and L Ð.! Ñ = 1024. ARL Values for Shewhart SN3 and X 3 Charts with Warning Limits for Various Distributions when n=10 and L Ð.! Ñ = 593.7. ARL Values for Shewhart SN3 and S2 Charts for Variability for Various
16
Distributions when n=7 and L Ð.! Ñ = 128. TABLE 10. ARL Values of the Two-sided SN Chart with Curtailed Sampling Plans. TABLE 11. ARL Values for the Cusum Chart using SN3 and X 3 for Various + Distributions when n=10 and L (.0 ) = 1074.2
17
TABLE 1. Values of L (.) for Shewhart Charts Using \ 3 when for Various Distributions. n=10 and a" =35\ Shift (. .! Ñ .00 .25 .50 1 .00 2 .00
Uniform
Normal
1068.7 78.2 12.7 1.8 1.0
740.8 73.7 12.8 1.8 1.0
Distribution Double Exponential Gamma 441.9 65.8 13.2 1.8 1.0
Exponential
Cauchy
148.9 38.0 11.4 1.9 1.0
11.7 8.8 6.0 1.8 1.1
268.3 49.1 11.9 1.8 1.0
TABLE 2. Values of L Ð.! Ñ for Shewhart Charts Using SN3 when n=10. a# = 8 w# r 2 3 4 5 6 7
0
2
4.1 7.9 13.5 21.2 31.0 42.3
9.2 23.0 44 .7 66.9 81.5 88.5
4 3 0.2 70.1 88.4 92.3 93.0 93.1
a# = 10 6
2 79.4 92.4 93 .1 93.1 93.1 93.1
9.6 27.8 73 .0 175 .4 364.4 609.8
18
4 38.6 194.7 593..7 911.2 1002.0 1020.3
6
8
269.2 890.3 1015.8 1023.6 1024.0 1024.0
933.7 1023.0 1023.0 1024.0 1024.0 1024.0
TABLE 3. Values of L Ð.Ñ as a Function of p for Shewhart Charts Using SN3 When n=10. p
0.50
0.60
0.70
a# =10
1024.0
165.4
35.4
a# =10, w# =4, r=6
1002.8
127.8
a# =10, w# =6, r=4
1015.8
151.1
0.80
0.90
0.95
9.3
2.9
1.7
19.5
5.9
2.7
1.7
25.9
6.3
2.5
1.6
TABLE 4. Approximate Optimum Values of k for the Cusum Chart Using SN3 when n=10 Shift (." .! Ñ .25 .50 1.00 2.00 3.00
Uniform 0.72 1.44 2.88 5.00 5.00
Normal 0.99 1.91 3.41 4.77 4.98
Distribution Double Exponential 1.49 2.53 3.78 4.70 4.93
19
Cauchy 1.22 1.74 2.09 2.29 2.36
Gamma 1.09 2.20 4.11 5.00 5.00
Rounded Normal Values 1 2 3 5 5
TABLE 5. Values of L Ð.! Ñ for the Cusum Chart Using SN3 when n=10. k h 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
2
5.6 9.1 11.8 16.9 22.8 30.1 39.9 51.1 65.9 83.7 105.8 133.3 166.8 208.3
14.4 14.4 36.8 36.8 91.6 91.6 216.3 216.3 499.5 499.5 1147.8 1147.8 2623.3 2623.3
3
4
17.8 44.6 67.0 148.0 282.5 519.0 1074.2 1886.4 3663.9 6968.4 13030.2 25236.3 47128.0 89762.1
5
78.0 78.0 464.9 464.9 3166.5 3166.5 17931.9 17931.9
6
92.4 521.5 875.6 4756.3 23143.0 41364.0
7
930.0 930.0 45525.1 45525.1
1023.0 50083.8
TABLE 6. ARL Values for Two-sided Shewhart SNi and X 3 Charts for Various Distributions when n=10 and LÐ.! Ñ = 512 Double Exponential
Normal
Gamma1
Shift
SN3
X3
SN3
X3
SN3
(. .! Ñ
a# =10
a" =.945
a# =10
a" =.979
a# =10
w a1 =0.555 0 .25 .50 1.00 2.00
512.0 166.0 40.0 5.6 1.3
Gamma1 : / =8, -=1/È8 Gamma2 : / =4, -=1/2 Gamma3 : / =2, -=1/È2
512.0 94.5 15.4 1.9 1.0
512.0 75.4 17.0 3.7 1.4
512.0 135.9 24.3 2.3 1.0
512.0 167.2 43.4 7.2 1.6
Gamma2 X3
a" =1.198 w a1 =.762
512.0 137.7 25.6 2.6 1.0
SN3
X3
a# =10
a" =1.288 w a1 =.675
512.0 164.1 43.4 7.5 1.7
512.0 157.0 31.1 3.1 1.2
TABLE 7. ARL Values for One-sided Shewhart SN3 and X 3 Charts for Various Distributions when n=10 and L Ð.! Ñ = 1024
20
Gamma3 SN3
X3
a# =10 a" =1.412
512.0 156.1 41.8 7.8 1.9
512.0 184.8 40.1 3.8 1.0
Uniform
Double Exponential
Normal
Shift
SN3
X3
SN3
X3
SN3
(. .! Ñ
a# =10
a" =.945
a# =10
a" =.979
a# =10
1024.0 169.0 40.0 5.6 1.3
1024.0 94.9 15.4 1.9 1.0
1024.0 75.5 17.0 3.7 1.4
0 .25 .50 1.00 2.00
1024.0 265.9 81.1 10.7 1.0
1024.0 75.9 12.4 1.8 1.0
X3 a" =1.05 1024.0 137.9 24.3 2.3 1.5
Cauchy
Gamma2
SN3
X3
SN3
X3
a# =10
a" =84.9
a# =10
1024.0 19.4 5.3 2.3 1.5
1024.0 1021.0 1018.0 1011.9 999.9
1024.0 143.1 26.6 2.5 1.0
a" =1.12 1024.0 157.2 31.1 3.0 1.0
TABLE 8. ARL Values for Shewhart SN3 and \ 3 Charts with Warning Limits for Various Distributions when n=10 and L Ð.! Ñ = 593.7 Uniform
Double Exponential
Normal
Shift
SN3
X3
SN3
X3
SN3
(. .! Ñ
a# =10 w# =4 r=4
a" =.945 w" =.303 r=4
a# =10 w# =4 r=4
a" =.979 w" =.299 r=4
a# =10 w# =4 r=4
0 .25 .50 1.00 2.00
593.7 116.5 23.5 4.6 1.0
593.7 31.2 6.4 1.7 1.0
593.7 82.8 7.9 1.6 1.3
593.7 33.0 6.9 1.6 1.0
593.7 21.7 6.1 2.7 1.3
X3 a" =1.05 w" =.292 r=4 593.7 34.8 7.1 2.1 1.0
Cauchy SN3
X3
SN3
a# =10 w# =4 r=4
a" =84.9 w" =.435 r=4
a# =10 w# =4 r=4
593.7 6.7 3.2 2.1 1.5
593.7 145.2 18.6 5.9 4.6
TABLE 9. ARL Values for Shewhart SN3 and S2 Charts for Variability for Various Distributions when n=7 and L Ð.! Ñ ¶ 128 Normal
Gamma
Double Exponential
21
Gamma2
Normal (n=2)
593.7 44.6 8.4 2.2 1.0
\3 a" =1.12 w" =.296 r=4 593.7 41.2 7.9 2.4 1.0
S2 a3 =2.906
51 /50
V3 c=7
(*) (**) 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
(254.0) 128.0 74.9 48.7 34.1 25.4 19.8 15.9 13.2 11.2 9.7 8.6
(128.0) (896.0) 128.0 39.4 16.8 8.9 5.7 3.9 3.0 2.4 2.0 1.8 1.6
V3 c=7
S2 a3 =3.86(1)
V3 c=7
(39.1) (248.0) (878.5) 128.0 125.5 72.6 50.0 44.4 24.5 29.0 14.5 20.1 9.6 14.7 6.8 11.1 5.3 9.1 4.4 7.7 3.7 6.7 3.2 6.0 2.8
S2 a3 =9.0(1)
S2 a3 =7.08
(4.9) (254.0) (886.0) 128.0 126.6 82.4 60.3 57.0 33.4 41.8 20.6 32.0 13.5 25.4 9.5 20.8 7.0 17.4 5.6 14.8 4.5 12.9 3.8 11.3 3.2
(256.0) 128.0 64.1 37.5 24.6 17.4 13.1 10.4 8.5 7.2 6.2 5.5
(1) The constant a3 and the ARL were obtained by simulations based on 10,000 runs. (*) In-control ARL values when the control limits are based on normality assumption. (**) In-control ANO values for each distribution.
TABLE 10. ARL Values of the Two-sided Sign Chart with Curtailed Sampling Plans and X Chart with corresponding n. Shift (. .0 )
0.00 0.25 0.50 1.00 2.00
SN(1) Xi i n=10 n=3 c=10 Normal 512.0 512.0 166.0 246.0 40.0 77.7 5.6 11.6 1.3 1.6
SN(2) Xi i n=14 n=6 c=13 Normal 546.1 546.1 125.7 160.4 24.2 34.2 3.1 4.0 1.0 1.0
SN(1) Xi i n=10 n=3 c=10 Gamma(3) 512.0 512.0 164.1 423.9 43.4 184.2 7.5 39.0 1.7 3.4
SN(2) Xi i n=14 n=6 c=13 Gamma(3) 546.1 546.1 124.0 273.1 26.5 77.2 4.1 9.3 1.5 1.1
(1) : E((|.0 ) Ÿ 3.0 (2) : E((|.0 ) Ÿ 5.5 (3) : Gamma with shape parameter 4 and scale parameter 0.5.
TABLE 11. ARL Values for the Cusum Chart using SN3 and X 3 for Various Distributions when n=10 and L Ð.! Ñ = 1074.2
Uniform
Double Exponential
Normal
22
Cauchy
Gamma
Shift (. .! Ñ 0 .25 .50 1.00 2.00
SN3 h=9 k=3 1074.2 94.8 17.3 3.8 2.0
X3 h=.415 k=.55 1091.8 57.3 7.8 1.6 1.0
SN3
X3
SN3
h=9 k=3
h=.451 k=.55
h=9 k=3
1074.2 46.2 8.6 2.9 2.0
1074.31 68.76 8.95 1.67 1.00
1074.2 16.0 4.8 2.5 2.0
23
X3 h=.530 k=.55 1053.0 103.0 11.5 1.8 1.0
SN3
X3
SN3
X3
h=9 k=3
h=15.62 k=.55
h=9 k=3
h=.61 k=.55
1074.2 5.2 2.8 2.2 2.0
1074.2 581.1 211.1 33.9 11.4
1076.5 36.2 6.3 2.3 2.0
1048.6 104.5 13.9 2.1 1.0