A New Distributionfree Control Chart for Joint

Research Article (wileyonlinelibrary.com) DOI: 10.1002/qre.1488

Published online in Wiley Online Library

A New Distribution-free Control Chart for Joint Monitoring of Unknown Location and Scale Parameters of Continuous Distributions S. Chowdhury,a A. Mukherjeeb and S. Chakrabortic*† While the assumption of normality is required for the validity of most of the available control charts for joint monitoring of unknown location and scale parameters, we propose and study a distribution-free Shewhart-type chart based on the Cucconi1 statistic, called the Shewhart-Cucconi (SC) chart. We also propose a follow-up diagnostic procedure useful to determine the type of shift the process may have undergone when the chart signals an out-of-control process. Control limits for the SC chart are tabulated for some typical nominal in-control (IC) average run length (ARL) values; a large sample approximation to the control limit is provided which can be useful in practice. Performance of the SC chart is examined in a simulation study on the basis of the ARL, the standard deviation, the median and some percentiles of the run length distribution. Detailed comparisons with a competing distribution-free chart, known as the Shewhart-Lepage chart (see Mukherjee and Chakraborti2) show that the SC chart performs just as well or better. The effect of estimation of parameters on the IC performance of the SC chart is studied by examining the influence of the size of the reference (Phase-I) sample. A numerical example is given for illustration. Summary and conclusions are offered. Copyright © 2013 John Wiley & Sons, Ltd. Keywords: Cucconi test; effects of estimation; Monte-Carlo simulation; nonparametric; Phase-I; Phase-II; Shewhart-Lepage chart; run length distribution

1. Introduction ost of the available parametric control charts (see for example, Gan,3 Chao and Cheng,4 Chen and Cheng,5 Chen et al.,6 Costa and Rahim7 and Zhang et al.8) for jointly monitoring the mean and the variance of a distribution are based on the assumption of normality. Cheng and Thaga9 provided a review of this literature up until about 2005. Given the huge amount of activity in this area, McCracken and Chakraborti10 recently provided an updated overview through 2011. Whereas the normal theory charts are useful, it is well recognized that the assumption of normality can be hard to justify in some practical situations. In those contexts, distribution-free (or nonparametric) charts are useful and expected to be superior. For a detailed discussion on nonparametric control charts, see Chakraborti et al.11–13 Some nonparametric joint monitoring schemes that are available in the literature are one-chart schemes. Zou and Tsung14 proposed an EWMA control chart based on Zhang’s goodness-of-fit test. In practice, it would be useful to use a test (or a chart) that works well in general (including nonnormal data) for jointly monitoring location and scale changes. The best known and most used rank test for location and scale is due to Lepage15 which is a combination of the Wilcoxon and Ansari-Bradley statistics (see for example, Gibbons and Chakraborti16 for a discussion of these and other rank tests). Mukherjee and Chakraborti2 proposed a chart for joint monitoring of location and scale based on the Lepage15 statistic, called the Shewhart-Lepage (SL) Chart. In contrast with the Lepage test, Cucconi1 proposed a distribution-free test based on the concepts of rank and status for testing the equality of two location and scale parameters. Despite some interesting features, which include its geometric interpretation (where it did not use the conventional quadratic form of a test on location and scale), easy computational form and comparative inferential features, the test has remained relatively unknown outside of the Italian scientific community. As a consequence, and as recently pointed out by Marozzi,17 it appears that most authors in the literature refer to the test by Lepage15 for joint testing the location and the scale parameters. Motivated by this, in this paper, we introduce a single distributionfree Shewhart-type chart based on Cucconi1 statistic, called the Shewhart-Cucconi (SC) chart. We also propose a follow-up procedure in case the chart gives an out-of-control (OOC) signal. Like the SL chart, the SC chart is also distribution-free, so that all of its in-control (IC) properties remain the same and known for all continuous distributions.

M

a

Army Institute of Management, Kolkata, India Department of Mathematics, Indian Institute of Technology Madras, Chennai 600036, India c University of Alabama, Tuscaloosa, USA *Correspondence to: S. Chakraborti, University of Alabama, Tuscaloosa, USA. † E-mail: [email protected] b

Copyright © 2013 John Wiley & Sons, Ltd.

Qual. Reliab. Engng. Int. 2013

S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI The rest of the paper is organized as follows. Section 2 provides a brief background including the Lepage chart. Statistical framework and preliminaries are outlined in Section 3. The proposed SC control chart is introduced in Section 4. Section 5 is devoted to the practical implementation of the chart including a tabulation of the control limits. Performance of the chart, along with a detailed comparison with the SL chart, is presented in Section 6 based on various run length distribution characteristics obtained via Monte-Carlo simulation. In Section 7, the effects of estimation of the parameters are studied by examining the influence of the size of the reference sample on the performance of the SC chart. An illustration is given in Section 8 with a data set from Montgomery.18 Section 9 concludes with a summary.

2. Background While studying the testing problem for the equality of the location and scale parameters of two continuous distributions in the distribution-free framework, Marozzi17 showed that the Cucconi1 test performs like the more well-known Lepage test in many cases and in fact is better in certain situations. Many nonparametric tests for jointly testing location and scale parameters are based on the combination of two tests, one for location and one for scale. The most popular combination is the sum of the respective squared standardized test statistics, as used in the Lepage test. The solution presented by Cucconi is different, as it is not a quadratic form combining a test for location and a test for scale differences, and it addresses the location-scale problem by considering the squares of ranks and ‘contrary ranks’. It is important to note that the Cucconi test may be computationally preferable as one only needs to compute the ranks of the observations from one of the samples in the combined sample, whereas for the Lepage test, one also needs to compute the Ansari-Bradley scores. Marozzi17 performed a detailed power simulation study including distributions of different shapes and showed that the Cucconi test maintains its size very close to the nominal level and is more powerful than the Lepage test in some situations. It was also seen that the presence of ties did not lower the performance of the Cucconi test contrary to the Lepage test. Motivated by these observations, we consider an adaptation of the Cucconi test to propose a control chart for the joint monitoring of location and scale parameters.

3. Statistical framework and preliminaries Let U1, U2, . . ., Um and V1, V2, . . ., Vn be the independent random samples from two populations with continuous cumulative distribution functions (cdf) F(x) and Gðx Þ ¼ F ðxθ d Þ; θ 2; d > 0; where F is some unknown continuous cdf. The constants θ and d represent the unknown location and scale parameter, respectively. Introduce an indicator variable Ik = 0 or 1 depending on whether or not the kth order statistic of the combined sample of N(=m + n) observations is a U or a V. Define the following statistics: T1 ¼

N X

kIk

(3:1)

k 2 Ik

(3:2)

k¼1

S1 ¼

N X k¼1

Clearly, T1 is the sum of the ranks of the Vi’s in the combined sample and represents the well-known Wilcoxon rank sum test statistic for location. Similarly, S1 represents the sum of the squares of the anti ranks of Vi’s in the combined sample. Further, note that, the quantities (N + 1 k)Ik, for k = 1, 2, . . ., N, may be considered as the ‘anti-ranks of Vi’s. The sum of squares of anti-ranks of Vi’s in the combined sample, say S2, is given by S2 ¼

N X

ðN þ 1 k Þ2 Ik ¼ nðN þ 1Þ2 2ðN þ 1ÞT1 þ S1

(3:3)

k¼1

In the IC case, θ = 0 and d = 1, so that F = G. In this case, it is well known (see Gibbons and Chakraborti16) that EðT1 jIC Þ ¼

nðN þ 1Þ mnðN þ 1Þ and VarðT1 jIC Þ ¼ 2 12

(3:4)

Also, it is easy to see that EðS1 jIC Þ ¼ EðS2 jIC Þ ¼

nðN þ 1Þð2N þ 1Þ 6

(3:5)

and VarðS1 jIC Þ ¼ VarðS2 jIC Þ ¼

mn ðN þ 1Þð2N þ 1Þð8N þ 11Þ 180

(3:6)

Define the standardized statistics Copyright © 2013 John Wiley & Sons, Ltd.


S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI S1 EðS1 jIC Þ 6S1 nðN þ 1Þð2N þ 1Þ U ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi mn VarðS1 jIC Þ 5 ðN þ 1Þð2N þ 1Þð8N þ 11Þ

(3:7)

S2 EðS2 jIC Þ 6S2 nðN þ 1Þð2N þ 1Þ V ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi mn VarðS2 jIC Þ 5 ðN þ 1Þð2N þ 1Þð8N þ 11Þ

(3:8)

Note that when θ > 0 and d = 1, E(U) > 0 and E(V) < 0; when θ = 0 and d > 1, E(U) > 0 and E(V) > 0; and in general, when θ1 6¼ 0 and d 6¼ 1, E(U) 6¼ 0 and E(V) 6¼ 0. Similar inequalities may be observed in other possible cases, when either θ differs from 0, or d differs from 1, in any direction. Also, E(U|IC) = E(V|IC) = 0 and V(U|IC) = V(V|IC) = 1. Marozzi17 argued that when F = G, the correlation coefficient between U and V equals r ¼ CorrðU; V jIC Þ ¼

2ðN2 4Þ 1 ð2N þ 1Þð8N þ 11Þ

(3:9)

Cucconi1 proposed an interesting rank statistic for simultaneous testing of shift in location and scale parameters, based on C¼

U2 þ V 2 2rUV 2ð1 r2 Þ

(3:10)

It is worth mentioning that since U is based on the sum of squares of the usual ranks and V is based on the sum of squares of the anti-ranks of the second sample, the labeling of the samples (first or second) does not matter while computing C. Note that the C statistic for the j-th test sample, denoted by Cj, according to (3.10), represents an equation of an ellipse whose minimum attainable value is 0. Further, larger values of Cj, may indicate a shift in location, or scale or both, that is, an OOC process.

4. Proposed charting procedure The proposed SC control chart is constructed as follows. Step 1. Collect and establish through some Phase-I analysis, a reference sample Xm = (X1,X2, . . .,Xm) of size m from an IC process. Step 2. Let Yj,n = (Yj1,Yj2, . . .,Yjn) be the jth Phase-II (test) sample of size n, j = 1, 2, . . .. Step 3. Identify the U’s with the X’s and the V’s with the Y’s, respectively. Calculate S1j, S2j using (3.2) and (3.3) for the j-th test sample. Step 4. Calculate Uj and Vj for the jth-subgroup using equations in (3.7) and (3.8). Step 5. Calculate the Cucconi plotting statistic Cj given in (3.10) for the jth test sample. Step 6. Plot Cj against an upper control limit (UCL) H(> 0). Step 7. If Cj exceeds H, the process is declared OOC at the jth test sample, and we move to Step 8. If not, the process is thought to be IC, and testing continues to the next test sample. Step 8. Follow-up: When the process is declared OOC at the jth test sample, compute the p-values for the Wilcoxon test for location and the Mood test for scale (see Gibbons and Chakraborti16) respectively, based on the two samples: one with the m Phase-I observations, and the other with the n observations from the jth test sample. Denote the p-values as p1 and p2 respectively. If p1 is very low but not p2, a shift in only location is indicated. If p1 is relatively high but p2 is low; only a shift in scale is suspected. If both pvalues are very low; a shift in both location and scale is declared. Note that it might happen that neither p1 nor p2 is very small even though Cj is high, either because of an interaction between the location and scale changes or because of a false alarm. In such a scenario, one can combine the jth and the (j1)th test samples and recalculate p1 and p2 to make a decision.

5. Practical implementation In order to implement the SC chart, we need H, the charting constant. This requires a discussion about the chart’s run length distribution and related characteristics such as the ARL first. 5.1. Run length distribution and ARL Let R be the random variable denoting the run length of the proposed SC control chart. Note that, Cj, j = 1, 2, . . ., are not independent. However, given the Phase-I sample, Xm, the Cj, j = 1, 2, . . ., . . . are independent. Therefore, following the same argument as in Mukherjee and Chakraborti,2 the conditional run length distribution is geometric with success probability Prob[Cj > H|Xm] where success is the OOC signal. Therefore all moments, percentiles, etc., of the conditional run length distribution can be obtained directly from the properties of the geometric distribution, e.g. the conditional average run length (ARL) is given by Prob½C 1>HjX . Hence, all properties of the unconditional run j

m

length distribution can be obtained by averaging over the distribution of the reference sample, e.g. the unconditional ARL (UARL), which is h i the familiar ARL, can be found from E Prob½C 1>HjX . It is worth mentioning that both the conditional and the unconditional run length j

m

distributions provide useful information about the performance of the chart. Copyright © 2013 John Wiley & Sons, Ltd.


S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI Thus, by writing U(Xm,H) = Prob[Cj ≤ H|Xm], for any j = 1, 2,., the unconditional run length distribution is given by ProbðR ¼ r Þ ¼ E ½U ðXm ; HÞr1 ½1 U ðXm ; HÞ ¼ E ½U ðXm ; HÞr1 E ½U ðXm ; HÞr ; r ¼ 1; 2; . . .

(5:1)

Moments of the unconditional run length distribution can be conveniently calculated by conditioning on the reference sample and using properties of the geometric distribution. For example, the ARL is given by Z 1 1 ¼ dF ðXm Þ ARL ¼ E (5:2) ð 1 U ðXm ; HÞ 1 U Xm ; HÞ Xm The IC ARL can be obtained from (5.2), substituting F = G Z 1 1 ¼ dF ðXm Þ ARL0 ¼ EIC 1 U ðXm ; HÞ 1 U F¼G ðXm ; HÞ Xm

(5:3)

Other moments and percentiles of the run length distribution can be calculated in a similar manner. 5.2. Determination of H Using the analytical form of the IC ARL in (5.3), we can set ARL0 equal to some desired (nominal) value, say, ARL0 , and determine the constant H by solving Z 1 (5:4) ARL0 ¼ dF ðXm Þ 1 U F¼G ðXm ; HÞ Xm Such a formulation is direct in principle but difficult to implement in practice because there is no standard form for the integral in (5.4). An alternative is to use numerical computations based on Monte-Carlo simulations, and we use this approach here. Since the chart is distribution free, for the Phase-I sample, we generate m samples from the standard normal distribution, and for each test sample, we generate n observations from the same distribution. We use the R.2.14.1 software and 50,000 replicates which provide a pretty stable estimate of the ARL0 and other percentiles of the IC run length distribution. As in Mukherjee and Chakraborti,2 we chose m = 30, 50, 100 and 150 for the reference sample size and n = 5, 11 and 25, respectively, for the test sample size. The results are shown in Table I. To find H, for any given pair of (m,n) values, a search is conducted with different values of H, and that value of H is obtained for which the ARL0 is equal to a nominal (target) value. The third to the fifth columns of Table I give the required H values for target ARL 0 = 250, 370 and 500, respectively. Thus, for example, when 30 reference observations and test samples of size 5 are available and an ARL0 of 500 is desired, the UCL H for the SC chart is given by 4.48. One can easily verify using Monte-Carlo and using the same but any other non-normal distribution for drawing the Phase-I and Phase-II samples, that for a given combination of (m, n, ARL0) the same H values as in Table I are valid. This also justifies the nonparametric nature of the proposed chart. Further, in Figure 1A through 1D, we show the IC ARL values (profiles) for various H values for certain given m, for three choices of n, (n = 5, 11 and 25). These figures can be used to estimate the necessary H value for other target ARL0 values such as 300 and 400, respectively. This is discussed further in the next paragraph. Interestingly, we see from Figures 1A through 1D that the ARL0 profiles become steeper when n increases for a given m, for larger values of H. However, for smaller values of H, profiles with smaller n have larger slopes for a given m. Further, for a given number of reference sample observations m(> 30), we need a smaller H value to attain a specified ARL0 value, for increasing n. Only a minor

Table I. Charting constant H for the SC chart, for various values of m and n, and for some standard (target) values of ARL0 Reference sample size Test sample size The charting constant (upper control limit): H M 30 30 30 50 50 50 100 100 100 150 150 150


n

Target ARL0 = 250

Target ARL0 = 370

Target ARL0 = 500

5 11 25 5 11 25 5 11 25 5 11 25

3.99 4.09 3.78 4.62 4.38 4.25 5.22 4.79 4.73 5.44 5.01 4.94

4.26 4.30 4.01 4.97 4.63 4.50 5.64 5.11 5.03 5.90 5.38 5.25

4.48 4.45 4.18 5.25 4.80 4.70 5.98 5.34 5.25 6.25 5.67 5.49


S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI

A : m = 30

B : m = 50

C : m = 100

D : m = 150

Figure 1. ARL0 profiles for the different choices of n and H for some specified m

deviation from this pattern is observed in case of m = 30 and ARL0 = 370 as can be seen from Table I. We observe that, in general, H increases with m when both ARL0 and n are held fixed. The Monte-Carlo simulations show that the H values become more stable when both m and n get larger. 5.3. Approximation for H in large samples Earlier, we have mentioned that Table I and Figure 1 are useful for determining H, for small to moderately large values of m and n. Now, we consider how to obtain the necessary H for any given m, n that are not covered by the Table I or Figure 1. In fact, it may be worthwhile to consider an approximation to H for general use in practice. This may be a reasonable approach to consider in some SPC applications today where large volumes of data may be available so that the time spent in determining the ‘exact’ H may be saved. To this end, it is not difficult to establish that as m and n get larger

U ðXm ; HÞ ! E ProbIC Cj ≤HjXm Þ ¼ ProbIC Cj ≤ H (5:5) in probability. Therefore, for large m, n 1 1 (5:6) ! ARL0 ¼ EXm;IC 1 U ðXm ; HÞ ProbIC Cj > H in probability. Now, Cucconi1 and Marozzi20 claimed that as m, n tend to 1, so that Z Prob½U≤u; V≤v jIC

v 1

Z

u 1

! l 2 ð0; 1Þ, ! 1 w 2 þ z2 2r0 wz pffiffiffiffiffiffiffiffiffiffiffiffiffi exp dwdz 2 1 r20 2p 1 r20 m mþn

(5:7)

Therefore, using a property of the exponent of a bivariate normal distribution (see, for example, Johnson and Kotz19), we can approximate the large sample IC distribution of 2C as a chi-square with 2 degrees of freedom. Hence, Copyright © 2013 John Wiley & Sons, Ltd.


S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI Prob Cj > HjIC Þ ¼ Prob 2Cj > 2HjIC Þ expðHÞ

(5:8)

As a consequence, using (5.4) and (5.6), a large sample approximation to H is equal to lnARL0. Thus, for example, for ARL0 ¼ 250; 370 or 500, the approximate H values are found to be 5.52, 5.91 and 6.21, respectively. A Monte-Carlo simulation study reveals that for m = 200, n = 5 and H = 5.52, the estimated ARL0 is 242.04 with SDRL0 = 298.68, where SDRL0 denotes the IC standard deviation of the run length distribution. Similarly, for the same m and n with H = 5.91, the estimated ARL0 is 337.69 with SDRL0 = 431.18; and when H = 6.21, the estimated ARL0 is 435.15 with SDRL0 = 560.80. Thus, the actual ARL0 attained is slightly less than the target while using the approximate H when m = 200. In other words, the Monte Carlo H values are a little higher than the approximate H values when m = 200, n = 5. On the other hand, Table I indicates that the large sample approximations to H are only marginally higher than the tabulated values for m = 150, n = 5. These provide evidence that for practical purposes, the large sample approximation may be satism 150 200 factory when m ≥ 150, and the ratio mþn is around 150þ5 ¼ 0:9678 but not exceeding 200þ5 ¼ 0:9756. As a rule of thumb, we conclude that the large sample results are useful when the reference sample size (m) is 30 to 35 times larger than the test sample size with (n ≥ 5).

6. Performance analysis The performance of a control chart is commonly studied via its run length distribution, and because the run length distribution is skewed to the right, it is useful to look at various summary measures such as the mean, the standard deviation (SD) and several percentiles including the first and the third quartiles to characterize the distribution. We consider both the IC and the OOC setup, respectively, in Subsection 6.1 and 6.2. 6.1. IC performance of the SC chart For the IC setup, we simulate both the reference and the test samples from the standard normal distribution. We chose m = 30, 50, 100, and 150 and n = 5, 11, and 25, as before. For a given pair of (m,n) values, we obtain H from Table I, for a nominal (target) ARL0 of 500 and simulate various characteristics of the IC run length distribution. The findings are shown in Table II. It is seen that the IC run length distribution is highly skewed to the right; the target ARL0 500 is much larger than the medians for all m, n combinations. When m = 100 or 150, the median is about half of the target ARL0 = 500. As the reference sample size m increases from 30 to 150, for a fixed n, all the percentiles including the median increase except the 95th percentile and the SD decreases. The 95th percentile is seen to be more or less stable around 3.5 to 4 times the target ARL0 = 500. It is also seen that in the majority of the cases, the third quartiles of the run length distributions are closer to the nominal IC ARL0 of 500. This indicates that the IC run length distribution of the proposed chart, like the same for many other control charts, is heavily skewed with a long right tail. 6.2. OOC performance of the SC chart in comparison to SL chart In order to investigate the OOC performance of proposed SC chart, we consider two popular distributions under the general location-scale n o exp 2d1 ðu θÞ2 over the support u 2 (1, 1); family: (i) the thin-tailed symmetric normal distribution (N(θ,d)) having the pdff ðuÞ ¼ dp1ffiffiffiffi 2p 2

(ii) the heavy-tailed symmetric Laplace distribution (Laplace(θ,d)) having pdf f(u) = (1/(2d))exp(|u θ|/d) over the support u 2 (1, 1). We examine the performance characteristics of the run length distribution when the IC sample in each case is taken from the corresponding standard distribution with θ = 0 and d = 1. Thus for case (i), the samples are taken from a N(θ,d) distribution, with the IC sample coming from

Table II. IC performance characteristics of the SC chart for ARL0 = 500 Simulated values m

n

H

ARL0

SDRL0

5th Percentile

1st Quartile

Median

3rd Quartile

95th Percentile

30 30 30 50 50 50 100 100 100 150 150 150

5 11 25 5 11 25 5 11 25 5 11 25

4.48 4.45 4.18 5.25 4.80 4.70 5.98 5.34 5.25 6.25 5.67 5.49

516.4 505.8 492.2 497.3 492.5 502.3 509.4 496.2 506.7 508.2 496.8 496.1

1834.7 981.2 998.2 1235.3 960.7 866.9 788.4 886.1 743.8 723.9 821.2 688.5

6 7 5 10 8 8 15 12 12 18 15 15

37 51 39 61 57 56 95 74 78 107 87 89

118 176 147 179 181 192 253 210 243 277 237 254

371 535 500 479 522 566 604 547 621 629 575 622

1946 2077 2168 1824 1969 2046 1831 1900 1899 1737 1780 1780



S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI a N(0,1) distribution. To examine the effects of shifts in the mean and the variance, 30 combinations of (θ,d) values are considered viz. θ = 0, 0.25, 0.5, 1, 1.5 and 2 and d = 1, 1.25, 1.5, 1.75, and 2.0, respectively. To facilitate a sound comparison, results are also obtained for the SL chart of Mukherjee and Chakraborti2 for the same combinations of shifts. For brevity, only the results for n = 5 are presented in Table III. In case (ii), we examine the chart performance characteristics for the heavy-tailed and symmetric Laplace distribution for both the SC and SL charts, using the same combinations of the reference and test sample sizes and location and scale parameters (θ and d), with the IC sample coming from a Laplace(0,1) distribution. These results are shown in Table IV. Note that in Tables III and IV, the first row of each of the cells shows the ARL and (SDRL) values, whereas the second row shows the 5th, 25th, 50th, 75th, 95th percentiles (in this order). In general, the simulation results reveal that the OOC run length distributions are also skewed to the right, and this can be observed from Tables III and IV for both charts. Moreover, except for minor sampling fluctuations, for a fixed m, n and a given ARL0, the OOC ARL values as well as the percentiles all decrease sharply with the increasing shift in the location and also with the increasing shift in the scale. This (expected) phenomenon is seen for both the SC and SL charts, and this indicates that both distribution-free charts are reasonably effective in detecting shifts in the location and/or in the scale. However, the effectiveness of the chart (speed of detection) varies depending on the type of shift and the type of chart being considered. Both SC and SL charts detect a shift in the scale faster than that in the location. For example, from Table III, we see that for a 25% increase in the location parameter (θ) when the scale parameter (d) is in IC, there is about a 42% reduction in the ARL for both SC and SL charts with m = 50, and nearly a 49% reduction in ARL when m = 100. However, for a 25% increase in the scale parameter when the location parameter is in IC, there is about 86% reduction in ARL for SC chart and about 79% reduction for the SL chart with m = 50. Almost similar observations can be made with m = 100. Finally, when both the location and scale parameters increase by 25%, the ARL decreases by 89.3% for the SC chart while the ARL decreases by 85.3% in SL chart for m = 50. The pattern is quite similar for the SDRL for both charts. For example, we see that for a 25% increase in the location when the scale is IC, it also decreases for an increase in the shift in both parameters but decreases at a faster rate for a shift in the scale parameter. For example, when m = 100, for a 25% increase only in the location, the SDRL decreases by 42.1% in SC chart and 44.5% for SL chart, respectively, while for a 25% increase in the scale parameter only, the SDRL decreases by 88.3% for SC chart and 83.2% for SL chart, respectively. Further, we note from Table III that for the normal distribution, the SC chart clearly outperforms the SL chart if there is a shift in the scale parameter along with some shift in location. For example, when d = 1.25, the SC chart has a shorter ARL than that of the SL chart. The ARL is shorter by 24.7% (when θ = 0) to 22.43% (when θ = 0.25) to 12.66% (when θ = 0.5). As θ increases, the two charts become more or less equally effective; that is, their ARL values become close to each other. On the other hand, if a shift in location takes place while the scale parameter is IC or if the shift in the location parameter is very large accompanied by some shift in the scale parameter, the SL chart may be slightly better or that the two charts are almost equally effective. For example, when d = 1, and θ = 0.25, the ARL of the SL chart is 403.1 which is 3.8% smaller than that of the SC chart, whose ARL is 419, but when θ = 2.0, the two ARL’s are 2.1 and 2.0, respectively. The nonparametric charts are IC robust by definition; that is, their IC performance remains the same for all continuous distributions. However, it is useful to examine their OOC performance for underlying distributions that have tails heavier than the normal since heavier-tailed symmetric distributions under the location-scale family, such as the Laplace distribution, arises in applications where extreme values can occur with higher probability. Keeping this in mind, we first repeat the simulation study with data from the Laplace distribution. The performance characteristics of the run length distribution were evaluated when the IC sample is from a Laplace (0,1) distribution that has a mean of 0 and a variance of 2, but the test samples are from a Laplace (θ,d) distribution, where θ denotes the location (the median or the mean) and d denotes the scale. Recall that for a Laplace distribution with scale parameter d, variance= 2d2. To study the impact of a shift in the location and scale, as in the normal case, we studied the same 30 combinations of θ and d values. From Table IV, it is seen that for the Laplace distribution, the general patterns in the OOC ARL values remain the same as in the case of the normal distribution, but the magnitudes of the ARL values are slightly higher for a similar shift in the location or in the scale parameter, indicating a slightly slower detection of shifts under the heavier tailed distribution. For example, when m = 100 and the location increases 25% to 0.25 and the SD increases 50% to 1.5, the ARL is 42.1 compared to 20.4 in the normal case for the SC chart. Moreover, the percentiles as well as the SDRL values all increase under the Laplace distribution. However, since a shift in the scale parameter in a normal distribution and in a Laplace distribution result in different amount of shifts in the process SD, with Laplace registering a higher shift in process SD, such comparisons are not very meaningful, in general. In summary, Table IV reveals that if there is a small shift in the location along with some shift in the scale parameter, the proposed SC chart outperforms the SL chart. However, the SL chart performs slightly better than the SC chart if only a shift in the location parameter takes place while the scale parameter remains IC. If there is a moderately high shift in location accompanied by a shift in scale, the SL chart still works better than the SC chart for heavy-tailed distributions. If the amount of shift in location is large, both charts are equally effective.

7. Effects of estimation and the reference sample size Estimation of parameters using a reference sample can seriously impact the performance of a Phase-II chart. To investigate the extent of the impact, first consider the IC case, and note that although the proposed SC chart is calibrated to achieve an unconditional IC ARL equal to some nominal value, say, 500, the actual performance of the chart (the attained IC ARL) would depend on the ‘quality’ of the particular Phase-I (the reference) sample in a specific application. Thus, it would be helpful to examine the variation in the attained IC ARL. Mukherjee and Chakraborti2 examine this issue in the context of the SL chart and show that the attained IC ARL can vary, significantly, from the nominal value even though the chart is designed for that nominal IC ARL and the Phase-I sample is obtained from an IC process. To investigate the same question for the SC chart, we study the conditional IC ARL and the SDRL taking a Phase-I sample of various sizes, m = 30, 50, 100 Copyright © 2013 John Wiley & Sons, Ltd.


S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI Table III. Performance comparisons between the SC and SL charts for the Normal (θ, d) distribution with ARL0 = 500 m = 50, n = 5 m = 100, n = 5 θ

d

0

1

0.25

1

0.5

1

1.0

1

1.5

1

2

1

0

1.25

0.25

1.25

0.5

1.25

1.0

1.25

1.5

1.25

2

1.25

0

1.5

0.25

1.5

0.5

1.5

1.0

1.5

1.5

1.5

2

1.5

0

1.75

0.25

1.75

0.5

1.75

1.0

1.75

1.5

1.75

2

1.75

0

2

0.25

2

0.5

2

1.0

2

SC Chart

SL Chart

SC Chart

SL Chart

497.3 (1235.3) 10, 61, 179, 479, 1824 288.6 (731.8) 5, 32, 97, 273, 1099 92.2 (285.0) 2, 11, 31, 84, 336 8.5 (14.3) 1, 2, 4, 10, 28 2.2 (2.1) 1, 1, 1, 3, 6 1.2 (0.6) 1, 1, 1, 1, 2 71.1 (116.2) 3, 13, 35, 81, 252 53.5 (86.8) 2, 10, 26, 60, 196 27.6 (43.6) 1, 6, 14, 32, 97 6.6 (8.9) 1, 2, 4, 8, 21 2.4 (2.1) 1, 1, 2, 3, 6 1.4 (0.8) 1, 1, 1, 2, 3 22.8 (29.3) 1, 6, 13, 28, 78 19.7 (26.0) 1, 5, 11, 25, 65 13.3 (16.0) 1, 3, 8, 17, 44 5.2 (5.6) 1, 2, 3, 7, 15 2.4 (2.1) 1, 1, 2, 3, 7 1.5 (0.9) 1, 1, 1, 2, 3 10.9 (12.4) 1, 3, 7, 14, 34 10.0 11.2 1 3 6 13 31 8.1 (8.9) 1, 2, 5, 10, 25 4.4 (4.4) 1, 1, 3, 6, 13 2.5 (2.0) 1, 1, 2, 3, 6, 1.6 (1.0) 1, 1, 1, 2, 4 6.6 (6.6) 1, 2, 4, 9, 20 6.2 (6.3) 1, 2, 4, 8, 19 5.5 (5.6) 1, 2, 4, 7, 16 3.7 (3.4)

499.6 (918.9) 13, 78, 215, 534, 1886 292.7 (641.3) 7, 40, 116, 308, 1124 94.7 (253.9) 2, 12, 34, 91, 351 9.3 (18.6) 1, 2, 5,11, 31 2.3 (2.2) 1, 1, 2, 3, 6 1.3 (0.6) 1, 1, 1, 1, 2 106.2 (197.8) 4, 21, 54 124, 369 73.6 (116.1) 3, 14, 37, 86, 261 35.4 (55.5) 2, 7, 18, 42, 123 7.4 (9.3) 1, 2, 4, 9, 23 2.6 (2.4) 1, 1, 2, 3, 7 1.4 (0.9) 1, 1, 1, 2, 3 36.82 (46.98) 2, 9, 22, 47, 118 30.64 (38.81) 2, 7, 18, 39, 102 19.0 (24.77) 1, 5, 11, 24, 64 6.5 (7.2) 1, 2, 4, 8, 19 2.8 (2.5) 1, 1, 2, 4, 8 1.6 (1.1) 1, 1, 1, 2, 4 18.5 (20.7) 1, 5, 11, 24, 59 16.7 (20.4) 1, 4, 11, 22, 53 12.1 (13.9) 1, 3, 8, 16, 37 5.7 (5.8) 1, 2, 4, 7, 17 2.9 (2.6) 1, 1, 2, 4, 8 1.8 (1.2) 1, 1, 1, 2, 4 11.3 (12.3) 1, 3, 7, 15, 34 10.3 (11.0) 1, 3, 7, 14, 32 8.5 (9.0) 1, 3, 6, 11, 25 4.9 (4.8)

509.4 (788.4) 15, 95, 253, 604, 1831 253.6 (456.4) 8, 43, 116, 285, 935 68.6 (117.2) 3, 14, 34, 80, 239 7.7 (9.4) 1, 2, 5, 10, 24 2.1 (1.7) 1, 1, 1, 3, 5 1.2 (0.5) 1, 1, 1, 1, 2 74.5 (92.2) 3, 18, 45, 96, 243 54.9 (70.8) 3, 13, 32, 69, 183 26.2 (32.4) 1, 7, 15, 34, 86 6.2 (6.6) 1, 2, 4, 8, 19 2.4 (2.0) 1, 1, 2, 3, 6 1.3 (0.7) 1, 1, 1, 2, 3 24.3 (27.1) 2, 6, 16, 32, 76 20.4 (22.7) 1, 6, 13, 27, 64 13.4 (15.1) 1, 4, 9, 17, 42 5.3 (5.3) 1, 2, 4, 7, 15 2.4 (1.9) 1, 1, 2, 3, 6 1.5 (0.9) 1, 1, 1, 2, 3 11.7 (12.0) 1, 3, 8, 16, 36 10.7 (11.2) 1, 3, 7, 14, 32 8.4 (8.9) 1, 3, 6, 11, 25 4.4 (4.1) 1, 2, 3, 6, 12 2.4 (1.9) 1, 1, 2, 3, 6 1.6 (1.0) 1, 1, 1, 2, 4 7.1 (6.9) 1, 2, 5, 10, 21 6.8 (6.6) 1, 2, 5, 9, 20 5.8 (5.7) 1, 2, 4, 8, 17 3.8 (3.4)

513.0 (738.9) 18, 106, 276, 635, 1792 257. 6 (410.3) 9, 47, 127, 303, 917 66.5 (98.6) 3, 13, 35, 80, 237 7.7 (8.8) 1, 2, 5, 10, 24 2.1 (1.7) 1, 1, 2, 3, 5 1.2 (0.5) 1, 1, 1, 1, 2 102.9 (124.1) 5, 25, 62, 133, 337 70.6 (92.3) 3, 17, 41, 89, 232 30.9 (38.8) 2, 8, 18, 40, 101 6.7 (7.0) 1, 2, 4, 9, 20 2.5 (2.1) 1, 1, 2, 3, 7 1.4 (0.8) 1, 1, 1, 2, 3 37.5 (42.2) 2, 10, 24, 50, 118 29.9 (34.2) 2, 8, 19, 39, 91 17.8 (19.7) 1, 5, 12, 24, 55 6.1 (6.1) 1, 2, 4, 8, 18 2.7 (2.2) 1, 1, 2, 3, 7 1.6 (1.0) 1, 1, 1, 2, 4 19.1 (20.3) 1, 5, 13, 26, 59 16.4 (17.2) 1, 5, 11, 22, 50 12.1 (12.5) 1, 4, 8, 16, 37 5.5 (5.2) 1, 2, 4, 7, 16 2.8 (2.4) 1, 1, 2, 4, 7 1.8 (1.2) 1, 1, 1, 2, 4 11.5 (11.9) 1, 3, 8, 15, 35 10.8 (10.9) 1, 3, 7, 15, 32 8.6 (8.5) 1, 3, 6, 11, 25 4.8 (4.5) (Continues)



S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI Table III. (Continued) m = 50, n = 5 θ

d

1.5

2

2

2

m = 100, n = 5

SC Chart

SL Chart

SC Chart

SL Chart

1, 1, 3, 5, 10 2.4 (1.9) 1, 1, 2, 3, 6 1.7 (1.1) 1, 1, 1, 2, 4

1, 2, 3, 7, 14 2.9 (2.5) 1, 1, 2, 4, 8 1.9 (1.4) 1, 1, 1, 2, 5

1, 1, 3, 5, 11 2.4 (1.9) 1, 1, 2, 3, 6 1.7 (1.1) 1, 1, 2, 3, 6

1, 2, 3, 6, 14 2.9 (2.5) 1, 1, 2, 4, 8 1.9 (1.3) 1, 1, 1, 2, 4

Table IV. Performance comparisons between the SC and SL charts for the Laplace (θ, d) distribution with ARL0 = 500 m = 50, n = 5 m = 100, n = 5 θ

d

0

1

0.25

1

0.5

1

1.0

1

1.5

1

2

1

0

1.25

0.25

1.25

0.5

1.25

1.0

1.25

1.5

1.25

2

1.25

0

1.5

0.25

1.5

0.5

1.5

1.0

1.5

1.5

1.5

2

1.5

0

1.75

0.25

1.75

0.5

1.75

1.0

1.75

SC Chart

SL Chart

SC Chart

SL Chart

492.7 (1088.2) 11, 61, 179, 488, 1917 419.0 (1072.6) 7, 43, 132, 375, 1648 240.4 (733.2) 3, 20, 63, 194, 963 41.4 (149.1) 1, 4, 10, 29, 150 7.2 (53.9) 1, 1, 3, 6, 23 2.1 (3.4) 1, 1, 1, 2, 5 118.0 (222.5) 4, 20, 52, 129, 429 99.9 (197.3) 3, 16, 43, 106, 370 69.7 (150.5) 2, 10, 27, 69, 262 20.1 (69.3) 1, 3, 7, 18, 69 5.1 (10.7) 1, 1, 3, 5, 17 2.1 (2.6) 1, 1, 1, 2, 6 43.3 (63.1) 2, 9, 23, 53, 150 39.2 (61.0) 2, 8, 20, 46, 141 29.3 (48.2) 1, 6, 15, 34, 104 12. 0 (28.0) 1, 2, 6, 13, 41 4.5 (7.2) 1, 1, 3, 5, 14 2.2 (2.2) 1, 1, 1, 3, 6 22.8 (30.8) 1, 6, 13, 29, 74 20.1 (27.0) 1, 5, 11, 25, 66 16.7 (24.6) 1, 4, 9, 20, 55 8.5 (12.6)

493.2 (851.0) 13, 79, 220, 551,1853 403.1 (830.1) 8, 52, 156, 426, 1596 235.2 (674.5) 4, 21, 66, 200, 938, 36.1 (153.7) 1, 4, 9, 26, 123 5.93 (20.3) 1, 1, 3, 5, 18 2.0 (2.9) 1, 1, 1, 2, 5 156.8 (263.1) 5, 28, 75, 179, 557 128.8 (239) 4, 22, 60, 146, 467 79.8 (155.6) 2, 11, 32, 86, 303 19.9 (72.3) 1, 3, 8, 19, 68 5.2 (10) 1, 1, 3, 6, 16 2.2 (3.2) 1, 1, 1, 3, 6 65.9 (91.6) 3, 14, 36, 81, 226 59.1 (92.6) 2, 12, 31, 69, 208 42.1 (76.2) 2, 8, 20, 48, 152 14.2 (37.3) 1, 3, 7, 15, 48 4.7 (6.9) 1, 1, 3, 5, 15 2.3 (2.3) 1, 1, 2, 3, 6 35.6 (46.2) 2, 8, 21, 45, 119 33.0 (45.2) 2, 7, 18, 41, 110 24.5 (33.7) 1, 5, 13, 30, 84 10.4 (15.5)

509.6 (817.7) 15,93,251,589,1909 381.6 (758.8) 11,63,175,429,1351 191.0 (482.9) 5, 25, 73, 191, 722 26.5 (86.1) 1, 4, 11, 27, 95 4.8 (7.8) 1, 1, 3, 5, 16 1.8 (1.5) 1, 1, 1, 2, 5 124.5 (173.2) 5,27,68,150,437 100.6 (145.6) 4, 21, 54, 122, 345 61.7 (101.9) 3, 12, 31, 73, 215 14.6 (22.2) 1, 3, 8, 17, 50 4.4 (5.9) 1, 1, 3, 5, 14 2.0 (1.7) 1, 1, 1, 2, 5 47.8 (59.7) 2, 12, 29, 61, 156 42.1 (52.4) 2, 10, 25, 53, 139 29.6 (39.1) 2, 7, 17, 37, 101 10.7 (15.2) 1, 3, 6, 13, 34 4.0 (4.2) 1, 1, 3, 5, 12 2.1 (1.7) 1, 1, 1, 3, 5 24.4 (29.8) 2, 6, 15, 31, 77 22.0 (25.7) 1, 6, 14, 28, 71 16.9 (20.8) 1, 4, 10, 22, 54 7.9 (9.5)

508.3 (728.1) 18, 106, 277, 626, 1742 366.9 (589.4) 11, 64, 177, 440, 1318 159.2 (286.2) 5, 25, 68, 174, 608 19.9 (37.3) 1, 4, 9, 21, 73 4.1 (5.5) 1, 1, 2, 5, 12 1.7 (1.4) 1, 1, 1, 2, 4 153.2 (198.4) 6, 35, 88, 197, 512 121.5 (161.6) 5, 26, 68, 153, 422 66.19 (103.7) 3, 13, 35, 78, 227 14.0 (19.6) 1, 3, 8, 17, 47 4.2 (5.6) 1, 1, 3, 5, 12 2.0 (1.5) 1, 1, 1, 2, 5 66.8 (80.5) 4, 17, 41, 86, 218 55.2 (65.6) 3, 14, 33, 72, 184 36.8 (46.7) 2, 8, 22, 46, 124 11.1 (14.2) 1, 3, 7, 14, 36 4.1 (4.4) 1, 1, 3, 5, 12 2.2 (1.8) 1, 1, 1, 3, 6 36.4 (40.5) 2, 10, 23, 48, 116 32.7 (36.8) 2, 9, 21, 43, 103 23.2 (27.4) 1, 6, 15, 30, 75 9.2 (10.9) (Continues)



S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI Table IV. (Continued) m = 50, n = 5 θ

d

1.5

1.75

2

1.75

0

2

0.25

2

0.5

2

1.0

2

1.5

2

2

2

m = 100, n = 5

SC Chart

SL Chart

SC Chart

SL Chart

1, 2, 5, 10, 28 4.0 (5.0) 1, 1, 2, 5, 12 2.2 (2.3) 1, 1, 1, 3, 6 13.8 (17.4) 1, 4, 8, 17, 44 13.1 (16.6) 1, 3, 8, 17, 42 11.1 (14.2) 1, 3, 7, 14, 37 6.5 (8.0) 1, 2, 4, 8, 21 3.5 (4.1) 1, 1, 2, 4, 10 2.2 (2.0) 1, 1, 1, 3, 6

1, 3, 6, 12, 35 4.5 (6.3) 1, 1, 3, 5, 14 2.4 (2.3) 1, 1, 2, 3, 7 22.1 (26.6) 1, 6, 14, 28, 71 20.7 (27.1) 1, 5, 12, 26, 69 17.0 (22.3) 1, 4, 10, 21, 55 8.6 (10.5) 1, 2, 5, 11, 28 4.3 (4.9) 1, 1, 3, 5, 13 2.5 (2.3) 1, 1, 2, 3, 7

1, 2, 5, 10, 25 3.7 (3.8) 1, 1, 2, 5, 11 2.1 (1.7) 1, 1, 2, 3, 5 14.5 (16.2) 1, 4, 9, 19, 45 13.6 (15.1) 1, 4, 9, 18, 43 11.3 (12.4) 1, 3, 7, 15, 35 6.3 (6.7) 1, 2, 4, 8, 19 3.5 (3.4) 1, 1, 2, 4, 10 2.1 (1.7) 1, 1, 2, 3, 5

1, 3, 6, 12, 29 4.0 (4.1) 1, 1, 3, 5, 12 2.3 (1.9) 1, 1, 2, 3, 6 22.9 (25.3) 2, 6, 15, 31, 71 21.1 (23.5) 1, 6, 14, 28, 66 16.6 (18.1) 1, 5, 11, 22, 51 7.9 (8.7) 1, 2, 5, 10, 25 3.9 (3. 9) 1, 1, 3, 5, 11 2.3 (1.9) 1, 1, 2, 3, 6

and 150, respectively, with n = 5 and H determined as earlier. To this end, 50,000 different Phase-I samples, each of size m, are generated from a standard normal distribution, and from these, the IC ARL of the SC chart is calculated. Even though the samples all come from an IC process, they vary as expected, and the differences among them can be expressed in terms of the respective sample mean and the sample variance. In order to do a systematic investigation of the impact of the reference sample on the IC performance of the chart, it is convenient to simulate observations from the distribution of the sample mean, which is known to be normal, by classifying the values of the reference sample mean into one of seven categories: (i) 5th percentile or lower (so the mean has a very high downward bias); (ii) between the 5th and the 25th percentiles, (so the mean has moderately high downward bias); (iii) between the 25th and the 45th percentiles, (so there is relatively low downward bias in the mean); (iv) between the 45th and the 55th percentiles, (so the mean nicely represents the true mean), (v) between the 55th and the 75th percentiles, (so the mean has relatively low upward bias); (vi) between the 75th and the 95th percentiles, (so the mean has moderately high upward bias); and (vii) beyond the upper 5th percentile (so the mean has very high upward bias). Seven similar classifications can be made based on the values of the reference sample SD with respect to its sampling distribution and various percentiles. Therefore, the entire 50,000 replications of the simulation study may be categorized in a 77 way table depending on the values of the observed Phase-I sample mean and SD. The run length distribution, in each of these 49 cells, is studied by recording the proportion of observations in a particular cell along with the ARL and the SDRL of that cell. Note that the ARL and SDRL values in each cell are the conditional ARL and SDRL values, respectively, given the specified percentiles of the Phase-I sample mean and SD with respect to their distributions. Findings for m = 50, 100 and 150 are presented in Tables V. To understand the effects of estimation on the performance of the chart, for a nominal ARL0 of 500, we consider four categories of the conditional ARL0: (i) less than 100, which gives rise to early false alarms at a much higher rate; (ii) greater than or equal to 100 but less than 250, which leads to a moderately high false alarm rate; (iii) between 250 and 900, both values included, where a strong control over the false alarm rate is achieved without much compromise on the sensitivity of detecting a true shift; and (iv) greater than 900 which controls early false alarms but largely at the expense of the possibility of an early detection of a true shift. Note that from the current stand point in practice, it seems desirable that the conditional ARL0 belongs to the third category; that is, the conditional ARL0 lies in the closed interval [250, 900]. We call this interval the ‘safe’ interval. From Table V, we see that if the observed mean of the reference sample lies between the first quartile and the 95th percentile and the observed SD lies between the 45th and the 75th percentiles of their respective sampling distributions, the conditional ARL0 lies within the ‘safe’ interval. Interestingly, if the observed sample mean lies below the 1st quartile or above the 3rd quartile and the sample SD lies between the 75th and the 95th percentiles, then also the conditional ARL0 belongs to the safe interval. Cells with certain combinations of the observed reference sample mean and SD for which the safe region may be attained is marked with a darker gray shading. On the other hand, a lighter gray shading denotes the combinations which ensure that the conditional ARL0 lies within [100,250), the interval of moderate risk. This region forms a nice border above the darker shaded region. Any cells above the lighter shaded region without any shading implies that the conditional ARL0 is less than 100 and indicates that those combinations of observed mean and SD of the reference sample might lead to an early false alarm. Table V clearly shows that whatever be the observed values of the mean, if the SD of the reference sample comes from the lower 5 percentile point of its distribution, the conditional ARL0 is less than 100, which is of course a problem. Again, a cell under the un-shaded region below the darker gray region displays the conditional ARL0 values greater than 900 and a serious threat of delayed or no detection of a true shift. Therefore, in working with m as small as 50, one has to be careful and has to establish the suitability of the reference sample with a proper Phase-I analysis. A Copyright © 2013 John Wiley & Sons, Ltd.


S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI Table V. Conditional ARL0, SDRL0 and the proportion of samples belonging to various categories of the distribution of sample mean and variance defined by percentiles for m = 50, m = 100 and m = 150 . For each cell, the first value (in italics) shows the proportion, the middle value (in bold) shows the conditional ARL0 and the last value (within the brackets) shows the conditional SDRL0 m=50

Phase-I analysis is a broad area of quality monitoring research and has been studied by several authors; see, for example the review paper by Chakraborti et al.21 The bottom line is that if the reference sample is chosen suitably and tested appropriately, even m = 50 may be considered adequate for the Phase-II SC chart without much problem. In general, we see that for a given range of values of the mean of the reference sample, the larger the SD, the larger the conditional ARL0. By contrast, given a specific range of values of the SD of the reference sample, the conditional ARL0 initially increases with an increase in its sample mean, reaches a peak somewhere between the first and the third quartiles of its distribution and then decreases. Table VB shows that there are extensions in both the darker and the lighter gray-shaded regions. Some of lighter shaded cells of Table V fall into the darker shaded region in Table VB, and some new areas come under the lighter shaded region. Even though the conditional ARL0 values in some of the cells below the darker shaded region are still quite high in Table VB, there is a noticeably sharp decline in the conditional ARL0 with respect to the values in the corresponding cells of Table VA, for a reference sample SD that comes from the upper 5th percentile of its sampling distribution along with any reference sample mean except that from the lower 5th percentile of its distribution. Moreover, Table VB. shows a steady rise of the conditional ARL0 in its first two rows compared to the first two rows of Table VA. This indicates that when the reference sample SD comes from the first quartile of its sampling distribution, the problem of a very low conditional ARL0 fairly diminishes as m increases. Needless to say, there are significant expansions in both the lighter and the darker regions in Table VC compared to Table VB. In fact, if the SD of the reference sample comes from within the first and the third quartiles of its sampling distribution, whatever the sample mean may be, m = 150 ensures a conditional ARL0 between 225 and 600. However, the last row of Table VC also shows that the conditional ARL0 never exceeds 1850 and protects a better control over delayed detection of a shift compared to the earlier cases with small reference sample sizes. Thus, except for some sampling fluctuations, in general, we see that the conditional ARL0 values Copyright © 2013 John Wiley & Sons, Ltd.


S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI that fall below the target unconditional ARL0 increases with an increase in the reference sample size, and the conditional ARL0 values that fall above the target unconditional ARL0 decreases with an increase in reference sample size. With increases in the reference sample size, the conditional ARL0 values converge to the unconditional ARL0, and, obviously, the larger the reference sample size, the better is the IC chart performance. Nevertheless, our performance analysis and the analysis of the impact of the reference sample size show that m ≥ 150 is a reasonably good practical choice.

8. Illustrative example In this section, we illustrate the proposed distribution-free SC chart using the well-known piston ring data discussed in Montgomery18(Table V.1 and V.2, respectively; Figure 3). Piston rings for an automotive engine are produced by a metallic workshop process. The aim is to maintain statistical control of the inside diameters of the rings manufactured by this process. Twenty-five samples each of size 5 (shown in Table V.1 of Montgomery18) are assumed to have been observed a-priori. A Phase-I analysis in Montgomery concluded that one may consider the data set with 125 observations as a reference sample. Therefore, in the present context, m = 125. Moreover, in Montgomery,18 15 Phase-II samples (test samples) each of size 5 are also provided. This, in turn, indicates that n = 5. Through Monte-Carlo simulations, we find H = 6.1 for a target ARL0 of 500 and for m = 125 and n = 5. Values of the 15 SC plotting statistics are obtained as: 2.29, 0.068, 2.17, 0.39, 1.19, 0.76, 0.57, 1.28, 2.46, 2.59, 0.25, 7.53, 8.24, 12.34 and 2.54, respectively. These are displayed in Figure 2 along with the UCL 6.1. The control chart shows that the process remains IC for the first 11 test samples and goes OOC for the first time at sample number 12. In fact, all three consecutive test samples 12, 13 and 14, seem to come from an OOC process, indicating possibly a shift in location, or scale, or both. In this context, it is worth mentioning that the SL chart as in Mukherjee and Chakraborti2 also declared a shift at the same time point. Following the signal from the chart at sample 12, it is of interest to see if the signal is due to a shift in location, scale or both. For this post signal diagnostic stage, we use step 8 of Section 4, which is easier to use than the one described in Mukherjee and Chakraborti.2 We first carry out a two-sided two-sample Wilcoxon rank-sum test for location between the 125 observations from the reference sample (Phase-I) and the 5 (Phase-II) observations from the 12th test sample. This test yields a p-value p1 = 0.002714. Next, we conduct a two-sided twosample Mood’s test for scale using the same data and find the p-value as p2 = 0.01287. Thus, while p1 is less than 1%, p2 is marginally higher than 1% but significantly lower than 5%. Hence, we conclude that there is stronger evidence of a shift in location and some evidence of a shift in scale at test sample number 12. Note, however, that Mood’s test is applicable when no change in location is assumed. Since, we have already detected a location shift, we can reapply Mood’s test considering the centered 125 reference samples (Original observationsobserved Phase-I Median) and the 5 centered Phase-II observations (Original observations-Corresponding Median) recorded for 12th test sample. We see that the p-value of Mood’s test with centered observations increases to 0.2567 which shows no evidence of a scale shift. It is worth mentioning that for these data only, a shift in the location was indicated at sample number 12, whereas the scale was thought to be IC, on the basis of a three-sigma X and R chart, run separately, in Montgomery18 (pp: 220); (Figures 5–6). However, an application of these charts requires the tacit assumption of normality, which may or may not hold. The advantage of the distribution-free charts is that the nominal ARL0 and the false alarm rates are fixed with no model assumption being necessary.

Figure 2. Nonparametric SC chart for the piston ring data



S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI

9. Summary and concluding remarks A single distribution-free Shewhart-type control chart, called the SC chart, is proposed, based on the Cucconi1 statistic, for joint monitoring of location and scale parameters of any continuous distribution. A new follow-up decision rule is proposed for post signal diagnostics. Charting constants are provided for the SC chart and an illustrative example is given. Both the IC and the OOC performance of the chart are studied in detail. Various performance characteristics of the proposed SC chart, viz. the mean, the median and some percentiles of the run length distribution are examined and compared in a simulation study with the distribution-free SL chart by Mukherjee and Chakraborti.2 It is observed that the SC chart performs as well as or better than the SL chart. In summary, the proposed distribution-free SC chart monitors the unknown location and scale parameters simultaneously on a single chart, maintains the nominal ARL0 for all continuous distributions and comes with a follow-up diagnostic procedure in case of a signal to determine whether the shift is in location or scale or both. Most importantly, it does all of this without requiring the practitioner to make the important assumption of normality or any other distribution. Therefore, the proposed chart can be useful in a wide range of practical applications.

Acknowledgements The authors would like to thank the editor Professor Douglas Montgomery and two anonymous reviewers for their positive remarks and encouraging comments. A part of this work was supported under the SARCHI Chair at the University of Pretoria, South Africa. Partial support was also provided by the College of Commerce and Business Administration, the University of Alabama, Tuscaloosa, Alabama, U.S.A. A large part of this work was carried out during second author’s stay at Aalto University, Finland.

References 1. Cucconi O. Un nuovo test non parametrico per il confronto tra due gruppi campionari. Giornale degli Economisti 1968; XXVII:225–248. 2. Mukherjee A, Chakraborti S. A distribution-free control chart for joint monitoring of location and scale. Quality and Reliability Engineering International 2012; 28:335–352. 3. Gan FF. Joint monitoring of process mean and variance, Nonlinear Analysis. Proceedings of the 2nd World Congress of Nonlinear Analysis, USA, 30, 1997; 4017–4024. 4. Chao MT, Cheng SW. Semicircle control chart for variables data. Quality Engineering 1996; 8:441–446. 5. Chen G, Cheng SW. Max chart: Combining X-bar and S chart. Statistica Sinica 1998; 8:263–271. 6. Chen G, Cheng SW, Xie H. Monitoring process mean and variability with one EWMA chart. Journal of Quality Technology 2001; 33:223–233. 7. Costa AFB, Rahim MA. Monitoring process mean and variability with one non-central chi-square chart. Journal of Applied Statistics 2004; 31:1171–1183. 8. Zhang J, Zou C, Wang Z. A control chart based on likelihood ratio test for monitoring process mean and variability. Quality and Reliability Engineering International 2010; 26:63–73. 9. Cheng SW, Thaga K. Single variables control charts: an overview. Quality and Reliability Engineering International 2006; 22:811–820. 10. McCracken AK, Chakraborti S. Control charts for joint monitoring of mean and variance: an overview. Quality Technology & Quantitative Management 2013; 10:17–35. 11. Chakraborti S, Van der Laan P, Bakir ST. Nonparametric control charts: an overview and some results. Journal of Quality Technology 2001; 33:304–315. 12. Chakraborti S, Graham MA. Nonparametric control charts. Encyclopedia of Statistics in Quality and Reliability. John Wiley: New York, 2007, 1; 415–429. 13. Chakraborti S, Human SW, Graham MA. Nonparametric (distribution-free) quality control charts. In Handbook of Methods and Applications of Statistics: Engineering, Quality Control, and Physical Sciences, Balakrishnan N (ed.). John Wiley & Sons: New York, 2011; 298–329. 14. Zou C, Tsung F. Likelihood ratio-based distribution-free EWMA control charts. Journal of Quality Technology 2010; 42:174–196. 15. Lepage Y. A combination of Wilcoxon’s and Ansari-Bradley’s statistics. Biometrika 1971; 58:213–217. 16. Gibbons JD, Chakraborti S. Nonparametric Statistical Inference, 5th edn. Taylor and Francis: Boca Raton, FL, 2010. 17. Marozzi M. Some notes on the location-scale Cucconi test. Journal of Nonparametric Statistics 2009; 21:629–647. 18. Montgomery DC. Introduction to Statistical Quality Control, 4th edn. John Wiley: New York, NY, 2001. 19. Johnson NL, Kotz S. Continuous Multivariate Distributions. John Wiley: New York, 1972. 20. Marozzi M. New results on the Cucconi test. www.sis-statistica.it/files/pdf/atti/rs08_spontanee_2_1.pdf 21. Chakraborti S, Human SW, Graham MA. Phase-I Statistical process control chart: an overview and some results. Quality Engineering 2009; 21:52–62.

Authors' biographies Dr. Shovan Chowdhury is currently an Assistant Professor in the Army Institute of Management, Kolkata. He also serves as a visiting faculty member at the University of Calcutta, Kolkata and the Visva-Bharati University, Santiniketan, India. His research interests lie in the areas of SQC, Reliability, Probability distributions and Inferential aspects of queueing models. He is about to join as a faculty member in the area of Quantitative Methods and Operations Management in the Indian Institute of Management, Kozhikode, Kerala. Dr. Amitava Mukherjee is currently a faculty member in the area of Operations Management, Quantitative Methods and Information Systems in the Indian Institute of Management, Udaipur, Rajasthan. He was a visiting Professor in Aalto University, Finland and an Assistant Professor at the Indian Institute of Technology Madras, India. He has published several articles in international peer-reviewed Copyright © 2013 John Wiley & Sons, Ltd.


S. CHOWDHURY, A. MUKHERJEE AND S. CHAKRABORTI journals in the area of Statistical Quality Control and Sequential Monitoring with a focus onGeostatistics and Market Research/Stock Price problems. Dr. Mukherjee is an active member of several learned societies including the International Indian Statistical Association. He was awarded the U.S. Nair Young Statistician Award by the Indian Society for Probability and Statistics in 2010. He serves as an Associate Editor of the Statistical Methodology. Subhabrata Chakraborti is Professor of Statistics and a Faculty Excellence Fellow at the University of Alabama. He is a Fellow of the American Statistical Association, an elected member of the International Statistical Institute and has been a Fulbright Senior Scholar to South Africa. Professor Chakraborti has authored and co-authored over one hundred publications in a variety of international journals and outlets. He is the co-author of the book Nonparametric Statistical Inference, fifth edition (2010), with Jean D. Gibbons, published by CRC Press/Taylor and Francis. His current research interests include applications of statistical methods, particularly nonparametric statistical methods, to the area of statistical process control. He has supervised more than fifteen Masters and Ph.D. students. Professor Chakraborti has been the winner of the Burlington Northern Faculty Achievement Award for excellence in teaching at the University of Alabama and has been cited for reseach contributions, mentoring and collaborative work with students and colleagues from around the world. He is currently serving his fifteen year term as an Associate Editor of Communications in Statistics. He is a member of the American Statistical Association, the International Statistical Association and the Institute of Mathematical Statistics.