DETECTION OF CHANGE POINT IN STATISTICAL PROCESS CONTROL JAROŠOVÁ Eva, (CZ) Abstract. The paper deals with the Bayesian approach to the statistical control. The shift of the process mean is detected via a high value of the posterior probability. The average run length and the risk of false alarm are computed numerically by simulation for various levels of the shift and for different sample sizes. Both known and unknown process standard deviation are considered. The results of simulation show that the method performs better than the Shewhart control chart and confirm its usability in short run processes with the exception of individual values from the process with an unknown standard deviation. Key words. Bayesian approach, average run length, risk of false alarm, Shiryayev-Roberts statistic. Mathematics Subject Classification: 62P10, 62-07
1
Introduction
Statistical control of industrial processes is one of the most frequently used tools in quality control. A process is monitored through samples of relatively small size drawn at regular intervals. Sample characteristics are plotted against their order and compared to limits in the control chart. When a point falls beyond the control limits a signal is given that parameters of the process may have changed and so the process is out of control. Shewhart control charts for averages are very often applied. The control limits are positioned at 3 / n away from the central line which corresponds e.g. to the target process mean, standard deviation represents variation of the process that is under control and n denotes the size of subgroups. When standard deviation is not known, 20 or 25 subgroups should be taken before the control limits are constructed. It cannot be accomplished in short run processes that are typical for modern business strategies. Various approaches were suggested to solve this problem. They include selfstarting CUSUM chart [3], Q-charts [4] and others. Some methods are based on the Bayesian approach [2] and two of them were examined in [1]. To assess performance of different methods some characteristics are evaluated. The average run length (ARL) is the average number of subgroups taken until a point indicates an out-of-control condition. ARL is determined for several levels of shift in the process mean including 0 . It is obvious that for 0 a fairly long ARL
Aplimat – Journal of Applied Mathematics is desirable while for a non-zero shift ARL should be as short as possible. The risk of false alarm (RFA) is the probability that a signal occurs without the process mean being shifted. RFA must be fairly small to avoid overcontrol. These characteristics must usually be determined numerically by simulation. The aim of this paper is to examine one of the Bayesian methods in more detail. Some findings from [1] are used and other simulations are performed to explore the effect of the sample size and of the unknown process . 2
Detection of the change point
Suppose that a process is monitored at regular intervals and that means are determined in samples of size n. The sample means are assumed to have a normal distribution with mean 0 and variance
2 / n when the process is under control. Suppose the process mean changed from 0 to 1 at some time t0 and remains at this level since then. Time t0 of the change in the process mean is called the change point. Kenett and Zacks [2] present the following approach. The probability that the change point occurred before or at sampling time t is determined repeatedly and its large value indicates the existence of a change point. A random discrete parameter is defined, where
0 when the change point occurred before the first sampling time,
i (1 i t ) when the change point occurred between the i-th and (i+1)st sampling time, t when the change point occurred after time t. The modified geometric prior distribution of this parameter at time t is used, defined by
0, i 1 t ( ) (1 ) p(1 p) i, 1 i t, (1 )(1 p)t 1 t.
(1)
Here denotes the probability that the shift in the process occurred before the first sampling time and p is the probability of success on each trial (i.e. the probability that the shift occurs within the time interval between two successive samplings). Contrary to the ordinary geometric distribution, the set of values of is finite. We will assume that no shift occurred before the process started to be monitored. Then 0 and formulas (1) become simpler
0, 0 i 1 t ( ) p(1 p) i, 1 i t, (1 p)t 1 t.
(2)
The posterior probability function of at sampling time t given sample means X1 , X 2 , ... , X t is 300
volume 4 (2011), number3
Aplimat – Journal of Applied Mathematics
t ( | X1 , X 2 , ... , X t )
t ( ) Lt ( ; X1 , X 2 , ... , X t ) , t ( ) Lt ( ; X1 , X2 , ... , Xt )
(3)
where the likelihood Lt ( ; X1 ,..., Xt ) is given by
Lt ( ; X1 ,..., Xt )
Functions f ( X j ; 0 ) and
t f ( X j ; 1 ) j 1 f ( X j ; 0 ) j 1 t f ( X j ; 0 ) j 1
0 t
j 1
f ( X j ; 1 )
1 t
(4)
t
f ( X j ; 1 ) are densities of normal distributions
N ( 0 , 2 / n)
and
N ( 1 , 2 / n) , respectively. At the sampling time t we are interested in the posterior probability
P( t | X1 ,..., Xt ) that the change point has occurred. Using equations (2), (3) and (4), we have
P ( t | X1 ,..., Xt )
t 1
i
i 1
j 1
t
p(1 p)i1 f ( X j ; 0 ) t 1
i
f (X ; ) j i 1
t
p(1 p) f ( X ; ) f ( X ; ) i 1
i 1
j 1
j
0
j
j i 1
1
j
(1 p)
1
t 1
t
f (X ; ) j 1
j
.
(5)
0
It can be rewritten as t t 1 p i 1 (1 ) p Rj (1 p)t 1 i 1 j i 1 , P( t | X1 ,..., Xt ) t t 1 p i 1 Rj 1 (1 p) (1 p)t 1 i 1 j i 1
(6)
where Rj
f ( X j ; 1 )
n 2 n exp 2 2 ( X j 0 ) . f ( X j ; 0 ) 2
(7)
Kenett and Zacks [ ] use an approximate expression t 1
P( t | X1 ,..., Xt )
t
R
i 1 j i 1 t 1 t
R i 1 j i 1
where
t 1
t
R i 1 j i 1
j
j
j
,
(8)
1
Wt is Shiryayev-Roberts statistic.
volume 4 (2011), number 3
301
Aplimat – Journal of Applied Mathematics
In paper [1] the original expression (6) was considered. Putting t t 1 p i 1 ( 1 p ) R j pZ t , (1 p ) t 1 i 1 j i 1
(9)
Z t can be determined recursively Zt
Rt ( Z t 1 1) 1 p
(10)
and the probability P( t | X1 ,..., Xt ) is given by P ( t | X 1 ,..., X t )
pZ t pZ t 1
(11)
If P( t | X1 ,..., Xt ) is larger than some stopping threshold * a signal is given that a change point has occurred that is that the process mean has shifted. When of the process must be estimated, a recursive formula for sample size n 2 wt2
1 t 2 (t 1) wt21 s t2 sl t l 1 t
(12)
can be used, where sl2 is the sample variance at the lth sampling time. 3
Simulation study
The prior distribution (2) with p = 0.05 was used based on the simulation study in [1]. To compute R j according to (7), the deviation from 0 which is to be identified, i.e. 1 0 has to be set. The size of shift corresponded gradually to , 1.5 , and 2 , where is the standard deviation of the pocess. Based on [1], the stopping threshold equal to 0.99865 was chosen. This value imitating the one-sided risk of false alarm in the Shewhart control chart seemed to guarantee a sufficiently low risk of false signal. The aim of the simulation study was to evaluate ARL for different sample sizes n and for both known and unknown . Monte Carlo method was used to simulate drawing subgroups from a process within SPC. Three situations were considered: a) a process under control with the mean equal to the target value; in this case 1000 samples from N(10, 9) were generated in one cycle, b) process with a shift of the mean equal to that occurred between sampling times t 5 and t 6 ; first 5 samples came from N(10, 9), the remaining 95 samples from N(10 , 9) , c) process with a shift of the mean equal to that occurred between sampling times t 10 and t 11 ; first 10 samples came from N(10, 9), the remaining 90 samples from N(10 , 9) . The sample size changed from 2 to 5 and in case of known also individual values were considered. For all conditions 1 000 cycles were performed every time and the number of samples until P( t | X1 ,..., Xt ) * were recorded. Results are given in tables 1 to 4. 302
volume 4 (2011), number3
Aplimat – Journal of Applied Mathematics Table 1. Empirical ARL0 based on 1000 subgroups.
known
n
1 2 3 4 5
unknown
3
4.5
6
3
4.5
6
956 982 988 987 988
976 988 991 992 994
982 991 996 994 997
923 963 979 978
932 977 986 990
952 984 989 996
Table 2. Empirical RFA based on 1000 subgroups.
n
1 2 3 4 5
known
unknown
3
4.5
6
3
4.5
6
83 40 26 27 22
42 21 16 14 9
33 14 6 9 6
100 49 36 31
79 34 19 13
54 20 14 8
Table 3. Empirical ARL , change point between 5th and 6th sampling time
n 1 2 3 4 5
known
unknown
3
4.5
6
3
4.5
6
14.052 7.661 5.275 3.953 3.258
6.964 3.601 2.416 1.736 1.260
4.067 1.936 1.219 0.792 0.492
7.305 5.163 4.002 3.146
3.430 2.287 1.686 1.285
1.926 1.202 0.795 0.522
Table 4. Empirical ARL , change point between 10th and 11th sampling time
n
1 2 3 4 5 volume 4 (2011), number 3
known
unknown
3
4.5
6
3
4.5
6
13.687 7.527 5.295 4.052 3.200
6.741 3.578 2.351 1.740 1.305
3.974 2.000 1.193 0.787 0.478
7.439 5.258 4.075 3.202
3.522 2.395 1.701 1.308
2.007 1.213 0.823 0.521 303
Aplimat – Journal of Applied Mathematics Table 5. ARL in Shewhart control chart, standards given
n
1 2 3 4 5
4
44 18 10 6 4
1.5
2
15 5 3 2 2
6 2 1 1 1
Conclusion
The simulation study confirmed good properties of the method. All values of ARL are smaller than those of the classical Shewhart control chart (Table 5). The fact that estimating practically does not affect ARL is important. Based on two simulated alternatives with the change point located between the 5th and the 6th sampling times or between the 10th and the 11th sampling times, it seems that the Bayesian method performs well even for quite short sequences of samples. As for individual observations, ARL of the Bayesian method is much better than ARL of the Shewhart control chart when of the process is known. The problem arises, though, when is to be estimated. The estimation based on moving ranges used in the control charts for individuals is not applicable in the recurrent formula because the change of the process mean at the change point is expected to induce a large value of the corresponding moving range and thus to bias the estimate of . A possible excluding this “unsuitable” moving range seems to be quite intricate. References
[1.]
JAROŠOVÁ, E.: Bayesian approach to the short run process control. Demanovská Dolina, 25.-29.8.2010. AMSE 2010 Applications of Mathematics and Statistics in Economy, to be published KENETT, R.S., ZACKS, S.: Modern Industrial Statistics. Brooks/Cole Publishing [2.] Company, Pacific Grove, 1998. MONTGOMERY, D.C.: Statistical Quality Control: A Modern Introduction. John Wiley [3.] & Sons, Hoboken, 2009. QUESENBERRY, Ch.: On Properties of Q Charts for Variables. Journal of Quality [4.] Technology 27, pp. 204-213, 1995.
Current address doc. Ing. Eva Jarošová, CSc. Skoda Auto University Tr. Vaclava Klementa 864 293 60 Mlada Boleslav Czech Republic Phone Number 732469892 e-mail:
[email protected]
304
volume 4 (2011), number3