Tamkang Journal of Science and Engineering, Vol. 3, NO. 3, pp. 131-137 (2000)
131
Control Charts for Lognormal Data Smiley W. Cheng and Hansheng Xie Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada R3T 2N2 E-mail:
[email protected]
Abstract In dealing with positively-skewed distributed data, the direct logarithmic transformation may result in a control chart with inappropriate control parameters in the application of quality control. When a specific interval for the lognormal mean is given, a new method is introduced to set up two control charts and these two charts can monitor a process for which the underlying distribution of the quality characteristic is lognormal. Key Words: lognormal distribution, control chart, quality control, mean, variance, normal distribution
1. Introduction Since many kinds of data in real life have a positively-skewed distribution, the lognormal distribution has been widely applied in many areas. Morrison [2] first applied the lognormal distribution to quality control, and proposed a modified quality control scheme that can process skewed data in the original scale of measurement when the assumption of normality can not be made. Ferrell [1] also suggested using this control scheme for computing and plotting control charts when data are from a badly skewed distribution which can be approximated by a lognormal distribution. Based on the basic relationship between normal and lognormal distributions, Morrison derived control limits for the lognormal variable from the corresponding control limits for a normal variable, using the exponential transformation. However, because of the complexity of the lognormal distribution, its application to quality control cannot be referred to that of the normal distribution by simply taking the direct transformation, which may result in a control chart with inappropriate control parameters. For simplicity, in the following a lognormal process refers to a process in which its characteristic follows a lognormal distribution and a normal process refers to a process in which its characteristic follows a normal distribution. In this article, quality control schemes for lognormal processes based on direct transformations, either the exponential
transformation or the logarithmic transformation, are critically examined, and a new quality control scheme is proposed. To monitor a lognormal process, the corresponding normal process is obtained through the logarithmic transformation. When it is given that the lognormal process mean lies in a specific interval, then two control charts are set up for the lognormal process. The control of a complex lognormal process is simplified to that of a normal process, for which good control schemes are available and it is much easier to implement. Properties of the new control charts are discussed, and an example is given to illustrate the implementation of the new charts.
2. The Direct Transformations 2.1 The Exponential Transformations Suppose that Xij, i =1, 2, • • • and j = 1, 2, • • •, n represent the quality characteristic of a process, follow a lognormal distribution with parameters µ , and σ ; i.e., LN( µ , σ 2). Let Yij = lnXij, then Yij, i =1, 2, • • • and j = 1, 2, • • •, n, follow a normal distribution N( µ , σ 2), where µ is the normal process mean and σ is the normal process standard deviation. In the modified quality control scheme proposed by Morrison [2], the statistics for a lognormal process and the corresponding 3 control limits can be obtained from the following derivations. Notice that
Smiley W. Cheng and Hansheng Xie
132
&& − E(Y && ) ⎛Y ⎞ ≤ 3⎟ P⎜ ⎜ ⎟ σ Y&& ⎝ ⎠ && ) ≤ exp(µ + 3σ && ) ) = P(exp(µ − 3σ Y&& ) ≤ exp(Y Y
(
= P exp(µ − 3σ Y&& ) ≤ X i(1) X i(n) ≤ exp(µ + 3σ Y&& )
)
(1)
&& is the sample midrange for a normal where Y process, and Xi(1) and Xi(n) are the minimum and maximum of the ith sample, respectively. Similarly, ⎛ R − E(R ) ⎞ i ≤ 3⎟ P=⎜ i ⎜ ⎟ σ Ri ⎝ ⎠ = P (exp ((d 2 − 3d 3 )σ ) ≤ exp(R i ) ≤ exp ((d 2 + 3d 3 )σ ))
⎞ ⎛ X = P⎜ exp ((d 2 − 3d 3 )σ ) ≤ i(n) ≤ exp ((d 2 + 3d 3 )σ )⎟ ⎟ ⎜ X i(1) ⎠ ⎝ (2) where Ri is the sample range for a normal process, and d2 and d3 are control chart constants. Thus, from (1), the geometric sample mean
X i (1) X i ( n )
is used as a measure of the lognormal process mean and the exact control limits are exp( µ ± 3σ Y&& ) . With the available sample results, the control limits are estimated by m X ⎛ ⎞ && )r ± A 2 = X X ⎜ 1 ∑ i(n) ⎟ exp(Y (1) (nm) ⎜ ⎟ ⎝ m i =1 X i(1) ⎠
±A2
(3) where X(1) and X(nm) are the minimum and maximum in nm sample values for a lognormal process, r is the average ratio of the maximum to the minimum from m samples of size n for a lognormal process, and A2 is a control chart constant. Similarly, from (2), the sample ratio
X i(n ) is used as a measure of the lognormal Xi (1)
process variability and the exact control limits are exp[(d 2 ± 3d 3 )σ ] , which are estimated by
r
D3
⎛ 1 m X i(n) ⎞ ⎟ =⎜ ∑ ⎜ m i =1 X ⎟ i(1) ⎠ ⎝
D3
⎛ 1 m X i(n) ⎞ ⎟ =⎜ ∑ ⎜ m i =1 X ⎟ i(1) ⎝ ⎠
D4
(4)
and
r D4
where D3 and D4 are control chart constants.
From the derivations, it is seen that the incontrol probability of the derived statistics for a lognormal process is the same as that for its normal counterpart, but the control limits for lognormal control charts are inaccurate. Because the control parameters for lognormal and normal processes are different, the direct exponential transformations may not assure that the statistical state of a lognormal process is the same as that of the corresponding normal process. Morrison's chart actually sets the target as the normal process mean, and therefore a normal process is monitored through its lognormal counterpart. Moreover, because the parameter estimators of (3), (4) and (5) resulting from the exponential transformation are biased, the control charts neither have proper probability nor proper 3-sigma control limits. 2.2 The Logarithmic Transformation In statistical analysis, a logarithmic transformation is often applied to a set of positively-skewed distributed data before proceeding with the analysis. This approach works well for usual statistical analysis. However, the direct logarithmic transformation may result in a control chart with inappropriate control parameters in the application of quality control. Let µ* be the lognormal process mean and σ * be the lognormal process standard deviation. For a lognormal process, it is of interest to control the parameters µ* and σ * , while, for a normal process, µ and σ are of interest. It can be shown that, for a specified significance level α , the control limits for individual measurements of a lognormal process is different from those of the corresponding normal process. Without loss of generality, assume that X ~ LN(0, 1), then Y = ln(X) ~ N(0, 1). For α = 0.0027, it follows from P(
X − µ* > x0.00135) = 0.00135 σ*
that the upper percentile x0.00135 can be found as x0.00135 = exp(3.5). Hence, the upper control limit for the X chart is
UCL X = µ * + x 0.00135σ* =
exp(0.5) + exp(3.5) exp(1) − 1 = 45.06 (5)
and ln(UCLX) = 3.81.
Control Charts for Lognormal Data
X (mn) − X (1) ≤ a1 + a 2 .
The upper control limit for the Y chart is
UCL Y = µ + z 0.00135 σ = 3.00.
Hence, an interval for possible values of µ* is
Obviously, ln(UCLX) is not equal to UCLY so that the direct logarithmic transformation may result in a different control state for corresponding normal process. Therefore, some restrictions have to be given in order to guarantee the control state of the corresponding process is equivalent to that of the original lognormal process.
3. New Control Charts for Lognormal Processes
3.1 A Specific Interval for the Lognormal Process Mean
(µ *L , µ *U )
− a 2 + X ij ≤ µ * ≤ a 1 + X ij i = 1, 2, • • •, m; j = 1, 2, • • •, n
If specification limits are available, the upper and lower specification limits can be used as µ*U and µ*L , respectively.
The control parameters for a lognormal process are µ* and σ * . The control parameters for the corresponding normal process are µ and σ . The parameters µ* and σ * are functions of µ and σ given by
µ* = exp(µ + 0.5σ2 )
(9)
σ 2* = exp(2µ + σ 2 )[exp(σ 2 ) − 1] (10) m preliminary samples collected from the incontrol process can be used to estimate σ 2 , i.e., as below:
µ U = ln(µ *U ) − 0.5σˆ 2
(11)
µ L = ln(µ *L ) − 0.5σˆ 2
(12) 2
and it is given for the process according to technical specifications. It could be either a given margin of error or specification limits for a single measurement. The margin of error is given by
where a1 and a2 are known positive constants. Equation (6) can be rewritten as
(7) (8)
σˆ 2 = S 2 . From (9), an interval for µ is obtained
Suppose that m samples are randomly drawn from a lognormal process, and µ* is known to lie in an interval:
i = 1, 2, • • •, m; j = 1, 2, • • •, n
µ*U = X(1) + a 1 µ*L = X(mn ) − a 2
3.2 Derivation of Intervals for Parameters
It is difficult to directly construct a control chart for a lognormal process since sampling properties associated with the lognormal statistics are not easy to derive. Making use of the relationship between normal and lognormal distributions and having been given a specific interval for the lognormal mean of a lognormal process, a new method is proposed to avoid the complexity of the lognormal distribution. Two control charts for lognormal distribution can be constructed to monitor a lognormal process.
−a 1 ≤ X ij − µ* ≤ a 2
133
(6)
From (10), an interval for σ * is obtained as below:
σ *2U = exp(2µ U + σˆ 2 )[exp(σˆ 2 − 1] (13)
σ
2 *L
= exp(2µ L + σˆ )[exp(σˆ − 1] 2
2
(14) Because normal distribution is symmetric about mean, the target for the corresponding normal process is
µ 0 = 0.5(µ U + µ L ) = 0.5ln(µ *U µ *L ) − 0.5σˆ 2 (15) which implies that the target for the lognormal process is the geometric mean of µ*U and µ*L , i.e., µ*0 =
µ *Uµ *L .
which implies
X (mn) − a 2 ≤ µ* ≤ X(1) + a 1
3.3 Constructing Control Charts for Lognormal Processes
Smiley W. Cheng and Hansheng Xie
134
When m preliminary samples are taken from a lognormal process, the logarithms of each observation form the m initial samples of the corresponding normal process. To determine whether the process variability is stabilized, an S chart can be set up with control limits:
UCL S =
χ 2α 4 S n − 1 c4 (16)
LCL S =
χ 2α 3 S n − 1 c4
(17) where α 3 and α 4 are type-I error probabilities for lower and upper tails respectively. If all the standard deviations of these samples plot inside the control limits, then the process variability appears to be in control. Otherwise, each of the out-of-control points for which assignable causes can be found is discarded and the control limits are recalculated. Then these control limits can be used for controlling current or future production and σ 2 is estimated from the formula
σˆ = S . 2
2
The percentiles for Y chart can be obtained by setting
µ U − µ 0 = z α2 σ
(18)
µ L − µ 0 = z α1 σ
(19) Then,
z α1 =
µL − µ0 σ (20)
z α2 =
µU − µ0 σ
(21) where α1 and α 2 are type-I error probabilities for lower and upper tails respectively. A Y chart can be set up with the following control limits:
CL Y = µ 0
(22)
UCL Y = µ 0 +
zα 2 σ
µ 1 = (1 − )µ 0 + U n n n (23)
LCL Y = µ 0 +
z α1 σ n
= (1 −
1 µ )µ 0 + L n n (24)
4. Properties of the New Control Charts When a specific interval for µ* is given, the derivations of control limits for the control charts monitoring the two related processes are reversible and hence the statistical control state of the lognormal process can refer to that of the corresponding normal process. As a result, it is necessary to study properties of the two control charts for the normal process and effects of normal parameter changes on the lognormal parameters. 4.1 The Calculations of the Average Run Length (ARL) 2
Y ij ~ N(µ 0 ,σ 0 ) Assume that independently, where i = 1, 2, • • •, m; j = 1, 2, • • •, n. Suppose that the normal process mean changes from µ 0 to µ 0 + aσ 0 (a ≠ 0) and the normal process standard deviation changes from σ 0 to
bσ 0 (b > 0). The probability of type-II error for Y chart can be computed from
β Y = P(LCL Y ≤ Y ≤ UCL Y µ = µ 0 + aσ 0 ; σ = bσ 0 ) ⎡ 1 ⎤ ⎛µ ⎞ = Φ⎢ ln⎜⎜ *U ⎟⎟ − a n ⎥ ⎣ 2σ 0 ⎝ µ *L ⎠ ⎦ ⎡ 1 ⎤ ⎛µ ⎞ − Φ ⎢− ln⎜⎜ *U ⎟⎟ − a n ⎥ ⎣ 2σ 0 ⎝ µ *L ⎠ ⎦ (25) When a = 0, the probability of type I error for Y chart is
α Y = 1− β Y = 2Φ[−
⎛µ ⎞ 1 ln ⎜ *U ⎟ ] 2σ 0 ⎝ µ*L ⎠
from which it is noted that α Y
(26) is a function of
µ*U µ . To achieve a small α Y , *U and σ 0 are µ*L µ*L
usually not larger for high precision products so that the process variability has to be small, while, σ 0 is allowed to be a little bit large for medium or low precision products.
Control Charts for Lognormal Data
The ARL's for Y chart can be easily obtained from
1 ARL Y = 1 −βY
(27)
When α Y is fixed, ARL will decrease as a and n increase. The ARL's for S chart can be computed from
ARL S =
1 1 − βS
(28)
⎛ χ α2 , n −1 ⎞ ⎛ χ 2α , n −1 ⎞ 4 3 ⎜ ⎜ ⎟ ⎟ where βS = H −H 2 2 ⎝ b ⎠ ⎝ b ⎠ αS = α 3 + α 4 .
and
4.2 Effects of Changes in Parameters When there are changes in the normal process mean and/or process variability, the lognormal parameters will be changed to
µ1* = exp[µ 0 + aσ 0 + 0.5(bσ 0 ) 2 ] σ12* = exp[2(µ 0 + aσ 0 ) + (bσ 0 ) 2 ] ⋅ [exp((bσ 0 ) 2 ) − 1] where a ≠ 0 and b > 0.
135
5. Charting Procedure and Example The steps to set up the two charts are summarized below: 1. Determine the values of µ*U and µ*L . (a) Use values provided by technical specifications, or if not available, (b) use µ*U and µ*L obtained from preliminary data as follows: if the overall range of the data is less than or equal to a 1 + a 2 , calculate µ*U and µ*L ; however, if the overall range is greater than a 1 + a 2 , remove the possible outliers X(nm), X(1), • • •, until the overall range is less than or equal to a 1 + a 2 , and then calculate µ*U and µ*L . 2. Transform data using Y = ln(X). 3. Construct an S chart and estimate σ 2 by 2 S when the process variability is in control. 2 2 4. Compute µ*U , µ*L , µ 0, σ *U and σ *L . 5. Construct a Y chart. 6. For a sample point that plots outside one of 2 the control limits, calculate µˆ *i and σˆ *i 2 using Y i as the estimate of µ and S i 2
Because the derivatives of µ1* with respect to a and b are
∂µ1* = σ 0 µ1* > 0 ∂a
(29)
∂µ1* 2 = bσ 0 µ1* > 0 ∂b
(30)
Notice that µ1* is a monotone increasing function of a and b, and σ1* can be written as a monotone increasing function of µ1* :
σ1* = µ1* exp((bσ 0 )2 ) − 1 2
(31)
since exp((bσ 0 ) ) − 1 > 0. Then σ1* is also a monotone increasing function of a and b. Thus, the direction of an out-of-control signal from a lognormal process can be identified from the corresponding normal process.
as the estimate of σ . Plot 'm+' or 'm–' against sample number if only µˆ *i > µ *U or µˆ *i < µ *L ; plot 'v+' ' or 'v–' against sample number if only
σˆ *i > σ *U
or
σˆ *i < σ *L ; plot 'm+v+', 'm+v–', 'm–v+' or 'm–v–' against sample number according to the sources and the directions of an out-ofcontrol signal. 7. Examine the assignable cause(s). An example is given to illustrate how to apply the new control scheme to lognormal distributed data. The data, consisting of 34 samples of size 5, are given in Table 1. The first 30 samples are taken from Morrison [2], where it was stated that they were collected from a process in the valves industry. The last 4 samples are added to simulate an out-of-control process. For the measurement of individual values, the upper and lower specification limits are 1 and 10. A probability plot of the real data in Figure 1 suggests that the observations do not behave as though arising from a normal distribution. To
136
Smiley W. Cheng and Hansheng Xie
adjust for non-normality, lognormal transformation is applied to the original data. A probability plot of the transformed data in Figure 2 shows that a lognormal distribution curve can be fitted quite well, suggesting that lognormal quality control scheme should be employed in this case. Suppose that the first 20 samples in Table 1 are used as preliminary samples. After applying logarithmic transformation, an S chart is set up with α S = 0.0027 and it is shown in Figure 3. When the 20 sample standard deviations are plotted on this chart, there is no indication of an out-of-control condition. Then σ 2 is estimated by 2 S = 0.1263.
mean shift on the first subsequence sample is 0.9236, and the probability of detecting the variability change on the first subsequence sample is 0.7398. Hence, the expected number of samples taken before the shift is detected is 1.0827, and the expected number of samples taken before the change is detected is 1.3517. When the last 14sample means are plotted on the Y chart shown in Figure 5 and the last 14 sample standarddeviations are plotted on the S chart shown in Figure 6 , it is seen that the last 4 points are above at least one of the UCL's. This indicates that the lognormal process is out of control with an 2 increase in µ* and σ * . To identify the sources of these out-of-control signals, µˆ *i and σˆ *2i are calculated. It is found that µˆ *31 and µˆ *33 are 2 2 2 greater than µ*U , and σˆ *32 , σˆ *33 and σˆ *34
2
Figure 1. The probability plot for the valve data
exceed σˆ *U , which is equal to 13.4651. This diagnosis is supported by reference back to the individual measurements of the last 4 samples, since there are individuals exceeding µ*U in sample 31 and 33 and greater variability within sample 32, 33, and 34. It should be noted that, for the last sample, only the lognormal process variability is out of control although both of the corresponding normal process mean and variability are out of control.
Figure 2. The probability plot for the logarithm of the valve data Since z α 1 = –3.0449 and z α 2 = 3.0449, a
Y chart can be set up with α Y = 0.0023 and it is shown in Figure 4. When the 20 sample means are plotted on this chart, there is also no indication of an out-of-control condition. Since the S and Y charts constructed using the first 20 samples indicate that both the process variability and the process mean are in control, the control limits obtained can be used in on-line statistical process control. Assuming that there is a 2 σ 0 shift in the process mean and a 3 times change in the process standard deviation, the probability of detecting the
Figure 3. The first S chart for the valve data
Control Charts for Lognormal Data
Table 1. Valve data
137
6. Conclusions Our discussion shows that direct data transformation methods may be inappropriate for the control of a lognormal process if no constraints are applied to the lognormal process. This clarifies some confusion in lognormal quality control. When a specific interval for the lognormal mean is given, our new method enables the control of a lognormal process through that of its normal counterpart, avoiding the complexity of the lognormal distribution. A detailed analysis of the two new control charts is presented. In general, the new control charts are applicable to a lognormal process whenever a reasonable interval for the lognormal mean can be found based on the nature of the lognormal data. Acknowledgment
Figure 4. The first Y-bar chart for the valve data
The research was partially supported by a Nature Sciences and Engineering Research Council, Canada grant. The first author would also like to express the gratitude to the Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada for providing facilities while he is on Administrative leave there. References
Figure 5. The second S chart for the valve data
Figure 6. The second Y-bar chart for the valve data
[1] Ferrell, E. D., “Control Charts for Lognormal Universe,” Industrial Quality Control, V. 15, pp. 4-6 (1958). [2] Morrison, J. , “The Lognormal Distribution in Quality Control,” Applied Statistics, V. 7, pp. 160-172 (1958). [3] Xie, H., Contributions to Qualimetry. Ph.D. dissertation, University of Manitoba, Winnipeg, Canada, (1999).