Estimation of Population Generalized Variance: Application in Service Industry Revathi Sagadavan1, Maman A.Djauhari2, and Ismail Mohamad1 1
Department of Mathematical Science, Universiti Teknologi Malaysia, 81310 Johor,Malaysia Center for Research, Consultation and Training in Statistical Analysis, Faculty of Graduate Studies, Universitas Pasundan, Bandung, Indonesia 1
[email protected],
[email protected],
[email protected]
2
Abstract - In statistical process control, monitoring process target is as important as process variability. In multivariate setting, the latter is still in development due to the complexity of multivariate variability measure. That is why the former is more popular than the latter. The most widely used measure of multivariate variability is the so called generalized variance (GV). In order to monitor GV, we need to estimate the population generalized variance and its square. In the literature, those estimates are given based on single sample. Only recently, it has been developed for the case of m independent samples with equal sample size. This motivates us to further develop in this paper for the case of m independent samples with unequal sample sizes which is usually encountered in service industry. An example of GV control charting for unequal sample sizes will be presented to illustrate the advantages of this method of estimation in monitoring the quality of service. Keywords - Improved generalized variance, control chart, asymptotic distribution, pooled variance, parameter estimation
I. INTRODUCTION Multivariate variability measure such as GV [1][13] is a non - negative real valued function of the covariance matrix such that the more scattered the population, the larger the value of that function [6][7][8]. In practice, this measure is the sole ingredient to construct a control charts to monitor the stability of the process variability. Another example of such functions is vector variance (VV) [7]. In order to monitor process variability in multivariate setting, we need to take into consideration the type of sampling design which can be classified into three types. First, n = 1 observation in each sub-group as presented in [2][13]. Second, n > p in each sub-group where p is the number of quality characteristics or variables. The third design is where 1 < n < p. In this paper, the focus will be on the second design with GV as multivariate variability measure. Examples of GV chart based on sub - group observations can be found in any special publication such as in [14] which has been greatly improved by [4] by removing the bias of control limits. We call the improved version of GV - chart as improved generalized variance (IGV) chart. It is important to note that IGV - chart is developed for the case where the sub - group size is constant, say n. This condition is rather difficult to be fulfilled in service industry where the subgroup - size cannot be easily
978-1-4799-6410-9/14/$31.00 ©2014 IEEE
controlled. In this paper, we are interested in extending the sampling design where the m independent samples are not necessarily of the same size. So, in this new design, the problem is to derive the control limits of IGV chart. For that purpose, the first discussion will be on the 2 estimation of | 6 | and | 6 | which will be the back bone of that chart. This paper is organized into five parts. First section is the introduction. The second part is an overview of the IGV - chart. Next section is all about the derivation of the control limits of IGV chart for the case of m independent sub - groups with might be of different sizes. In the fourth section, an example will be presented to illustrate the mechanism of the new design of IGV - chart. Lastly concluding remarks in the fifth section will close this presentation.
II. OVERVIEW OF IMPROVED GENERALIZED VARIANCE CHART Let X1 , X 2 ,..., X n be a random sample from p- variate
(n 1) S where S is the sample covariance matrix, then A can be written in the normal distribution, N p (P , 6) . If, A n 1
form
of
A ~ ¦ ZD ZD'
where
D 1
Z1 , Z 2 ,..., Z n 1
are
independent, each with the distribution N (0, I ) . See [2] for the details. Consequently, | A | (n 1) p | S |~ Wp (6, n 1) . From this equality [1] shows that | A |~| 6 | uFn21 u Fn22 u
u Fn2 p where the chi - square random variables on the right hand side are independent. Now, we have the exact distribution of GV, |6| | S |~ u F n21 u F n2 2 u u F n2 p ( n 1) p or |6| p | S |~ Z k with Zk Fn2k . (n 1) p i 1 This distribution has the following first two moments. The first, p |6| E (| S |) E ( Z ) p k (n 1) i 1
Proceedings of the 2014 IEEE IEEM
|6| (n 1) p
p
(n k )
E (| S |)
k 1
(1)
The second moment is,
§ 2 · ¨ ¸ © (n 1) ¹
2
E (| S | )
| 6 |2 (n 1)2 p
2p
|6| [m(n 1)] p
§1 · * ¨ (n k ) 2 ¸ p 2 ¹ | 6 |2 © §1 · k 1 * ¨ (n k ) ¸ ©2 ¹
|6| (n 1) 2 p
2p
k 1
p
·
p
(n k ) ¨ n j 2 (n i ) ¸
i 1 ©j1 ¹ (2) The mean (1) and Variance (2) of GV lead us to the p 1 (n k ) classical GV- chart. Let us write b1 p (n 1) k 1
and b2
k 1
1 (n 1)2 p
p
§
p
p
k 1
©
j 1
i 1
Var (| S |)
· ¹
|S| |S| is an unbiased estimate of | 6 | and (b2 b12 ) b1
is an unbiased estimate of | 6 | . Based on these estimators, see [14], many authors use the following control limits of GV - chart, UCL | S | (b1 3 b2 ) 2
LCL | S | b1
LCL | S | (b1 3 b2 ) Djauhari in [4] criticizes these control limits and showed that they are bias. He mentions that these control limits are true for single sample. Furthermore, he eliminates the bias by using all available sub - groups. Here is his derivation Let Si be the sample covariance matrix of sub- group i, i = 1,2,...,m. The average of Si, is, i
i 1
m
Thus,
m(n 1)S
m
¦(n 1)S
i
i 1
All (n 1) Si for i = 1,2,...,m are independent to each other. Consequently, m(n 1)S ~ Wp (6, m(n 1)) From this point, by using similar argument as in the single sample case, we have p |6| | S |~ Z k with Zk ~ F 2m(n1)k 1 [m(n 1)] p k 1 Thus, the first moment is,
p
E (| S |2 ) E (| S |)
2
§ · p | 6 |2 [m(n 1) k 1] ¨ 2p ¸ © [m(n 1)] ¹ k 1 p ª p º u « [m(n 1) j 3] [m(n 1) i 1]» i 1 ¬j1 ¼
(4) The mean (3) and variance (4) lead to the IGV chart. Let p 1 m(n 1) k 1 and us write b3 [m(n 1)] p k 1
b4
§ · p 1 [m(n 1) k 1] ¨ 2p ¸ © [m(n 1)] ¹ k 1
p ª p º u « [m(n 1) j 3] [m(n 1) i 1]» i 1 ¬j1 ¼
Then,
m
¦S
(3)
§ · p | 6 |2 [m(n 1) k 3] Var (| S | ¨ 2p ¸ © [m(n 1)] ¹ k 1 § · p | 6 |2 [m(n 1) k 1] u[m(n 1) k 1] ¨ 2p ¸ © [m(n 1)] ¹ k 1
(n k ) ¨ n j 2 (n i ) ¸
S
k 1
§ · p | 6 |2 [m(n 1) k 3][m(n 1) k 1] ¨ 2p ¸ © [m(n 1)] ¹ k 1 Consequently, the variance is,
2
Then,
p
m(n 1) k 1
§ m(n 1) k 1 · *¨ 2¸ § 2 · 2 © ¹ 2 2 E (| S | ) ¨ ¸ |6| § m(n 1) k 1 · k 1 © m(n 1) ¹ *¨ ¸ 2 © ¹
p
(n k ) 2 (n k ))
§
p
º · p ¸ (Z k )» ¸ » ¹i1 ¼
and the second moment is
Therefore, the variance is, Var (| S |) E (| S |2 ) [ E (| S |)]2 2
ª§ |6| E «¨ p «¨ ¬© > m(n 1) @
|S| | S |2 is an unbiased estimate of | 6 | and b3 (b4 b32 )
is an unbiased estimate of | 6 |2 . Therefore, if we define the control limits of GV - chart as b b UCL | S | ( 1 3 2 2 ) b3 b3 b4 CL | S |
LCL | S | (
b1 b3
b1 b 3 2 2 ) b3 b3 b4
then these control limits are unbiased with smallest variance [4].
Proceedings of the 2014 IEEE IEEM
III. PROPOSED CONTROL CHART Let us consider the following sampling design that we propose. Control chart for process variability is constructed based on sub - group observations where the sample size is not constant for all sub-groups. Like the previous section, let S i be the covariance matrix of the sample i of size ni; i = 1, 2, ...,m. In this case, the pooled covariance matrix is, m
¦ (n
1) Si
i
i 1 m
Sp
¦ (n
i
1)
i 1
m
¦(ni 1)S p
¦(ni 1)Si .
i 1
i 1
and all (n 1) Si for i = 1,2,...,m are independent to each other. Then, m
m
i 1
i 1
¦ (ni 1)S p ~ Wp (6, ¦ (ni 1)) Again, by using similar argument as previously, we come up with the following distribution of the determinant of Sp. |6|
| S p |~
p
p
Z
k
ªm º k1 « ¦ ( ni 1) » ¬i 1 ¼ The mean of this distribution is,
with Zk ~ F 2m( n1)k 1
· ¸ p m ¸ § (n 1) k 1· ¸ ¸¨¦ i ¹ ¸ k 1©i 1 ¸ ¹ p m ª § · p § m ·º u « ¨ ¦ (ni 1) j 3 ¸ ¨ ¦ (ni 1) l 1¸ » (8) ¹ k 1©l 1 ¹¼ ¬ j 1©i 1 From (7) and (8), if we write p 1 § m · (ni 1) k 1¸ and b7 ¦ ¨ p ¹ § m · k 1©i 1 ¨ ¦ (ni 1) ¸ ©i1 ¹
b8
§§ ·· ¨ ¨ ¦ (ni 1) k 1 ¸ ¸ *¨¨ i 1 1¸ ¸ p § · ¨¨ 2 ¸¸ ¨ ¸ ¸¸ p ¨¨ 2 ¹¹ ¸ | 6 | ©© E (| S p |) ¨ m ¨ª º¸ § m · k 1 ¨ «¦ (ni 1) » ¸ ¨ ¦ (ni 1) k 1 ¸ ¼¹ © ¬i 1 ¸ *¨ i 1 2 ¨ ¸ ¨ ¸ © ¹
§ · ¨ ¦ (ni 1) ¸ ©i1 ¹ m
p
p
§
m
¨© ¦ (n
i
k 1
i 1
· 1) k 1¸ ¹
§ ¨ 1 ¨ 2p ¨ m ¨ ª ¦ (ni 1) º » ¨« ¼ ©¬i 1
then,
| Sp | b7
is an unbiased estimate of | 6 | and
2
E (| S p |)
§ · ¨ ¸ 2 ¨ ¸ m ¨ª º¸ ( n 1) ¨ «¦ i »¸ ¼¹ ©¬i 1
.
| S p |2 (b8 b72 )
is an unbiased estimate of | 6 |2 From these results, we propose the following control limits of IGV-chart for subgroup i; for i = 1,2,...,m. b1,i b2,i UCLi | S p | 3 | Sp | b7 b8 b7 2
CLi
To derive its variance, we need the second moment. 2p
· ¸ p m ¸ § (n 1) k 1· ¨¦ i ¸ ¸ ¹ ¸ k 1©i 1 ¸ ¹
ª p § m · p § m ·º u « ¨ ¦ (ni 1) j 3 ¸ ¨ ¦ (ni 1) l 1¸ » ¹ k 1©l 1 ¹¼ ¬ j 1©i 1
(5) §§ m ·· ¨ ¨ ¦ (ni 1) k 1 ¸ ¸ 1 i *¨¨ 2¸¸ 2 ¨¨ ¸¸ ¸¸ ¨¨ p © ¹¹ © 2 |6| m § · k 1 ¨ ¦ (ni 1) k 1 ¸ ¸ *¨ i 1 2 ¨ ¸ ¨ ¸ © ¹
2
or equivalently, § ¨ | 6 |2 Var (| S p |) ¨¨ 2p m ¨ ª ¦ (ni 1) º » ¨« ¼ ©¬i 1
m
|6|
E (| S p |2 ) E (| S p |)
Var (| S p |)
Since, m
or equivalently, § · ¨ ¸ p m 2 m |6| ¸ § (n 1) k 3 ·§ (n 1) k 1· E (| S p 2 |) ¨¨ ¦ ¦ i i ¨ ¸¨ ¸ 2 p ¸ m ¹© i 1 ¹ ¨ ª¦ (ni 1) º ¸ k 1 © i 1 » ¸ ¨« ¼ ¹ © ¬i 1 This equation gives us, the variance
b1,i
LCLi
b7
b1,i
| Sp |
b7
b2,i
| S p | 3
b8 b7 2
where b1,i and b2,i are
b1,i
and
1 (ni 1) p
p
(n k ) i
k 1
| Sp |
Proceedings of the 2014 IEEE IEEM
b2,i
1 (ni 1)2 p
§
p
p
·
p
(n k ) ¨ n j 2 (n l ) ¸ i
k 1
©
i
i
j 1
¹
l 1
This chart shows that there is no out of control signal occurs. In other words, in this case study, learning process variability based on X1, X2, and X3 is stable.
respectively. V CONCLUDING REMARKS IV EXAMPLE OF APPLICATION This example is based on university students’ grades in percentage where university is categorized as service industry. Our case study will be based on first year university students’ grades of three principle subjects, native language(X1), English (X2), and mathematics (X3). Due to confidentiality, the name of the university is kept undeliverable. During monitoring learning process for 13 years, the number of students for every year is not constant. The number of sub - groups is m = 13 which is based on number of years monitoring process conducted. From the data, the following statistics are obtained. The value of b7 0.9989 and b8 0.0022 , and thus the control limits are summarized as the table below. TABLE I THE CONTROL LIMIT VALUES
i
n(i)
GV
b1(i)
b2(i)
LCL
CL
UCL
1
200
0.0508
0.9850
0.0297
0.0427
0.0898
0.1369
2
205
0.1033
0.9853
0.0290
0.0433
0.0898
0.1363
3
200
0.1183
0.9850
0.0297
0.0427
0.0898
0.1369
4
203
0.0917
0.9852
0.0293
0.0431
0.0898
0.1365
5
203
0.0734
0.9852
0.0293
0.0431
0.0898
0.1365
6
200
0.0590
0.9850
0.0297
0.0427
0.0898
0.1369
7
218
0.0640
0.9862
0.0273
0.0448
0.0899
0.1350
8
204
0.0792
0.9853
0.0291
0.0432
0.0898
0.1364
9
210
0.0851
0.9857
0.0283
0.0439
0.0899
0.1358
10
215
0.0789
0.9860
0.0276
0.0445
0.0899
0.1353
11
207
0.0946
0.9855
0.0287
0.0436
0.0898
0.1361
12
210
0.0990
0.9857
0.0283
0.0439
0.0899
0.1358
13
203
0.0965
0.9852
0.0293
0.0431
0.0898
0.1365
To the knowledge of the authors, statistical process control in multivariate setting based on sub - group observations where the sample size is not constant for all sub - groups, is not well explored in literature. However, this sampling design is customarily encountered in the field of service industry. Therefore, the need to develop a new control charting procedure is necessary. In this paper, that procedure is proposed. The main problem is to identify the distribution of the pooled covariance matrix. This is the necessary condition to estimate the population GV and its square which will lead to the construction of the proposed IGV- chart. The control limits for the sampling design (i) individual observations, and (ii) sub - group observations with equal sample size, are just a straight line. But, for the proposed design, they are not a straight line. They are in the form of step functions. Due to this condition, the computation of the proposed control limits is not simple as in the case of sub - group observations with equal sample size.
ACKNOWLEDGMENT This research is sponsored by the Ministry of Higher Education of Malaysia through Fundamental Research Grant Schemes vote number 4F260. The authors gratefully acknowledge Government of Malaysia for the sponsorships and Universiti Teknologi Malaysia for the opportunity to do this research.The authors also grateful to the Editor and anonymous referees for their comments and suggestions that led to the final presentation of this paper. Special thanks goes to Universitas Pasundan for providing the research facilities and Universiti Teknologi Malaysia where the second author initiated this research.
From Table 1, the control chart is presented in Fig. 1. REFERENCES
Fig. 1: IGV - chart of learning process variability
[1] B. Alt, N. D. Smith, " Multivariate ProcessControl," in Handbook of Statistics, P. R. Krishnaiah and C. R. Rao, Ed. Elsevier, 1988, vol.7, pp. 333- 351. [2] T. W. Anderson, An Introduction to Multivariate Analysis. 2nd Edition. New York : John Wiley and Sons Inc, 1984. [3] F. Aparisi, J. Jabaioyes, A. Carrion, "Statistical Properties of the |S| Multivariate Control Chart," Communication in Statistics., vol. 28, no. 11, pp. 2671-2686, 1999. [4] M. A. Djauhari, "Improved Monitoring of MultivariateProcess Variability," Journal of Quality Technology., vol. 37, no. 1, pp. 32-39, 2005. [5] S. Bersimis, S. Psarakis, J. Panaretos, "Multivariate Statistical Process Control Charts: An Overview," Quality And Reliability Engineering International., vol. 23, no. 5, pp. 517 – 543, Aug. 2007.
Proceedings of the 2014 IEEE IEEM
[6] M. A. Djauhari, "A Measure of Multivariate Data Concentration," Journal of Applied Probability & Statistics., vol. 2, no. 2, pp. 139-155, 2007. [7] M. A. Djauhai, M. Mahsuri, D. E. Herwindiati, "Multivariate Process Variability Monitoring," Communication in Statistics - Theory and Methods., vol.37 : pp. 1742 - 1754, 2008. [8] M. A. Djauhari, "Asymptotic Distribution of Sample Covariance Determinant," MATEMATIKA., vol. 25, no. 1, pp. 79 - 85, 2009. [9] M. A. Djauhari, "A Multivariate Process Variability Monitoring Based on Individual Observations," Modern Applied Science., vol. 4, no. 10, pp. 91-96, 2010. [10] R. L. Mason, Y. M. Chou, J. C. Young, "Monitoring variation in a multivariate process when the dimension is large relative to the sample size," Communications in Statistics – Theory and Methods., vol. 38, no. 6, pp. 939 – 951, 2009. [11] R. L. Mason, J. C. Young, Multivariate Statistical Process Control with Industrial Applications. ASA-SIAM Series on Statistics and Applied Probability, 2002. [12] R. L. Mason, Y. M. Chou, J. C. Young, "Decomposition of scatter ratio used in monitoring multivariate process variability," Communications in Statistics – Theory and Methods., vol. 39, no. 12, pp. 2128 – 2145, 2010. [13] R. L. Mason, Y. M. Chou, J. C. Young, "Detecting and interpretation of a multivariate signal using combined charts," Communications in Statistics – Theory and Methods., vol. 40, no. 5, pp. 942 – 957, 2011. [14] D. C. Montgomery, Introduction to Statistical Quality Control. New York : John Wiley and Sons, 2005. [15] A. M. Noor, M. A. Djauhari, "Monitoring the Variability of beltline Moulding Process Using Wilk's Statistics," Journal of Fundamental Sciences., vol. 6, no. 2, pp. 116-120, 2010. [16] A. M. Noor, M. A. Djauhari, "Measuring the Performance of an Eigenvalue Control Chart for Monitoring Multivariate Process Variability," Journal of Fundamental Sciences., vol. 7, no. 2, pp. 120- 125, 2011. [17] A. B. Yeh, D. K. J. Lin, R. N. McGrath, "Multivariate Control Charts for Monitoring Covariance Matrix : Review," Quality Technology & Quantitative Management., vol. 3, no. 4, pp. 415 - 436, 2006. [18] J. Zhang, Z. Li, Z. Wang, "A Multivariate Control Chart for Simultaneously Monitoring Process Mean and Variability," Computational Statistics and Data Analysis., vol. 54, no. 10, pp. 2244 - 2252, 2010.