Aug 2, 2010 - `Studentized residuals' against ®tted values obtained through the ... is used to detect non-constant variance on these Studentized residuals.
This article was downloaded by: [Chongqing University] On: 24 March 2014, At: 16:16 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Journal of Applied Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cjas20
Criteria for estimating the variance function used in the asymptotic quasilikelihood approach Sifa Mvoi & Yan-Xia Lin Published online: 02 Aug 2010.
To cite this article: Sifa Mvoi & Yan-Xia Lin (2000) Criteria for estimating the variance function used in the asymptotic quasi-likelihood approach, Journal of Applied Statistics, 27:3, 347-362, DOI: 10.1080/02664760021655 To link to this article: http://dx.doi.org/10.1080/02664760021655
PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access
Downloaded by [Chongqing University] at 16:16 24 March 2014
and use can be found at http://www.tandfonline.com/page/terms-andconditions
Journal of A pplied Statistics, Vol. 27, N o. 3, 2000, 347± 362
Downloaded by [Chongqing University] at 16:16 24 March 2014
Criteria for estim ating the variance function used in the asym ptotic quasi-likelihood approach
SIFA M VOI & YAN-XIA LIN, School of M athematics and Applied Statistics, University of Wollongon g, Australia
The estimation of the variance function of a linear regression model used in the asymptotic quasi-likelihood approach is considered. It is shown that the variance function used in the deter mination of the asymptotic quasi-likelihood estimates encompasses the variance functions commonly found in the literature. Selection criteria of the most appropriate estimate of the variance function for given data are established. These criteria are based on a graphical technique and a chi-squared test. AB STRACT
1 Introduction O ne of the m ost useful m odels in statistical applications is the linear regression m odel. If the random errors are independently distributed and have the sam e variance, then the classical theory shows that the least-squares (L S) estim ator of the unknown regression param eters is the best linear estim ator. In m any practical problem s, the error variances are heteroscedastic, so the optim al properties of the LS m ethod are lost. An alternative estimation m ethod is required for such problem s. If the variances are known, then the best linear estim ation is obtained through the weighted LS (W LS) m ethod, in which the reciprocals of the variances are used as weights. However, the variances are usually unknown. A natural and frequently used approach is to obtain estim ates of the error variances based on replication of the responses on each design point. T he scenario in which the num ber of replicates is sm all is very ineý cient in practice, since the sam ple variances based on sm all degrees of freedom are wildly unstable, so introduce unnecessar y variability into the problem . H owever, taking a large number of replicates m ay restrict the num ber Correspondence: S. Mvoi, School of Mathem atics and Applied Statistics, University of Wollongong , North® elds Avenue, Wollongong, NSW 2522, Australia. ISSN 0266-476 3 print; ISSN 1360-053 2 online/00/030347-1 6
2000 Taylor & Francis Ltd
348
S. M voi & Y.-X . Lin
of design points to be estim ated. It is not necessary to take m any replicates if it can be assum ed that the variation of the error variances has som e underlying smoothness, as shown by C arroll (1982) and C arroll and Ruppert (1988). In the rem ainder of the paper, the distribution function of the error variance is referred to as the variance function. Many models have been proposed for the variance function. The objective is to have sim ple m odels relating predictors to the variances. E xam ples of variance functions can be found in Box and H ill (1974), Har vey (1976), Just and Pope (1978) and Carroll and Ruppert (1982). Consider the m odel f (x i , h ) + e i ,
Downloaded by [Chongqing University] at 16:16 24 March 2014
yi 5
i5
1, . . . , n
where y i is a response variable, x i is a predictor variable, h is the param eter of interest, f (x i , h ) is the function that gives the determ inistic relationship between y i and x i and e i is the random error. The variance function m odels that generalize other m odels previously considered in the literature have been given by M uÈ ller and Zhao (1995) to be as follows: (1) the power of the m ean m odel, i.e. 2
E(e i ) 5
r
2 0
[ f (x i , h )]
2b
where r 0 > 0, b > 0, which has been referred to by Carroll and Ruppert (1988) and M cC ullagh and N elder (1989); (2) the exponential variance m odel, i.e. 2
E(e i ) 5
r
2 0
exp[2 b f (x i , h )]
where r 0 > 0, b > 0; (3) the polynom ial variance m odel, i.e. 2
E(e i ) 5
b
0
+ b 1 [ f (x i , h )]a 1 + . . . + b
with known powers a j > 0 and b
j
p2
1
[ f (x i, h )] a
are som e real num bers ( j 5
p2
1
1, . . . , p).
T he estim ation of the variance function naturally relies on the speci® cation of the variance± m ean (i.e. f (x i , h )) relationship, and any m isspeci® cation creates problem s for m ost variance function estim ation techniques. The variance function m odel adopted in the asym ptotic quasi-likelihood (AQL) procedure presented below generalizes the variance functions listed above, so elim inates the problem of m isspeci® cation. In this paper, the estim ation of the variance function of a linear-regression m odel used in the AQL approach is introduced. This approach is shown to generalize the m ost widely used variance functions. D iagnostics are developed to select the proper estim ate of the variance function for the AQ L m ethod. Simulated data of various scenarios as well as real-life data are analyzed through this m ethod, and its versatility is dem onstrated.
2 The AQ L approach T he AQ L m ethod is ver y much sim ilar to the quasi-likelihood (Q L) m ethod, in that they are both inference m ethods that use the estim ating functions approach. T he optim ality of the QL estim ate in a general context, as discussed by G odam be and H eyde (1987) and H eyde (1988), requires that a speci® c criterion be satis® ed
Variance function in asymptotic quasi-likelihood approach
349
exactly, whereas, in the AQL m ethod, the criterion is satis® ed in a certain asym ptotic sense (Heyde & Gay, 1989). Lin (1995, 1999) introduced the concept of ® xed sam ple space in her de® nition of the AQ L. This concept enables the estim ation of regression param eters of a linear regression model w ithout presupposing the distribution form of the error term . M voi et al. (1998) have show n the consistency of the AQ L estim ate in linear m odels. Further, M voi and Lin (1999) have shown that the asymptotic quasi-score function converges to norm ality. T herefore, asym ptotic hypothesis tests and con® dence intervals of this estim ate can be determ ined. The Q L and AQ L m ethods are usually de® ned in a stochastic setting. This convention is followed in this paper. The response variable is exp ressed as
Downloaded by [Chongqing University] at 16:16 24 March 2014
yt 5
ft (h ) + M t ,
t5
1, . . . , T
(1)
where M t is a m artingale diþ erence associated with the natural r -® eld Z t generated by { y s , s < t}, where y 0 5 0, Z 0 is the trivial ® eld, and f t ( h ) 5 f t (x t , h ) is a predictable process linear in h and x t being the predictor variable. The Q L estim ate of h is the root of the quasi-score function T
GT 5
Çf t M t
R
t5
1
2
E(M t ½ Z
t2
1
)
where Çf t is a vector derivative of f t ( h ) w ith respect to the elements of h . The quasi2 score function cannot be determ ined if E(M t ½ Z t 2 1 ) is unknown. In the AQ L 2 m ethod introduced by Lin (1995, 1999), the conditional variance E (M t ½ Z t 2 1 ) is 2 estim ated by the function g t 2 f t ( h ), so that the AQ L estim ate of h is the root of the asym ptotic quasi-score function T
G *T 5
R
t5
Çf t M t 1
gt 2
2 f t (h )
2 T he function g t is determ ined from the square of observations { y t } either by the LS ® t or by the autoregressive m ethod. T his is further expounded in the rem ainder of this section. Squaring equation (1) results in 2
yt 5
2 2 f t ( h ) + 2f t ( h )M t + M t 5 g t + e
t
where e t is som e error. This exp ression relates y 2t to the function g t . Taking the conditional exp ectation of this equation results in 2 E ( yt ½ Z
t2
1
)5
2 f t ( h ) + E (M t ½ Z
t2 1
)5
g t + E( e ½ Z
t2
1
)» 2
T hus, the function g t estim ates the conditional expectation E( y t ½ Z is negligible. In an ideal situation, it is required that gt 5
2 2 f t ( h ) + E(M t ½ Z
t2
1
)
gt t2
1
) if E( e ½ Z
t2
1
)
(2)
T he chances of obtaining an ideal g t as given in equation (2) are extremely low in practice. N evertheless, selecting a g t that provides the closest approximation of the right-hand side of equation (2), under a given set of circum stances, is readily possible (Lin, 1995, 1999; M voi et al., 1998; M voi & Lin, 1999). The following provide the groundwork for determ ining the m ost appropriate g t in order to obtain the best inference via the AQ L m ethod.
350
S. M voi & Y.-X . Lin
For the multiple regression setting, the function f t ( h ) is given by ft ( h ) 5
h
0
+ h 1 x 1t + . . . + h p x pt
(3)
where x it and h i , i 5 1, . . . , p, are, respectively, the predictor variables and param eters of interest. G iven the nature of variance functions (1), (2) and (3) given in Section 1, it is reasonable to assume that g t is a function of the powers of the predictor variables x it , i 5 1, . . . , p. The function g t m ay also include variables that correspond to tim e or spatial ordering of the data. T he conditional expectation E(M t2 ½ Z t 2 1 ) is estim ated by
Downloaded by [Chongqing University] at 16:16 24 March 2014
gt 2
2
f t (h )
which is ordinarily expected to be positive. T he determ ination of g t m ay result in negative values for g t 2 f 2t for som e values of t Î (1, 2, . . . , T ) (see M voi et al., 1998). 2 In the event that this is the case, the absolute value ½ g t 2 f t ( h ) ½ is considered. M voi et al. (1998) and M voi and Lin (1999) have shown that one of the conditions for the AQ L estim ate to be consistent as well as for the asym ptotic quasi-score function to converge to norm ality is that the ratio 2 E(M t ½ Z t 2 1 ) , 2 ½ gt 2 f t (h ) ½
t5
1, . . . , T
has to be bounded by two positive num bers. T his condition is very weak and is easily enforced. The only situation in which this condition is not satis® ed is when 2 ½ g t 2 f t ( h ) ½ is zero for some values of t Î (1, . . . , T ). W hen this occurs, an AQ L estim ate can still be obtained by adding a sm all constant c to ½ g t 2 f 2t ( h ) ½ ; i.e. 2 E(M t ½ Z t 2 1 ) is estim ated by ½ gt 2
2 f t (h ) ½ + c
(4)
T he constant c also elim inates the problem of a lack of convergence when obtaining the AQ L estim ate iteratively. The initial value of c is a hundredth of the sm allest unit used in the scale of the data. If convergence still fails w ith the addition of such a value of c, then the value of c is doubled. If convergence fails for a third tim e, then the second value of c is doubled. T his process is continued until convergence 2 eventually occurs. W hen the constant c is so large as to dom inate g t 2 f t ( h ) for t 5 1, . . . , T , the resultant AQ L estimate is very close to the LS estim ate.
3 D eter m ination of the function g t G iven the nature of the variance functions proposed in the literature, the function g t is expressed as a linear function of the powers of the predictor variables x it , i 5 1, . . . , p, in equation (3). Suppose p 5 1, so that ft ( h ) 5
h
0
+ h 1 xt ,
t5
1, . . . , T
(5)
T he function g t would then take the form gt 5
b
0
+ b 1 x t + b 2 3 x 2t + . . . + b q x qt
(6)
where q is a positive integer and b 0 , . . . , b q are som e constants. It is easy to see that this approach incorporates the power of the m ean variance function m odel and the polynom ial variance function m odel. T he exponential variance function m odel is also incorporated in this approach, w hich is a result of the expansion
Variance function in asymptotic quasi-likelihood approach exp (x) 5 For the case w here p 5
`
R
351
k
k5 0
x k!
1, g t is determ ined by the LS ® t on the m odel 2
yt 5
b
0
+ b 1 x t + . . . + b q x qt + e
t
Downloaded by [Chongqing University] at 16:16 24 March 2014
2
where y t is the square of the response variable. If x t is an autoregressive term of y t , say y t 2 1 , then the autoregressive m ethod is used to determine g t . In general, the lowest polynom ial that adequately describes the data { y 2t } is ® tted. In the case where p > 1, all the cross-product term s of the powers of the predictor variables also have to be taken into account. A key step to obtain good estim ates via the AQL method is the determ ination of an appropriate g t . In this paper, selection criteria of the appropriate g t based on a graphical m ethod and a chi-squared statistic are established. The plot of `Studentized residuals’ against ® tted values obtained through the AQ L m ethod using the appropriate g t should exh ibit constant variance. The chi-squared statistic is used to detect non-constant variance on these Studentized residuals. D etails of these procedures are given in the follow ing subsections.
3.1 The graphical method T he graphical m ethod of determining the appropriate g t for a given model is based on a plotting technique derived from the Studentized residuals given in Cook and Weisberg (1982). T his involves rewriting the linear equation (1) in the vector form Y 5
Xh
+M
(7)
where Y is the T 3 1 vector of observations, X is the T 3 p design matrix and M is the T 3 1 vector of m artingale diþ erences. Let X 1 be the T 3 p m atrix w hose tth row is the tth row of the matrix X divided by ½ g t 2 f 2t ( h ) ½ , t 5 1, . . . , T. T hen, the AQ L estim ate can be represented by
h T he predicted value vector is YÃ 5
5
T
Xh H 5
(XÂ 1 X) 2 5
T
1
XÂ 1 Y
HY, w here
X(XÂ 1 X)
2
1
XÂ 1
is the T 3 T m atrix with elem ents h i, j , i, j 5 1, 2, . . . , T. This m atrix is equivalent to the hat m atrix in the estim ation of param eters of linear m odels through the LS à t 5 y t 2 ft ( h T ), t 5 1, . . . , T, and m ethod. T he individual residuals are given by M have m ean zero and variances estim ated by à t ) 5 (1 2 Vaà r(M
h tt ) [ ½ g t 2 2
2
f t (h
T
)½ ] +
R¹
k
[ ½ gk 2
2
f k (h
2
T
) ½ ] htk
Criterion 1. The appropriate g t is the one in which the plot of ® tted values ft ( h against the residuals standardized for their variances ÄÃ t 5 M
à t M à t )] 1/2 [Vaà r(M
has an approxim ate rectangular shape.
(8)
t
T
)
(9)
352
S. M voi & Y.-X . Lin TABLE 1. T he horm one assay data
Downloaded by [Chongqing University] at 16:16 24 March 2014
Reference m ethod 1.0 1.8 1.9 2.0 2.1 2.4 2.6 3.0 3.4 3.6 4.4 5.4 5.8 6.1 6.2 6.7 7.0 8.1 8.6 10.5 12.4 13.6 16.8 20.5 23.5 26.0 33.5 37.0 67.0
Test method
Reference m ethod
Test method
Reference m ethod
Test m ethod
1.8 1.3 1.1 3.4 1.6 1.6 3.0 1.8 1.8 2.2 4.4 4.6 3.7 5.9 10.6 7.8 7.5 5.4 9.2 11.7 12.2 11.1 15.8 28.8 16.4 32.4 32.9 32.9 50.9
1.4 1.9 1.9 2.0 2.2 2.4 2.8 3.2 3.4 3.6 4.6 5.6 5.9 6.1 6.4 6.7 7.9 8.2 8.8 10.6 13.0 13.8 19.3 21.8 24.9 30.0 34.5 38.0
1.0 1.7 1.6 1.4 1.7 2.2 3.1 3.2 3.4 2.8 4.8 4.3 2.9 6.1 6.0 6.9 8.0 6.4 8.4 8.2 12.2 15.2 17.6 18.1 16.4 33.4 27.0 48.6
1.6 1.9 2.0 2.1 2.4 2.5 2.9 3.3 3.6 3.9 4.6 5.7 6.0 6.2 6.4 6.8 8.0 8.6 9.9 10.8 13.0 15.4 20.0 22.0 24.9 31.0 36.5 38.0
1.5 1.6 1.6 1.6 2.1 1.1 3.4 3.3 3.5 2.2 3.3 9.0 9.0 5.2 4.1 4.6 7.8 6.8 9.3 9.4 8.9 18.8 15.8 25.5 20.4 41.9 37.4 40.7
Although equation (9) is diþ erent for the Studentized residuals given in Cook and Weisberg (1982), it still follows the sam e concept. The residuals in equation (9) will hereafter be referred to as the Studentized residuals. Example 1. T he data in Table 1 are the results of two assay m ethods given in Carroll and Ruppert (1988). T his exam ple has also been used by M voi and Lin (1999). T he scale of the data as presented is not particularly m eaningful. T he original source of the data refused perm ission to divulge further information. T he overall goal was to see whether the test method m easurem ents can be reproduced by the reference m ethod m easurem ents. T he expected value of the test m ethod is given by the m odel E( y i ) 5
b
0
+ b i xi ,
i5
1, . . . , n
where x i represents the reference m ethod. The horm one assay data are heteroscedastic. According to Carroll and Ruppert (1988), the constant coeý cient of variation m odel in which Standard deviation of y i 5
r (b
0
+ b 1 x i ),
where r > 0
is a reasonable m odel for variability. W LS results of this data set obtained by using
Variance function in asymptotic quasi-likelihood approach
353
50
Test method
40 30 20 10
Downloaded by [Chongqing University] at 16:16 24 March 2014
0 0
20
40
60
Reference method F IG. 1. H orm one assay: scatter plot.
this variance function serve as a benchm ark for the results obtained through the AQ L m ethod and the L S m ethod. The last data point is outlying, since its value in the reference method is alm ost twice as large as any other am ong the data (see Fig. 1). For this reason, this point is not included in the analysis. Three com peting g functions w ith the appropriate constant c (as given in equation (4)) for the estim ation of the regression param eters using the AQL m ethod are g 1i 5
21.59 2
g 2i 5
2
g 3i 5
1.10x 2i ,
7.18x i + 1.31x i2 ,
12.70 + 1.116x 2i , c5
c5
c5
10.24
2.56
0
Table 2 gives the AQ L estim ates obtained by using these functions. There is substantial variation am ong these estim ates, emphasizing the need to select the appropriate function g. T he plots for the Studentized residuals against the ® tted values for these functions are given in Figs 2 ± 4. Ä Ã i increases sharply from the ® tted Figure 2 indicates that the variance of M value 0 to the ® tted value 7, and then rem ains rough ly constant. Figure 3 also Ä Ã i increases sharply from the ® tted value 0 to the indicates that the variance of M ® tted value 6, and then rem ains roughly constant. Figure 4 indicates that the TABLE 2. Parameter estim ates for the diþ erent functions g
b
M ethod AQ L g 1i AQ L g 2i AQ L g 3i LS W LS
2
2
2
2
2
0
0.41 0.33 0.18 0.64 0.08
b
1
1.00 0.99 0.98 1.04 0.96
354
S. M voi & Y.-X . Lin
Studentized residuals
1.5 1.0 0.5 0.0 ± 0.5
0
10
20
30
Fitted values F IG. 2. Studentized residuals against ® tted values for g 1i .
2.0 1.5 Studentized residuals
Downloaded by [Chongqing University] at 16:16 24 March 2014
± 1.0
1.0 0.5 0.0
± 0.5 ± 1.0 0
10
20
30
Fitted values F IG. 3. Studentized residuals against ® tted values for g 2i .
Ä Ã i is roughly constant. Based on these plots, it is evident that g 3i is the variance of M m ost appropriate for these data. A further con® rm ation of this is that the AQL estim ates obtained by using g 3i are closer to the W L S estim ates of the constant coeý cient of variation variance function m odel than are the AQ L estim ates obtained by using g 1i and g 2i (see Table 2). It is instructive to note that the LS m ethod is not appropriate for these data, and this is re¯ ected by the fact that the LS estim ates are furthest from the W LS estim ates of all the results given in Table 2. 3.2 A diagnostic for non-constant variance In this section, the diagnostic for non-constant variance is introduced. This diagnostic is based on a chi-squared statistic. T he test is done under assum ptions that equation (9) is norm ally and independently distributed.
Variance function in asymptotic quasi-likelihood approach
355
Studentized residuals
2
1
0
Downloaded by [Chongqing University] at 16:16 24 March 2014
± 1 0
10
20
30
Fitted values F IG. 4. Studentized residuals against ® tted values for g 3i
2
W hen E (M t ½ Z t 2 1 ) is not a constant for each t, it is likely to be dependent on the values of one or m ore predictor variables in the function ft ( h ) or on additional relevant quantities, such as time or spatial ordering. The determ ination of a proper g t for a given data set therefore relies on the identi® cation of these quantities. The test of the appropriateness of a given g t for a particular data set is based on the diagnostic for non-constant variance given by Cook and Weisberg (1983). T he basic idea is to convert the constant variance assum ption into a testable param etric hypothesis. T his requires the speci® cation of the form that the variance will take when it is not a constant. Ä t ) depends on an unknown vector param eter k and a known Suppose that Var(M vector z t that m ay be diþ erent for each t. The quantity z t may be a vector of higher powers of a subset of predictor variables in the m odel that have not been used in the current function g t . The quantity z t m ay also include variables with additional relevant quantities, such as time or spatial ordering, that have not been included in the current g t . G iven z t , it is assum ed that ÄÃ t ) 5 Var(M M
r
2
exp( k  z t ),
where r
2
>0
(10)
Ä Ã t satis® es the following conditions: T his expression indicates that M
· · · ·
Ä Ã t ) > 0; Var(M M Ä Ã t depends on z t and k but only through k  z t ; the variance of M Ã Ä Var(M M t ) is m onotonic, either increasing or decreasing, in each component of z t ; Äà t ) 5 r 2 for all t. if k 5 0, then Var(M M
T he results of C hen (1983) suggest that the tests described here are not very Ä Ã t satis® es sensitive to the exact functional form used in equation (10), as long as M the conditions just listed. Let W be a T 3 T diagonal m atrix w ith elements w t 5 exp ( k  z t ), t 5 1, . . . , T. Ä Ã t terms are norm ally distributed with zero mean and variance Assum ing that the M 2 r w t , the test for non-constant variance is equivalent to testing the hypothesis k 5 0. T he follow ing steps for such a test are derived from steps of a sim ilar test of
356
S. M voi & Y.-X . Lin
detecting heteroscedasticity in regression m odels given by C ook and Weisberg (1983). (1) Com pute the regression of Y on all the X term s in the m odel by the AQL Äà t . m ethod using the current g t . Save the Studentized residuals M 2 2 Ã Ä Ä Ã 2t / (2) Let U be a vector of dim ension T w ith elem ents u t 5 M t / r Ä , w here r Ä 2 5 R M 2 T is the m aximum -likelihood estimate of r . (3) Suppose z t has m com ponents. L et D be the T 3 m m atrix with tth row given by d t, where d t represents w(z t , k ) / k j ( j 5 1, . . . , m) evaluated at k 5 0. (4) Com pute the T 3 m m atrix
Downloaded by [Chongqing University] at 16:16 24 March 2014
Å 5 D
D2
11Â D / T
which is obtained from D by subtracting the colum n averages, i.e. 11Â represents a T 3 T m atrix of ones. (5) Com pute the statistic 1
S5
Å (D Å Â D Å )2 UÂ D
2
1
Å Â U D
Å is of full rank. This test cannot be used if D Å is not of full assum ing that D rank. Computationally, S is a half of the sum of squares for the regression of u t on z t with an intercept included. U nder the hypothesis k 5 0, the asym ptotic distribution of S is central chi-squared with m degrees of freedom . Criterion 2. Let z t be a vector w hose elem ents are drawn from the set of all variables that may be responsible for heteroscedasticity of the data under consideration. This set includes powers of the predictor variables as well as variables with relevant quantities, such as tim e or spatial ordering. The appropriate g t is the one for which the hypothesis k 5 0 (from equation (10)) is not rejected at a reasonable level of signi® cance, say 5% , for all possible values of z t . For the m odel yt 5
ft ( h ) + M t 5
h
+ h 1 x 1t + . . . + h p x pt + M t
0
where the x it term s are the predictor variables and the h i term s are the param eters of interest (t 5 1, . . . , T; i 5 1, . . . , p), the AQL estim ate coincides with the LS 2 2 estim ate (as well as the m aximum -likelihood estim ate) if E(M t ½ Z t 2 1 ) 5 r , a 2 2 constant, and g t 5 f t ( h ) + r . A starting point in identifying heteroscedasticity is to assu me that E(M 2t ½ Z t 2 1 ) 5 r 2 . In this case, the AQ L estim ate of h is given by
h
T
5
(XÂ X)
2
1
XÂ Y
from the vector form of equation (7). The Studentized residuals are then given by ÄÃ t 5 M
à t M r à (1 2 h tt ) 1/2
(11)
where h t is the t th diagonal elem ent of the m atrix H 5 T
rÃ
2
5
R
t5
[ yt 2
2 ft ( h )] /(T 2
X (XÂ X )
2
1
XÂ and
p) .
1
Based on the discussions already given, a stepwise m ethod for the determ ination of an appropriate g t can be sum m arized as follows:
Downloaded by [Chongqing University] at 16:16 24 March 2014
Variance function in asymptotic quasi-likelihood approach
357
(1) Let the initial g t be f 2t ( h ) + r 2 , by assum ing that E(M 2t ½ Z t 2 1 ) 5 r 2 , a constant. T his im plies that the AQ L estim ate coincides with the LS estim ate h LS . O btain the LS estim ate of h . Plot the graph of the Studentized residuals in equation (11) against the ® tted values ft ( h LS ). If this plot is reasonably rectangular in shape, then the variance is hom oscedastic and the LS estim ate is a good estim ator of h ; if not, then continue to the next stage. (2) Let z t , t 5 1, . . . , T, be a vector of the predictor variables x 1t , . . . , x pt , and it m ay also include variables that relate to tim e and spatial ordering, depending on the nature of the data and suspected cause of heteroscedasticity. U se the non-constant variance diagnostic described earlier to test the hypothesis of à t 5 y t 2 f t ( h LS ) has constant variance. This involves carrying whether or not M à t , rather than on out the steps in the non-constant variance diagnostic on M à . If M à ÄM t t is found not to have constant variance, then move to the next step. (3) Let the new g t be the most signi® cant LS ® t of y 2t on the predictor variables in f t2 ( h ), i.e. on the variables 2
2
2
x 1t , x 2t , . . . , x pt , x 1t x 2t , x 1t x 3t , . . . , x ( p 2
1)t
x pt
as well as all the variables in the vector z t in the previous step. T his is a reasonable way of determining g t , since g t » f 2t ( h ) + E (M 2t ½ Z t 2 1 ), as shown in equation (2). (4) O btain the AQ L estim ate of h using this g t . Plot the graph of the Studentized residuals in equation (9) versus the ® tted values ft ( h T ). If this plot is reasonably rectangular in shape, then this g t is appropriate for these data; if not, then continue to the next stage. (5) O btain a vector of variables z t (t 5 1, . . . , T ), whose variables are m ost likely the cause of the heterogeneity and that have not been included in the current g t . The variables in z t m ay be the predictor variables in higher powers than given in the current g t . Use the non-constant variance diagnostic described Ä t has constant variance. If earlier to test the hypothesis of w hether or not M Ã is found to have non-constant variance, then m ove back to step (3). ÄM t Example 2. The data in this exam ple are given in Cook and Jacobsen (1978). Prior to the experim ent that resulted in these data, aerial sur vey m ethods were used to estim ate the num ber of snow geese in their sum m er range areas east of H udson Bay in Canada. To obtain the estim ates, a sm all aircraft would be ¯ own over the range and an experienced person would estim ate the number of birds in each ¯ ock of geese spotted. T he following experim ent was carried out to investigate the reliability of this m ethod of counting. An airplane carrying two observers ¯ ew over n 5 45 ¯ ocks, and each observer m ade an independent estim ate of the num ber of birds in each ¯ ock. Also, a photograph of the ¯ ock was taken, so that an exact count of birds in the ¯ ock could be m ade. For the purposes of this exam ple, only the results of one observer are considered. The data are given in Table 3. T his experim ent is also given as exercise 4.6 in Weisberg (1985). The relationship of y 5 photo count and x 5 observer count appears to be linear, but suþ ers from som e heteroscedasticity, as seen in the scatter plot in F ig. 5. T hus, it is reasonable to assum e that yi 5
h
0
+ h 1 xi + M i ,
i5
1, . . . , n
358
S. M voi & Y.-X . Lin TABLE 3. Snow geese data
56 48 22 14 18 26 11 30 165 409 73 70 95 55 83
Observer count
Photo count
Observer count
Photo count
Observer count
50 35 12 10 15 30 9 25 100 250 50 50 150 100 40
38 38 42 30 25 88 66 90 152 342 123 90 57 325 91
25 25 34 25 20 75 55 40 150 500 75 50 40 200 35
25 22 34 9 62 56 42 119 205 200 150 110 43 114 56
30 20 20 10 40 35 30 75 120 200 150 75 25 60 20
400
300 Photo count
Downloaded by [Chongqing University] at 16:16 24 March 2014
Photo count
200
100
0 0
100
200
300
400
500
Observer count F IG. 5. Snow data: scatter plot.
where M i is the error term . T he search for the appropriate function g for this data set involved going through the following functions g i in the given order: g 1i 5 g 2i 5 g 3i 5
2
(h
0
+ h 1 xi )2 + r 2 ,
12 450.61 + 450.00x i 2
11 948.62 2
c5
0
2
c5
0.32x i ,
520.36x i + 6.36x 2i 2
0.01x 3i ,
5242.88 c5
2621.44
2
2
T he function g 1i is based on the assum ption that E(M i ½ Z i 2 1 ) 5 r and the resulting estim ator is the sam e as the LS estimate. T his can be seen by noting that 2 E(M i ½ Z i 2 1 ) is estim ated by g 1i 2
2 f i (h ) 5
(h
0
+ h 1 x i )2 + r
2
2
(h
0
+ h 1 x i )2 5 r
2
Variance function in asymptotic quasi-likelihood approach
359
TABLE 4. Snow geese data: determ ining the appropriate g i Function g g 1i g 2i g 3i
Variables in z i
Statistic S
p-value
81.41 5.29 0.98
xi 3 xi 4 xi
0.0000 0.0214 0.3222
Studentized residuals
Downloaded by [Chongqing University] at 16:16 24 March 2014
1.0
0.5
0.0
± 0.5
± 1.0 0
100
200
300
400
500
600
Fitted values F IG. 6. Snow data: Studentized residuals for function g 3i .
à i 5 y i 2 fi( h LS ) has constant variance. T he ® rst step was to establish whether or not M M aking the variable z i 5 x i , the statistic S was found to be highly signi® cant. T he next function g 2i was therefore an LS ® t of y 2i on the variables x i and x 2i . By selecting 3 Ä Ã i obtained from the m odel by z i 5 x i , the hypothesis of constant variance for M using the AQ L estim ates of g 2i was found to be untenable at the 5% level of 2 2 3 signi® cance. The third function g 3i was therefore an LS ® t of y i on x i , x i and x i . 4 Ä Ã i obtained from the By setting z i 5 x i , the hypothesis of constant variance for M m odel by using the AQ L estim ates of g 3i was found to hold at the 5% level of signi® cance. A sum m ary of these steps is given in Table 4. The graph of the Studentized residuals against ® tted values (Fig. 6) for the function g 3i , although am biguous, is the best that can be found for this data set am ong the prevailing functions g i . Weisberg (1985) suggested that one way to stabilize the variance of this data was to assum e Var(M i ) 5 x ir 2 . For comparison purposes, the W LS estim ates for the regression param eters were worked out by m aking this assum ption. The param eter estim ates for W LS, LS and AQL m ethods are given in Table 5. T he AQ L estim ates are quite close to the W LS estim ates, TABLE 5. Snow geese data: param eter estim ates M ethod LS W LS AQ L
hÃ
0
26.65 9.22 9.58
hÃ
1
0.88 1.13 1.20
360
S. M voi & Y.-X . Lin
whereas the LS estim ates are quite diþ erent. T his is an indication of the eþ ectiveness of the criteria used in the selection of the proper function g for a given data set.
4 E xam ples of sim ulated data T he purpose of the follow ing exam ples is to illustrate that ½ g t 2 f t ( h ) ½ adequately estim ates the variance functions com m only encountered in the literature, and the resulting AQ L estim ates are good estim ators of the param eters of linear m odels. T he m ethods of determining the best g t described in the previous sections are employed in these exam ples. D ata are simulated for the m odel 2
Downloaded by [Chongqing University] at 16:16 24 March 2014
yt 5 where E(M
2 t ½
Z
t2
1
h
0
+ h 1 xt + M t ,
t5
1, . . . , T
) takes the form of the variance function m odels listed in Section 1.
Example 3. In this exam ple, the variance of M t is m odelled as a power of the m ean, i.e. M t is normally distributed w ith m ean equal to 0 and variance given by 2 f t (h ), w here f t (h ) 5 h 0 + h 1x t . The values of the param eters are h 0 5 0.2 and h 1 5 0.8. T hirty sam ples of size 200 each were sim ulated for these values. Param eter estim ates were obtained from the LS, AQ L and W LS m ethods. T he weights used in the W LS m ethod were m odelled according to the power of the m ean variance function that was used in the generation of this data set. The W L S estim ates are obtained by iterating from the initial unweighted LS values. T he results of this exam ple are given in Table 6. Example 4. In this exam ple, the variance of M t is m odelled as an exp onential of the m ean, i.e. M t is norm ally distributed with m ean equal to 0 and variance given by exp [0.2f t ( h )], where f t ( h ) 5 h 0 + h 1x t . The values of the param eters are h 0 5 0.2 and h 1 5 2. T hirty sam ples of size 200 each were sim ulated for these values. Param eter estim ates were obtained from the LS, AQL and W LS m ethods. T he weights used in the W LS m ethod were m odelled according to the exponential variance function that was used in the generation of this data set. T he results of this exam ple are given in Table 7. TABLE 6. Power of the mean variance function m odel M ethod LS W LS AQL
hÃ
0
2.017 8 (1.3640 ) 0.043 2 (0.1448 ) 0.146 1 (0.1948 )
hÃ
1
0.785 1 (0.0208 ) 0.811 7 (0.0104 ) 0.807 4 (0.0108 )
Note: Standard error in parentheses.
TABLE 7. Exponential variance function model M ethod LS W LS AQL
hÃ
0
0.716 2 (0.3281 ) 0.174 6 (0.0504 ) 0.026 7 (0.0747 )
Note: Standard error in parentheses.
hÃ
1
1.936 1 (0.0533 ) 2.013 3 (0.0150 ) 2.025 9 (0.0212 )
Variance function in asymptotic quasi-likelihood approach
361
TABLE 8. Polynom ial variance function m odel M ethod LS W LS AQL
hÃ
0
1.490 9 (0.9691 ) 0.023 2 (0.1398 ) 0.100 1 (0.1597 )
hÃ
1
0.789 4 (0.0148 ) 0.810 0 (0.0080 ) 0.806 4 (0.0079 )
Downloaded by [Chongqing University] at 16:16 24 March 2014
Note: Standard error in parentheses.
Example 5. In this exam ple, the variance of M t is m odelled as a polynom ial of the m ean, i.e. M t is normally distributed with mean equal to 0 and variance given by 2 0.01 + 0.5f t ( h ) + 0.5f t ( h )x t , where f t ( h ) 5 h 0 + h 1 . T he values of the param eters are h 0 5 0.2 and h 1 5 0.8. T hirty sam ples of size 200 each were sim ulated for these values. Param eter estim ates were obtained from the LS, AQ L and W LS m ethods. T he weights used in the W LS m ethod were modelled according to the polynom ial variance function that was used in the generation of this data set. T he results of this exam ple are given in Table 8.
5 D iscussion In Exam ples 3 ± 5 of the sim ulated data, the LS m ethod is not appropriate, because the variances of the models are heteroscedastic in nature. This is re¯ ected by the fact that the LS estim ates of the sim ulated data are very far from the true values of the param eters. T he LS estim ates also have the highest standard errors. T he weights used in the W LS m ethod were m odelled according to the true variance functions used in the sim ulation of the data. U nder these circum stances, the W LS estim ates are the best that can be obtained for these data. The AQL estim ates for the three scenarios listed provide reasonable estim ates and the standard errors for these estim ates are only m arginally higher than the standard errors of the W LS estim ates. The exam ples involving the real-life data and those involving sim ulated data indicate that, by adopting the m ethod introduced in this paper to estim ate the variance function, the AQ L m ethod always provides quite accurate results. T he W LS results of the sim ulated data were slightly superior (had lower standard errors as well as having values closer to the true param eter) to the AQ L results. T his is because the W LS results were determined by using the exact variance function that was used in the simulation of the data. T he m arginal diþ erence between the W LS and the AQL results indicates that the variance function of a given data set is estim ated quite accurately by the approach introduced in this paper. Thus, the AQ L m ethod is a legitim ate alternative method of estim ating param eters of heteroscedastic data. In practical situations, the nature of the exact variance function m ay not be discerned and it is quite possible to m isspecify it completely. T he m isspeci® cation of the variance function would undoubtedly result in poor W LS estim ates. T he form ulation of the variance function presented in this paper generalizes the com monly used variance functions, so removes the danger of m isspeci® cation. In this respect, the AQ L m ethod is superior to the W LS m ethod. The application of the AQ L method is not restricted to linear regression m odels. T he application of this procedure in analysis of variance m odels will be investigated in a subsequent paper.
362
S. M voi & Y.-X . Lin
Downloaded by [Chongqing University] at 16:16 24 March 2014
R EFER EN C ES B OX, G. E. P. & H ILL, W. J. (1974 ) Correcting inhomogeneity of variance with power transformation weighting, Technometrics, 16, pp. 385 ± 389. C ARROLL, R. J. (1982 ) Adapting to heteroscedasticity in linear m odels, The A nnals of Statistics, 10, pp. 1224 ± 1233. C ARROLL, R. J. & R UP P ERT, D. (1982 ) Robust estim ation in heteroscedastic linear m odels, The A nnals of Statistics, 10, pp. 429 ± 441. C ARROLL, R. J. & R UPP ERT, D. (1988) Transformation and Weighing in Reg ression (N ew York, Chapman and Hall). C HEN, C. F. (1983 ) Score tests for regression m odels, Journal of the Am erican Statistical Association, 78, pp. 158 ± 161. C OOK, R. D. & JACOBSEN, J. O. (1978 ) Analysis of 197 7 West H udson Bay snow goose surveys, U npublished report, Canadian W ildlife. C OOK. R. D. & W EISBERG, S. (1982) Residuals and In¯ uence in Regression (N ew York, Chapm an and Hall). C OOK. R. D. & W EISBERG, S. (1983 ) Diagnostics for heteroscedasticity in regression, B iometr ika, 70, pp. 1 ± 10. G ODAMBE, V. P. & H EYDE, C. C. (1987 ) Quasi-likelihood and optimal estim ation, International Statistical Review, 55, pp. 231 ± 244 . H ARVEY, A. C. (1976 ) Estim ating regression models with m ultiplicative heteroscedasticity, Econometrics, 44, pp. 461 ± 465. H EYDE, C. C. (1988 ) Fixed sam ple and asym ptotic optim ality for classes of estimating functions, Contemporar y M athematics, 80, pp. 241 ± 247. H EYDE, C. C. & G AY, R. (1989 ) On asym ptotic quasi-likelihood estimation, Stochastic Processes and Their Applications, 31, pp. 223 ± 236. J UST, R. E. & P OPE, R. D. (1978 ) Stochastic speci® cation of production functions and econom ic implications, Journal of Econometrics, 7, pp. 67 ± 86. L IN, Y.-X. (1995 ) On asym ptotic quasi-score estim ating functions and its applications, Scienti® c Research Report SRR 024-9 5, Australian National University. L IN, Y.-X. (2000 ) A new approach of asymptotic quasi-score estimating functions, Scandinavian Journal of Statistics, 27, in press. M CC ULLAGH, P. & N ELDER, J. A. (1989) Generalized Linear M odels (N ew York, Chapman and Hall). M UÈ LLER, H.-G. & Z HAO, P.-L. (1995 ) On a semiparametric variance function model and a test for heteroscedasticity, The Annals of Statistics, 23, pp. 946 ± 967. M VOI , S. & L IN, Y.-X. (2000 ) Convergence to normality of the asym ptotic quasi-score function on linear m odels, B iometrical Journal, in press. M VOI , S., L IN, Y.-X. & B IONDINI , R. (1998 ) Consistency of the asym ptotic quasi-likelihood estim ate on linear models, B iometrical Journal, 40, pp. 57 ± 78. W EISBERG, S. (1985) A pplied Linear Reg ression (N ew York, W iley).