A Short Course on Likelihood Asymptotics Part II: S ...

2 downloads 0 Views 543KB Size Report
1 The S-PLUS library HOA. 2 ... HOA (short for Higher-Order Asymptotics) is an S-PLUS library that imple- ... http://statwww.epfl.ch/people/brazzale/lib.html.
A Short Course on Likelihood Asymptotics Part II: S-PLUS Tutorials by R. Bellio, A.R. Brazzale, L. Pace, A. Salvan April 2000

Contents 1 The S-PLUS library HOA

2

2 Logistic and loglinear models

3

2.1 Background results . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 4 8

3 Regression-scale models

11

4 Nonlinear heteroscedastic models

18

5 References

34

A Main S-PLUS functions

35

B Copyright C Contact addresses

35 35

3.1 Background results . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 Background results . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Example 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Example 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1

1 THE S-PLUS LIBRARY HOA

2

1 The S-PLUS library HOA (short for Higher-Order Asymptotics) is an S-PLUS library that implements approximate conditional inference for respectively logistic and loglinear models, linear nonnormal regression models and nonlinear heteroscedastic regression models. The library can be obtained from the web page

HOA

http://statwww.epfl.ch/people/brazzale/lib.html

The software is free and distributed under the GNU General Public License. See Appendix B and the enclosed READ.ME les for the license conditions. The library HOA is written in S-PLUS 3.4 Release 1 for Unix, and has been developed on a Silicon Graphics Iris, IRIX Release 6.5. As it is entirely written in S-PLUS, it can be used on other Unix platforms, as well as under the Windows versions of S-PLUS. The code makes use of the following S-PLUS libraries 1.

Matrix

(Venables and Ripley, 1997, pp. 58{60)

2.

MASS

(Venables and Ripley, 1997, Appendix A)

3.

boot

(Davison and Hinkley, 1997, Chapter 11)

which must be loaded. See the documentation les enclosed with the library for further details on how to install it on your platform. The library is split into four sections1 : cond

approximate conditional inference for logistic and loglinear

2.

marg

approximate conditional inference for linear nonnormal models;

3.

nlreg

4.

sample:

1.

models;

approximate conditional inference for nonlinear heteroscedastic regression models; and

models.

conditional sampling routines for linear nonnormal regression

In the following we shall illustrate the rst three sections. The cond, marg and sample sections of the S-PLUS library HOA have been developed by A.R. Brazzale. The nlreg section is due to R. Bellio and A.R. Brazzale. 1

2 LOGISTIC AND LOGLINEAR MODELS

3

2 Approximate conditional inference in logistic and loglinear models 2.1 Background results Logistic and loglinear models are a particular instance of a linear exponential family with p-dimensional natural parameter ( ; ), minimal sucient statistic (T; S ), and log-likelihood function

`( ; ) = t +  s ? K ( ; ) ;

(1)

where K is the cumulant function, the p0 -dimensional parameter of interest and  a nuisance parameter of dimension p ? p0 . For models belonging to the class (1), the modi ed pro le log-likelihood for is given by `MP ( ) = `P ( ) + 21 log jK ( ; ^ )j ; where `P is the pro le log-likelihood. It represents an approximation to the conditional log-likelihood function, `C , for based on the conditional distribution of T given S = s

`MP ( ) = `C ( ) + O(n?1 ) : When the parameter of interest is scalar, very accurate approximations for conditional tail probabilities can be obtained from the modi ed directed deviance r = r + NP + INF, that is Pr(T  tjS = s; ) =: (r ) ;

(2)

and the relative error in (2) is of order O(n?3=2 ). Here the NP and INF adjustments are given by NP = r?1 log() ;

INF = r?1 log(v=r) ;

(3)

where v is the signed Wald statistic for and (

jK( ^; ^)j  = jK  ( ; ^ )j

)1=2

:

In general, higher-order results are not valid for discrete models. The jumps in the distribution may interfere with the continuous approximations, and it is very dicult to quantify to which order that happens. For lattice

2 LOGISTIC AND LOGLINEAR MODELS

4

models, however, Pierce and Peters (1992, Section 4) discuss suitable continuity corrections. For instance, for computing Pr(T  tjS = s; ), v can be multiplied by the factor ^ h( ^ ? ) = exp( ^ ? ) ? 1 : ? The above approximations are available for logistic and loglinear models through the cond section of library HOA.

2.2 Example 1 Cox (1970, page 61) gives a data set in the form of matched pairs of binary observations concerning the crying of babies. The babies were observed on 18 days and on each day one child was lulled. Interest focuses on whether lulling has any e ect on crying. A binary logistic model is formulated, where the probability that in day j a treated individual does not cry (success) is given by exp( + j )=f1 + exp( + j )g and the corresponding probability for an untreated individual by exp(j )=f1 + exp(j )g. The parameter represents the treatment e ect, and 1 ; : : : ; 18 are 18 nuisance parameters. The data set is given in the data frame babies. > babies r1 r2 lull day 1

3

5

no

1

2

1

0

yes

1

3

2

4

no

2

4

1

0

yes

2

5

1

4

no

3

6

1

0

yes

3

7

1

5

no

4

8

0

1

yes

4

9

4

1

no

5

10

1

0

yes

5

....

The variables r2 and r1 represent the number of babies that respectively do and do not cry, lull is an index vector for treatment and day an 18-level factor with one level for each day. We start the analysis by tting a logistic model by means of the function glm.

2 LOGISTIC AND LOGLINEAR MODELS

5

> babies.glm babies.sum print( coef(babies.sum), digits=4 ) Value

Std. Error

t value

day1

-0.37540

0.6895

-0.5444

day2

-0.48934

0.7921

-0.6178

day3

-0.96027

0.9097

-1.0556

day4

-2.09210

1.1282

-1.8543

day5

1.45371

1.1066

1.3137

day6

-0.12757

0.6463

-0.1974

day7

0.57454

0.7175

0.8007

day8

0.08964

0.6849

0.1309

day9

0.51106

0.8856

0.5771

day10

1.29275

0.7963

1.6235

day11

1.66236

1.0879

1.5280

day12

2.11144

1.0575

1.9966

day13

0.57454

0.7175

0.8007

day14

1.45371

1.1066

1.3137

day15

0.76886

0.8500

0.9045

day16

1.98279

1.0650

1.8617

day17

0.11693

0.7836

0.1492

day18

0.57454

0.7175

0.8007

lull

1.43237

0.7338

1.9521

Inference on lull is dicult because of the 18 nuisance parameters 1 , : : : , 18 , which are estimated using only 2 binomial observations. The procedure glm.cond makes several higher-order results available that eliminate the nuisance parameters. > babies.cond plot( babies.cond ) Make a plot selection (or 0 to exit) 1: plot: All 2: plot: Profile and modified profile log-likelihoods 3: plot: Profile and modified profile likelihood ratio tests 4: plot: Directed and modified directed deviances 5: plot: Modified and continuity corrected directed deviances 6: plot: Lugannani-Rice tail approximations 7: plot: Confidence intervals 8: plot: Diagnostics based on INF/NP decomposition Selection:

Figure 1 shows some examples of graphical output. As shown in the leftmost panel, there is a clear di erence between the pro le log-likelihood and the approximate conditional log-likelihood for the parameter of interest , and consequently between the corresponding estimates. Note that, in the same gure, the approximate conditional and exact conditional loglikelihood functions agree to within drawing accuracy (see Davison, 1988, Example 6.1). Figure 1(b) shows the directed and modi ed directed deviance statistics, r and r . The dotted horizontal lines correspond to respectively the 2.5% and 97.5% quantiles of the standard normal distribution and are used to read o the 95% con dence interval. The dashed straight lines represent the normal approximation to the Wald statistics based on either the unconditional or conditional maximum likelihood estimates. The numerical values of several 95% con dence intervals are provided by summary. > print( summary(babies.cond), digits=4 ) COEFFICIENTS Value

Std. Error

uncond.

1.432

0.7338

cond.

1.277

0.6952

CONFIDENCE INTERVALS level = 95 % MLE normal approximation

two-sided lower

upper

-0.005767

2.871

-0.4

3.0

0.062

-0.8 -1.2 -1.6 -2.0

INF

0.076

Directed deviance

2.0

0.049

1.0

0.036 -1.0 1.5 3.5 Coefficient of lull

0.0 -1.0 -2.0

-0.04

-2.4

-3.0

-0.12

-2.8

-4.0 0.0

0.6

1.2 1.8 2.4 Coefficient of lull

3.0

(a) Comparison of log-likelihoods: solid line, pro le log-likelihood; bold solid line, exact and approximate conditional log-likelihoods.

NP

Log-likelihood

4.0

-0.20 -1.0

0.0 1.0 2.0 3.0 Coefficient of lull

4.0

(b) Comparison of test statistics: solid line, directed deviance; bold solid line, modi ed directed deviance; dashed line, unconditional and conditional maximum likelihood estimate normal approximations.

-0.28 -1.0 1.5 3.5 Coefficient of lull

2 LOGISTIC AND LOGLINEAR MODELS

0.0

(c) Information and nuisance parameter adjustments.

Figure 1: Crying babies data: examples of graphical output obtained with the plot.cond routine of the cond library section. 7

2 LOGISTIC AND LOGLINEAR MODELS CMLE normal approximation Directed deviance Modified directed deviance Modified directed deviance (cont. corr.)

8 -0.08543

2.640

0.1228

3.086

0.006970

2.756

-0.1551

3.099

DIAGNOSTICS: INF

NP

0.0761 0.286 Approximation based on 20 points

The rst three lines of the summary output refer to large-sample statistics such as r or the Wald statistic, whereas the last two lines give higher-order con dence intervals, including the continuity corrected version of r . The di erent statistics do not agree on whether to include the value zero. The results are inconclusive as to what concerns the e ectiveness of lulling. The information and nuisance parameter aspects are given in Figure 1(c). The absolute value of the NP correction term exceeds the limiting value of 0:2 given in Pierce and Peters (1992), suggesting a need for small-sample results. Note that 20 points have been used for the computation of the several statistics over a certain range of values for which is automatically determined by glm.cond. The number of points and the width of the range can be changed by the user. See the corresponding S-PLUS help le.

2.3 Example 2 Davison and Hinkley (1997, Example 7.8) consider a set of binary data on the presence of calcium oxalate crystals in 79 samples of urine. Explanatory variables are speci c gravity, i.e. the density of urine relative to water, pH, osmolarity (mOsm), conductivity (mMho milliMho), urea concentration (millimoles per litre), and calcium concentration (millimoles per litre). Two incomplete cases are dropped. The remaining 77 observations form the data frame urine of library cond. A question of theoretical and practical interest in asymptotic inference, which has still to be solved, refers to when the sample size, or more generally the amount of information contained in the data, is suciently large that it no longer justi es the recourse to higher-order solutions. For data sets similar to the crying babies data approximate conditional inference is imperative, because of the small amount of information on the parameter of

2 LOGISTIC AND LOGLINEAR MODELS

9

interest provided by each stratum. The urine data consist of 77 observations for 7 parameters, which makes 11 observations per parameter. If compared to the 36 observations for 19 parameters of Example 1, one might guess that the gap between large-sample and small-sample solutions will be much less pronounced. We may clarify this doubt by using the routines provided by the cond section of library HOA. We start the analysis by tting a logistic model. > urine.glm print( coef(urine.glm), digits=4 ) (Intercept)

gravity

ph

osmo

cond

urea

calc

-355.2

355.8

-0.4956

0.01681

-0.4327

-0.03200

0.7834

At rst, we consider as parameter of interest the coecient associated with the urea concentration. The corresponding 95% con dence intervals based upon di erent rst-order and higher-order statistics can be obtained by means of the glm.cond routine. > urine.cond print( coef(urine.cond), digits=4 ) Value

Std. Error

uncond.

-0.03200

0.01601

cond.

-0.02772

0.01476

> print( summary(urine.cond, coef=F), digits=4 ) CONFIDENCE INTERVALS level = 95 %

two-sided lower

upper

MLE normal approximation

-0.06338

-0.0006223

CMLE normal approximation

-0.05664

0.001205

Directed deviance

-0.06677

-0.002457

Modified directed deviance

-0.05822

0.0002479

Modified directed deviance (cont. corr.)

-0.05838

0.0003479

DIAGNOSTICS: INF

NP

0.0536 0.409

Further, we consider as parameter of interest the coecient associated with the calcium concentration.

2 LOGISTIC AND LOGLINEAR MODELS

10

> urine.cond print( coef(urine.cond), digits=4 ) Value

Std. Error

uncond.

0.7834

0.2380

cond.

0.7119

0.2289

> print( summary(urine.cond, coef=F), digits=4 ) CONFIDENCE INTERVALS level = 95 %

two-sided lower

upper

MLE normal approximation

0.3169

1.250

CMLE normal approximation

0.2631

1.160

Directed deviance

0.3815

1.342

Modified directed deviance

0.3224

1.208

Modified directed deviance (cont. corr.)

0.3073

1.249

DIAGNOSTICS: INF

NP

0.0899 0.338

Figure 2 plots the pro le log-likelihoods and the approximate conditional log-likelihoods for the two conditional ts of the data, obtained with the plot method. > plot( urine.cond, which=2 )

Di erently from what we might have expected, the NP correction term is large in both cases, indicating the need for higher-order solutions. The main lesson learnt from this example is that binary data seem to provide less information on the parameter of interest than continuous data. This could be explained by the loss of information entailed by dichotomizing the response. The continuous variable giving for instance the concentration of calcium oxalate crystals in the 77 samples of urine, would no doubt provide more information than does the binary variable on their presence or absence.

0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.2 -1.4 -1.6 -1.8 -2.0 -2.2 -2.4 -2.6 -2.8 -3.0 -0.07 -0.06 -0.04 -0.02 -0.01 Coefficient of urea

11

Log-likelihood

Log-likelihood

3 REGRESSION-SCALE MODELS

0.01

0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.2 -1.4 -1.6 -1.8 -2.0 -2.2 -2.4 -2.6 -2.8 -3.0 0.3

0.5

0.7 0.9 1.1 Coefficient of calc

1.3

Figure 2: Urine data. Comparison of log-likelihoods: pro le log-likelihood (solid line), approximate conditional log-likelihood (bold line). The variables of interest are urea (left panel) and calcium concentration (right panel). The graphical output is obtained with the plot.cond routine of the cond library section.

3 Approximate conditional inference for regressionscale models 3.1 Background results Regression-scale models belong to the wider class of transformation models (see Pace and Salvan, 1997, Chapter 7). A regression-scale model has the form

y = X + "; where X is a xed n  p matrix with unknown regression coecient 2 IRp,  > 0 is a scale parameter, and " represents an n-dimensional vector of errors. We suppose that the errors "1 ; : : : ; "n are independent and identically distributed according to a known though not necessarily normal density p0 () on IR. For the ith response we write yi = xTi + "i , where xTi is the ith row of X . Given a sample y = (y1 ; : : : ; yn ), the log-likelihood for and  is

`( ; ) = ?n log  ? where g0 () = ? log p0 ().

n X i=1





g0 (yi ? xTi )= ;

3 REGRESSION-SCALE MODELS

12

^ ^ ) exist and are nite, it is If the maximum likelihood estimates ( ; possible to make a one-to-one change of variable from y = (y1 ; : : : ; yn ) to ^ ^ ; a), where ai = (yi ? xTi ^)=^ , i = 1; : : : ; n, represent the standardized ( ; residuals of the model. Note that only n ? p ? 1 of the ai are functionally independent. It is possible to show that i) the con guration statistic a is maximal distribution constant, and is an ancillary for and . ^ ^ ; a1 ; : : : ; an?p?1 ) is ii) the joint density of ( ; n n?p?1 Y

^ ^ ; a; ; ) = c(a) ^ n p( ; 

i=1

h

i

p0 fxTi ( ^ ? ) + ^ ai g= jX T X j1=2 ;

where an?p; : : : ; an are expressed as a function of (a1 ; : : : ; an?p?1 ). The function c(a) only depends on n and on (a1 ; : : : ; an?p?1 ). iii) given a, the pivot



(Q1 ; Q2 ) = ( ^ ? )=; ^ = has density

pQ1 ;Q2 jA=a (q1; q2 ja) = c(a)q2n?1

n Y i=1







p0 (xTi q1 + ai)q2 jX T X j1=2 ; (4)

where c(a) is a normalizing constant. ^ ^ ) forms a transformation variable, whereas the In other words, the pair ( ; sample con guration a = (a1 ; : : : ; an ) is ancillary with respect to and . >From general theory on transformation models it follows that inference on the parameters ( ; ) should be made conditionally on the observed value A = a. The two parameter and  play quite di erent roles and it may be worthwhile considering them separately. As shown by Fraser (1979, Section 6.1.5), the functionally unique separation of and  is obtained from the pivots Q1 and Q2 , whose joint distribution is given by (4). Conditional con dence intervals for any single parameters, say l , l = 1; : : : ; p, or , are based on the marginal density of the related pivot obtained by integrating out the remaining components in (4). Such a task, however, can be very hard to do in practice. Higher-order likelihood asymptotics apply rather naturally to this context and allow us to by-pass high-dimensional numerical integration. Note

3 REGRESSION-SCALE MODELS

13

that higher-order theory can be applied in two di erent ways. The results proposed by DiCiccio, Field and Fraser (1990), DiCiccio and Field (1991) and Fraser, Lee and Reid (1990) can be used to approximate the marginal distribution of a component of Q1 or Q2 . Instead, the general theory presented in the rst part of this short course focuses on the maximum likelihood estimator and its conditional distribution given the con guration ancillary a. This approach is motivated in the setting of regression-scale models by the fact that the p -formula is exact for transformation models. The section marg of the library HOA contains routines for both approaches.

3.2 Example 3 Sen and Srivastava (1990, page 32) consider a data set on house prices. Among the variables examined are the selling price in thousands of dollars (price), the number of bedrooms (bdroom), the oor space in square feet (floor), the total number of rooms (rooms) and the front footage of lot in feet (front). Let the data frame houses contain all variables. Figure 3 shows plots of the response variable against each of the covariates. The model is pricei

= 0 + 1 bdroomi + 2 floori + 3 roomsi + 4 fronti + "i ;

where "i is taken to be standard Student's t with 5 degrees of freedom, to allow for longer tails and for extreme values. The model can be tted by means of the rsm tting routine, provided by the marg section of library HOA. > houses.rsm print( summary(houses.rsm, corr=F), digits=4 ) Coefficients: Value

Std. Error

t value

p-value

(Intercept)

15.435

7.860

1.96

0.050

bdroom

-7.232

2.755

-2.62

0.009

3 REGRESSION-SCALE MODELS





3 4 5 6 7 number of bedrooms •

80 price

70 60 50



40 4



• • • • • •

• • • •

• • • •

8





60 • • 40



6 8 10 number of rooms



70

50

12





80

• • • •



1000 1500 2000 floor space

price

2

• • •

• • •• • • • • 60 • • •• • 50 • • • • •• •• 40 • • 70

price

• 60 • •• 50 • • 40



80

• • • • •

• • • ••

70 price





80

14

•• • • • 25

• • • • •

• •

• • •





30 35 40 45 front footage

50

Figure 3: Scatter plots of house price vs each covariate.

floor

0.017

0.004

4.21

0.000

rooms

5.537

2.191

2.53

0.011

front

0.270

0.178

1.52

0.129

Scale parameter for student family taken to be: 6.095

( 1.043 )

Degrees of Freedom: 26 Total; 20 Residual -2*Log-Likelihood 126.323

Graphical inspection of the data using the rsm.diag.plots plotting routine shows that the model ts the data reasonably well. The leftmost panel of Figure 4(a) plots the deviance residuals against the tted values. Figure 4(b) gives the normal QQplot of the r -type residuals. More statistics, such as in uence measures and leverages, are provided by rsm.diag and rsm.diag.plots. Let us rst focus on the coecient 4 associated with the variable of interest front. The corresponding p-value for testing if 4 = 0, provided

3 REGRESSION-SCALE MODELS

15

3

2 •



• •

• •

1



• 0 -1

• ••

• • • • •• • • • •



• • •

-2 •

Ordered r* residuals

Deviance residuals

2

•• •



••

1 •• 0

••

•••• ••• • • •

-1

• •• •

-2 •

-3 50

60 70 Linear predictor

80

(a) Deviance residuals against tted values.

-2 -1 0 1 2 Quantiles of standard normal

(b) Normal QQplot of r-type residuals.

Figure 4: House prices data: examples of graphical output obtained with the rsm.diag.plots routine of the marg library section. by the summary of the baseline t, suggests that the coecient is not signi cantly di erent from zero at a 10% level. However, this result is based on the large-sample normal approximation to the distribution of the Wald statistic and might not be very accurate. The marg section of library HOA allows one to calculate, at the same time, several large- and small-sample test statistics and the corresponding tail probabilities, thus providing a means to better asses the signi cance of the above result in the context of approximate conditional inference. Let us save in houses.marg the t obtained by applying rsm.marg to our linear regression model with o set variable equal to front. In summary we use the argument test to denote we wish to test an hypothesis. > houses.marg print( summary(houses.marg, test=0), digits=4) COEFFICIENTS Value

Std. Error

uncond.

0.2700

0.1777

cond.

0.2483

0.2048

0.2503

0.1936

marg.



HYPOTHESIS TESTING

3 REGRESSION-SCALE MODELS hypothesis :

16

coef( front ) = 0 statistic

tail prob.

MLE normal approx.

1.52

0.064

Cond. MLE normal approx.

1.21

0.113

Marg. MLE normal approx.

1.29

0.098

Directed deviance

1.51

0.066

Modified directed deviance

1.27

0.102

Marginal directed deviance

1.27

0.102

DIAGNOSTICS: INF

NP

0.0728 0.445 Approximation based on 20 points > plot(houses.marg)

Note that though the null hypothesis still cannot be rejected, the p-values that we obtain from these tail probabilities di er considerably. Figure 5(a) compares the pro le log-likelihood (solid line) with the modi ed pro le loglikelihood (bold line) and the approximate marginal log-likelihood (bold dashed line). There is a striking closeness between the results obtained by the \modi ed" approach (based on the likelihood combinants) and the \marginal" approach (based on the pivot Q1 ). Let us now repeat the same analysis for the scale parameter, for which we want to compute con dence intervals. As before houses.marg contains the approximate marginal t. > houses.marg print( summary(houses.marg), digits=4 ) COEFFICIENTS Value uncond. marg.

Std. Error

6.095

1.043

6.794

1.245

CONFIDENCE INTERVALS level = 95 % lower MLE normal approx.

4.358

two-sided

upper 8.523

-0.4

-0.8

-0.8

-1.2 -1.6 -2.0

-1.2 -1.6 -2.0

-2.4

-2.4

-2.8

-2.8

-0.20

0.0 0.15 0.35 0.55 Coefficient of front

0.75

(a) Comparison of log-likelihoods for coecient 4 : solid line, pro le loglikelihood; bold solid line, modi ed loglikelihood; bold dashed line, approximate marginal log-likelihood.

Directed deviance

-0.4 Log-likelihood

Log-likelihood

0.0

1.50

1.65

1.80 1.95 log(scale)

2.10

2.25

(b) Comparison of log-likelihoods for the scale parameter: solid line, pro le log-likelihood; bold solid line, modi ed/approximate marginal loglikelihood.

3 REGRESSION-SCALE MODELS

0.0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 -2.5 -3.0 -3.5 -4.0 1.2

1.4

1.6

1.8 2.0 log(scale)

2.2

2.4

(c) Comparison of test statistics for the scale parameter: solid line, directed deviance; bold solid line, modi ed directed deviance; dashed line, joint and marginal maximum likelihood estimate normal approximations.

Figure 5: Examples of graphical output obtained with the HOA library section marg for the house prices data. 17

4 NONLINEAR HETEROSCEDASTIC MODELS

18

Marg. MLE normal approx.

4.743

9.730

Directed deviance

4.410

8.668

Marginal directed deviance

4.837

10.022

DIAGNOSTICS: INF

NP

0.0594 0.713 Approximation based on 20 points > plot( houses.marg )

All results are calculated on the logarithmic scale. If the parameter of interest is the scale parameter, the statistics obtained by approximating the marginal distribution of the pivot Q2 agree with those obtained from the likelihood combinants. The results, however, are displayed in the original scale. The approximate marginal log-likelihood is given in Figure 5(b), whereas Figure 5(c) compares the directed deviance (solid line) and the marginal directed deviance statistics (bold line). There is a clear di erence between pro le and marginal results. The INF/NP decomposition of r suggests a need for small-sample results, though no reference values are available as is the case for linear exponential families.

4 Higher-order inference in nonlinear regression models 4.1 Background results Nonlinear models are widely used in applied statistics. Here we consider the general form

yij = (xi ; ) + "ij ; i = 1; : : : ; q; j = 1; : : : ; mi;

(5)

where q is the number of design points, mi the number of replicates at design point xi , yij represents the response of the j th experimental unit in the ith group, and the errors "ij are independent N (0; i2 ) variates. The mean response is given by the nonlinear function (x; ), called the mean function, which depends on an unknown regression coecient . The de nition of the model is completed by assuming i = g(xi ; ; ), where g2 () is the variance

4 NONLINEAR HETEROSCEDASTIC MODELS

19

function, and 2 and  are variance parameters. If g2 () is constant, we obtain as special case the homoscedastic nonlinear regression model. The log-likelihood function corresponding to model (5) is

`( ; ) = ?

q X mi i=1

q X mi

q X mi X fyij ? (xi; )g2

2 log  ? i=1 2 log g (xi ; ; ) ? i=1 j =1 22 g2 (xi ; ; ) : 2

2

As can easily be seen, the maximum likelihood estimate of 2 is available in closed form and equal to the residual average sum of squares. Maximisation with respect to and  is usually accomplished via an iterative two-steps procedure, where one maximises alternatively with respect to and to . Inference on the regression coecients and the variance parameters is commonly based on large-sample results and linearisation techniques (Seber and Wild, 1989, Chapter 5). Among the most useful tools are graphical summaries of the t, such as pro le plots, pro le traces and contour plots (Bates and Watts, 1988, Section 6.1). Nonlinearity of the mean function and variance heterogeneity can lead to substantial inaccuracies if the sample size is small or moderate. The estimators of and of 2 and  are asymptotically correlated, unless the variance function is of the form g2 (x; ), that is free of the regression coecients. The problem is mostly present when the parameter of interest is one of or part of the variance parameters. Inference must then account for the estimation of , otherwise a substantial bias can occur. A means to overcome these drawbacks is to resort to higher-order methods, that are generally applicable such as the versions proposed by Skovgaard (1996, 1999) and Severini (1998, 1999). In particular, the modi ed directed likelihood r can be used for computing con dence intervals for all the parameters. Improved estimates of variance parameters can be obtained by maximisation of the modi ed pro le likelihood, which can be seen as the natural extension of the REML likelihood to the nonlinear setting. Regression diagnostics are very important to evaluate the tted model. They include inspection of the residuals, calculation of exact and approximate leverages, and Cook's-distance-type in uence measures (Cook and Weisberg, 1982). Outlier detection can be based on the mean shift outlier model. If the j -th case is suspected as being an outlier, we consider

yi = (xi ; ) +  Ij + g(xi ; ; ) "i ;

i = 1; : : : ; n ;

where Ij is the dummy variable selecting the j -th case. We can de ne the

4 NONLINEAR HETEROSCEDASTIC MODELS

20

deletion residuals rj as the signed likelihood ratio test for  = 0, while the signed score test de nes a Pearson-type residual rPj . Higher-order asymptotics can be used to improve the normality of rj by de ning an r -type deletion residual, that is the r statistic to test  = 0. The above methods are available through the nlreg section of the S-PLUS library HOA.

4.2 Example 4 A simple example of nonlinear regression model is discussed in Davison and Hinkley (1997, Example 7.7). The data concern the calcium uptake of cells, y, as a function of time, x, after being suspended in a solution of radioactive calcium. The data are available in the calcium data frame of library boot (Davison and Hinkley, 1997, Chapter 11). The mean function is

(xi ; ) = 0 f1 ? exp(? 1 x)g ; 0 and 1 are unknown regression coecients, and the error term "i  N (0; 2 ) in (5) follows a centered normal distribution with unknown variance 2 . The data are plotted in the leftmost panel of Figure 6 together with the

tted response curve. To t the model by maximum likelihood we can use the nlreg tting routine of the nlreg section of library HOA. This function emulates the inbuilt S-PLUS function nls for nonlinear least squares estimation. > calcium.nl print( summary(calcium.nl, corr=F), digits=4 ) Coefficients: Value

Std. Error

t value

p-value

b0

4.309

0.290

14.86

0.0

b1

0.208

0.038

5.55

0.0

Variance parameters: logs

Value

Std. Error

-1.286

0.272

Total number of observations: 27 Number of parameters: 3 -2*Log-Likelihood -7.713

4 NONLINEAR HETEROSCEDASTIC MODELS Estimate 0 Wald (3:741; 4:878) r (3:806; 5:066)  r (3:773; 5:108) student. (3:781; 5:084) BCa (3:827; 5:040)

1 (0:135; 0:282) (0:140; 0:296) (0:137; 0:302) (0:134; 0:296) (0:141; 0:292)

21

2 (0:129; 0:424) (0:169; 0:496) (0:183; 0:568) (0:190; 0:555) (0:171; 0:346)

Table 1: Calcium uptake data. 95% con dence intervals obtained from the Wald statistic, the directed deviance and the modi ed directed deviance statistics. The last two lines correspond to calibrated parametric bootstrap con dence intervals (999  249 replicates). The variability of the maximum likelihood estimates can be assessed by plotting the pro les of the Wald, the r and the r statistics. This task is performed by the profile.nlreg routine for a single parameter of interest, and by all.profiles for all parameters simultaneously. > calcium.prof plot( calcium.prof, nframe=c(1,3) )

As we can deduce from Figure 6(b), the gap between rst-order and higherorder methods is most signi cant for the variance parameter 2 . This can be explained by the fact that the modi ed directed deviance corrects for most of the bias present in estimating 2 . Table 1 gives the 95% con dence intervals obtained from the Wald statistic, the directed deviance and the modi ed directed deviance statistics. They correspond to the output obtained from the summary.all.profiles method. > print( summary(calcium.prof), digits=4 ) Two-sided confidence intervals for b0 lower

upper

r* (0.95)

3.773

5.108

r (0.95)

3.806

5.066

Wald (0.95)

3.741

4.878

lower

upper

0.137

0.302

b1 r* (0.95)

calcium uptake (nmoles/mg)



4 3

• •

•• •

• • •

• • •

• •

• ••

2

• • •• 1 •• • • 0 • 0 2 4 6 8 10 12 14 time (minutes)

3

3

3

2

2

2

1

1

1

0

0

0

-1

-1

-1

-2

-2

-2

-3

-3 4.0

4.5 β0

5.0

-3 0.15 0.20 0.25 0.30 β1

-1.5

-1.0 log σ 2

-0.5

(b) Enhanced pro le plots for the parameters 0 , 1 and log 2 : dotted line, directed deviance; bold solid line, modi ed directed deviance; solid line, maximum likelihood estimate normal approximation.

(a) Calcium uptake data and tted curve.

4 NONLINEAR HETEROSCEDASTIC MODELS

5

Figure 6: Calcium uptake data: tted curve and enhanced pro le plots obtained with, respectively, the nlreg tting routine and the plot.all.profiles method of the nlreg library section.

22

4 NONLINEAR HETEROSCEDASTIC MODELS r (0.95)

0.140

0.296

Wald (0.95)

0.135

0.282

lower

upper

23

logs r* (0.95)

-1.696

-0.567

r (0.95)

-1.776

-0.700

Wald (0.95)

-1.819

-0.752

Computer-intensive methods such as the bootstrap are often preferred to higher-order solutions on the grounds that they are easier to implement. The last two lines of Table 1 give the parametric bootstrap con dence intervals for the calibrated studentized and the BCa method (999  249 replicates). In this examples both methods compete, from both the numerical and the computational viewpoint. Most of the execution time in profiles.nlreg is taken by the computation of the rst and second derivatives of the mean function, whereas in the second case we have to perform 999  249 maximizations. However, if the model is more complex, higher-order solutions outplay the bootstrap. The profile method can also be used to compute con dence intervals for a scalar function of the parameters of interest. Davison and Hinkley (1997, page 356) study the \proportion of maximum"  = 1 ? exp(? 1 x0 ) at speci c time points x0 . All what we have to do is to reformulate the model by changing the parameterization to (; 0 ). Con dence intervals for x0 = 5 are computed as follows. > x0 calcium.x0 print( summary(profile(calcium.x0, offset=p)), digits=4 ) Two-sided confidence intervals for p lower

upper

r*

(0.95)

0.4955

0.7790

r

(0.95)

0.5037

0.7728

Wald

(0.95)

0.5175

0.7773

13 points calculated exactly 50 points used in spline interpolation

4 NONLINEAR HETEROSCEDASTIC MODELS

24

INF: 0.0216 NP: 0.133

Note that, in order to avoid numerical problems, it is preferable to work on the logit scale for larger values of x0 , that is to use the parameterization  = logf=(1 ? )g. The nlreg section of library HOA implements a large set of regression diagnostics. They are accessed through the nlreg.diag.plots plotting routine. > nlreg.diag.plots( calcium.nl ) Make a plot selection (or 0 to exit) 1: plot: Summary 2: plot: Studentized residuals against fitted values 3: plot: r* residuals against fitted values 4: plot: Normal QQplot of studentized residuals 5: plot: Normal QQplot of r* residuals 6: plot: Cook statistic against h/(1-h) 7: plot: Global influence against h/(1-h) 8: plot: Cook statistic against observation number 9: plot: Influence measures against observation number Selection

Figure 7 shows the output of option 1. There is a moderate indication of increasing variability at higher time points. For illustrative purposes, we t a model with the power-of-X variance function g2 (x; ; 2 ) = 2 (1+ x ). As we have seen before, if the variance is constant, we may either use the nlreg routine of library HOA or the in-built S-PLUS routine nls to t the model: both functions yield the same estimates. However, nls no longer applies, if the errors are heteroscedastic and the heteroscedasticity is modelled through a parametric variance function. The nlreg routine has speci cally been designed to handle this case. It includes a weights argument that de nes the variance function of the nonlinear model through an S-PLUS formula. Note that the parameter 2 has to be omitted from the formula, as it is included by default in the weights argument. > calcium.nl.g