odel Selection, Transformations and Variance

odel Selection, Transformations and Variance stimation in onlinear egression Olaf Bunke 1 , B e r n d Droge 1 and J ö r g Polzehl 2 Institut für Mathematik, Humboldt-Universität zu B e l i P S F 1297, D-10099 Berlin, Germany 2

K o n r a d - Z u s - Z e n t r u m für Informationstechnik Heilbrunne

t

10 D-10711 B e l i n

Gemany

Abstract The results of analyzing experimental d a t a using a parametric model may heavily depend on the chosen model. In this paper we propose procedures for the ade quate selection of nonlinear r g r i o n models if the i n t n d e d use of the model i among the following: 1. prediction of future values of the response variable, 2. estimation of the un known regression function, 3. calibration or 4. estimation of some parameter with a certain meaning in the corresponding field of application. Moreover, we propose procedures for variance modelling and for selecting an appropriate nonlinear trans formation of the observations which may lead to an improved accuracy. We show how to assess the a c c u r c y of the parameter estimators by a "moment oriented bootstrap procedure". This procedure may also be used for the construction of confidence, prediction and calibration intervals. Programs written in Splus which realize our strategy for nonlinear regression modelling and parameter estimation are described as well. The performance of the selected model is d i u s d , and the behaviour of the p r o c d u r s is i l l u s t t d by exampl K e y words: Nonlinear regression, m o d e l selection, b o o t s t r a p , c r o s s v a l i d a t i o n , variable t r a n s f o r m a t i o n , variance modelling, calibration, m e a n s q u a r e d error for p r e d i c t i o n , c o m p u t i n g in nonlinear regression. A M S 1991 subject classifications: 62J99, 62J02, 6 2 P 0

Selection of r e g r e s i o n 1.1

odel

reliminary discussion

In many papers and books it is discussed how to analyse experimental data estimating the parameters in a linear or nonlinear regression model, see e g . Bunke and Bunke [2], [3], Seber and Wild [15] or Huet, Bouvier, Gruet and Jolivet [11]. The usual situation is that it is not known, if a certain regression model describes the unknown true regression function sufficiently well. The results of the statistical analysis may depend heavily on the chosen model. Therefore there should be a careful selection of the model, based on the scientific and practical experience in the corresponding field of application and on statistical procedures. Moreover, a nonlinear transformation of the observations or a appropriate model of the variance structure can lead to an improved accuracy In this paper we will propose procedures for the adequate selection of regression models. This includes choosing suitable variance models and transformations. Additionally we present methods to assess the accuracy of estimates, calibrations and predictions based on the selected model. The general framework of this paper will be described now. We consider possibly replicated observations of a response variable Y at fixed values of exploratory variables or nonrandom design points) Yij = fxi)

which follow the model

+ eij

i =

k

j =

nt

J2nt

= n.

l

In (1.1) the regression function / is unknown and realvalued and the e^ are assumed to be uncorrelated random errors with zero mean and positive variances af =

a(xi)

The usual assumption of a homogeneous error variance erf = a is unrealistic in many applications and one is often confronted with heteroscedasticity problems. The analysis of the data requires in general to estimate the regression function which describes the dependence of the response variable on the explanatory variables. This is usually done by assuming that this dependence may be described by a parametric model

(*) I

tfee},

(2)

where the function /(.,i?) is known up to a p-dimensional parameter i ) G 0 C R P , SO that the problem reduces to estimate this parameter using the data. As an estimate of the parameter we will use the ordinary least squares estimator (OLSE) i?, which is the minimizer of the sum of squares

W=£ £ t f - t f ) ) l

l

3)

with r e p e c t to i? . In practice this is the m o t p o r

a

h

to e i m a e •&. Weig

SE will be discussed later in Section 2. To simplify the presentation in what follows, we will preliminary restrict ourselves to the case of realvalued design points, that is, we will assume to have only one explanatory variable. Note that the model (12) is called linear if it depends linearly on i?; otherwise it is called nonlinear. Our focus will be on the latter case. A starting point in an approach to model selection is the idea, that even if a certain regression model is believed to be convenient for given experimental data, either because of theoretical reasonings or based on past experience with similar data, there is seldom sure evidence on the validity of a model of the form (1.2). Therefore a possible modification of the model could lead to a better fit or to more accurate estimates of the interesting parameters. A parameter with a certain physical or biological meaning may often be represented in different regression models as a function of the corresponding parameters. This will be the case if the parameter 7 of interest is a function

7= 7[/)

/

of the values of the regression function at the values x\, are one of these values e g . 7 = / z i )

4) , Xk of its argument

xamples

the linear slope or growth rate)

î/ÔK--^

with

x = \j^

or the approximated area under the curve fci

7=

Z K + O + fxi))xi+l

x

i)

6

)

1

which is used to characterize rate and extent of drug absorption in pharmacokinetic studies. Alternatively, the theoretical reasonings as well as experiences may lead to several models, which preliminary seem to be equally adequate, so that the ultimate choice of a specific model among them is left open for the analysis of the data

1.2

Examples

The following examples illustrate the above discussion by describing situations with different objectives in analyzing the data. E x a m p l e 1 [Pasture regrowth). Ratkowsky [12] considered data (see Table 1) describing the dependence of the pasture regrowth yield on the time since the last grazing The objective of the analysis is to model the dependence of the pasture regrowth yield on the time since the last grazing. A careful investigation of a graph of the data (see Figure

) suggested, that the process produces a sigmoidally shaped curve as it is provided

Tale

: D a a of y i e d of p a e

Time after pasture Yield

42

28 8

Tale 2

r w t h v u s time

08

57

22

859

63

611

m of squared errors for alternative s i g m o i d l m o l s in model

8) 0

i*)

79 6462

08

mple

0) 242

46

080

y the W e i b l l mo x,#) =

#1exv((#3x**))

But there are other models of sigmoidal form, which could as well fit data from growth experiments e g x,#) = #1exv[#2/x

+ #3)]

or a;,i?) = i ? 1 e x p [ e x p i ? 3 a ; )

Figure

: Pasture regrowth:

bserved yield and fitted regression curves .

time after pasture

8)

( t p a a m e

G o m p e z mo

or M ) = #i + 7 7 — r i — r - T °) + exp X) fourparameter logistic model). A listing and a description of even more alternative sigmoidal nonlinear models is given in Ratkowsky [13] (see also Seber and

ild [15] or Ross [14]). The regression

curves corresponding to the alternative models (1.8), (1.9) and (1.10) fit differently well to the observations, as may be seen by the corresponding values of the sum of squared errors (13) in Table 2 or by the plot of the fitted models in Figure 1. Notice that the fitted curves of model ( ) and (110) are hardly distinguishable in Figure 1. E x a m p l e 2 (Radioimmunological assay of Cortiol: Calibtion) In calibration Tale 3 Daa fr t e c l i b i o n p l e Dose in ng/

ml

f a

fC

Response in counts per minutes 2868

002

2615

004

24

28 2651

280 06

15

230

008

11

20

0

862

19

02

364

0

91

06

02

08

86

75

24 22

20

919

88

24

23

006

779

2030

800

71

77

304

55

75 62

59 99

30

51

43

33 42

26 34

33

experiments one first t k e s a training (calibration) sample and fits a model to the data providing a calibration curve. However, in contrast to Example 1 the real interest lies here in estimating an unknown value of the explanatory variable corresponding to a value of the response variable, which is independent of the training sample and may easily be measured. This is usually done by inverting the calibration curve. We will illustrate the procedure by considering the radioimmunological assay (RIA of Cortisol. The corresponding data from the laboratoire de physiologie de la lactation

(INRA) are reported in Huet, B o u i e r , Gruet and Jolivet [ ] and are r e p r o u c e d in Table 3. The response variable is the quantity of a link complex of antibody and radioactive hormone while the explanatory variable is the dose of some hormone. In practice the Richards generalized logistic) model

=t is use

+ e x

3

)Y

-

for describing the dependence between the response and the logarithm of th

dose. Since the aim of the experiment is to estimate an unknown dose of hormone for a value y0 of the response, we have to invert the fitted calibration curve in y0 providing

%o) =

H o

-

In

2) i?!,. ., i?5 denote the least squares estimates of the parameters of model (1.11)

and

y0) is the estimated log-dose associated with the response y0

Of course, for

mula (1.12) can only be applied if y0 belongs to the range of model (1.11), that is, if J/o € (t?i, i?i + t?2)- If J/o lies outside of this range then we have to modify the procedure e.g. by taking the minimum or maximum value of the observed dose in dependence of y0 being larger or smaller than all possible values of the model function ((x}$)

in case that

is monotonously decreasing)

E x a m p l e 3 (Length versus age for dugongs: Estimation

of growth rate). In many

agricultural but also biological applications, growth curves are studied which for large values of the explanatory variable approach an asymptote (similarly to the sigmoidal curves), but lack an inflection point. Tale age

We have reinvestigated a corresponding data

th v r s u s a e of d

length

age

length

age

length

80

0

226

00

2

15

8

0

2

20

2

6

8

80

2

20

232

77

8

219

30

2

226

30

2

4

2

2

202

40

22

240

215

2

0

0

15

26

age

length 2 26

0 22

2 2

0

2 2

set dscribing the length versus the age of dugongs (see Ratkowsky [12], p. 101, and Table 4), which had been examined by Ratkowsky [ 2 ] using the so-called asymptotic regression model with various parameterizations) x,-&) =

ti1

3)

In contrast to Ratkowsky we will suppose, however, that the objective of the analysis is to determine the growth rate 7 (given by ( 1 ) ) of the dugongs. An unbiased estimate of 7 may be calculated without a model: 7 ^ 1 , Yi := 1 / ni Y^jLi Yij,

i = 1,..,&.

., Yjt] = 0.02896, where

The use of a parametric model could lead to a

more accurate estimate of 7, if the model gives a good approximation to the unknown true regression function.

1.3

criterion for the

odel performance

The model fit may be apparently good as in the

xample

) visually from a graphical representation of the estimated regression curve together with the observations or 2) from the numerical values of the sum of squares

$)

The fit may be improved (even up to a vanishing S($)\)

taking models with a large

number of parameters. But it is intuitively clear, that such "overparametrized" models f(x}$)

will lead to large errors in estimating its parameters and consequently also to

large errors /(a;,??)

f(x) in estimating the regression function / .

If the objective of the analysis is primarily the estimation of the regression function or curve, that is, of the values of the regression function itself over a region X of interest (and only secondarily the analysis of its properties and the estimation of some of its parameters), then the weighted cross-validation criterion is a convenient criterion characterizing the performance of the model fx,

$):

ni

T , c £ Y Here i)

i

^ ) \

denotes the LSE calculated from the n

4) observations left after deleting

the observation (2:;, If,). Its numerical calculation will be easy for well parameterized models using •& as a starting value Further we assume in i = (s - r and that the values

l

4) that

l

of the independent variable are in X, while the other

values are those if any!) not contained in X. If the independent variable is univariate

and its v l u s Xi a e o

d

a i n g

to t i r m a i t u d e and if

= [ , b is an in

with a < xr < • • • < x

&

15

then we use weights , with t di

1

di[2mb x+1

6)

x i

xr+i

< i < )

)

)

(

i)

( 8 )

If the user would not like to specify the interval [a, 6], then a = X\ and b = Xk should be the standard values. In the definition of the crossvalidation criterion (1.14) we have introduced weights in order to take into account the distances between the different design points as well as the number of replications.

more detailed reasoning

for choosing the weights just as in ( 1 6 ) is given in Section 1.4 and in Bunke, Droge and Polzehl [6], where C is characterized as an estimate of the mean squared error in estimating the values of the regression function. If the values Xi are equidistant and all contained in the interval [a, b] and if there are no replications n, = 1), then the weights are identical:

,= k

l

for i = 1, . ., k.

The criterion (1.14) will also be convenient, if the estimated regression function will be used to predict by f(x:$)

the future values Y*(x) of the dependent variable for

given values x in [a, 6] of the explanatory variable, assuming, that their "distribution is represented to a certain extend by the design points x\,. . ., xn. For some models the estimates

xi}$)

may not be defined for some i} e g . in

the exponential model (1.8) with $i > 0 and $2 < 0, the value /(a;,??) tends to 0 for x I x0 =

^3 convergence from the right) while it tends to o for x " x0 (convergence

from the left). Such cases are not disturbing, if in place of C we always use the following modified crossvalidation criterion full

ross-alidaion,

see Bunke Droge and Polzehl

6] and Droge 0]):

EE^-^ =r

i9

l

where = f

J

)

1.20)

and where i)hl is the OLSE calculated under the substitution of just the observation Y

by Yi =

Xi

J)

1.4

odel selection procedure

The model selection could be done in three steps

Step 1

List alternative m o e l s fi(x,$i),.

. . , / M ( ^ , ^ M ) of similar qualitaive be

haviour corresponding to theoretical or practical experience in the field of application, e g . models with sigmoidal form as in our Example 1. The books of Ratkowsky [13] (but also of Seber and Wild [15] and Ross [4]) offer a rich selection of alternative models with one to five parameters for each possible type of qualitative behaviour: convex or concave curves that are either continuously ascending or descending without maxima minima or inflection points) sigmoidal shaped curves, i.e. curves possessing inflection points but without maxima or minima and possibly having asymptotes) curves with maxima and minima and possibly one or more inflection points) Sometimes the same models are given with different parameterizations, that is, there are models, which may be obtained from another substituting the parameters by functions of new parameters. Often it is indicated, which parameterizations may be favorable with respect to having comparatively small parameter effects curvature in the sense of Bates and Watts [1]. Such parameterization could lead to numerical as well as inferential advantages and should therefore be used. S t e p 2. Select among these models fm (m =

M) a model / „ with smallest

value of the corresponding crossvalidation criterion m m C

M

given by

19) or

4):

}

2

Alternatively, one could select a model / „ which subjectively has an especially appealing form (e.g. a model with interpretable parameters or a model with simple structure and (or) few parameters) but with an otherwise small value of the crossvalidation. For this we may use the rule of thumb

A

or T O "

2^/~

22)

see Examples 1, 2 and 3 treated in Subsection 1.7). St

3 . The data analysis is then done with the estimated regression function x,

parameter like

Or

x , )

23)

4) would be estimated by 7 = l[fxi

)

fXk

being the OLSE of the parameter under the model f

]

24)

As an alternative to t

OLSE in view of possibly (or most likely)

variances the estimate in (1.23) or

24) could be chosen as a weighted

h t e a s t i c SE

SE)

It is defined to be the minimizer of

d) = J:f:Yö)f l

2

l

over i ) £ 0 , where a denotes a convenient estimate of the variance a (see

ection 2)

This may in some cases increase the precision provided that the estimates a are sufficiently accurate. But often the WLSE will be less reliable, especially when the differences in the variances a

are not large. Because the regression functions and the variances

are unknown, actually it is not known whether the OLSE or the WLSE is better. A choice between the OLSE and a WLSE may be performed with a criterion aiming at a maximal accuracy of estimation or calibration) and is discussed in

1.5

he performance of the selected

ection 2

odel

The use of a criterion like (1.14) (or (1.19)) is justified by the fact that it estimates the weighted sum r of actual squared prediction errors

i 2 ^ E J ) \

26)

=r

Here Ey is the conditional expectation over future values Y*(xi) of the independent variable (under the condition of fixed observations Yij) and £,- = riiCi. The weighted sum ry may be seen as an approximation to the integrated actual squared prediction error over the interval

= [ , b\:

—

E

y

x ) x , t i ) \ d x .

2

The weighted sum ry is ( up to a model independent term a2 squared error in estimating the regression function assuming a

r

*, ^)!2 ~

Er 61

—

0*0

equivalent to the mean =r

*, $)\dx-

an

d

28)

If an estimate of the overall mean squared error

rf]

J2EyXi

xJ)\

2

=r

connected with the selected model f^ and the corresponding (possibly weighted) LSE is wanted, an estimate like double crossvalidation has to be used. It takes into

t t e d a d p e n d n c e of t e c n

mo

nt

2 ^

C

X

^

i ^ j l

30)

r

Here the model rh

and the (weighted) LSE

calculated following the procedure described in

of the parameter in this model are ubsection 1.4 from the n — 1 obser

vations left after deleting the observation (a:;, Y^). The calculation of ( 3 0 ) will be computationally expensive because for each of the H=r ni observations in

30) there

must be calculated: (i) crossvalidation values for all admitted models that is nM

SE'

and possibly ii) estimates for the variances o

which are needed for the calculation of the weighted

SEô. less computationally intensive estimate of r\,> may be calculated selecting randomly (< n) observations YiTjT(r

) among the n observations Yij and using the

onte Carlo approximation A

Cf

of

—

^

X i

l

V ) \

30)

1.6

lternative crteria

Cross-validation is an adequate criterion, if prediction or regression function estimation is a primary objective of the data analysis. In different situations other criteria should be used. In a calibration problem as in Example 2 a modified crossvalidation criterion could be adequate assuming a calibration is demanded for observations Y of the dependent variable being in the interval

):

^J:Y^^J)\ Here «/ contains all indices j = 1,. .

32)

n with a < Y{3 < b} N is the number of obser

vations Yij in the calibration interval (ä, b) and the calibration function £(y, i?) is given by the inverse of f(x: i?) as a function of x (see e.g. 1.12). In case of a nonlinear model it may occur that the value of x(Yiî))

is not defined. In such cases or if

x(Yi1}i)h

is outside of the interval [a, b] on which the regression function f(x, i?) is considered, we use in the definition of the criterion the values of a or & instead of £, depending on the monotonicity and the value of / a ; , i ? )

see

xample 2 in

ection

2)

If the objective is to e i m a

a p a m e r

of t

fo

, t

ua

criterion would be nt

CG—^î[Y1Yki[fx1j)Jxkj]\

3)

where and

Yi = — J2Y %

Jn*

4)

ir

j l

(1.33) may be interpreted as a jackknife approximation to the mean absolute error for the estimate 7 (see Bunke, Droge and Polzehl [6]). The criterion

33) is only sensible if all replication sizes rn are large or otherwise

if the parameter ( 4 ) involves weighted sums of many values f(xi). for the linear slope

) and for the area

This is the case

6) if the number k of design points is not

small. Modifications CC and CG in the sense of full crossvalidation (see Subsection may be defined using $

in place of $

3)

in ( 3 2 ) and (1.33) respectively.

"Double resampling" criteria analogous to ( . 3 0 ) or its Monte Carlo approximation (1.31) may be formulated corresponding to the "calibration selection criterion and the "estimation selection criterion

1.7

plications of the

32)

3)

odel selection procedure

In this section we present the results of applying the model selection procedure to the problems introduced in Subsection 1.2 as well as an additional one. E x a m p l e 1 (Pasture regrowth, continued). We reconsider the example of pasture regrowth yield, where its dependence on time is to be modelled. In Subsection

2 it

was already found that the model candidates should be sigmoidally shaped such as (1.7 (1.11). However, the class of competing models could be enlarged by various other sigmoidal models given, for example, in chapter

of Ratkowsky 1990). In addition to

the above mentioned ones we take the following: x,d)

i ?

M)

TT

1

e x p '

? 2

P

1 + exp

^)

) )

T^

36)

^^)

TT^T i ?

) )

8)

x,ti)

i?iexp(V))'

? 3

)

x,ti)

i ? i e x p ^ ) )

40)

1

e x p x '

? 3

x,ti)

^2

#

'6

x,0)

Tale

x,0)

(l + e p ( ^ ^ 3 i?i + exp exp^3

x,#)

i?1 +

: Crossvlidtio

criterion

42) 3)

42;))

exp^))'

5 4

4)

riteria and r a i n g of m o l s fo

crossvalidation model

*M

4)

modified cv ranking

criterion

mple

sum of squares

19

O)

286 11

220

8)

46

11

46 48

0)

224

36)

22

80 6

1591 8)

426

6

30 32

4

3) 4)

4

4 1597

42)

242

66

0

484 51 6

28

32

360

59

For this example, the values of the cross-validation criterion (114) and the corre sponding ranking of the different models are presented in Table 5. Values of the modified crossvalidation criterion (1.19) and the sum of squares (13) are given for a comparison. Consequently, an application of the crossvalidation approach to the pasture regrowth data would lead to the choice of the fourparameter Richards model (1.42). Further more, there is no other model fulfilling the rule of thumb (1.22). However, models (111 which is an extension of ( . 4 2 ) ) , (1.10) and (1.36) violate only slightly the condition (1.22), so that one could also select one of these three models as well, e.g. the simple logistic model ( 3 6 ) , having only three parameters, that is, one less than the models (1.42), (1.11) or (1.10). We remark that the models (1.35) and (1.40) are not flexible enough to provide reasonable fits to the data but for other data sets the situation could, of course, be quite different. Figure 2 illustrates the behaviour of the best model

42) by plotting the observa

F i g e 2: P a e

r w t h :

Resulting p l o s for mo

model: f(x,d) = 11/(1 + exp( 2 - Ö3x ))^(1/d4 ) 1 = 69.623

20

40

12 = 4.255

60

RSS= 0.6721

13 = 0.089

C= 1.476

#4 = 1.724

20

80

Time after pasture

ions and t e f i t d c e

4

40

as well as t e r s i d u a l s assay of Cortiol, continued). We reconsider the

example of the radioimmunological assay of Cortisol

plot of the response versus the

Table 6: Crossvalidation criteria and ranking of models for crossvalidation 11

criterion

32)

xample 2

crossvalidation ranking

criterion

4)

criterion

030

15

11

06

020

0)

0362

66

1562

36)

071

26

1197

42)

0

46

46

3)

80

Time after pasture

E x a m p l e 2 (Radioimmunological

model

60

0328

33)

15

20

logarithm of the dose (with replacing log(0) by —3 and log(oo) by 2) suggests again a sigmoidal shaped dependence, but now the curves should be monotonously decreasing see Figure 3). Therefore we can use those models of our catalogue of competing models

in Example

which are well defined for the given d e i g n , i.e (1.9), (1.10), (1.11) (136)

(1.42) and (1.43). We have excluded model (1.8), although in principle applicable, since it is not flexible enough to fit this data set well. The points ±oo have been excluded in the crossvalidation criteria by choosing a =

2.99 and b = 199. The region of interest

in the calibration criterion CC (see (1.32)) has been fixed by a = 200 and b = 2 0 0 The results are summarized in Table 6, indicating that the Richards model (111 should be the first choice. Naturally, one could formulate a rule of thumb for the criterion (132) analogously to

22), say by considering all models _

yielding values of

32)

with 2 where the model _/„ is that with minimal value of (1.32). Then the logistic model (1.10) would be the only alternative candidate for analyzing the data for calibration purposes, having a more simple structure and one parameter less than the Richards model (1.11) Note that an application of the crossvalidation criterion

4) would lead to the same

ranking of the best three models. In pharmacokinetics the area under the curve obtained by analyzing radioimmunological assays is used to characterize rate and extent of drug absorption. The integral with respect to the dose recall that Xi is the logarithm of the dose) can be approximated

Figure 3: RI

of Cortisol: Resulting plots for model

11

model: f(x, 0 ) = 1 +12 /(1 + exp(i3 - d4 x ) ) ^ i 5 RSS = 2723 C = 1565 calibration criterion (CC): 0.0307 1 = 133.601 €2 = 2628.593 13 = 3.129 •&4 = -3.215 15 = 0.622

•

• •

•

• •• • • •

• • • \ \

•

• • •

• • • •• • •

•

•

•

• • • •

•

•

• •

-2

-1

0

log-dose

-2

-1

0

log-dose

1

y a pame

7 = ^ £ * + ! ) + /aO)

W)

46)

«=i

If this parameter is of interest, for instance as a measure to compare different experi ments, the crossvalidation criterion (1.33) would again suggest to use model (1.11). Figure 3 shows the fit of the best model (1.11) to the data and presents the cor responding residual plot. This indicates that the error variances are probably heterogeneous. Therefore, one could try to estimate the error variances, for example, on the basis of the replicated observations and to fit the model to the data by the weighted least squares criterion (1.25). However, in calibration problems it seems to be important to approximate the unknown regression function with high accuracy in particular in regions where it is flat, that is, where it has a small derivative. This would suggest the use of a weighted least squares criterion different from (1.25). To avoid such a discussion here we have therefore used the ordinary nonlinear least squares approach, while a comparison of different weighted least squares in this example is left to Section 2. In order to estimate the approximated area under the curve 1.46 it would be impor tant to approximate f more accurate if the analyses of

xample 2 in

E x a m p l e 3 (Estimation

) is large. This will be reflected in

ection 3 of growth rate, continued). In addition to the model ( 1 . 3 )

we fitted almost all (i.e. more than 20) concave models of chapter 4 in Ratkowsky [13] with one to four parameters to the data. We obtained without numerical difficulties the S E s and the values of the model choice criteria for many of these models, especially e mo

and

8 x,0)

l?l +

)

t**

x,#)

log

i?i)

8)

x,#)

logi?i +

ix)

9)

x,d)

M

50) 51)

x,#) x,0)

•&\X

2)

x,0)

X+ $ •ß^ x)*2

3)

x,#)

tiix***

4)

x,#)

+ dx + +

t*s

3yft

+ #3X

,#) x,0)

)

+$X

1

i?i + i log

x,0)

+ i?i

x,0)

8)

i?3)

59

$3

60)

1+$X + $AX2 1)1 e x p x ) + ^3 e x p x )

x,0)

6

Table 7 c o n s t e r u l t s . The model with minimal value of the crterion (1.33) (for 7 given by ( 5 ) ) is ( 1 . 3 ) . This model ranks only as eights for prediction purposes, i e . by the crossvalidation criterion (1.14), but still fulfils the rule of thumb (1.22) On the other hand, model (1.47) minimizing criterion (1.14) ranks only eleventh with respect to criterion (133) and exceeds the minimal value of that criterion even by more than 80 per cent. Notice that the models providing a value of 7 closest to the unbiased estimate "f[Yi

Tale model

Yk] = 0 0 2

of 7 rank best for criterion

: Crossvlidtio criterion

4)

33)

criteria and r a i n g of m o e l s fo ranking

criterion ( 3 3 )

OO

Figure

mple 3

ranking

000

8)

088

046

002

3)

0

0463

00280

095

07182

0026

0

08

8) 0) 51

20

15

11

6806

00264 00311

24

6020

003151

43

11

0097

408

2)

83

3)

08

002 46

002033

30

002

0

08759

002634

084

0

002623

0

0806

002633

088

0420

002

08

2

0026

59

0

08

002632

60)

23

06215

002599

6

51

063

00259

4) 6) 8)

11

15

displays

Fig

s t i m a i o n of g w t h r

R e l t s for mo

) and 7 g i v n b

model: f(x, J ) = J + J J ^ x RSS = 0.006917 1 2 3 g = 0.0258 CG = 0.0004637 J1 = 2.67 2 J = 0.973 J 3 = 0.873

0

C = 0.009434

5

10

15

20

25

30

age

the results for the best model

3) according to criterion

3)

E x a m p l e 4 (Bean root cells - simulated data). This example aims at showing the importance of taking into account the intended use of analyzing the data when an appropriate model is to be selected. The growth of bean root cells is a microscopic vegetative process where the dependence of the water content on the distance from growing tip is of interest.

A data

set of size 15 has been used by Ratkowsky [ 2 ] as an illustrating example, and as in the Example 1 it can be seen that the process produces a sigmoidally shaped growth curve. Ratkowsky [12] considered five competing models (all of them are among our candidates), but without arriving to a convenient model selection. Applying the crossvalidation criterion to the data would suggest the use of model (1.10). We generated a data set (see Table 8) of size 50, which mimics the original data as follows: After transforming the x-data to the interval (0,1) the model (111) which is an extension of the best model (1.10), was fitted to the data. The resulting homogeneous variance estimate a

= 0.868 was used to simulate

0 pseudo random

normally distributed variables with mean 0 and this variance. For 50 equidistant revalues on (0,1), the corresponding y-values were obtained by adding the simulated "errors" to the fitted curve

Tale

i m u d d a o

0

26

2

30

03

326

2

3

0

24

3

0

220

0 11

mple 4

62

79

2077

0

8

22679

83

2036

622

8

203

55 11

57

33

606

59

8

3

68

6

2068

8

11

004

3

099

63

1915

8

2222

079

3

820

6

859

91

2260

4

0

6

1946

0415

43

11322

6

86

19

75

4

3266

71

2

3477

4

288

23

386

4

330

75

20097

2

477

51

4

77

26

15

Tale model

: C r l i i o criterion 19

ranking

751 95

315

19311

97

20306

19623

99

2095

i t a and r i n g of m o s f criterion 32)00

ranking

mple

criterion 33)00

0

686

0

615

0)

0

64

36)

0

628

40)

330

6

62

0

603

23

42)

0

657

24

3)

0

6

686

4)

0

615

11

232

4683 46

91

ranking

We have used the same catalogue of models as in Example 1. Some of the model turned out to be not flexible enough, leading to numerical problems. The results for the different model selection criteria are contained in Table 9, showing the values of the full crossvalidation criterion (1.19) as well as the corresponding ranking of the models We have checked, that the values of the crossvalidation criterion (1.14) (and also the ranking of the models) differ only slightly from the values of the full crossvalidation 19). Crossvalidation favors the Morgan-MercerFlodin model (141) with four parame ters, whereas the fiveparameter model (1.11) assumed to be the true regression model in the simulated experiment ranks second, but with nearly the same value of the criterion. Figure 5 shows how model

4 ) fits to the data and presents a plot of the resulting

residuals Figure

imulated data of

xample 4: Results for model

model: f(x,d) = ( 1 x^ 2 + 3d4 )/(*4 + x^ 2) 1 = 22.035

12 = 4.479

RSS= 0.7639

13 = 1.569

4)

C= 0.901

#4 = 0.024

Although in the present example there is no interest in calibration, for the sake of comparison between the criteria we report the values of the criteria (1.32), (1.33) and the corresponding ranking of the models in Table

too. In case of

33) the parameter

of interest was the growth rate (1.5). The ranking of the models for calibration purposes is quite different from that in the case where prediction or regression function estimation was the primary objective of the data analysis. However the

o r g a n - e r c e r F l o d i n model

) is also the best

r caibraton prposes. If the estimation of the growth rate (1.5) would be the objective of the data analysis then the ranking of the models would be completely different from those in the other cases. The first choice would be the logistic model (1.10) which behaves slightly better than the Weibull model (1.7). But obviously three of the models are not appropriate in this case. Even the best model for prediction and calibration purposes violates slightly a rule of thumb accepting models /

with ^^

which is defined in analogy to

2

)

Selection of v r i a n c e estimation 2.1

odels and variance

election and ftting of v a r i a c e

odels

The observation variances a may be estimated by the intra sample estimates JO _

T

E ^V ^ ' IV I 1

if there are enough replications ( rn relatively large!). In such an exceptional situation these estimates may be used for calculating weighted LSE. An improved variance estimation may be possible using alternative variance models and taking into consideration, that possibly it is not sure that a certain variance model is adequate and moreover it is even not sure that a certain structure of the regression function is adequate. Our procedure (see Bunke, Droge and Polzehl [6]) is based on a least squares fitting of alternative variance models to conveniently defined "obser vations" z\,.,Zk-

Here the knowledge of adequate regression and variance models

(and normality of observations) are not necessary. The "observations" Z{ are defined in such a way, that they have (roughly) the variances of as their expectation, as it is exactly the case for the estimates sf given by (2.1). Assuming ordered univariate values X\ < x < •

< Xk of the independent variable we use the "observations

if

z; :

|

Here we use the residuals

if

>2 o replictions)

22)

in e m p l o i n g the ( b e t fitting) m o l f

This model is c h e n among the admitted

models fm as that with smallest sum Til =

and nk

) of squared errors see

3)). In the case of

we use z1 = -e1

and

zk

-ek

ekx\

24)

If the independent variable x is multivariate then we may use in case of rn =

where the residual

2 corresponds to the value Xj nearest to xf min

J

—

26)

If there are several such values, we take the value x3u with smallest index j(i)-) We fit alternative variance models of the form gx, a

) to the above "observations"

by minimization of the sum of squares )=

£

n

i

) \ x

v ) \ 5 2

i

rii>

z x a ) \ } nt

2

l

We propose six alternative variance models, which are especially useful and have been proposed in the literature, see Carroll and Ruppert 8]: ) The first is the exponential model x , a )

= a[fmoxJmo)

where we use the model fmo (ra 0 = m or section 1.4. The constant a is chosen as 2)

+

28)

) chosen by the procedure described in

ub-

:= .1

s an alternative model in place of 2 8 ) one may fit the model x, a

) = cexp[fmox,

•dmo)]

2

(3) Past experience or the residuals after fitting by ordinary least squares) may suggest that the variance af does not vary monotonously with the mean /(a;,), but behaves possibly approximately like a unimodal function of f(x{)

or like the reverse of such a

behaviour. Then we could alternatively fit a quadratic variance model x, a

)

=

x, S )

)+

x, d )

20)

where : 0

T /

x

ii"

^ '

b e l l h a p e d v i a n c e mo x, a

)= a

x, S))

/ = min, fm0(x} i?mo) / = max fmoxi

$mo)

nas

x, l )

) ,

22)

l e s s parameters and may be useful e g

for count data. ometimes also a simple linear model may be useful: x, a

)= c

T(fmox,

dmo

fmo)

23)

6) In many cases a homogeneous variance estimate a would be more accurate than a heteroscedastic estimate a

determined by a variance model, especially when the

differences between the variances a are moderate or small. Thus we would fit a constant model

x1a)

= a to our observations

nt

d obtain

£^*l

24

£ } ni

)

where q is the number of points Xi with rn > 2. Unfortunately, some of the six models have the disadvantage of possibly leading to negative estimates

:= g(xi,a,r)

for some design points X{. We replace the negative

(and also very small) estimates by some fixed small positive value, say by

odel Selection, Transformations and Variance

odel Selection, Transformations and Variance

Suggest Documents

are variance-stabilizing transformations really useful

are variance-stabilizing transformations really useful

Linear transformations of variance/covariance matrices

SELECTION OF SIMULATION VARIANCE ...

A Quantitative Model for odel for odel for ... - NeuroQuantology

Mean-Variance Portfolio Selection under Portfolio Insurance ...

Mean-Variance Portfolio Selection with Margin Requirements

Variance stabilising transformations for NMR ... - BMC Systems Biology

Variance Stabilizing Transformations in Patch-based Bilateral Filters

On Feature Selection, Bias-Variance, and Bagging - CiteSeerX

selection and genetic (co)variance in bighorn sheep

Dominance variance: associations with selection and fitness - Nature

variance components and respon se to selection for ... - RiuNet

Genetic Variance, Heritability and Selection Gain of ...

Variance Based Scheduling Algorithm with Relay Selection - arXiv

Selection of low-variance expressed Malus x ...

Continuous-Time Mean-Variance Portfolio Selection with Partial ...

Gaussian model selection with an unknown variance - arXiv

AM ODEL OF THE CARDIOVASCULAR SYSTEM ...

Distributionally Robust Mean-Variance Portfolio Selection with ... - arXiv

Harvest-induced disruptive selection increases variance in fitness

Mean-Variance Portfolio Selection in a Jump-Diffusion ... - MDPI

Mean-Variance Portfolio Selection with Inflation Hedging ... - CiteSeerX

Developments of Mean-Variance Model for Portfolio Selection ... - ORSC