1. prediction of future values of the response variable, 2. estimation of the un known regression function, 3. calibration or 4. estimation of some parameter with.
odel Selection, Transformations and Variance stimation in onlinear egression Olaf Bunke 1 , B e r n d Droge 1 and J ö r g Polzehl 2 Institut für Mathematik, Humboldt-Universität zu B e l i P S F 1297, D-10099 Berlin, Germany 2
K o n r a d - Z u s - Z e n t r u m für Informationstechnik Heilbrunne
t
10 D-10711 B e l i n
Gemany
Abstract The results of analyzing experimental d a t a using a parametric model may heavily depend on the chosen model. In this paper we propose procedures for the ade quate selection of nonlinear r g r i o n models if the i n t n d e d use of the model i among the following: 1. prediction of future values of the response variable, 2. estimation of the un known regression function, 3. calibration or 4. estimation of some parameter with a certain meaning in the corresponding field of application. Moreover, we propose procedures for variance modelling and for selecting an appropriate nonlinear trans formation of the observations which may lead to an improved accuracy. We show how to assess the a c c u r c y of the parameter estimators by a "moment oriented bootstrap procedure". This procedure may also be used for the construction of confidence, prediction and calibration intervals. Programs written in Splus which realize our strategy for nonlinear regression modelling and parameter estimation are described as well. The performance of the selected model is d i u s d , and the behaviour of the p r o c d u r s is i l l u s t t d by exampl K e y words: Nonlinear regression, m o d e l selection, b o o t s t r a p , c r o s s v a l i d a t i o n , variable t r a n s f o r m a t i o n , variance modelling, calibration, m e a n s q u a r e d error for p r e d i c t i o n , c o m p u t i n g in nonlinear regression. A M S 1991 subject classifications: 62J99, 62J02, 6 2 P 0
Selection of r e g r e s i o n 1.1
odel
reliminary discussion
In many papers and books it is discussed how to analyse experimental data estimating the parameters in a linear or nonlinear regression model, see e g . Bunke and Bunke [2], [3], Seber and Wild [15] or Huet, Bouvier, Gruet and Jolivet [11]. The usual situation is that it is not known, if a certain regression model describes the unknown true regression function sufficiently well. The results of the statistical analysis may depend heavily on the chosen model. Therefore there should be a careful selection of the model, based on the scientific and practical experience in the corresponding field of application and on statistical procedures. Moreover, a nonlinear transformation of the observations or a appropriate model of the variance structure can lead to an improved accuracy In this paper we will propose procedures for the adequate selection of regression models. This includes choosing suitable variance models and transformations. Additionally we present methods to assess the accuracy of estimates, calibrations and predictions based on the selected model. The general framework of this paper will be described now. We consider possibly replicated observations of a response variable Y at fixed values of exploratory variables or nonrandom design points) Yij = fxi)
which follow the model
+ eij
i =
k
j =
nt
J2nt
= n.
l
In (1.1) the regression function / is unknown and realvalued and the e^ are assumed to be uncorrelated random errors with zero mean and positive variances af =
a(xi)
The usual assumption of a homogeneous error variance erf = a is unrealistic in many applications and one is often confronted with heteroscedasticity problems. The analysis of the data requires in general to estimate the regression function which describes the dependence of the response variable on the explanatory variables. This is usually done by assuming that this dependence may be described by a parametric model
(*) I
tfee},
(2)
where the function /(.,i?) is known up to a p-dimensional parameter i ) G 0 C R P , SO that the problem reduces to estimate this parameter using the data. As an estimate of the parameter we will use the ordinary least squares estimator (OLSE) i?, which is the minimizer of the sum of squares
W=£ £ t f - t f ) ) l
l
3)
with r e p e c t to i? . In practice this is the m o t p o r
a
h
to e i m a e •&. Weig
SE will be discussed later in Section 2. To simplify the presentation in what follows, we will preliminary restrict ourselves to the case of realvalued design points, that is, we will assume to have only one explanatory variable. Note that the model (12) is called linear if it depends linearly on i?; otherwise it is called nonlinear. Our focus will be on the latter case. A starting point in an approach to model selection is the idea, that even if a certain regression model is believed to be convenient for given experimental data, either because of theoretical reasonings or based on past experience with similar data, there is seldom sure evidence on the validity of a model of the form (1.2). Therefore a possible modification of the model could lead to a better fit or to more accurate estimates of the interesting parameters. A parameter with a certain physical or biological meaning may often be represented in different regression models as a function of the corresponding parameters. This will be the case if the parameter 7 of interest is a function
7= 7[/)
/
of the values of the regression function at the values x\, are one of these values e g . 7 = / z i )
4) , Xk of its argument
xamples
the linear slope or growth rate)
^i/^OK--^
with
x = \j^
or the approximated area under the curve fci
7=
Z K + O + fxi))xi+l
x
i)
6
)
1
which is used to characterize rate and extent of drug absorption in pharmacokinetic studies. Alternatively, the theoretical reasonings as well as experiences may lead to several models, which preliminary seem to be equally adequate, so that the ultimate choice of a specific model among them is left open for the analysis of the data
1.2
Examples
The following examples illustrate the above discussion by describing situations with different objectives in analyzing the data. E x a m p l e 1 [Pasture regrowth). Ratkowsky [12] considered data (see Table 1) describing the dependence of the pasture regrowth yield on the time since the last grazing The objective of the analysis is to model the dependence of the pasture regrowth yield on the time since the last grazing. A careful investigation of a graph of the data (see Figure
) suggested, that the process produces a sigmoidally shaped curve as it is provided
Tale
: D a a of y i e d of p a e
Time after pasture Yield
42
28 8
Tale 2
r w t h v u s time
08
57
22
859
63
611
m of squared errors for alternative s i g m o i d l m o l s in model
8) 0
i*)
79 6462
08
mple
0) 242
46
080
y the W e i b l l mo x,#) =
#1exv((#3x**))
But there are other models of sigmoidal form, which could as well fit data from growth experiments e g x,#) = #1exv[#2/x
+ #3)]
or a;,i?) = i ? 1 e x p [ e x p i ? 3 a ; )
Figure
: Pasture regrowth:
bserved yield and fitted regression curves .
time after pasture
8)
( t p a a m e
G o m p e z mo
or M ) = #i + 7 7 — r i — r - T °) + exp X) fourparameter logistic model). A listing and a description of even more alternative sigmoidal nonlinear models is given in Ratkowsky [13] (see also Seber and
ild [15] or Ross [14]). The regression
curves corresponding to the alternative models (1.8), (1.9) and (1.10) fit differently well to the observations, as may be seen by the corresponding values of the sum of squared errors (13) in Table 2 or by the plot of the fitted models in Figure 1. Notice that the fitted curves of model ( ) and (110) are hardly distinguishable in Figure 1. E x a m p l e 2 (Radioimmunological assay of Cortiol: Calibtion) In calibration Tale 3 Daa fr t e c l i b i o n p l e Dose in ng/
ml
f a
fC
Response in counts per minutes 2868
002
2615
004
24
28 2651
280 06
15
230
008
11
20
0
862
19
02
364
0
91
06
02
08
86
75
24 22
20
919
88
24
23
006
779
2030
800
71
77
304
55
75 62
59 99
30
51
43
33 42
26 34
33
experiments one first t k e s a training (calibration) sample and fits a model to the data providing a calibration curve. However, in contrast to Example 1 the real interest lies here in estimating an unknown value of the explanatory variable corresponding to a value of the response variable, which is independent of the training sample and may easily be measured. This is usually done by inverting the calibration curve. We will illustrate the procedure by considering the radioimmunological assay (RIA of Cortisol. The corresponding data from the laboratoire de physiologie de la lactation
(INRA) are reported in Huet, B o u i e r , Gruet and Jolivet [ ] and are r e p r o u c e d in Table 3. The response variable is the quantity of a link complex of antibody and radioactive hormone while the explanatory variable is the dose of some hormone. In practice the Richards generalized logistic) model
=t is use
+ e x
3
)Y
-
for describing the dependence between the response and the logarithm of th
dose. Since the aim of the experiment is to estimate an unknown dose of hormone for a value y0 of the response, we have to invert the fitted calibration curve in y0 providing
%o) =
H o
-
In
2) i?!,. ., i?5 denote the least squares estimates of the parameters of model (1.11)
and
y0) is the estimated log-dose associated with the response y0
Of course, for
mula (1.12) can only be applied if y0 belongs to the range of model (1.11), that is, if J/o € (t?i, i?i + t?2)- If J/o lies outside of this range then we have to modify the procedure e.g. by taking the minimum or maximum value of the observed dose in dependence of y0 being larger or smaller than all possible values of the model function ((x}$)
in case that
is monotonously decreasing)
E x a m p l e 3 (Length versus age for dugongs: Estimation
of growth rate). In many
agricultural but also biological applications, growth curves are studied which for large values of the explanatory variable approach an asymptote (similarly to the sigmoidal curves), but lack an inflection point. Tale age
We have reinvestigated a corresponding data
th v r s u s a e of d
length
age
length
age
length
80
0
226
00
2
15
8
0
2
20
2
6
8
80
2
20
232
77
8
219
30
2
226
30
2
4
2
2
202
40
22
240
215
2
0
0
15
26
age
length 2 26
0 22
2 2
0
2 2
set dscribing the length versus the age of dugongs (see Ratkowsky [12], p. 101, and Table 4), which had been examined by Ratkowsky [ 2 ] using the so-called asymptotic regression model with various parameterizations) x,-&) =
ti1
3)
In contrast to Ratkowsky we will suppose, however, that the objective of the analysis is to determine the growth rate 7 (given by ( 1 ) ) of the dugongs. An unbiased estimate of 7 may be calculated without a model: 7 ^ 1 , Yi := 1 / ni Y^jLi Yij,
i = 1,..,&.
., Yjt] = 0.02896, where
The use of a parametric model could lead to a
more accurate estimate of 7, if the model gives a good approximation to the unknown true regression function.
1.3
criterion for the
odel performance
The model fit may be apparently good as in the
xample
) visually from a graphical representation of the estimated regression curve together with the observations or 2) from the numerical values of the sum of squares
$)
The fit may be improved (even up to a vanishing S($)\)
taking models with a large
number of parameters. But it is intuitively clear, that such "overparametrized" models f(x}$)
will lead to large errors in estimating its parameters and consequently also to
large errors /(a;,??)
f(x) in estimating the regression function / .
If the objective of the analysis is primarily the estimation of the regression function or curve, that is, of the values of the regression function itself over a region X of interest (and only secondarily the analysis of its properties and the estimation of some of its parameters), then the weighted cross-validation criterion is a convenient criterion characterizing the performance of the model fx,
$):
ni
T , c £ Y Here i)
i
^ ) \
denotes the LSE calculated from the n
4) observations left after deleting
the observation (2:;, If,). Its numerical calculation will be easy for well parameterized models using •& as a starting value Further we assume in i = (s - r and that the values
l
4) that
l
of the independent variable are in X, while the other
values are those if any!) not contained in X. If the independent variable is univariate
and its v l u s Xi a e o
d
a i n g
to t i r m a i t u d e and if
= [ , b is an in
with a < xr < • • • < x
&
15
then we use weights , with t di
1
di[2mb x+1
6)
x i
xr+i
< i < )
)
)
(
i)
( 8 )
If the user would not like to specify the interval [a, 6], then a = X\ and b = Xk should be the standard values. In the definition of the crossvalidation criterion (1.14) we have introduced weights in order to take into account the distances between the different design points as well as the number of replications.
more detailed reasoning
for choosing the weights just as in ( 1 6 ) is given in Section 1.4 and in Bunke, Droge and Polzehl [6], where C is characterized as an estimate of the mean squared error in estimating the values of the regression function. If the values Xi are equidistant and all contained in the interval [a, b] and if there are no replications n, = 1), then the weights are identical:
,= k
l
for i = 1, . ., k.
The criterion (1.14) will also be convenient, if the estimated regression function will be used to predict by f(x:$)
the future values Y*(x) of the dependent variable for
given values x in [a, 6] of the explanatory variable, assuming, that their "distribution is represented to a certain extend by the design points x\,. . ., xn. For some models the estimates
xi}$)
may not be defined for some i} e g . in
the exponential model (1.8) with $i > 0 and $2 < 0, the value /(a;,??) tends to 0 for x I x0 =
^3 convergence from the right) while it tends to o for x " x0 (convergence
from the left). Such cases are not disturbing, if in place of C we always use the following modified crossvalidation criterion full
ross-alidaion,
see Bunke Droge and Polzehl
6] and Droge 0]):
EE^-^ =r
i9
l
where = f
J
)
1.20)
and where i)hl is the OLSE calculated under the substitution of just the observation Y
by Yi =
Xi
J)
1.4
odel selection procedure
The model selection could be done in three steps
Step 1
List alternative m o e l s fi(x,$i),.
. . , / M ( ^ , ^ M ) of similar qualitaive be
haviour corresponding to theoretical or practical experience in the field of application, e g . models with sigmoidal form as in our Example 1. The books of Ratkowsky [13] (but also of Seber and Wild [15] and Ross [4]) offer a rich selection of alternative models with one to five parameters for each possible type of qualitative behaviour: convex or concave curves that are either continuously ascending or descending without maxima minima or inflection points) sigmoidal shaped curves, i.e. curves possessing inflection points but without maxima or minima and possibly having asymptotes) curves with maxima and minima and possibly one or more inflection points) Sometimes the same models are given with different parameterizations, that is, there are models, which may be obtained from another substituting the parameters by functions of new parameters. Often it is indicated, which parameterizations may be favorable with respect to having comparatively small parameter effects curvature in the sense of Bates and Watts [1]. Such parameterization could lead to numerical as well as inferential advantages and should therefore be used. S t e p 2. Select among these models fm (m =
M) a model / „ with smallest
value of the corresponding crossvalidation criterion m m C
M
given by
19) or
4):
}
2
Alternatively, one could select a model / „ which subjectively has an especially appealing form (e.g. a model with interpretable parameters or a model with simple structure and (or) few parameters) but with an otherwise small value of the crossvalidation. For this we may use the rule of thumb
A
or T O "
2^/~
22)
see Examples 1, 2 and 3 treated in Subsection 1.7). St
3 . The data analysis is then done with the estimated regression function x,
parameter like
Or
x , )
23)
4) would be estimated by 7 = l[fxi
)
fXk
being the OLSE of the parameter under the model f
]
24)
As an alternative to t
OLSE in view of possibly (or most likely)
variances the estimate in (1.23) or
24) could be chosen as a weighted
h t e a s t i c SE
SE)
It is defined to be the minimizer of
d) = J:f:Yö)f l
2
l
over i ) £ 0 , where a denotes a convenient estimate of the variance a (see
ection 2)
This may in some cases increase the precision provided that the estimates a are sufficiently accurate. But often the WLSE will be less reliable, especially when the differences in the variances a
are not large. Because the regression functions and the variances
are unknown, actually it is not known whether the OLSE or the WLSE is better. A choice between the OLSE and a WLSE may be performed with a criterion aiming at a maximal accuracy of estimation or calibration) and is discussed in
1.5
he performance of the selected
ection 2
odel
The use of a criterion like (1.14) (or (1.19)) is justified by the fact that it estimates the weighted sum r of actual squared prediction errors
i 2 ^ E J ) \
26)
=r
Here Ey is the conditional expectation over future values Y*(xi) of the independent variable (under the condition of fixed observations Yij) and £,- = riiCi. The weighted sum ry may be seen as an approximation to the integrated actual squared prediction error over the interval
= [ , b\:
—
E
y
x ) x , t i ) \ d x .
2
The weighted sum ry is ( up to a model independent term a2 squared error in estimating the regression function assuming a
r
*, ^)!2 ~
Er 61
—
0*0
equivalent to the mean =r
*, $)\dx-
an
d
28)
If an estimate of the overall mean squared error
rf]
J2EyXi
xJ)\
2
=r
connected with the selected model f^ and the corresponding (possibly weighted) LSE is wanted, an estimate like double crossvalidation has to be used. It takes into
t t e d a d p e n d n c e of t e c n
mo
nt
2 ^
C
X
^
i ^ j l
30)
r
Here the model rh
and the (weighted) LSE
calculated following the procedure described in
of the parameter in this model are ubsection 1.4 from the n — 1 obser
vations left after deleting the observation (a:;, Y^). The calculation of ( 3 0 ) will be computationally expensive because for each of the H=r ni observations in
30) there
must be calculated: (i) crossvalidation values for all admitted models that is nM
SE'
and possibly ii) estimates for the variances o
which are needed for the calculation of the weighted
SE^o. less computationally intensive estimate of r\,> may be calculated selecting randomly (< n) observations YiTjT(r
) among the n observations Yij and using the
onte Carlo approximation A
Cf
of
—
^
X i
l
V ) \
30)
1.6
lternative crteria
Cross-validation is an adequate criterion, if prediction or regression function estimation is a primary objective of the data analysis. In different situations other criteria should be used. In a calibration problem as in Example 2 a modified crossvalidation criterion could be adequate assuming a calibration is demanded for observations Y of the dependent variable being in the interval
):
^J:Y^^J)\ Here «/ contains all indices j = 1,. .
32)
n with a < Y{3 < b} N is the number of obser
vations Yij in the calibration interval (ä, b) and the calibration function £(y, i?) is given by the inverse of f(x: i?) as a function of x (see e.g. 1.12). In case of a nonlinear model it may occur that the value of x(Yi^i))
is not defined. In such cases or if
x(Yi1}i)h
is outside of the interval [a, b] on which the regression function f(x, i?) is considered, we use in the definition of the criterion the values of a or & instead of £, depending on the monotonicity and the value of / a ; , i ? )
see
xample 2 in
ection
2)
If the objective is to e i m a
a p a m e r
of t
fo
, t
ua
criterion would be nt
CG—^^i[Y1Yki[fx1j)Jxkj]\
3)
where and
Yi = — J2Y %
Jn*
4)
ir
j l
(1.33) may be interpreted as a jackknife approximation to the mean absolute error for the estimate 7 (see Bunke, Droge and Polzehl [6]). The criterion
33) is only sensible if all replication sizes rn are large or otherwise
if the parameter ( 4 ) involves weighted sums of many values f(xi). for the linear slope
) and for the area
This is the case
6) if the number k of design points is not
small. Modifications CC and CG in the sense of full crossvalidation (see Subsection may be defined using $
in place of $
3)
in ( 3 2 ) and (1.33) respectively.
"Double resampling" criteria analogous to ( . 3 0 ) or its Monte Carlo approximation (1.31) may be formulated corresponding to the "calibration selection criterion and the "estimation selection criterion
1.7
plications of the
32)
3)
odel selection procedure
In this section we present the results of applying the model selection procedure to the problems introduced in Subsection 1.2 as well as an additional one. E x a m p l e 1 (Pasture regrowth, continued). We reconsider the example of pasture regrowth yield, where its dependence on time is to be modelled. In Subsection
2 it
was already found that the model candidates should be sigmoidally shaped such as (1.7 (1.11). However, the class of competing models could be enlarged by various other sigmoidal models given, for example, in chapter
of Ratkowsky 1990). In addition to
the above mentioned ones we take the following: x,d)
i ?
M)
TT
1
e x p '
? 2
P
1 + exp
^)
) )
T^
36)
^^)
TT^T i ?
) )
8)
x,ti)
i?iexp(V))'
? 3
)
x,ti)
i ? i e x p ^ ) )
40)
1
e x p x '
? 3
x,ti)
^2
#
'6
x,0)
Tale
x,0)
(l + e p ( ^ ^ 3 i?i + exp exp^3
x,#)
i?1 +
: Crossvlidtio
criterion
42) 3)
42;))
exp^))'
5 4
4)
riteria and r a i n g of m o l s fo
crossvalidation model
*M
4)
modified cv ranking
criterion
mple
sum of squares
19
O)
286 11
220
8)
46
11
46 48
0)
224
36)
22
80 6
1591 8)
426
6
30 32
4
3) 4)
4
4 1597
42)
242
66
0
484 51 6
28
32
360
59
For this example, the values of the cross-validation criterion (114) and the corre sponding ranking of the different models are presented in Table 5. Values of the modified crossvalidation criterion (1.19) and the sum of squares (13) are given for a comparison. Consequently, an application of the crossvalidation approach to the pasture regrowth data would lead to the choice of the fourparameter Richards model (1.42). Further more, there is no other model fulfilling the rule of thumb (1.22). However, models (111 which is an extension of ( . 4 2 ) ) , (1.10) and (1.36) violate only slightly the condition (1.22), so that one could also select one of these three models as well, e.g. the simple logistic model ( 3 6 ) , having only three parameters, that is, one less than the models (1.42), (1.11) or (1.10). We remark that the models (1.35) and (1.40) are not flexible enough to provide reasonable fits to the data but for other data sets the situation could, of course, be quite different. Figure 2 illustrates the behaviour of the best model
42) by plotting the observa
F i g e 2: P a e
r w t h :
Resulting p l o s for mo
model: f(x,d) = 11/(1 + exp( 2 - Ö3x ))^(1/d4 ) 1 = 69.623
20
40
12 = 4.255
60
RSS= 0.6721
13 = 0.089
C= 1.476
#4 = 1.724
20
80
Time after pasture
ions and t e f i t d c e
4
40
as well as t e r s i d u a l s assay of Cortiol, continued). We reconsider the
example of the radioimmunological assay of Cortisol
plot of the response versus the
Table 6: Crossvalidation criteria and ranking of models for crossvalidation 11
criterion
32)
xample 2
crossvalidation ranking
criterion
4)
criterion
030
15
11
06
020
0)
0362
66
1562
36)
071
26
1197
42)
0
46
46
3)
80
Time after pasture
E x a m p l e 2 (Radioimmunological
model
60
0328
33)
15
20
logarithm of the dose (with replacing log(0) by —3 and log(oo) by 2) suggests again a sigmoidal shaped dependence, but now the curves should be monotonously decreasing see Figure 3). Therefore we can use those models of our catalogue of competing models
in Example
which are well defined for the given d e i g n , i.e (1.9), (1.10), (1.11) (136)
(1.42) and (1.43). We have excluded model (1.8), although in principle applicable, since it is not flexible enough to fit this data set well. The points ±oo have been excluded in the crossvalidation criteria by choosing a =
2.99 and b = 199. The region of interest
in the calibration criterion CC (see (1.32)) has been fixed by a = 200 and b = 2 0 0 The results are summarized in Table 6, indicating that the Richards model (111 should be the first choice. Naturally, one could formulate a rule of thumb for the criterion (132) analogously to
22), say by considering all models _
yielding values of
32)
with 2 where the model _/„ is that with minimal value of (1.32). Then the logistic model (1.10) would be the only alternative candidate for analyzing the data for calibration purposes, having a more simple structure and one parameter less than the Richards model (1.11) Note that an application of the crossvalidation criterion
4) would lead to the same
ranking of the best three models. In pharmacokinetics the area under the curve obtained by analyzing radioimmunological assays is used to characterize rate and extent of drug absorption. The integral with respect to the dose recall that Xi is the logarithm of the dose) can be approximated
Figure 3: RI
of Cortisol: Resulting plots for model
11
model: f(x, 0 ) = 1 +12 /(1 + exp(i3 - d4 x ) ) ^ i 5 RSS = 2723 C = 1565 calibration criterion (CC): 0.0307 1 = 133.601 €2 = 2628.593 13 = 3.129 •&4 = -3.215 15 = 0.622
•
• •
•
• •• • • •
• • • \ \
•
• • •
• • • •• • •
•
•
•
• • • •
•
•
• •
-2
-1
0
log-dose
-2
-1
0
log-dose
1
y a pame
7 = ^ £ * + ! ) + /aO)
W)
46)
«=i
If this parameter is of interest, for instance as a measure to compare different experi ments, the crossvalidation criterion (1.33) would again suggest to use model (1.11). Figure 3 shows the fit of the best model (1.11) to the data and presents the cor responding residual plot. This indicates that the error variances are probably heterogeneous. Therefore, one could try to estimate the error variances, for example, on the basis of the replicated observations and to fit the model to the data by the weighted least squares criterion (1.25). However, in calibration problems it seems to be important to approximate the unknown regression function with high accuracy in particular in regions where it is flat, that is, where it has a small derivative. This would suggest the use of a weighted least squares criterion different from (1.25). To avoid such a discussion here we have therefore used the ordinary nonlinear least squares approach, while a comparison of different weighted least squares in this example is left to Section 2. In order to estimate the approximated area under the curve 1.46 it would be impor tant to approximate f more accurate if the analyses of
xample 2 in
E x a m p l e 3 (Estimation
) is large. This will be reflected in
ection 3 of growth rate, continued). In addition to the model ( 1 . 3 )
we fitted almost all (i.e. more than 20) concave models of chapter 4 in Ratkowsky [13] with one to four parameters to the data. We obtained without numerical difficulties the S E s and the values of the model choice criteria for many of these models, especially e mo
and
8 x,0)
l?l +
)
t**
x,#)
log
i?i)
8)
x,#)
logi?i +
ix)
9)
x,d)
M
50) 51)
x,#) x,0)
•&\X
2)
x,0)
X+ $ •ß^ x)*2
3)
x,#)
tiix***
4)
x,#)
+ dx + +
t*s
3yft
+ #3X
,#) x,0)
)
+$X
1
i?i + i log
x,0)
+ i?i
x,0)
8)
i?3)
59
$3
60)
1+$X + $AX2 1)1 e x p x ) + ^3 e x p x )
x,0)
6
Table 7 c o n s t e r u l t s . The model with minimal value of the crterion (1.33) (for 7 given by ( 5 ) ) is ( 1 . 3 ) . This model ranks only as eights for prediction purposes, i e . by the crossvalidation criterion (1.14), but still fulfils the rule of thumb (1.22) On the other hand, model (1.47) minimizing criterion (1.14) ranks only eleventh with respect to criterion (133) and exceeds the minimal value of that criterion even by more than 80 per cent. Notice that the models providing a value of 7 closest to the unbiased estimate "f[Yi
Tale model
Yk] = 0 0 2
of 7 rank best for criterion
: Crossvlidtio criterion
4)
33)
criteria and r a i n g of m o e l s fo ranking
criterion ( 3 3 )
OO
Figure
mple 3
ranking
000
8)
088
046
002
3)
0
0463
00280
095
07182
0026
0
08
8) 0) 51
20
15
11
6806
00264 00311
24
6020
003151
43
11
0097
408
2)
83
3)
08
002 46
002033
30
002
0
08759
002634
084
0
002623
0
0806
002633
088
0420
002
08
2
0026
59
0
08
002632
60)
23
06215
002599
6
51
063
00259
4) 6) 8)
11
15
displays
Fig
s t i m a i o n of g w t h r
R e l t s for mo
) and 7 g i v n b
model: f(x, J ) = J + J J ^ x RSS = 0.006917 1 2 3 g = 0.0258 CG = 0.0004637 J1 = 2.67 2 J = 0.973 J 3 = 0.873
0
C = 0.009434
5
10
15
20
25
30
age
the results for the best model
3) according to criterion
3)
E x a m p l e 4 (Bean root cells - simulated data). This example aims at showing the importance of taking into account the intended use of analyzing the data when an appropriate model is to be selected. The growth of bean root cells is a microscopic vegetative process where the dependence of the water content on the distance from growing tip is of interest.
A data
set of size 15 has been used by Ratkowsky [ 2 ] as an illustrating example, and as in the Example 1 it can be seen that the process produces a sigmoidally shaped growth curve. Ratkowsky [12] considered five competing models (all of them are among our candidates), but without arriving to a convenient model selection. Applying the crossvalidation criterion to the data would suggest the use of model (1.10). We generated a data set (see Table 8) of size 50, which mimics the original data as follows: After transforming the x-data to the interval (0,1) the model (111) which is an extension of the best model (1.10), was fitted to the data. The resulting homogeneous variance estimate a
= 0.868 was used to simulate
0 pseudo random
normally distributed variables with mean 0 and this variance. For 50 equidistant revalues on (0,1), the corresponding y-values were obtained by adding the simulated "errors" to the fitted curve
Tale
i m u d d a o
0
26
2
30
03
326
2
3
0
24
3
0
220
0 11
mple 4
62
79
2077
0
8
22679
83
2036
622
8
203
55 11
57
33
606
59
8
3
68
6
2068
8
11
004
3
099
63
1915
8
2222
079
3
820
6
859
91
2260
4
0
6
1946
0415
43
11322
6
86
19
75
4
3266
71
2
3477
4
288
23
386
4
330
75
20097
2
477
51
4
77
26
15
Tale model
: C r l i i o criterion 19
ranking
751 95
315
19311
97
20306
19623
99
2095
i t a and r i n g of m o s f criterion 32)00
ranking
mple
criterion 33)00
0
686
0
615
0)
0
64
36)
0
628
40)
330
6
62
0
603
23
42)
0
657
24
3)
0
6
686
4)
0
615
11
232
4683 46
91
ranking
We have used the same catalogue of models as in Example 1. Some of the model turned out to be not flexible enough, leading to numerical problems. The results for the different model selection criteria are contained in Table 9, showing the values of the full crossvalidation criterion (1.19) as well as the corresponding ranking of the models We have checked, that the values of the crossvalidation criterion (1.14) (and also the ranking of the models) differ only slightly from the values of the full crossvalidation 19). Crossvalidation favors the Morgan-MercerFlodin model (141) with four parame ters, whereas the fiveparameter model (1.11) assumed to be the true regression model in the simulated experiment ranks second, but with nearly the same value of the criterion. Figure 5 shows how model
4 ) fits to the data and presents a plot of the resulting
residuals Figure
imulated data of
xample 4: Results for model
model: f(x,d) = ( 1 x^ 2 + 3d4 )/(*4 + x^ 2) 1 = 22.035
12 = 4.479
RSS= 0.7639
13 = 1.569
4)
C= 0.901
#4 = 0.024
Although in the present example there is no interest in calibration, for the sake of comparison between the criteria we report the values of the criteria (1.32), (1.33) and the corresponding ranking of the models in Table
too. In case of
33) the parameter
of interest was the growth rate (1.5). The ranking of the models for calibration purposes is quite different from that in the case where prediction or regression function estimation was the primary objective of the data analysis. However the
o r g a n - e r c e r F l o d i n model
) is also the best
r caibraton prposes. If the estimation of the growth rate (1.5) would be the objective of the data analysis then the ranking of the models would be completely different from those in the other cases. The first choice would be the logistic model (1.10) which behaves slightly better than the Weibull model (1.7). But obviously three of the models are not appropriate in this case. Even the best model for prediction and calibration purposes violates slightly a rule of thumb accepting models /
with ^^
which is defined in analogy to
2
)
Selection of v r i a n c e estimation 2.1
odels and variance
election and ftting of v a r i a c e
odels
The observation variances a may be estimated by the intra sample estimates JO _
T
E ^V ^ ' IV I 1
if there are enough replications ( rn relatively large!). In such an exceptional situation these estimates may be used for calculating weighted LSE. An improved variance estimation may be possible using alternative variance models and taking into consideration, that possibly it is not sure that a certain variance model is adequate and moreover it is even not sure that a certain structure of the regression function is adequate. Our procedure (see Bunke, Droge and Polzehl [6]) is based on a least squares fitting of alternative variance models to conveniently defined "obser vations" z\,.,Zk-
Here the knowledge of adequate regression and variance models
(and normality of observations) are not necessary. The "observations" Z{ are defined in such a way, that they have (roughly) the variances of as their expectation, as it is exactly the case for the estimates sf given by (2.1). Assuming ordered univariate values X\ < x < •
< Xk of the independent variable we use the "observations
if
z; :
|
Here we use the residuals
if
>2 o replictions)
22)
in e m p l o i n g the ( b e t fitting) m o l f
This model is c h e n among the admitted
models fm as that with smallest sum Til =
and nk
) of squared errors see
3)). In the case of
we use z1 = -e1
and
zk
-ek
ekx\
24)
If the independent variable x is multivariate then we may use in case of rn =
where the residual
2 corresponds to the value Xj nearest to xf min
J
—
26)
If there are several such values, we take the value x3u with smallest index j(i)-) We fit alternative variance models of the form gx, a
) to the above "observations"
by minimization of the sum of squares )=
£
n
i
) \ x
v ) \ 5 2
i
rii>
z x a ) \ } nt
2
l
We propose six alternative variance models, which are especially useful and have been proposed in the literature, see Carroll and Ruppert 8]: ) The first is the exponential model x , a )
= a[fmoxJmo)
where we use the model fmo (ra 0 = m or section 1.4. The constant a is chosen as 2)
+
28)
) chosen by the procedure described in
ub-
:= .1
s an alternative model in place of 2 8 ) one may fit the model x, a
) = cexp[fmox,
•dmo)]
2
(3) Past experience or the residuals after fitting by ordinary least squares) may suggest that the variance af does not vary monotonously with the mean /(a;,), but behaves possibly approximately like a unimodal function of f(x{)
or like the reverse of such a
behaviour. Then we could alternatively fit a quadratic variance model x, a
)
=
x, S )
)+
x, d )
20)
where : 0
T /
x
ii"
^ '
b e l l h a p e d v i a n c e mo x, a
)= a
x, S))
/ = min, fm0(x} i?mo) / = max fmoxi
$mo)
nas
x, l )
) ,
22)
l e s s parameters and may be useful e g
for count data. ometimes also a simple linear model may be useful: x, a
)= c
T(fmox,
dmo
fmo)
23)
6) In many cases a homogeneous variance estimate a would be more accurate than a heteroscedastic estimate a
determined by a variance model, especially when the
differences between the variances a are moderate or small. Thus we would fit a constant model
x1a)
= a to our observations
nt
d obtain
£^*l
24
£ } ni
)
where q is the number of points Xi with rn > 2. Unfortunately, some of the six models have the disadvantage of possibly leading to negative estimates
:= g(xi,a,r)
for some design points X{. We replace the negative
(and also very small) estimates by some fixed small positive value, say by