Aust. N. Z. J. Statist. 42(2), 2000, 193–204
FITTING ROC CURVES USING NON-LINEAR BINOMIAL REGRESSION CHRIS J. LLOYD1 Australian Graduate School of Management Summary The performance of a diagnostic test is summarized by its receiver operating characteristic (ROC) curve. Empirical data on a test’s performance often come in the form of observed true positive and false positive relative frequencies, under varying conditions. This paper describes a family of models for analysing such data. The underlying ROC curves are specified by a shift parameter, a shape parameter and a link function. Both the position along the ROC curve and the shift parameter are modelled linearly. The shape parameter enters the model non-linearly but in a very simple manner. One simple application is to the meta-analysis of independent studies of the same diagnostic test, illustrated on some data of Moses, Shapiro & Littenberg (1993). A second application to so-called vigilance data is given, where ROC curves differ across subjects, and modelling of the position along the ROC curve is of primary interest. Key words: meta-analysis; non-linear regression; over-dispersion; receiver operating characteristic; vigilance data.
1. Introduction Suppose a diagnostic test T is used to detect the presence of a disease D, and in each case 1 indicates disease and 0 non-disease. The receiver operating characteristic (ROC) curve displays the trade-off that is possible between what statisticians call power and size. Define p1 = Pr(T = 1 | D = 1),
p0 = Pr(T = 1 | D = 0),
so that p1 is the probability of a true positive and p0 is the probability of a false positive. It is often at the discretion of the tester or experimenter to be more or less liberal with positive diagnoses in which case p1 and p0 both increase or decrease. We suppose, however, that as p1 changes so does p0 and that these probabilities trace out a curve in the unit square, called the receiver operating characteristic curve. When the test is applied to a different population then (p0 , p1 ) may remain unchanged, or move along the ROC curve, or move to an entirely different curve. In medical applications the term sensitivity is used for p1 and specificity for 1 − p0 . There is a considerable literature on the use and estimation of ROC curves; see Green & Swets (1966) for general background. Data used for inference on ROC curves come in four ‘flavours’. The simplest is binary data, where experimental units of known status are tested. The probabilities p1 and p0 are then directly estimable. The second type of data is called two-alternatives forced choice, where a positive and negative experimental unit are presented as a pair and the positive unit Received September 1998; revised May 1999; accepted July 1999. 1 Australian Graduate School of Management, University of New South Wales, Sydney, NSW 2052, Australia. e-mail:
[email protected] c Australian Statistical Publishing Association Inc. 2000. Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden MA 02148, USA
194
CHRIS J. LLOYD
is to be identified. So-called rating data arise when the tester is induced to rate the likelihood that each unit is positive on an ordinal scale. The last type of data arises when the tester or test produces a numerical value which forms the basis of the diagnosis. The linear predictor from a two-group discriminant analysis is one example of direct data with which statisticians are familiar. The general aim of such experiments is to understand test performance. To this end, we are interested in factors which may result in a different ROC curve and also factors which will move (p0 , p1 ) along a given curve. Methods for comparing two curves are given by Hanley & McNeil (1983) and Wieand et al. (1989). Metz & Kronman (1980) give a method for comparing several independent ROC curves by estimating the areas beneath them parametrically and then using standard analysis of variance. Moise et al. (1985) compare empirically estimated ROC curves using bootstrap. The approach presented here is based on direct parametric modelling of p1 and p0 using binary data. Some of the parameters of the model describe the ROC curve and some describe the position along the curve. The elements of the model are as follows: (1) (H, , θ) which determines the shape of the ROC curve(s). The ‘link function’ H is chosen rather than estimated. The shift parameter measures the quality of the curve, large positive values being good. The parameter θ measures asymmetry of the curve. (2) a linear model Xβ for the position of (p0 , p1 ) along the ROC curve; (3) a linear model W δ for the shift parameter to allow for factors which may change the ROC curve. The model is a logistic regression model for fixed θ. The parameter θ is easily estimated by grid search of the likelihood or by constructing an added variable. The plan of the paper is as follows. Section 2 describes the data, a family of models, the interpretation of parameters and a method of fitting. In Section 3 the model is applied to meta-analysis of several independent studies of the same diagnostic test. Section 4 gives two extensions of the model. In Section 5, the generalized model is applied to a vigilance experiment in which both the ROC curve and the position along it are affected by covariates. 2. Data and model 2.1. Data The data comprise independent measurements (yi , di , xiT ) on n experimental units, where (i) yi is the predicted disease status (0 or 1), (ii) di is the true disease status and is assumed to be measured without error, (iii) xiT is a set of m covariates. The binary distribution of Yi is described through pd (x) = Pr(Y = 1 | D = d, X = x). Let Y be the binary response vector and X the matrix with ith row xiT . It is convenient to order both Y and X by true disease status, i.e. to arrange that the first N1 individuals have the disease and the next N0 do not, and to partition Y into sub-vectors Y1 and Y0 and similarly X into the matrices X1 and X0 of dimension N1 × m and N0 × m respectively. It is implicit in our formulation that X does not contain a column of 1s in its column space. 2.2. Latent distributions To build a statistical model for p0 and p1 we imagine an unobserved continuous latent variable Z and threshold parameter γ such that when the observed value z > γ the disease c Australian Statistical Publishing Association Inc. 2000
FITTING ROC CURVES USING NON-LINEAR BINOMIAL REGRESSION
195
is diagnosed. If we let Fd (z) = Pr(Z ≤ z | D = d), d = 0, 1, be the distributions of the latent variable on the non-disease and disease groups, then the ROC curve is a graph of p1 = 1 − F1 (γ ) against p0 = 1 − F0 (γ ) and so an explicit form for the ROC curve is p1 = 1 − F1 F0−1 (1 − p0 ) . It is not necessary that a latent variable should actually exist, though in many cases it may. A family of parameteric curves for F0 and F1 generates a parametric family for the ROC curve and it has commonly been assumed that F0 and F1 are normal with different means and variances. For rating data, Dorfman & Alf (1968) give an algorithm for estimating the normal means and variances and hence the ROC curve, which has been used extensively. Note that if Z is transformed by any monotonic transform then the ROC curve is unaffected, so the normal assumption is perhaps less restrictive than might first be thought. Other suggested families are the one-parameter exponential family (England, 1988), the two-parameter logistic family (Moses, Shapiro & Littenberg, 1993) and the three-parameter Lomax family (Campbell & Ratnaparkhi, 1993). For binary data, the latent distributions are not directly estimable. However, p0 and p1 are directly estimable and the supposed parametric models for F0 , F1 generate a family of parametric models for the ROC curve. 2.3. Location-scale models for the latent variable The distribution of the latent variable is likely to depend not only on true disease status but also on other covariates, such as the age and sex of individuals as well as the conditions under which the diagnostic test is performed (for instance, different doctors). Let x be a vector of m such covariates and Fd (z | x) be the conditional distribution given true status d and covariate x. Our model supposes that Fd (z | x) = H
z − µ − β Tx d σd
(d = 0, 1)
(1)
for some continuous distribution H with median zero and standard spread. In words, we are assuming that (at least after an arbitrary monotonic transformation) the latent distributions are within a location-scale family with possibly differing scales and location of the form µd +β T x where β is a vector of m unknown parameters. 2.4. Implied ROC curves The ROC curve is a graph of p1 against p0 as γ varies, where pd = Pr(Z > γ | d) = 1 − Fd (γ | x). Letting G(z) = 1 − H (−z) be the ‘reflected’ distribution function, (1) may be re-expressed as µd + β T x − γ G−1 (pd ) = . (2) σd The unobserved threshold γ used in the diagnostic test may or may not depend on the covariates x but we suppose that this dependence may be well approximated by a linear function cT x of these covariates. Failure of this assumption would show up as lack of fit of the model, and further terms, such as extra powers of some components of x, may always be included. c Australian Statistical Publishing Association Inc. 2000
196
CHRIS J. LLOYD
We make the following reparameterizations: β ∗ = (µ0 , β T − cT )T /σ0 , x ∗ = (1, x T )T . Then (2) becomes µ1 − µ0 σ0 G−1 (p1 ) = + β ∗Tx∗, G−1 (p0 ) = β ∗ T x ∗ , (3) σ1 σ1 and isolating p1 the ROC curves have the form p1 = G + θ G−1 (p0 ) ,
where =
µ1 − µ0 σ0 and θ = . σ1 σ1
(4)
2.5. Parameter interpretation A common measure of test quality is the area beneath the ROC curve, which actually equals Pr(Z1 > Z0 ) where Zi are independent with distribution Fi (see Bamber, 1975). For the present model, this area equals A = Pr("1 < + θ"0 ), where ("0 , "1 ) are independent variables with distribution G. So long as the distribution G is symmetric, simple geometric considerations imply that A is a decreasing function of the perpendicular distance of the line ψ1 = + θψ0 from the origin (ψ0 , ψ1 ) = (0, 0). This perpendicular distance is given by √
∗
=
2 µ1 − µ 0 = . 2 1+θ (σ02 + σ12 )/2
Note also that if ψd = G−1 (pd ) then the ROC curve (4) has precisely the form ψ1 = +θψ0 and so ∗ directly measures the closeness of the ROC curve to the origin, when viewed in G−1 -space. The parameter θ is a shape parameter allowing for asymmetry and there is usually no reason to assume θ = 1. Its inclusion allows extra flexibility in the ROC curve and accounts for the very likely different scales of the distributions F0 and F1 . Note that β T x does not appear in (4). This term does not model the curve itself but rather the position along the curve. 2.6. Partially linear binomial model The model (3) may be written in the form ψd (x) = d + 1 + (θ − 1)d x ∗T β ∗
(d = 0, 1).
(5)
For fixed θ the model is linear in the (m + 1) parameters ( , β T ). Recall that we organized the binary response vector Y and matrix X by true status d. Then for fixed θ the design matrix of this binomial regression model is d θ X∗ 1 , ∼ X0∗ where d is a vector of N1 1s followed by N0 0s, and X0∗ , X1∗ , X∗ are just X0 , X1 , X with ∼ an extra column of 1s. The likelihood must be further maximized with respect to the nonlinear parameter θ, either by grid-search or by constructing an appropriate added variable as c Australian Statistical Publishing Association Inc. 2000
FITTING ROC CURVES USING NON-LINEAR BINOMIAL REGRESSION
197
described later. Model (5) is essentially equivalent to the model of Rutter & Gatsonis (1995) in the context of meta-analysis. Further comparisons are made in the next section. 3. Application to meta-analysis The standard empirical description of a test’s performance is a 2 × 2 table listing the number of true positive diagnoses Y1 out of N1 diseased individuals, and the number of false positive diagnoses Y0 out of N0 non-diseased individuals. Use subscript j for the j th of k such independent studies of the same test. For each study the empirical true and false positive rates estimate the true and false positive probabilities p1j and p0j for that study. The pairs (p0j , p1j ), j = 1, . . . , k, collectively summarize the test accuracy. Meta-analysis aims to provide a simpler summary than these k points, by assuming that they lie on different positions along a common ROC curve. To see how the theory of Section 2 applies, let us assume that (6) ψ1j = + θψ0j , where ψdj = G−1 (pdj ). Henceforth, we assume that G−1 is the log-odds transformation which corresponds to assuming the existence of latent variables Z0 , Z1 with logistic distributions. An equivalent expression for the model (6) is ψ1j = + θ φj ,
ψ0j = φj .
(7)
The parameters φj represent differing baseline positive diagnosis rates across the studies and ( , θ ) describe the assumed common underlying ROC curve. This common curve comes from the family (4), which here takes the logistic form p1 =
e p0θ
(1 − p)θ + e p0θ
.
In this context, model (6) has already been proposed by Karduan & Karduan (1990), Moses et al. (1993) and Rutter & Gatsonis (1995). These papers differ mainly in the suggested fitting procedure. Karduan & Karduan (1990) suggest fitting based on empirical logits. The logit ψdj is estimated by log Ydj − log(Nd − Ydj ), the variances are estimated in the usual manner and (6) is treated as a simple linear regression and fitted by inverse variance weighted least squares. The main advantage of this approach is that θ becomes a linear parameter. Moses et al. (1993) instead fit the model by regressing Dj = ψ1j − ψ0j on Sj = ψ1j + ψ0j using either ordinary least squares or a robust method such as median regression. On their ‘S-D’ scale the intercept of the line is 2 /(θ + 1) and the slope is (θ − 1)/(θ + 1). This has two virtues: first, disease and non-disease are treated symmetrically; second, the slope is zero when θ = 1, which makes graphs in S-D space easier to interpret. Such graphs can be used without accepting the fitting method they propose; we use S-D graphs in a quite different example in Section 5. One weakness of the fitting method of Moses et al. (1993) is that it is less efficient than maximum likelihood (ML) assuming that model (7) is true. At the very least one would expect to give more weight to larger than to smaller studies. Of course, if the model is wrong, for instance if Ydj do not have binomial distributions, then ML implicitly uses incorrect weights. The fitting method of Karduan & Karduan (1990) is equivalent to ML provided that the expected values of the Ydj and Nd − Ydj are large (see Lloyd, 1999 Section 4.5). Both methods c Australian Statistical Publishing Association Inc. 2000
o oo o o
o oo o o o
o 0.8
o
o
o
0.6
BR BR1 LS
o
0.4
o o
True postives
o
0.6
0.8
o
True postives
1.0
CHRIS J. LLOYD 1.0
198
o o
X
OD BR
o
o
0.0
0.4
0.2
o
o
X 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
False positives
False positives
Figure 1. Meta-analysis of Moses et al. (1993) data. Left: empirical true and false positive fractions, with three model estimates of ROC curve. Study marked ‘X’ was omitted from the analysis. Right: comparison of binomial and over-dispersed binomial model estimates of ROC curve. Plotting symbols for empirical fractions are proportional to the over-dispersion weights.
face difficulties if it is desired to included very small-scale studies, or to break the studies down by further factors, which the common practice of adding 21 to all counts only partially solves. Rutter & Gatsonis (1995) fit model (7) by ML, but specifically using a more general ordinal regression program PLUM written by McCullagh (1979). Our alternative is to use binomial regression directly. The response vector comprises the vector Y1 of true positive counts followed by the vector Y0 of false positive counts. In the notation of Section 2, we take β = φ = (φ1 , . . . , φk )T , the design matrices X1∗ and X0∗ are both the identity matrix Ik of dimension k. The full design matrix is then 1 θI k , X(θ) = ∼ 0 Ik ∼
where 0, 1 are k-vectors of 0s and 1s. Rather than searching for the value of θ that maximizes ∼ ∼ the likelihood, one may fit a constructed variable as defined in the appendix, which for this model is zT = (φˆ T , 0 ). A fitting algorithm is as follows. ∼
Step 1. Choose a starting value θ. One possibility is the estimate obtained from the method(s) of Moses et al. (1993). ˆ Step 2. Fit the logistic regression model with design matrix X(θ) to obtain the estimate φ. Compute fitted values ψˆ using the formula ˆ + φˆj , ψˆ 0j = φˆj . ψˆ 1j = ˆ 0 ) and offset ψ. ˆ Step 3. Fit the logistic model with the single covariate (φ, ∼ ˆ ˆ Step 4. Replace θ by θ + δ where δ is the slope parameter estimated in Step 3. Go to Step 2 and repeat until convergence. We illustrate these methods on the data of Moses et al. (1993 Table V) who describe the background. There was one obviously aberrant study where p1 < 0.5 < p0 which Moses et al. (1993) omitted from their calculations. The symbol ‘X’ marks this study in the left graph of Figure 1 which plots empirical true positives against false positive fractions for 14 studies. Note that both the ordinate and coordinate of each point are random. c Australian Statistical Publishing Association Inc. 2000
FITTING ROC CURVES USING NON-LINEAR BINOMIAL REGRESSION
199
TABLE 1 Fits to meta-analysis data of Moses et al. (1993) Method BR BR1 (θ = 1) LS (Moses et al.)
1.91 2.86 2.54
θ 0.394 1.000 1.197
All data ∗ 2.51 2.86 2.30
G2 /df 41.25/12 46.45/13 55.67/12
2.76 3.07 2.80
Study 79 omitted θ ∗ G2 /df 0.812 3.03 26.00/12 1.000 3.07 26.16/13 1.049 2.73 28.77/12
Table 1 lists estimates of , θ and ∗ as well as the likelihood ratio goodness-of-fit statistic G2 under the binomial model using three methods: (i) both and θ estimated from binomial regression likelihood (BR), (ii) only estimated from binomial regression with θ = 1 assumed (BR1), (iii) both θ and estimated by the least squares method (LS) of Moses et al. With the aberrant point included there is considerable disagreement between the BR and LS fits. With this point omitted there is less disagreement and the implied ROC curves are displayed in the left graph of Figure 1. Rutter and Gatsonis’ fitting method is ultimately identical to BR1. The method of Karduan and Karduan gives, in this example, a curve which is very close to BR1 and is not displayed. It is worth pointing out that there would be other ways of applying the model (1) if study level covariates were available. For instance, rather than fitting a different baseline φj to each study, one could model the dependence of this baseline on, say, age-group. 3.1. Over-dispersion model The large deviance of 26 on 12 degrees of freedom makes it virtually impossible that the fitted binomial model is correct. The correctness of the least squares approaches of Moses et al. (1993) and Karduan & Karduan (1990), which rely on assumptions about the variances of the empirical logits, cannot be tested from the data. There are many ways in which we could refine the binomial regression for this example. For instance, it seems unlikely that 14 applications of a diagnostic system by different doctors to different patient groups could or should be summarized by a single ROC curve. If there were any basis for grouping the 14 studies then one could easily fit an extended model as described in the following section. The sub-matrix W in (9) below would contain dummy variables defining this grouping. However, if we insist on a single curve, the over-dispersion of the data should also be modelled. The problem with not doing so is two-fold. First, the standard fitting procedure gives extra weight to studies with more data. Large between-study variability renders this weighting inappropriate and the estimated ROC curve is inefficient. Second, tests of significance are too liberal. For instance, standard errors are misleadingly small if only within study variability is modelled. The method of Williams (1982) supposes that p1 and p0 are random variables with var(pd ) ∝ τ E(pd )(1 − E(pd )) where τ is the over-dispersion parameter. This model implies that the usual binomial weights are divided by (1 + τ (ni − 1)). Even if this model is not exactly correct, it does directly address the two problems of over-dispersion mentioned above; an extremely simple fitting algorithm based on the method of moments is described in Lloyd (1999 Section 4.4). Rutter & Gatsonis (1995) have suggested a different model for over-dispersion. Their random effects model assumes that the unobservable decision thresholds γi and accuracies i of the k studies are independent normal variables with ¯ and unknown variances and they use Bayesian methods after assuming diffuse means γ¯ , priors on all unknown parameters. c Australian Statistical Publishing Association Inc. 2000
200
CHRIS J. LLOYD
TABLE 2 Vigilance experiment data, from Altham (1973) Segment
Stimulus
A
B
C
D
E
F
First 250
Signal (50) Noise (200) Signal (50) Noise (200)
44 29 39 22
47 19 45 13
40 31 34 24
38 36 37 30
45 24 32 18
41 48 34 34
Second 250
The Williams’ over-dispersion model is fitted by cycling between fitting the binomial regression model with weights 1/(1 + τ (ni − 1)) and estimating τ from the binomial regression model with matrix X(θ). After three iterations the estimated parameters converge ˆ = 4.163 with over-dispersion parameter τˆ = 0.0586. The implied ROC to θˆ = 1.798 and curve is plotted as a solid line in the right graph of Figure 1. To add some insight, the points have been graphed with characters proportional to the final weighting 1/(1 + 0.0586(ni − 1)), averaged over the ordinate and coordinate. 4. Extensions of the model The ROC curves (4) do not depend on x, so our model (5) implies that diagnostic performance, as measured by the ROC curve, does not depend on x. There are two simple generalizations of our model which allow dependence on x. When θ = 1, model (3) is a parallel lines model — it assumes that the regression parameters β1 of ψ1 on x and β0 of ψ0 on x have the common value β. When θ = 1 these slope parameters are not equal but are proportional. Our first generalization is to impose no restrictions at all on β1 and β0 . In this case, taking the parameter vector to be ( , β1T , β0T ) the design matrix becomes d X∗ 0 1 ∼ . (8) 0 X0∗ We can check that this model implies ROC curves of the form (4) but with slope θ = 1 and separation + (β1T − β0T )x depending on the covariate x. Another approach to generalizing (5) is to directly model the parameter in terms of a set of covariates, either the same covariates x used to model position, or a subset of these, or on further covariates. Because is multiplied by d in (5) we need only model it for the disease group. Thus suppose that i = wiT δ
(i = 1, . . . , N1 ),
where δ is a vector of k unknown parameters. Arrange the wiT row-wise into an N1 × m matrix W. For the parameterization (δ T , β T ), the design matrix of the implied binomial regression model is W θ X1∗ , (9) 0 X0∗ where 0 is an N0 × m matrix of 0s. The ROC curves generated have free asymmetry parameter θ and shift parameter = wT δ. The design matrices X0∗ , X1∗ model the position of individuals along their ROC curve defined by ( i , θ). c Australian Statistical Publishing Association Inc. 2000
0.9
BB
3.0
3.5
A
D
E
A A
E
FF
D
C C
o
D
o o
0.7
C
oo
0.8
true positive
4.0
E
o
oo
F
oo
o 0.6
2.5
logit difference D
4.5
B
-1.5
-1.0
-0.5
0.0
0.05
0.10
Delta* compared to subject B
0.9
A D
F
0.6
0.7
0.8
C
0.20
0.25
o
0 B
E
0.15 false positive
1.0
logit sum S
true positive
201
1.0
FITTING ROC CURVES USING NON-LINEAR BINOMIAL REGRESSION
-0.5 -1 -1.5
o
o o
o
o o
o
-2
o
o o
-2.5 -3
0.0
0.05
0.10
0.15
0.20
0.25
A
false positive
B
C
D
E
F
subject
Figure 2. Analysis of vigilance data of Altham (1973). Top left: S-D graph, joined by lines according to subject. In each case, the subject label is plotted at the data point for segment 1. The other end of the line is for segment 2. Top right: comparison of empirical (solid) and fitted (dotted) true and false positive fractions, joined by lines according to subject. Bottom left: binomial modelled ROC curves for subjects A–F and movement along them from segment 1 to segment 2. ˆ ∗ , relative to subject B. Bottom right: effect of estimating θ on estimated subject accuracies i
We may test the hypothesis that the ROC curve is independent of x or w by formally testing models (5) and (8) against model (9). 5. Application to vigilance data Table 2 lists data published by Altham (1973) after correction of a typographical error. Six subjects labelled A–F were each presented with 500 cases and asked to classify them as either signal or noise. The table lists the number of signal classifications. The term ‘vigilance’ describes the ability of human subjects to detect auditory signals embedded in noise as well as the tendency to hear signals which are non-existent. Unknown to the subjects, there were exactly 50 signals among the first 250 cases and 50 signals among the second 250 cases. One issue of interest is whether or not subjects tend to diagnose fewer signals as the experiment progresses, i.e. in the second segment, and the uniformity of this effect across subjects. The top left graph of Figure 2 gives the S-D graph of Moses et al. (1993). The logit probabilities of true and false positives, ψ1 , ψ0 , are estimated by substituting empirical frequencies c Australian Statistical Publishing Association Inc. 2000
202
CHRIS J. LLOYD
(adding 21 to each) and Dˆ j = ψˆ 1j − ψˆ 0j are plotted against Sˆj = ψˆ 1j + ψˆ 0j for the 12 subject or segment combinations. The ordinate S is a natural measure of vigilance. The co-ordinate D is the ordinary log-odds ratio for that subject or occasion. The graph suggests that (i) the six subjects have different ROC curves, (ii) that the slopes are positive suggesting that θ > 1, (iii) that subjects move along their ROC curve to the left as the experiment progresses from segment 1 to 2. Further important features of the data may become apparent as the analysis progresses. We now employ the extended model (9) to account for the features (i)–(iii). The model matrix W allows different values δ = ( 1 , . . . , 6 ) describing each subject; W is the identity matrix I6 . The model matrix X ∗ , which describes the position of individuals along their curve, depends on segment. Specifically, we let X ∗ be the 12 × 2 matrix with first column all 1s and second column indicating the second segment. The associated parameters are β T = (α, s2 ) with α representing the initial position on the curve and s2 the change from segment 1 to 2 (which we expect will be negative). Upon fitting this model, the estimated parameters are θˆ = 2.22, αˆ = −1.68, sˆ2 = −0.326 and δˆ = (5.74, 6.61, 5.19, 5.24, 5.35, 5.24). However, the likelihood ratio goodnessof-fit statistic is 38.34 on 24 − 9 = 15 degrees of freedom, so that P = 0.001, indicating a poor fit. Note that the poor fit results in a rank ordering of subjects BAEDFC that seems to conflict with the top left graph of Figure 2, (where BEACDF is strongly suggested), though A and E are close, as are C, D and F. Looking further at the top left graph of Figure 2, it seems clear that subjects C and D are significantly less vigilant (to the left) of the other subjects at the beginning of the experiment. We thus extend the model to allow the initial position α along the ROC curve to depend on subject (in addition to segment). To do this we replace the first column of X ∗ , which previously was a column of 1s, with six columns indicating subjects A through F and associated parameters α1 , . . . , α6 . The matrix X ∗ is now a standard design matrix for the two-way classification on segment and subject. The estimated parameters are θˆ = 2.18, sˆ2 = −0.332, δˆ = (5.84, 7.84, 5.1, 4.67, 5.93, 4.1), αˆ = (−1.77, −2.29, −1.68, −1.46, −1.99, −1.20). The deviance is 5.15 on 24 − 14 = 10 degrees of freedom which is a good fit (P = 0.881). One could test several hypotheses from this model. (1) Are the αi really different? The previous deviance of 38.34/15 is 33.19 larger for 10 extra degrees of freedom, so evidence against a constant α is strong. (2) Could s2 , measuring change in vigilance from segment 1 to segment 2, be different across subjects? One could fit this model by replacing X ∗ by the 24 × 11 matrix from a two-way interaction model in segment and subject. However, the drop in deviance could be no greater than 5.15 and the number of extra parameters is 5, so there is no evidence that s2 differs across subjects. (3) Do the i differ? We simply replace W by a single column of 1s. The resulting deviance is 49.25 on 15 degrees of freedom which is strong evidence against a common value of . (4) Is θ = 1 ? Fitting the model with θ = 1, the deviance rises to 7.79 with associated P -value 0.104. In other words, there is moderate statistical evidence that θ = 1, or that the slopes in the top left graph of Figure 2 are not zero. The fit of the model is summarized in the top right graph of Figure 2, which plots raw and model-based estimates of p1 against p0 , using solid and dotted lines, respectively. Since the deviance of the model is so low, it is no surprise that there is good agreement between the two. The bottom left graph of Figure 2 shows the modelled ROC curves for each subject with c Australian Statistical Publishing Association Inc. 2000
FITTING ROC CURVES USING NON-LINEAR BINOMIAL REGRESSION
203
the path from segments 1 to 2 traced out as a darker line. Apparently the curves for subjects A and E are very close, but the position of the subjects on the curve may be different. This could be tested by comparing estimated α s for these two subjects, or by refitting the model with these two parameters set equal. Some comments on the fitted model follow. The estimates αˆ i are all negative because they are measured on the logit scale, in contrast to the right ordinates in the top left graph of ˆ i now give a rank ordering of Figure 2 which are on the ‘logit-sum’ scale. The estimates subjects in agreement with top right graph of Figure 2. Our modelling indicates that subjects differ both in their ROC curves and in their starting positions on this curve, but one can assume that the reduction in vigilance from the first to the second segment is equal for all subjects (on the log-odds scale). ˆ ∗, . . . , ˆ ∗, The bottom right graph of Figure 2 plots the estimated subject parameters 1 6 standardized to equal zero for subject B, first under our fitted model with θ = 2.18 and second under the model with θ = 1 assumed. The model with θ estimated suggests somewhat larger differences between the accuracy of subjects than does the restricted model. The estimate of s2 ≈ −0.33 is largely unchanged over all these fitted models, even the very poorly fitting models mentioned earlier, and so we could be most confident in its value. Appendix: Fitting non-linear logistic models Let Y be an n × 1 vector of binomial variables and let the linear predictor be of the form ψ = X(γ )β, where X(γ ) is a family of n×m matrices depending on the non-linear parameters (γ1 , . . . , γq ) = γ and β is a vector of m linear parameters. For simplicity we suppose that the link is canonical or logistic; however, the algorithm given extends to any link function. Denote by XkT the n × m matrix with entries ∂Xij ∂γk and let zk (β) = XkT β, being a constructed covariate for the parameter γk . The components of the likelihood score function are xjT (y − µ) = 0
(j = 1, . . . , m),
zkT (β)(y − µ) = 0
(k = 1, . . . , q),
where xj is the j th column of X. Apart from the dependence of the constructed variables zk (β) on β and the design matrix X(γ ) on γ , these equations are identical to those from a logistic regression with design matrix (X, Z). The following algorithm exploits this partial linearity. Step 1. Choose γ0 . Step 2. Fit the model X(γ0 ) to obtain βˆ0 . Step 3. Compute the fitted values ψˆ 0 = X(γ0 )βˆ0 and the constructed covariates matrix Zˆ 0 with columns zk (βˆ0 ), k = 1, . . . , q. Step 4. Fit the model ψˆ 0 + Zˆ 0 δ where δ is a parameter of dimension q. The known vector ψˆ 0 is commonly called an offset. Step 5. Let γj +1 = γj + δˆj and repeat until convergence. c Australian Statistical Publishing Association Inc. 2000
204
CHRIS J. LLOYD
References ALTHAM, P.M.E. (1973). A non-parametric measure of signal discriminability. Brit. J. Math. Statist. Psychol. 26, 1–12. CAMPBELL, G. & RATNAPARKHI, M.V. (1993). An application of Lomax distributions in receiver operating characteristic (ROC) curve analysis. Comm. Statist. 22, 1681–1697. DORFMAN, D.D. & ALF, E. Jr (1968). Maximum likelihood estimation of parameters of signal detection theory — a direct solution. Psychometrika 33, 117–124. ENGLAND, W.L. (1988). An exponential model used for optimal threshold selection in ROC curves. Med. Dec. Making 8, 120–131. GREEN, D.M. & SWETS, J.A. (1966). Signal Detection Theory and Psychophysics. New York: Wiley. HANLEY, J.A. & McNEIL, B.J. (1983). A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148, 839–843. KARDUAN, J.W.P.F. & KARDUAN, O.J.W.F. (1990). Comparative diagnostic performance of three radiological procedures for the detection of lumbar disk herniation. Methods of Information in Medicine 29, 12–22. LLOYD, C.J. (1999). Statistical Analysis of Categorical Data. New York: Wiley. McCULLAGH, P. (1979). PLUM: an Interactive Package for Analysing Ordinal Data. University of Chicago, Department of Statistics. METZ, C.E. & KRONMAN, H.B. (1980). Statistical significance tests for binomal ROC curves. J. Math. Psychol. 22, 218–243. MOISE, A., CLEMENT, B., DUCIMETIERE, P. & BOURASSA, M.G. (1985). Comparison of receiver operating curves derived from the same population: a bootstrapping approach. Comp. Biom. Res. 18, 218–243. MOSES, L.E., SHAPIRO, D. & LITTENBERG, B. (1993). Combining independent studies of diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat. Med. 12, 1293– 1316. RUTTER, C.M. & GATSONIS, C.A. (1995). Regression methods for meta analysis of diagnostic test data. Academic Radiology 2, S48–S56. WIEAND, S., GAIL, M.H., JAMES, B.R. & JAMES, K.L. (1989). A family of non-parametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76, 585–592. WILLIAMS, D.A. (1982). Extra-binomial variation in logistic linear models. Appl. Statist. 31, 144–148.
c Australian Statistical Publishing Association Inc. 2000