Ideal-Observer Performance Under Signal and Background Uncertainty S. Park1 , M. A. Kupinski2,3 , E. Clarkson1−3 , and H. H. Barrett1−3 1
Program in Applied Mathematics, The University of Arizona at Tucson,
[email protected], 2 Department of Radiology, The University of Arizona at Tucson, 3 Optical Sciences Center, The University of Arizona at Tucson
Abstract. We use the performance of the Bayesian ideal observer as a figure of merit for hardware optimization because this observer makes optimal use of signal-detection information. Due to the high dimensionality of certain integrals that need to be evaluated, it is difficult to compute the ideal observer test statistic, the likelihood ratio, when background variability is taken into account. Methods have been developed in our laboratory for performing this computation for fixed signals in random backgrounds. In this work, we extend these computational methods to compute the likelihood ratio in the case where both the backgrounds and the signals are random with known statistical properties. We are able to write the likelihood ratio as an integral over possible backgrounds and signals, and we have developed Markov-chain Monte Carlo (MCMC) techniques to estimate these high-dimensional integrals. We can use these results to quantify the degradation of the ideal-observer performance when signal uncertainties are present in addition to the randomness of the backgrounds. For background uncertainty, we use lumpy backgrounds. We present the performance of the ideal observer under various signal-uncertainty paradigms with different parameters of simulated parallel-hole collimator imaging systems. We are interested in any change in the rankings between different imaging systems under signal and background uncertainty compared to the background-uncertainty case. We also compare psychophysical studies to the performance of the ideal observer.
1
Introduction
We take as fundamental the idea that image quality must be defined by the performance of an observer on a specified task. The tasks of interest in medical imaging can be categorized as detection tasks or estimation tasks. This work is focused on signal-detection tasks and the Bayesian ideal observer. The Bayesian ideal observer has all the statistical information about the data and makes optimal use of this information; it sets an absolute upper bound on task performance as measured by many common figures of merit derived from the ROC (receiver operating characteristic) curve. One such figure of merit is
the area under the curve (AUC). We use the AUC of the ideal observer as a measure of the quality of the image data produced by imaging hardware. Since the ideal observer requires full knowledge of the statistics of the image data, it is difficult to compute the ideal observer test statistic, the likelihood ratio. Instead of using the ideal observer, there has been a lot of work done on the performance of non-optimal observers in different kinds of backgrounds and signals. On the other hand, to make the ideal-observer computation tractable, unrealistic assumptions such as fixed backgrounds have been made in past work. Clarkson and Barrett [4] have developed mathematical methods to approximate the AUC of the ideal observer without having to assume such backgrounds. On the performance of the ideal observer in random backgrounds and fixed signals, Kupinski et al.[7] have developed computational methods to estimate the likelihood ratio using Markov Chain Monte Carlo (MCMC) techniques. In this work, we extend these computational methods [7] to estimate the likelihood ratio in the case where both backgrounds and signals are random with known statistical properties. The computation has been done using MCMC techniques. We present the results of simulation studies in which we compare the ideal-observer performance under various signal-uncertainty paradigms for different parameters of simulated parallel-hole collimator imaging systems. We shall see the degradation of the ideal-observer performance and find out how the ranking changes between three different imaging systems under signal and background uncertainty compared to the fixed signal cases. We also present psychophysical studies to compare the ideal observer and human observers under signal and background uncertainty.
2
Background
The imaging process can be represented mathematically by g = Hf + n
(1)
where H is a continuous-to-discrete imaging operator which maps an object f to an M × 1 vector of image data g, where f is a function of continuous variables, and n is an M × 1 vector of measurement noise. A linear continuous-to-discrete imaging operator H can be mathematically represented by Z drhm (r)f (r) + nm
gm =
(2)
S
where r is a 2D spatial coordinate, S is a field of view (FOV), hm is the mth sensitivity function of H, and gm and nm are elements of g and n. 2.1
Problem Setting
The goal of a signal-detection task is to determine whether or not a signal, such as a tumor, is present. We would like to know how imaging systems perform on
signal-detection tasks, i.e., how well observers such as human observers, model observers, or the Bayesian ideal observer, can determine the presence of a tumor in images generated by imaging hardware. We consider the tumor to be a signal f s in a random background f b , so imaging between two hypotheses can be represented mathematically by H0 : g = Hf b + n H1 : g = H(f b + f s ) + n.
(3) (4)
For notational convenience, we define the background and signal image to be b = Hf b
(5)
s = Hf s .
(6)
In this work, we consider signal-known-statistically (SKS) tasks where both the signal s and background b are random compared to signal-known-exactly (SKE) tasks where s is fixed and just the background b is random. Gaussian blur functions are used for hm (r) to simulate our simplified parallel-hole collimator imaging systems. 2.2
Ideal Observer
The ideal observer computes a test statistic, the likelihood ratio, and compares it to a threshold to make a decision between the two hypotheses. The likelihood ratio is defined as pr(g|H1 ) (7) Λ(g) = pr(g|H0 ) where pr(g|Hj ) is the probability density of image data g under the hypothesis Hj . By varying the threshold and plotting TPF vs FPF, an ROC curve can be generated for the task. The AUC is a common scalar figure of merit that is maximized by the ideal observer. In this sense, the ideal observer measures the amount of detectable information of an image system. Therefore we employ the ideal observer as our observer and the AUC as our figure of merit for detection tasks. We assume that s is statistically independent of b. We can rewrite (7) as a ratio of integrals over random backgrounds and signals [1],[11], [7], R R db ds pr(b)pr(g|b, s, H1 )pr(s) R Λ(g) = (8) db0 pr(b0 )pr(g|b0 , H0 ) where b and s are random backgrounds and signals, respectively. For an imaging system with Poisson noise n, pr(g|b, s, H1 ) =
M Y
m=1
e−(bm +sm )
(bm + sm )gm , gm !
(9)
and
M Y
pr(g|b, H0 ) =
e−(bm )
m=1
(bm )gm . gm !
(10)
To make the computation of the likelihood ratio feasible, we define ΛBSKE as ΛBSKE (g|b, s) =
pr(g|b, s, H1 ) . pr(g|b, H0 )
(11)
where ΛBSKE is the background and signal-known exactly (BSKE) likelihood ratio in the case where the background and signal are fixed and not random. Substituting (9) and (10) in the BSKE likelihood ratio in (11), we get ΛBSKE (g|b) =
M µ Y
m=1
sm 1+ bm
¶g m
e−sm .
(12)
We are now able to rewrite the likelihood ratio on random backgrounds and signals in terms of an integral of the BSKE likelihood ratio, i.e., the posterior mean of the BSKE likelihood ratio [7], Z Z Λ(g) = ds db ΛBSKE (g|b, s)pr(b|g, H0 )pr(s), (13) where
2.3
pr(b|g, H0 ) = R
pr(g|b, H0 )pr(b) 0
0
0
db pr(g|b , H0 )pr(b )
.
(14)
Object Models
To describe background uncertainty (BU), we use lumpy backgrounds. These backgrounds were proposed by Rolland and Barrett [10] in an attempt to estimate human and model performances with more realistic looking backgrounds than flat or Gaussian backgrounds. Lumpy backgrounds are mathematically represented by f b = fb (r) =
N X
L(r − cn |a, s),
(15)
n=1
where r is a 2D or 3D spatial coordinate, N is the random number of lumps in the object (Poisson with mean N ), L(·) is the lump function, cn , the center of the nth lump, is randomly chosen by a uniform distribution, and a and s are the fixed magnitude and width of the lump function. For our work, we use 2D-lumpy backgrounds with circularly symmetric Gaussian profiles. From (2),(5),and (15), the mth element of the background vector b can be written as N Z X bm = dr hm (r)L(r − cn |a, s) . (16) n=1
S
We can take advantage of our knowledge of the background statistics to learn how the variability in the backgrounds affects the performance of the ideal observer with different parameters of simulated parallel-hole collimator imaging systems. 2.4
Signal Models
To model signal uncertainty, we use circularly symmetric Gaussian signals with random locations for location uncertainty (LU), elliptical Gaussian signals for shape uncertainty (SU), and both for shape and location uncertainty (SLU). Signals are mathematically represented by f s = fs (r) = as exp{−[R† (r − c)]† D−1 [R† (r − c)]}
(17)
where r is a 2D spatial coordinate, as is the magnitude of the signal function, c is the center of a signal, R is a rotation matrix, and D is a diagonal matrix whose diagonals are 2σ12 and 2σ22 : µ ¶ cos θ sin θ R= (18) − sin θ cos θ and D=
µ
2σ12 0 0 2σ22
¶
.
(19)
For the LU cases, we fix shapes of signals as circularly symmetric Gaussians, i.e., θ and σ1 (= σ2 ) are fixed, and we choose the locations of the signals from a uniform distribution. For the SU cases, we fix signals at the center of backgrounds and choose the shapes of signals by uniform distributions, i.e., c is fixed, and θ, σ1 and σ2 are random. For the SLU cases, the locations and shapes of signals are randomly chosen. In all cases, there is either 0 or 1 signal in an image. Using the signal models as described above, we will find out how the randomness in signals changes the performance of the ideal observer in the case where backgrounds are random. We are also interested in any change in rankings of the imaging systems.
3
Methods
To explain our computational methods to estimate the likelihood ratio, let us start by considering (13). Due to the high dimensionality of the given integral, it is not computationally practical to estimate the integral as it is. We can reduce the dimension of the integral by using our knowledge of the background models f b and the signal models f s [7]. We define parameter vectors θ to be {N, c1 , c2 , . . . , cN } and α to be {c, θ, σ1 , σ2 }. We have assumed that f b and f s are statistically independent of each other. Then the lumpy backgrounds f b and the signals f s are completely characterized by θ and α, respectively.
We write (13) as an integral over θ and α: Z Z Λ(g) = dα dθ ΛBSKE (g|b(θ), s(α))pr(θ|g, H0 )pr(α) where
pr(g|b(θ), s(α), H1 ) pr(g|b(θ), H0 )
(21)
pr(g|b(θ), H0 )pr(θ) . dθ 0 pr(g|b(θ 0 ), H0 )pr(θ 0 )
(22)
ΛBSKE (g|b(θ), s(α)) = and pr(θ|g, H0 ) = R 3.1
(20)
Markov Chain Monte Carlo
Ideally we would estimate this integral in (20) using Monte Carlo integration [6] J 1X ˆ Λ(g) = ΛBSKE [ g | b(θ (j) ), s(α(j) ) ] . J j=1
(23)
However, it is difficult to sample images directly from pr(θ|g, H0 )pr(α) because pr(θ|g, H0 )pr(α) is not usually known. To overcome this difficulty, we use MCMC techniques, in particular, the Metropolis-Hastings algorithm with appropriate proposal densities for pr(θ|g, H0 )pr(α) [6], [7]. We construct a Markov chain with pr(θ|g, H0 )pr(α) as the stationary density for the chain. Because s and b are statistically independent, we can use proposal densities of our choice independently for pr(θ|g, H0 ) and p(α). We choose a proposal density qb (θ|θ (i) )qs (α|α(i) ) for our Markov chain where qb (θ|θ (i) ) and qs (α|α(i) ) are proposal densities, respectively for pr(θ|g, H0 ) and pr(α), and choose an initial parameter vector (θ (0) , α(0) ). Given (θ (i) , α(i) ), we draw ˜ α) ˜ from the proposal densities and accept or reject it with a sample vector (θ, acceptance probability h ih i ˜ H0 )pr(α) ˜ s (α(i) |α) ˜ ˜ pr(θ|g, qb (θ (i) |θ)q ˜ α)) i . ˜ = min 1, £ α((θ (i) , α(i) ), (θ, ¤h ˜ (i) )qs (α|α ˜ (i) ) pr(θ (i) |g, H0 )pr(α(i) ) qb (θ|θ (24) For qs (α|α(i) ), we use uniform distributions for choosing angles of rotation θ from 0 to 2π, widths of signals σ1 and σ2 from a to b, and locations c in images 1 g, i.e., qs (α|α(i) ) = 2π(b−a) 2 M (= pr(α)). To make qb (θ|θ (i) ) symmetric [7], Φ is defined as a matrix composed of a binary column vector β of dimension N 0 (= 100 × N ) followed by a list of centers of dimension N 0 × 2, i.e., ·
β1 β2 · · · β N 0 Φ= c1 c2 · · · c N 0
¸†
,
(25)
where the centers cn are row vectors. In generating the backgrounds to construct a Markov chain, the lumps are removed or added by flipping βs (i.e., a 1 to a 0 or a 0 to a 1) with probability η. We define a mapping function θ(Φ) as θ(Φ) = {cn : βn = 1}, which consists of the centers of all the lumps in the background. The proposal density is given by X 0 1 ˜ ˜−j )G(˜ δ(c−j − c cj − cj ), (26) qb (θ(Φ)|θ(Φ)) = η f (1 − η)N −f C j:β˜j =βj =1
0 ˜ given by f = PN |βi − where f is the number of βs that are flipped from Φ to Φ i=1 ˜ and G(·) is β˜i |, C is the number of terms in the sum in (26) given by C = β † β, ˜−j ) is a (2C − 2)a symmetric Gaussian. For 2D lumpy backgrounds, δ(c−j − c dimensional Dirac delta function where c−j is a concatenation of all the center ˜ vectors ci satisfying β˜i = βi = 1, except cj itself. The symmetry of qb (θ(Φ)|θ(Φ)) ˜−j ), and G(˜ follows from the symmetry of the f , C, δ(c−j − c cj − cj ) regarded ˜ as function of Φ and Φ. Finally we can rewrite the ratio in (24) into a computationally simpler one by canceling all qb (θ|θ (i) ), qs (α|α(i) ), and pr(α), i.e.,
˜ H0 )pr(θ) ˜ pr(g|b(θ), (i) pr(g|b(θ ), H0 )pr(θ (i) )
(27)
where pr(g|b(θ), H0 )pr(θ) = pr(g|b(θ), H0 )pr(N )pr({cn }). We construct Markov chains corresponding to a given ensemble of images, esˆ timate corresponding likelihood ratios Λ(g)s, and compute an estimate of AUC. 3.2
Consistency Check
A common way to see if our MCMC technique generates consistent simulation results is, for a given ensemble of images, to run an experiment repeatedly with a number of different random-number seeds estimating a number of ensembles of likelihood ratios, and compute a number of estimates of AUCs and a sample variance of the AUCs. This way, the Markov chains progress differently. The sample variance measures the variability in the AUC and if it is small enough, we would know that the MCMC technique gives consistent AUC values of the ideal observer. Resampling methods such as bootstrap can also be used to refine the variance estimates. Another way is to check if estimated AUCs satisfy the known bounds on the ideal-observer AUC [5]. First, the moment-generating function M0 (β) for Λ under the hypothesis H0 and the likelihood-generating function G(β) are defined by [2] Z ∞
Λβ pr(Λ|H0 )dΛ = hΛβ i0
M0 (β) =
(28)
0
and
£ ¤ ln M0 (β + 21 ) G(β) = . (β 2 − 14 )
(29)
It follows from the bounds on the AUC of the ideal observer [5] that " # r · ¸ 1 1 1 1 1 00 1 − exp − G(0) ≤ AU CΛ ≤ 1 − exp − G(0) − G(0) − G (0) . 2 2 2 2 3 (30) The plot of M0 (β) must go through unity at β = 0 and β = 1, and it is convex upward[2]. If the simulated AUC values were to be consistent, we would expect them to satisfy the conditions on M0 (β) and the bounds in (31). Then we may at least say that the MCMC technique generates consistent results in terms of these bounds. 3.3
Psychophysical Studies
We have also completed some two-alternative forced-choice (2AFC) experiments [2] to compare the performances of the ideal observer and the human observer on the detection tasks where the signal and background uncertainty is present. Five observers were presented with 100 pairs of signal-absent and signal-present images after 100 trials of training for each imaging system. The lights ware turned off in the room where the experiments were performed, and a black background was used for the computer screen to not distract the observers. For each detection task, three images were presented where a signal alone was presented in the middle image to show what the signal looks like. The other two images were random backgrounds with or without a signal, and the signalpresent image was randomly on the left or right. The observer selected which image had the signal with a mouse. The observers were allowed unlimited time to reach a decision.
4
Simulations
For comparison of imaging systems using the performance of the ideal observer, we use three different simplified parallel-hole collimator imaging systems for nuclear medicine. We model the imaging system response functions hm (r) as Gaussians centered on the mth pixel pm , · ¸ (r − pm )† (r − pm ) h exp − (31) hm (r) = 2πw2 2w2 where r is a 2D spatial coordinate. Each imaging system has different resolution w and relative sensitivity h for the hm (r) functions given in Table 1. These resolutions and relative sensitivities correspond to different collimator parameters and exposure times. For each imaging system, we generated 100 pairs of signal-absent and signalpresent images. The mean number of lumps in a 64 × 64 image was N = 25 but we initially generated 106 × 106 images with N = 69 and took the central 64 × 64 out of them to avoid boundary problems. For SU, we generated 64 × 64
images with N = 25. For the comparison of the 2AFC-psychophysical studies and the ideal observer, we used N = 176 on 170 × 170 images and took the central 128 × 128 out of them. For all the backgrounds, we used amplitude 1 and s = 7, and as = 0.1, 0.6, and 1.2 for the signal model. For BU(a) cases, a signal of width a is centered at each lumpy image. For LU(a), locations of signals of width a are chosen by the uniform distribution. For SU(a, b) and SLU(a, b), uniform distributions are used to choose locations, widths in an interval (a, b), and orientations. Table 1. Characteristics of the three imaging systems. Imaging System Resolution w Relative Sensitivity h A 0.5 40 B 2.5 100 C 5 200
For each image, we generated 150,000 iterations of the Markov chain. For each calculation, the first 500 iterations are ignored for burn-in, and the ΛBSKE (·) of the remaining 149,500 iterations are used to compute the likelihood ratios. The likelihood ratios were computed with 5 different random-number seeds to estimate each AUC’s sample variance. For consistency checks, we used 5 ensembles of 1000 signal-absent images and computed 5 ensembles of estimates of likelihood ratios to get their sample variance.
5
Simulation Results
We computed the ideal observer for 100 pairs of signal-absent and signal-present images for each imaging system under various paradigms of background and signal uncertainties. ROC analysis with LABROC4 [8] and PROPROC [9] software has been used to generate the AUCs to rank the three imaging systems on the ideal-observer performance. All the AUCs in the figures are shown with their sample standard deviations. The ideal-observer performances on LU(3) and LU(9) are respectively compared to BU(3) and BU(9) as shown in Fig.1(a) and (b). The ideal-observer performance degrades on LU too badly to see the ranking between the imaging systems. We increased signal magnitude (as = 0.6 and as = 1.2) to see the rankings between the three imaging systems on BU(3) and LU(3) as shown in Fig.2(a) and (b). While Fig.1(a) and (b) show that the ideal-observer performance degrades on LU(3) compared to BU(3), Fig.2(a) and (b) show that the rankings remain the same. The 2AFC method [2] has been used to compute the ideal-observer AUCs in Fig.2(a) and (b).
1
1
A
0.9
B
C
0.8
0.8
0.7
0.7
0.6
0.6
0.5 0.5
1
1.5
2
2.5
3
A
0.9
3.5
0.5 0.5
1
B
1.5
(a)
2
C
2.5
3
3.5
(b)
Fig. 1. (a) AUCs for A,B and C on BU(3) and LU(3) with as = 0.1. (b) AUCs for A,B and C on BU(9) and LU(9) with as = 0.1. The solid lines correspond to BU and the dashed lines correspond to LU.
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
A 0 0.5
1
B 1.5
2
(a)
C 2.5
3
A 3.5
0 0.5
1
B 1.5
2
C 2.5
3
3.5
(b)
Fig. 2. (a) AUCs for A,B and C on BU(3) with as = 0.6. (b) AUCs for A,B and C on LU(3) with as = 1.2. The solid and dashed lines correspond to the ideal observer and the human observer, respectively.
The 2AFC-psychophysical studies on BU(3) and LU(3) are also shown in Fig.2(a) and (b) to compare the ideal-observer and human-observer performance. The ideal observer greatly outperforms the human observer on these cases. The efficiency is defined as the square of the ratio da (human)/da (ideal) and is taken as a measure of the perceptual efficiency of the human observer [3]. From our psychophysical studies, the efficiency for the imaging system C is 0.0059 ±0.0055 on BU(3). Figure 3(a) shows that the ideal-observer performance degrades on 1
1
A
0.9
B
C
0.8
0.8
0.7
0.7
0.6
0.6
0.5 0.5
1
1.5
2
2.5
A
0.9
3
0.5 0.5
3.5
1
B
1.5
(a)
2
C
2.5
3
3.5
(b)
Fig. 3. (a) AUCs for A,B,and C on SU(3, 5) and SU(6, 8) with as = 0.1. (b) AUCs for A,B,and C on SLU(3, 5) and SLU(6, 8) with as = 0.1. The solid lines correspond to SU(3,5) and SLU(3,5) and the dashed lines correspond to SU(6,8) and SLU(6,8).
1.1
0.9
0
M (β)
1
0.8
0.7
0.6 0
0.2
0.4
β
0.6
0.8
1
Fig. 4. A M0 (β) curve for a consistency check for A on SU(3, 5)
SU(6,8) when the size of a signal in each background is close to the size of lumps
compared to SU(3,5). As shown in Fig.3(a), the ideal-observer performance degrades significantly on SLU(3,5) and SLU(6,8) respectively compared to SU(3,5) and SU(6,8) as in Fig.3(a). A plot of M0 (β) for a consistency check on SU(3,5) for the imaging system A is shown in Fig.4. The mean M0 (1) is 1.0449 ±0.1508. The mean AUC on SU(3,5) for A is 0.8401 ±0.0161 as in Fig.3(a) and lies between 0.7346 ±0.0141 and 0.9092 ±0.0051, the lower and upper bounds computed by the consistency check.
6
Discussion and Conclusion
We have shown that the ideal-observer performance degrades under both background and signal uncertainty but the rankings of the three different imaging systems appear to remain the same under background and signal-location uncertainties as under background uncertainty. We have also shown quantatively that the ideal observer performs better than the human observer on detection tasks under background and signal-location uncertainty. We consider this work as another step toward more realistic signal-detection tasks. This work can be used for hardware optimization.
References 1. H. H. Barrett and C. K. Abbey: Bayesian Detection of Random Signals on Random Backgrounds. 4th International Conference on Information Processing in Medical Imaging, 1997. 2. H. H. Barrett, C. K. Abbey, and E. Clarkson: Objective assessment of image quality III: ROC metrics, ideal observers, and likelihood- generating functions. J. Opt. Soc. Am. A 15, 1520-1535 (1998). 3. H. H. Barrett, C. K. Abbey, and E. Clarkson: Some unlikely properties of the likelihood ratio and its logarithm. Proc. SPIE 3340 (1998). 4. E. Clarkson and H. H. Barrett: Approximation to ideal-observer performance on signal-detection tasks. Applied Optics 39, 1783-1794 (2000). 5. E. Clarkson: Bounds on the area under the receiver operating characteristic curve for the ideal observer. J. Opt. Soc. Am. A 19 1963-1968 (2001). 6. W. R. Gilks, S. Richardson, and D. J. Spiegelhalter (Eds): Markov Chain Monte Carlo in Practice. Chapman and Hall, Boca Raton (1996). 7. M. A. Kupinski, J. W. Hoppin, E. Clarkson, and H. H. Barrett: Ideal Observer Computation Using Markov-Chain Monte Carlo. J. Opt. Soc. Am. A (Accepted)(2002). 8. C. E. Metz, B. A. Herman, and J. H. Shen: Maximum Likelihood Estimation of Receiver Operating Characteristic (ROC) Curves from Continuously -Distributed Data. Statistics In Medicine 17, 1033-1053 (1998). 9. C. E. Metz and X. Pan: Proper Binormal ROC Curves: Theory and MaximumLikelihood Estimation. Journal of Mathematical Psychology 43, 1-33 (1999). 10. J. P. Rolland and H. H. Barrett: Effect of random background inhomogeneity on observer detection performance. J. Opt. Soc. Am. A 9, 649-658 (1992). 11. H. Zhang: Signal Detection in Medical Imaging. Ph.D. Dissertation, The University of Arizona (2001).