Ultrasound Obstet Gynecol 2007; 29: 485–488 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/uog.3995
Statistical Opinion The Bayesian approach: a natural framework for statistical modeling
In this issue of the Journal, Bayesian models for predicting malignancy of ovarian tumors are presented1 . A Bayesian approach to statistical analysis is based on Bayes’ theorem and differs fundamentally from the traditional approach. Bayes’ theorem, which resulted in a paradigm shift in statistics, is rooted in an essay by Thomas Bayes published posthumously2 . Why do we believe that Bayes’ theorem is relevant to clinical decision making? Consider a diagnostic test. Suppose people with disease G are identified by a positive test result T in 95% of cases. Let us formally denote this by saying that P(T|G) = 0.95. However, this does not mean that a positive test result is associated with a 95% chance of having disease G: P(T|G) = P(G|T). This inequality is explained by Bayes’ theorem: P(G|T) =
P(T|G)P(G) . P(T)
Thus, one needs to estimate P(G), the prior probability of having G (this can be based on the estimated prevalence of G), and combine this probability with the test result P(T|G) to arrive at P(G|T), the posterior probability of having disease G. P(T) can be calculated simply as a normalization factor. Bayesian statisticians use the theorem as the core of their procedures and extend its use to statistical analysis itself while traditional (or orthodox) statisticians confine its use to decision-making problems like the one here. The central difference between traditional and Bayesian statisticians is their view of probability. Traditional statistics uses a ‘frequentist’ approach to probability; that is, the probability of an event is interpreted as the fraction of times the event occurred after an infinitely repeated set of identical trials. While this seems reasonable enough, this interpretation can give rise to problems in procedures such as significance tests. For example, a 95% confidence interval means that, if we were to repeat a study many times, the confidence interval should contain the true value in 95% of the replica studies. This does not mean that there is a 95% chance that the true value is inside the interval, even though such a conclusion is often (incorrectly) drawn. Owing to its view on probability, traditional statistics often does not answer the questions that we really want to ask. Bayesian statisticians define probability as a degree of belief. The Bayesian view of probability, therefore, is a more general one in which probabilistic reasoning about statistical models themselves is natural. Suppose that, as an example, we want to compare the mean age of conception in primigravida whose parents
Copyright 2007 ISUOG. Published by John Wiley & Sons, Ltd.
are less than 20 years older (Group 1) with primigravida whose parents are at least 20 years older (Group 2). In general terms, the Bayesian approach3,4 works in the following way: 1. A model for data analysis is defined that is thought to be suitable for the problem. In our example, we could postulate that, in both groups, the age of first pregnancy has a Gaussian distribution with population mean µ1 (Group 1) and µ2 (Group 2), and with equal but unknown variance σ. This model has three parameters: µ1 , µ2 and σ. 2. A prior probability distribution on the model parameters is specified, reflecting our knowledge or beliefs about likely values of these parameters. In our example, we would define a prior probability distribution p(µ) for µ1 and µ2 , centered at the ages an expert believes are the most likely values for these parameters. The width of the priors represents the uncertainty with respect to the expert’s belief. (Figure 1a, bold line). 3. Next, we look at the data (D) we have collected. We can compute how likely our data are for different assumed values for the parameters: if, in our example, the true mean age in Group 1 is 22 years, what is the probability of obtaining the data we have collected for this group? This yields the likelihood function p(D|µ). The computation of this function makes use of the specified model and its assumptions. 4. The combination (by multiplication) of the prior distribution (representing the information outside the collected data) and the likelihood function (representing the information inside the collected data) yields the posterior probability distribution for the parameter; it reflects how likely different parameter values are true, after taking into account one’s prior beliefs and the information in the collected data. To obtain a proper probability distribution, this product needs to be normalized to have an area of 1. Based on this distribution, we can infer probabilistic conclusions. For example, a 95% ‘probability interval’ can be constructed that can be interpreted as the interval that contains the true parameter value with a probability of 0.95 (i.e. 95% chance), where probability interval refers to the Bayesian version of confidence interval5 . Or, in our example, one can use the posterior distributions to estimate the probability that µ1 − µ2 is smaller than zero. In traditional statistics, one believes that the data should speak for themselves so that there is no need
STATISTICAL OPINION
Van Calster et al.
(a) 1.5
(b) 1.5
Probability density
Probability density
486
1.0
0.5
24 26 Age (years)
28
0 20
30
(c) 1.5
(d) 1.5
Probability density
22
0.5
Probability density
0 20
1.0
1.0
0.5
0 20
22
24 26 Age (years)
28
30
22
24 26 Age (years)
28
30
22
24 26 Age (years)
28
30
1.0
0.5
0 20
Figure 1 Examples of Bayesian inference for Example A, in which the mean age of conception in primigravida is compared for women with younger parents (less than 20 years older; Group 1) and older parents (at least 20 years older; Group 2). For each group data on 20 women were collected. Prior distributions are shown as bold solid lines, posterior distributions as thin solid lines, and likelihood functions as dashed lines. (a) The main analysis. (b) An analysis based on a different prior: the prior has less uncertainty and is therefore narrower. (c) The analysis using the prior of (a) but using three times as many data by cloning the original data. (d) What happens when a prior distribution without any information is used: the likelihood and the posterior distribution are now identical.
for a prior distribution. Rather, only the data are used to find estimates for the parameter values. This yields a single parameter estimate, as opposed to a probability distribution as obtained in a Bayesian approach. Therefore, one advantage of the Bayesian approach is that it incorporates uncertainty about the true parameter value. This approach is intuitively more valid since in general we have no exact knowledge about the true value of any parameter. Traditional statisticians often use P-values to describe the results of their analyses. P-values represent the probability of obtaining data at least as extreme as the collected data given that a hypothesis (usually the hypothesis of no effect) is true. Actually this is a probability statement about data given a hypothesis, while Bayesian analysis, on the contrary, results in probability statements about a hypothesis given the data5 . This is more consistent with natural reasoning and is more informative, for example, for decision making:
Copyright 2007 ISUOG. Published by John Wiley & Sons, Ltd.
‘What is the probability that treatment A is better than treatment B?’ Let us work out our example. Suppose we have collected data on 20 women in Group 1 (ages in years: 19, 26, 25, 20, 21, 21, 29, 24, 23, 23, 26, 22, 22, 18, 30, 24, 22, 24, 23, 26) and 20 women in Group 2 (ages 22, 23, 31, 27, 20, 22, 25, 26, 26, 28, 26, 28, 27, 27, 23, 20, 29, 25, 25, 28). The sample mean age of conception in primigravida is 23.4 years in Group 1 (SD = 3.05) and 25.4 years in Group 2 (SD = 2.96). Our model is the one specified above, which has three parameters: µ1 , µ2 and σ. We will only focus on results for µ1 and µ2 . Traditional statisticians would estimate the population mean age to be 23.4 for women belonging to Group 1 and 25.4 for women belonging to Group 2. To test whether the population means differ, traditional statisticians could use a t-test (with the null hypothesis stating that both groups have the same mean age) resulting
Ultrasound Obstet Gynecol 2007; 29: 485–488.
Statistical Opinion in a P-value of 0.0485. These results suggest that the mean age of conception in primigravida may differ for both groups – but be aware of blind interpretation of P-values6 . If the null hypothesis is true, we have less than 5% chance of collecting data that are at least as extreme as the collected data. However, this statement is not really of interest to clinicians, since what they want to know is the probability of a difference in age given the data that have been observed. Thus it makes sense to undertake a Bayesian analysis. Figure 1a shows such an analysis. The bold line represents the prior distributions: the expert believes that the mean age is most likely 25 years in both groups (he or she does not believe both groups differ much) with similar degree of uncertainty. Therefore the prior distribution for the mean age is identical in both groups (in fact, the prior distribution is a Student tdistribution because σ is unknown). Using the assumed model, the likelihood functions are computed and plotted (dashed lines). Notice that these (Student t-) distributions are centered around the sample mean ages. Using Bayes’ theorem we can derive the posterior distribution (plotted as the thin full lines). Using these (Student t-) distributions we can derive probabilistic conclusions: we conclude that there is a 92.3% chance that the mean age for people belonging to Group 1 is lower than the mean age for people belonging to Group 2. For Group 1 women, the posterior distribution is centered around 24.2 years (90% probability interval: 23.38–25.02); for Group 2 women, the posterior is centered around 25.2 years (90% probability interval: 24.48–26.02). The 90% probability interval for the difference between both mean ages is the interval between −2.15 and 0.15: we conclude that there is a 90% chance that the mean age difference is inside this interval. The use of prior distributions is often criticized as being subjective. Several points can be raised here. First, the use of prior information is necessary to arrive at probabilistic conclusions, and in fact all analyses make use of prior information (e.g. the analysis of a positive test result mentioned above) but in Bayesian analysis this information is made explicit. Second, orthodox statistics are less objective than they may seem: they incorporate different possibilities for data analysis (the choice of estimator, test and model) which may sometimes yield different conclusions7 . Bayesian inference, on the contrary, follows a consistent and unified procedure using Bayes’ theorem, even though the choice of model is just as important in Bayesian analysis. Third, making prior information explicit is an open procedure stimulating discussion and thought7 . Fourth, sensitivity analyses can be performed to check the robustness of the results to changes in prior distributions. In Figure 1b another expert has put less uncertainty in his prior distribution (it is narrower and higher). The resulting posterior distributions can be compared with those of the initial analysis. There is now an 87.7% chance that the mean age in Group 1 is lower than that in Group 2, and the 90% confidence interval for the difference of both means is now the interval between −1.62 and 0.28. Fifth,
Copyright 2007 ISUOG. Published by John Wiley & Sons, Ltd.
487
convincing data and/or larger data sets make the posterior distribution more dependent on the information inside the data (i.e. the likelihood) than on the prior information used (Figure 1c, based on three times as many data by cloning the original data set). And it is natural to choose broader, less informative prior distributions (with more uncertainty) when there is less prior knowledge about the parameters, which again makes the likelihood term more important (see Figure 1d, to derive which a flat prior is used that contains no information at all; the posterior and likelihood are now identical). The quantification of prior beliefs into a prior probability distribution can be a difficult step. A prior distribution may be based on available literature and data, and on the experience of an expert7 – 11 . For example, one can collect the beliefs of several experts by asking for their best estimate of the parameter and what values they consider unlikely. This information can then be summarized into a prior probability distribution reflecting the experts’ ideas and uncertainty thereof. Another possibility is to use the results of a metaanalysis about the subject. If our prior information is vague, this uncertainty regarding the accuracy can be incorporated quantitatively by making the prior distribution more broad. Remember that the prior often has a limited effect, which prohibits overly optimistic or pessimistic prior estimates to considerably influence the result. We conclude that the use of Bayesian models opens new possibilities for clinical research. Downsides are the mathematical complexity and time-consuming nature of the analyses.
ACKNOWLEDGMENTS This research was supported by Research Council KUL: GOA-AMBioRICS, CoE EF/05/006 Optimization in Engineering, several PhD/postdoc & fellow grants, Flemish Government: FWO (PhD/postdoc grants, projects, G.0360.05 (EEG, Epileptic), G.0519.06 (Noninvasive brain oxygenation), FWO-G.0321.06 (Tensors/Spectral Analysis), research communities (ICCoS, ANMMM)); IWT (PhD Grants), Belgian Federal Science Policy Office IUAP P5/22 (‘Dynamical Systems and Control: Computation, Identification and Modelling’), EU: BIOPATTERN (FP6-2002-IST 508803), ETUMOUR (FP6-2002-LIFESCIHEALTH 503094), Healthagents (IST–2004–27214). B. Van Calster*†, I. Nabney‡, D. Timmerman§ and S. Van Huffel† †Dept. of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark, Arenberg 10, B-3001 Leuven and §Department of Obstetrics and Gynecology, University Hospitals K.U. Leuven, Leuven, Belgium and ‡Neural Computing Research Group, Aston University, Birmingham, UK *Correspondence. (e-mail:
[email protected])
Ultrasound Obstet Gynecol 2007; 29: 485–488.
488
REFERENCES 1. Van Calster B, Timmerman D, Lu C, Suykens JAK, Valentin L, Van Holsbeke C, Amant F, Vergote I, Van Huffel S. Preoperative diagnosis of ovarian tumors using Bayesian kernel-based methods. Ultrasound Obstet Gynecol 2007; 29: 496–504. 2. Bayes T. An essay towards solving a problem in the doctrine of chances. Phil Trans 1763; 53: 370–418. 3. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Chapman & Hall: Boca Raton, 2004. 4. Berry DA. Statistics: a Bayesian perspective. Belmont: Duxbury Press: 1996. 5. Lewis RJ, Wears RL. An introduction to the Bayesian analysis of clinical trials. Ann Emerg Med 1993; 22: 1328–1336. 6. Sterne JAC, Davey Smith G. Sifting the evidence – what’s wrong
Copyright 2007 ISUOG. Published by John Wiley & Sons, Ltd.
Van Calster et al. with significance tests? BMJ 2001; 322: 226–231. 7. Kadane JB. Prime time for Bayes. Control Clin Trials 1995; 16: 313–318. 8. Spiegelhalter DJ, Myles JP, Jones DR, Abrams KR. An introduction to Bayesian methods in health technology assessment. BMJ 1999; 319: 508–512. 9. Lilford RJ, Braunholtz D. For debate: the statistical basis of public policy: a paradigm shift is overdue. BMJ 1996; 313: 603–607. 10. Chaloner K, Church T, Louis TA, Matts JP. Graphical elicitation of a prior distribution for a clinical trial. Statistician 1993; 42: 341–353. 11. Parmar MKB, Spiegelhalter DJ, Freedman LS. The chart trials: Bayesian design and monitoring in practice. Stat Med 1994; 13: 1297–1312.
Ultrasound Obstet Gynecol 2007; 29: 485–488.