Introduction to Biostatistics for Clinicians

6 downloads 0 Views 2MB Size Report
Feb 3, 2009 - What is statistics ? ⊳ Example. ⊳ Population – sample. ⊳ Random variability. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009.
Introduction to Biostatistics for Clinicians

Geert Verbeke I-BioStat: Interuniversity Institute for Biostatistics and statistical Bioinformatics K.U.Leuven & Hasselt University, Belgium [email protected] http://perswww.kuleuven.be/geert verbeke

Contents I

February 3, 2009

1

1

What is statistics ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3

Some frequently used tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

II

February 10, 2009

4

Errors in statistics: Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5

Errors in statistics: Practical implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

2

37

i

III

February 17, 2009

6

Diagnostic tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7

Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Bibliography

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

94

142

ii

Part I February 3, 2009

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

1

Chapter 1 What is statistics ?

. Example . Population – sample . Random variability

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

2

1.1

Example: Captopril data

• 15 patients with hypertension • The response of interest is the supine blood pressure, before and after treatment with CAPTOPRIL • Research question:

How does treatment affect BP ?

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

3

• Dataset ‘Captopril’ Before

After

Pati¨ent

SBP

DBP

SBP

DBP

1

210

130

201

125

2

169

122

165

121

3

187

124

166

121

4

160

104

157

106

5

167

112

147

101

6

176

101

145

85

7

185

121

168

98

8

206

124

180

105

9

173

115

147

103

10

146

102

136

98

11

174

98

151

90

12

201

119

168

98

13

198

106

179

110

14

148

107

129

103

15

154

100

131

82

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

Average (mm Hg) Diastolic before:

112.3

Diastolic after:

103.1

Systolic before:

176.9

Systolic after:

158.0

4

• It would be of interest to know how likely the observed changes in BP are to occur by pure chance. • If this is very unlikely, the above data provide evidence that BP indeed decreases after treatment with Captopril. Otherwise, the above data do not provide evidence for efficacy of Captopril. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

5

• Obviously, we are not interested in drawing conclusions about the 15 observed patients only. • Instead, we would like to draw conclusions about the effect of Captopril on the total population of all hypertensive patients. • Conclusion: Statistics aims at drawing conclusions about some population, based on what has been observed in a random sample

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

6

P O P U L A T I O N

••••••••••••••••••••••••••••••••••••••••• •••••••••••••• •••••••• • • • • • • ••••••• •• • • • • ••••• • • • • •••• •• • • ••• • • ••• • • •• • • •• • • ••• • • • • ••• •• ••• • ••• •• • • ••• •••• •• • • ••••• • •••••• ••••• • • • ••••••••• • • •••••• •••••••••••••• ••••••••••••••••••••••••••••••••••••••••• ••• ••• ••• ••• • ••• ••••• •• •••••••• •••••••• ••

Effect of Captopril in population

• •••••••••• • •• •• •• •• ••• •• •• •• •• •• ••

RANDOM S A M P L E

•••••••••••••••••••••••••••••••••••••••••••••••••• • • • • • • • •••••• ••• ••••• ••••• • • •••• • • ••• • • ••• • • ••• •••• • ••• • • ••• • ••• •• • •••• • • ••••• ••••• • ••••••• • • • •••• ••••••••••• •••••••••••••••••••••••••••••••••••••••••

STATISTICS

• •••••••••• • •• •• •• •• ••• ••• •• •• •• •• ••

Effect of Captopril in 15 patients

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

7

1.2

Population versus random sample

• Population: Hypothetical group of current and future subjects, with a specific condition, about which conclusions are to be drawn • Sample: Subgroup from the population on which observations will be taken • In order for effects observed in the sample to be generalizable to the total population, the sample should be taken at random

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

8

1.3

Random variability

• Descriptive statistics of the observed differences in diastolic BP, after treatment with Captopril, in 15 subjects:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

After DBP

Change

Pati¨ent

Before DBP

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

130 122 124 104 112 101 121 124 115 102 98 119 106 107 100

125 121 121 106 101 85 98 105 103 98 90 98 110 103 82

5 1 3 −2 11 16 23 19 12 4 8 21 −4 4 18

9

• Note that not all subjects experience the same benefit from the treatment • An average decrease of 9.27 mm/Hg is observed in our sample • A new, similar, experiment would lead to another sample, hence to another observed change in BP: . More reduction (11.57 mm/Hg) ? . Less reduction (4.78 mm/Hg) ? . No change (0.00 mm/Hg) ? . Increase (-5.23 mm/Hg) ? • This shows that the observed decrease of 9.27 mm/Hg should not be overinterpreted • This also shows that one should not hope that 9.27 mm/Hg is the gain in BP one would observe if the total population were treated with Captopril. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

10

• Let µ be the average change in BP one would observe if the total population would be treated • 9.27 mm/Hg can then be interpreted as an estimate for µ, based on our sample • Question: Is our observed change of 9.27 mm/Hg sufficient evidence to conclude that the treatment really affects the BP ? • Answer: Hypothesis testing

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

11

P O P U L A T I O N

••••••••••••••••••••••••••••••••••••••••• •••••••••••••• •••••••• • • • • • • ••••••• •• • • • • ••••• • • • • •••• •• • • ••• • • ••• • • •• • • •• • • ••• • • • • ••• •• ••• • ••• •• • • ••• •••• •• • • ••••• • •••••• ••••• • • • ••••••••• • • •••••• •••••••••••••• ••••••••••••••••••••••••••••••••••••••••• ••• ••• ••• ••• • ••• ••••• •• •••••••• •••••••• ••

Is µ different from 0

• •••••••••• • •• •• •• •• ••• •• •• •• •• •• ••

RANDOM S A M P L E

•••••••••••••••••••••••••••••••••••••••••••••••••• • • • • • • • •••••• ••• ••••• ••••• • • •••• • • ••• • • ••• • • ••• •••• • ••• • • ••• • ••• •• • •••• • • ••••• ••••• • ••••••• • • • •••• ••••••••••• •••••••••••••••••••••••••••••••••••••••••

STATISTICS

?

• •••••••••• • •• •• •• •• ••• ••• •• •• •• •• ••

Observed effect of 9.27 mm/Hg in 15 randomly selected patients

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

12

Chapter 2 Hypothesis testing

. Example . Null and alternative hypothesis . The p-value and level of significance . Possible errors in decision making

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

13

2.1

Example

• As before, µ is the average change in diastolic BP one would observe if the total population of hypertensive patients would be treated with Captopril. • Note that µ will never be known, but we can use our sample to learn about µ. • In case the treatment would have no effect, the average µ would be zero. • So, if one can show that there is (strong) evidence that µ 6= 0, then this can be considered as evidence for a treatment effect. c = 9.27mm/Hg. • Based on our sample of 15 observations, we estimated µ by µ

• Obviously, this estimate is relatively far away from 0, suggesting that the treatment might affect BP Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

14

c = 9.27 could have occurred by pure • On the other hand, the observed effect µ chance, even if there would be no treatment effect at all.

• Question:

How likely would that be ?

• Only if this would be very unlikely to happen, the observed data will be considered sufficient evidence for some effect of the treatment

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

15

2.2

Null and alternative hypothesis

• The procedure to decide whether there is sufficient evidence to believe the treatment did affect BP is called test of hypothesis • In practice, the research question is formulated in terms of a null hypothesis H0 and an alternative hypothesis HA: H0 : µ = 0

versus

HA : µ 6= 0

• Based on our observed data, we will investigate whether H0 can be rejected in favour of HA • If not, the null hypothesis H0 is accepted and one decides that the treatment was not effective

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

16

2.3

The p-value and level of significance

• Intuitively, it is obvious that H0 : µ = 0 will be rejected if the observed sample c is too far away from 0 average µ • Question: How far is too far ? • Answers: If this result is very unlikely to happen by pure chance If this result is not at all what you expect to see if µ would be 0

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

17

• One can calculate that, if Captopril would have no effect at all, that there is only 0.1% chance of observing a sample with average change in BP at least as big as 9.27mm/Hg. • Hence, if Captopril would have no effect (i.e., if µ = 0), then it would be very unlikely to observe a sample with average as extreme as 9.27. This would happen only once every 1000 times a similar experiment would be performed. • We therefore consider the data observed in our experiment sufficient evidence to reject the null hypothesis and we conclude that the treatment effect is significantly different from 0, or equivalently, that there is a significant treatment effect • The probability 0.1% that expresses how extreme our observations are in case the null hypothesis would be true, is denoted by p, and is called the p-value.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

18

• A small p-value is indication of extreme results were H0 true. One then rejects the null hypothesis • A large p-value is indication that the observed results are perfectly in line with what can be expected to observe, if H0 is true. One then does not reject the null hypothesis, which is equivalent to accepting the null hypothesis • In practice, one has to decide how small p should get before the null hypothesis is rejected. • One therefore specifies the so-called level of significance α: p < α =⇒ reject H0 p ≥ α =⇒ accept H0

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

19

• α is typicaly a small value, such as 0.01, 0.05, 0.10 • In biomedical sciences α = 0.05 = 5% is standard. • One then rejects the null hypothesis as soon as the observed result would happen in less than 5 times in 100 experiments, assuming that the null hypothesis would be correct • Strictly speaking, one should always mention what level of significance has been used, and the conclusion would have to be formulated as “the treatment effect is significantly different from 0 at the 5% level of significance,” or equivalently, that “there is a significant treatment effect at the 5% level of significance.”

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

20

• Note that specification of α is only required if a formal decision is preferred (‘accept’ or ‘reject’). • It is therefore not meaningful to report ‘borderline significance’ in examples where p is only slightly larger than α (e.g., p = 0.06 > α = 0.05)

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

21

2.4

Possible errors in decision making

• In our example about the Captopril treatment, we obtained p = 0.001 leading to the rejection of the null hypothesis of no treatment effect. • This should not be considered as formal proof that there is a treatment effect • Even if the treatment has no effect at all, a sample like ours would occur once every 1000 times. • Maybe, our sample was indeed the extreme one that happens once every thousand experiments. • Alternatively, suppose we would have obtained p = 0.9812. We then would not have rejected the null hypothesis, and concluded that there is no evidence for any treatment effect. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

22

• This should not have been considered as formal proof that any treatment effect would be absent. • Maybe, the treatment effect µ is not 0, but very close to 0. The data one then would observe would look very similar to data that would be observed if µ = 0, such that the data do not allow to detect that µ 6= 0 • Conclusion:

“Statistics can prove everything”

• Intuitively: Absolute certainty about population characteristics cannot be attained based on a finite sample of observations Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

23

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

24

Chapter 3 Some frequently used tests

. The unpaired t-test . The chi-squared test . The paired t-test . Assumptions

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

25

3.1

The unpaired t-test

• Consider data from a rat experiment to study weight gain under a high or a low protein diet • Group-specific histograms:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

26

• Group-specific summary statistics:

• On average, there is an observed difference of 19g between the rats on a high protein diet and those on a low protein diet. • Is this observed difference sufficient evidence to conclude that there indeed is an effect of diet on the weight gain ? • It would be of interest to know how likely such a difference of 19g is to occur if weight gain would be completely unrelated to the protein level of the diet.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

27

• Based on the unpaired t-test, it can be calculated that, in case the diet would not affect the weight gain at all, one would have p = 0.0757 = 7.57% chance of observing a difference of at least 19g, in a similar experiment. • So, even if there is no relation at all between the protein content of the diet and weight gain, then one can still expect to observe a difference of at least 19g in 7.6% of the future similar experiments. • Since p = 0.0757 > 0.05 = α, we consider this unsufficient evidence to conclude that the protein level would indeed affect the weight gain

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

28

• Conclusion: There is no significant difference (p = 0.0757) in weight gain between rats on a high protein level diet, and rats on a low protein level diet

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

29

3.2

The chi-squared test

• We consider data on sickness absence, collected on 585 employees with a similar job:

Sickness absence Gender

No

Yes

female

245

184

429

male

98

58

156

343

242

585

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

30

• Research question:

Is there a relation between absence and gender ?

• 184/429 = 42.9% of the females, and 58/156 = 37.2% of the males have been absent • This suggests that females are more absent than males • However, even if absence due to sickness is equally frequent amongst males and females, the above results could have occurred by pure chance. • It therefore would be of interest to calculate how likely it would be to observe such differences, by pure chance

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

31

• Based on the chi-squared test, it can be calculated that, even if males and females would be equally frequently absent, there would be p = 0.215 = 21.5% chance of observing a similar experiment with difference between the groups at least equal to 0.429 − 0.372 = 0.057 • So, even if there is no relation at all between gender and absence, then one can still expect to observe a difference of 5.7% in 21.5% of the future similar experiments. • Since p = 0.215 > 0.05 = α, we consider this unsufficient evidence to conclude that the the occurrence of sickness absence is related to gender

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

32

• Conclusion: There is no significant difference (p = 0.215) in prevalence of sickness absence between males and females

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

33

3.3

The paired t-test

• The Captopril example discussed before considered paired data: Each observation before treatment uniquely corresponds to one observation after treatment (from the same patient), and vice versa • The paired t-test analyses paired observations: . Before and after treatment . Married couples: male and female . Twin studies . Ophthalmology: left and right eye . ...

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

34

3.4

Assumptions

• Most statistical procedurs are based on assumptions about the distribution of the observations in the population • For example, the unpaired t-test, used before to compare weight gains under two different diets, assumed weight gains to be normally distributed, with the same amount of variability in both groups: Low protein

High protein

••••••••••••••••••••••••• ••••••••••••••••••••••••• ••••• ••••• ••••• •••• • • • • • • •••• •••• ••• •••• • • •••• • • • • • ••• •• • • ••• •••• ••• • • ••• • •••••• •• ••• • • • • ••• • •• •••••• ••• • • • • • • • • ••• • •••• • • • •••• • • • • • • • • •••• •••• •• •• • • • • •••• •••• •• •• • • • • • •••• •••• • • • • ••••• • • • • • • • • • • • • • ••••• •••••• •• •• • • • • • • • • • • • ••••••• ••••••• •• •• • • • • • • • • ••••••••• • • • • • • • • • • • • • • • • • • • • • • ••••••••••••••••••• •••••••••••••••••••• • • ••••• ••• •••••••••••••••••••••• ••••••••••••••••••••••

| µ2

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

| µ1

35

• If assumptions are not satisfied, wrong results can be obtained • One will therefore always explore the observed data to check whether the assumptions are supported by the data. • In large samples however, results are less sensitive to the underlying assumptions.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

36

Part II February 10, 2009

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

37

Chapter 4 Errors in statistics: Basic concepts

. Introduction . Two types of errors . Power . Sample size calculation . Example . Example from the biomedical literature

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

38

4.1

Introduction

• Re-consider the example on the weight gain in rats, where interest is in the comparison between rats fed on a high or low protein diet • Group-specific histograms:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

39

• Group-specific summary statistics:

• On average, there is an observed difference of 19g between the rats on a high protein diet and those on a low protein diet. • Based on the unpaired t-test, we obtained before that this observed difference is not sufficient evidence to believe that the weight gain is really different for the two diets (p = 0.0757)

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

40

• Conclusion: There is no significant difference (p = 0.0757) in weight gain between rats on a high protein level diet, and rats on a low protein level diet

• As indicated before, the result of a statistical test should be interpreted as evidence in favour or against the null hypothesis, and should not be interpreted as formal proof. • In our example, the difference in weight gain between a population treated with one diet and a population treated with the other diet is too small to be detected based on 12 and 7 animals, respectively.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

41

• Alternatively, if the t-test would have lead to p = 0.001, this would still not formally proof that there is a difference between both populations. • After all, p = 0.001 would only indicate that the observed difference of 19g occurs once every 1000 times, even if there is no difference at all between both populations. • Maybe, our sample was indeed the extreme one that happens once every thousand experiments. • Hence, whenever statistical tests are used, one has to be aware that errors in the conclusions can occur. • It is therefore important to quantify the errors, and to keep them under control

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

42

4.2

Two types of errors Reality Test result

Accept H0

H0 correct

H0 not correct

No error

Type II error

Reject H0 Type I error

No error

• Type I error: H0 is incorrectly rejected • Type II error: H0 is incorrectly accepted

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

43

4.3

Type I error

• A type I error occurs if H0 is correct but the test leads to a significant result. • Question: How likely is such an error to occur ?

• Suppose the test is performed at the α = 5% level of significance • If H0 is correct, then one will observe a significant result in 5% of the cases • Hence, in 5% of the cases, H0 would be incorrectly rejected

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

44

• The probability of making a type I error is therefore equal to the chosen level α of significance. • In practice, the probability of making a type I error is kept under control by choosing α sufficiently small • In biomedical sciences α = 5% is often used, hereby allowing to make a type I error in 5% of the cases.

Reality H0 correct

Test result

Accept H0

1−α

Reject H0

α

H0 not correct

1 • If H0 is correct, then the probability of making a type I error is α, while the probability of correctly accepting H0 is 1 − α. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

45

4.4

Type II error

• A type II error occurs if H0 is incorrect but the test has not detected this, i.e., a non-significant result is obtained • Question: How likely is such an error to occur ?

• In contrast to the type I error, the probability of making a type II error is not easily controlled, and depends on various aspects of the sample(s) and population(s)

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

46

• In analogy to the type I error, the type II error rate is denoted by β

Reality Test result

H0 correct

H0 not correct

Accept H0

1−α

β

Reject H0

α

1−β

1

1

• The power of a statistical test is 1 − β, the probability of correctly rejecting H0

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

47

4.5

Power

• In general, a specific testing procedure is acceptable, only if: . the chance of making a type I error rate is sufficiently small . the power to detect deviations from H0 is sufficiently large • The first condition can be met by specifying α sufficiently small. • The second condition is more difficult to meet, as the power depends on various aspects of the sample(s) and population(s) • This will be illustrated in the context of the comparison of two groups (such as the weight gain experiment)

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

48

• As before, let µ1 and µ2 represent the average weight gain in the total population, under high and low protein diets, respectively. • The null and alternative hypotheses are given by H0 : µ 1 = µ 2

versus

HA : µ1 6= µ2

• The power is the probability of correctly rejecting H0. • In that case, µ1 6= µ2, and we denote the true difference between both populations by ∆ = µ1 − µ2 • The unpaired t-test assumes the data to be normally distributed in both populations, with equal variability σ 2

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

49

• Graphically:

Low protein

High protein

••••••••••••••••••••••• ••••••••••••••••••••••• • • • • • • • • ••••• ••••• •••• •••• •••• • • • • • • •••• • • •••• • • • •••• ••• • ••• • • ••• ••• •••• • • ••••• • ••• • ••••• • • ••• • ••• • • ••• • • • • •••• ••• • • • • •••• •••• • • • • • • • •••• • • •••• • • • • •••• • • • •••• • • •••• • • • • • • • • • • ••••• • • • ••••• •• ••• • • • ••••• • • • • •••••• • • • • •••••• • • • • •••••••• •• •• • • ••••••••• • • • • • • • • • • • • • • • • • • • • •••••••••••••••• • • • • • • • • • • • • • • • • • • • •••••••••••••••••• ••••••••• ••••••••••••••••• •••••••••••••••••

2 .. . σ 2 .. σ . .................................... .....................................

| | µ2 µ1 .................∆ ........................

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

50

4.5.1

Power as a function of α

The smaller α, the smaller the power

• Intuitively: Type I errors are less likely if the null hypothesis is rejected less often. However, in cases where H0 is truly wrong, it will still be rejected less often. • An extreme case is obtained for α = 0: . α = 0 implies that the null hypothesis is always accepted . So, in case the null hypothesis is wrong, it is still accepted, leading to power 0

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

51

4.5.2

Power as a function of true difference ∆

The smaller ∆, the smaller the power

• Intuitively: Large deviations from the null hypothesis are easier to detect Low protein

High protein

•••••••••••••••••••••••• ••••••••••••••••••••••••• •••••• ••••• •••• ••••• • • • • • •••• • •••• ••• •••• ••• • • • • • • • ••• •• • ••• • • • • • ••• • • ••• ••• •• ••• • • • • ••••• • ••• • •• ••••• ••• • • • • • • • ••• • • • •••• •• •• ••••• • • • • • • • •••• • • •••• • • • • •••• •••• •• •• • • •••• • • •••• •• •• • • ••••• • • • • • • • • • • • • • • • ••••• ••••• ••• ••• • •••••• • • • • • • • • • • • • • • • • •••••••• • • • • • •••••••• ••• ••• • • • • •••••••••••• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • ••••••••••••••• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • ••• ••• ••

| | µ2 µ1 .................∆ .........................

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

Low protein

High protein

•••••••••••••••••••••••••••••••••••• ••••• ••••••••••••••••• ••••••••• • • • •••• ••• ••• •• •••• •••• ••• •••••• ••• •••• • • • • ••• ••• • • •• •••• ••• ••• • • • ••• ••• • • • •• ••• ••• ••• • • ••• ••• • • • • •••• ••• •• •••• • • • • •••• •••• • • • • • • •••• •••• •• •••• • •••• ••••• • •• ••••• ••••• •••• • • • • • • • ••••• ••••• • • ••• ••••••• •••••• •••••• • • • • • • • •••••••• •••••••• • • • • •••• •••••••••••• • ••••••••••••• •••••••••••• • • • • • • • • • • • • • • •••••••••••••••••••••••••••••• • • • • • • • • • • • • • • • • • • • • • • • • • • •• •••

| | µ2 µ1 ......... .∆

52

4.5.3

Power as a function of variability σ 2

The smaller σ 2, the larger the power

• Intuitively: Homogeneous groups are easier discriminated than heterogeneous groups Low protein

High protein

•••••••••••••• ••••••••••••••• ••••••• ••••••••••• •••••• ••••••••••• • • • • • • •••• •••• •• ••• •••• •••• •••• ••• • • • •••• • • • • • • • • • ••• •• ••• •• • • • ••• • • • •••••• •• ••• • • • ••• • ••• • • • •• •• ••••• ••• • • • • ••• • • • • • •••• • • •••• • • •••• •• •• •••• • • • • • • • • • • •••• •••• • •• • • • •••• • • • • • • ••••• ••••• ••• ••• • • • • • • • • •••••• • • • • • • • • • • • • • • • • ••••••• • • • • • •••••••• ••• ••• • • • •••••••••• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •••••••••••••••••••• • • • • • • • • • • • • • • • • • • • • • •••••••••••• ••••••••••• •••••••••••

| | µ2 µ1 ..................∆ .........................

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

Low protein

High protein

••••• ••••• •• ••••• •• ••••• • • • •• • •• •• ••••• •• ••••• • • ••• ••• •• •• ••• ••• • •• • ••• • • ••• • • ••• • • • ••• • • ••• • • ••• ••• • • ••• •• •• ••• • • ••• • • ••• • • ••• ••• • • ••• • • ••• • • ••• • • ••• ••• •• •• • • ••• •••• ••• • ••••• • • •••••••••••• •••• •••••••

| | µ2 µ1 ..................∆ ........................

53

4.5.4

Power as a function of sample size(s)

The more observations, the larger the power

• Intuitively: More observations yields more information about the population(s), therefore implying more precision in the conclusions

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

54

4.5.5

Conclusion

• The power depends on various aspects: . Level of significance α . True difference ∆ between the populations . Within-group variance σ 2 . Sample size(s) • Note that the sample size is the only aspect under control of the investigator. • In practice, one can calculate the sample size needed to reach a sufficiently high power.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

55

4.6

Sample size calculation

• As indicated before, a testing procedure is only acceptable if it has sufficient power, i.e., if the probability of making a type II error is sufficiently small. • Since the sample size is the only aspect influencing the power, which is under control of the investigator, it is important that experiments are sufficiently large in order for the power to be sufficiently large as well • The level α of significance is chosen such that the probability of making a type I error is sufficiently small • The within-group variance σ 2 is pre-specified based on earlier, similar experiments, relevant literature, or a pilot study

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

56

• To be on the safe side, usually an upperbound for σ 2 is used: In case the variability would be smaller, the power would be higher, hence still sufficiently high • In practice, ∆ is not known. Instead, the smallest ∆ which would still be clinically relevant to detect, is specified. • If sufficient power is attained for the smallest meaningful ∆, we have that: . Any larger difference will be detected with even larger power . We are not concerned about small powers for detecting smaller differences, as such differences are not relevant anyway. • One can then calculate the number(s) of observations needed to reach a desired level of power.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

57

4.7

Example: Weight gain data

• In the weight gain data, the observed difference of 19g was found not to be significant (p = 0.0757) • We can calculate the power that a real difference of 19g would be found significant if a new experiment were to be conducted, again with 12 and 7 observations in the high and low protein diet groups, respectively. • Group-specific summary statistics, from the current experiment:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

58

• Power calculations will be based on σ = 21, and α = 0.05 • The power to detect a difference ∆ of 19g equals 43.45% • Hence, with 12 and 7 observations respectively, there is only 43.45% chance that a true difference of 19g would be detected. • If a difference of 19g is considered clinically relevant, then the weight gain experiment was clearly too small, since it is very likely that such a difference would remain undetected. • We can also calculate the power for other values of ∆

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

59

• Summary: ∆



Power to detect a difference ∆ ∗

0g

5.00%

10g

15.70%

19g

43.45%

30g

80.80%

40g

96.49%

: equal to α

• For example, 12 and 7 observations would be sufficient to show a true difference of 40g with more than 96% chance. • Alternatively, one can also calculate how large the samples should be to detect a difference of, e.g., 20g with sufficiently high power. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

60

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

61

• If a power of 90% is required to detect true effects as small as ∆ = 20g, at least 25 observations are needed in each group. • With 30 observations in each group, the probability of making a type II error, when the true effect is not smaller than 20g, is approximately 5%.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

62

4.8

Example from the biomedical literature

Wong et al. [1] • Methodology section, p.658:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

63

• Table 2 with results:

• Discussion, p.664:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

64

• The difference on which the sample size calculation was based was much larger than what actually was observed in the experiment • Therefore, the power to reject equality of the groups was (much) lower than the expected 80% • The current study cannot tell the difference between a 9% increase and a 3% decrease. • If such differences are considered clinically important, then the current study was under-powered, due to the fact that the difference was overestimated at the time of the sample size calculation.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

65

Chapter 5 Errors in statistics: Practical implications

. Multiple testing . Bonferroni correction . Tests for baseline differences . Equivalence tests . Significance versus relevance . Example from biomedical literature

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

66

5.1

Multiple testing

• Each time a test is performed, there is probability α of making a type I error • For example, if α = 0.05, we can expect to incorrectly reject the null hypothesis in 5 out of 100 times. • Implication: “The more tests one performs, the higher the probability that something is detected by pure chance” • This problem of multiple testing occurs very frequently in bio-medical sciences, in various settings

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

67

5.1.1

Example: A classroom experiment

• On entry in the classroom, assign each student at random to be seated at the left or at the right side of the classroom • Compare both sides with respect to 100 aspects including weight, height, age, gender, color of hair, color of eyes,. . . • It is to be expected that for at least 5 of these outcomes, a significant difference is obtained at the 5% level of significance, by pure chance.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

68

5.1.2

Example: Testing many relations

• Amin et al. [2], Table 2:

. 18 tests performed . only 2 significant results

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

69

5.1.3

Example: Subgroup analyses

• Kaplan et al. [3], Table 5:

. Tests based on C.I.’s for odds ratios . C.I. containing 1 is equivalent to a non-significant test result . 21 × 3 = 63 tests performed . only 5 significant results

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

70

5.1.4

Example: Searching for the most significant results

• This ‘scientific finding’ was printed in the Belgian newspapers:

• It was even stated that those who wake up before 7.21am have a statistically significant higher stress level during the day than those who wake up after 7.21am.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

71

5.1.5

Conclusion

• Significant results obtained by multiple testing are often overinterpreted • If the number of tests is reported, the reader knows that such results need to be interpreted with extreme care • The problem arises when only the significant results are reported, and one does not know how many tests were performed in total • This leads to reporting results which turn out to be not reproducible • For example, a new study would not find that students seated on the left are taller than those on the right. Instead, students seated on the left may weigh more than those seated on the right.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

72

• For example, a new experiment might show no difference in stress levels between subjects waking up early and those waking up late. Or maybe a difference would be found only when waking up is later than 8.12am.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

73

5.2

Bonferroni correction

• Suppose two tests are performed, both at the 5% level of significance. • The probability that at least one type I error will be made can be shown not to exceed 2 × 0.05 = 0.10: P (at least 1 type I error) ≤ 2 × 5% = 10% • In general, if k tests are performed, all at the 5% level of significance, the probability of making at least one type I error can only be shown not to exceed k × 5% • Obviously, controling the overall type I error rate can be done by performing each separate test at the α/k level of significance.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

74

• For example, performing 2 tests at the 2.5% level of significance each implies that the probability of making at least one type I error will not exceed 5%. • In general, when k tests are performed at the α/k level of significance, one is sure that the overall probability of making at least one type I error will not exceed α. • This correction of the significance level is called the Bonferroni correction. • When confidence intervals are used instead of p-values, the confidence levels can be corrected in a similar way

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

75

• Some examples: Number of tests

Significance level α

Confidence level

1

0.05

95%

2

0.025

97.5%

5

0.01

99%

k

0.05/k

(1 − 0.05/k) × 100%

• For example, if CI1 , CI2 , . . . CI5 are 5 intervals with 99% confidence, for 5 unknown parameters θ1 , θ2 , . . . , θ5, then there is at least 95% probability that all 5 C.I.’s will contain all 5 unknown parameters: P (CI1 contains θ1 and

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

...

and CI5 contains θ5) ≥ 95%

76

• Note that, strictly speaking, the Bonferroni correction is an overcorrection, since the overall type I error rate can only be shown not to exceed 5%, and usually will be smaller than the required 5%. • In some specific testing situations (e.g., ANOVA analysis), more accurate corrections are available.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

77

5.3

Examples from the biomedical literature

• Baba et al. [4], p.1202 and p.1203:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

78

• Kellett et al. [5], Table 2 (for example):

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

79

In the discussion, R.Roy writes:

Note that the reader cannot perform the Bonferroni correction as the exact p-values have not been reported. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

80

5.4

Tests for baseline differences

• In order to show causal effects, patients are often randomized into 2 or more groups • This ensures (at least in large studies) that all treatment groups are identical, except for the treatment the patients receive • In (relatively) small studies, imbalances can still occur by pure chance • Therefore, one often compares the various groups with respect to important factors which are believed to be strongly related to the outcome of interest. • This is called testing for baseline differences, as one compares the characteristics of the patients at the start of the study.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

81

• As an example, suppose interest is to compare two oral treatments, A and B, for the treatment of hypertension. • Suppose the change in diastolic BP is the oucome of interest • Age is one of the factors believed to be strongly related to BP. Therefore, it is important that both treatment groups have the same age distribution • Therefore, one often tests for age differences between A and B, e.g., based on the two-sample t-test. • The hypothesis tested is H0 : µ A = µ B

versus

HA : µA 6= µB

• Note that H0 and HA express properties of the populations, not the samples

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

82

• In the populations (infinitely large), we know that, due to the randomization, µA and µB are identical • Conclusion: It makes no sense at all to perform baseline tests in randomized studies • No matter how small the resulting p-value would be (e.g., < 10−8 ) we know that the observed difference in age between groups A and B has occurred purely by chance. • A meaningful alternative is to calculate a C.I. of the average age difference between both groups, to ensure that the observed difference is sufficiently small to conclude that it cannot (completely) explain the observed differences in the outcome of interest. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

83

• In our example suppose that a 95% confidence interval for the average difference in age (years) is given by [0.1; 0.3], then we believe that this difference would be too small to explain why patients in group A show more decrease in BP than patients in group B. • Note also that testing for baseline differences cannot be used to check whether the randomization was done properly.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

84

5.5

Example from the biomedical literature

Nissen et al. [6], abstract and table 1:

A two-arm randomized study

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

85

formal tests at baseline

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

86

5.6

Equivalence tests

• Suppose two groups A and B are to be compared, and a two-sample t-test is used to test H0 : µA = µB versus HA : µA 6= µB • In case of a non-significant test result, one often concludes that both groups are identical or equivalent • An alternative interpretation is that the experiment did not have sufficient power to show an effect which is present. • Conclusion: Non-significance should not be interpreted as equivalence

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

87

• This can also be seen from the fact that, if the two-sample t-test could be used to show equivalence, it would be best to collect data on (extremely) small samples, as this would increase the chance to obtain an non-significant result, due to lack of power. • Instead, one should reverse H0 and HA: H0 : |µA − µB | > ∆

versus

HA : |µA − µB | ≤ ∆

where ∆ is a pre-specified constant, defining ‘equivalence’ • Obviously, the result of the equivalence test entirely depends on the choice of ∆ • Therefore, ∆ needs to be specified prior to the data collection

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

88

5.7

Example from the biomedical literature

Shatari et al. [7]: . Title:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

89

. Table 1:

No significant differences !

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

90

. Results and conclusions (abstract):

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

91

5.8

Significance versus relevance

• We discussed before that the power to detect some effect ∆ increases with the sample size • This implies that any effect ∆, no matter how small, will, sooner or later, be detected, if the sample is sufficiently large. • For example, consider the Captopril data, where the observed difference of 9.27 mmHg was found significantly different from zero (p < 0.001), based on data from 15 patients only:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

92

• Suppose that the observed difference would have been 0.1 mmHg. • A p-value as small as 0.001 would be likely to be obtained, provided that the sample would be sufficiently large. • Obviously, an average change in BP as small as 0.1 mmHg is not relevant from a clinical point of view. • Conclusion:

Statistical significance

6=

Clinical relevance

• The p-value cannot distinguish between both situations • It is therefore important not to blindly overinterpret significant results without knowing the size of the effect Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

93

Part III February 17, 2009

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

94

Chapter 6 Diagnostic tests

. Case study . Diagnostic tests . The quality of a diagnostic test . The ROC curve . The AUC

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

95

6.1

Case study

• Center for Nursing Research, K.U.Leuven. Elderly hipfracture patients • Research question 1: Can confusion (CAM) be predicted by MMSE one day after surgery ? • Research question 2: What is optimal dichotomization of MMSE to predict confusion ?

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

96

• Histogram and summary statistics:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

97

• Research question is twofold

• • • • • • • • • • • • • • • •• • • • • • • • •• • • • • • • • • •• • • • • •• • •• • • • • • • • • •• • • • •• • • • • • • • • • • • • • •• • • • • • •• • • • • • • •• • • • • • • • • • • •• • • • •• • • • •• • • • • • • • • •• • • • •• • • • •• • • • • • • • • •• • • • •• • • • •• • • • • • •• • •• •• • ••• •• • • •• •• • • •• •• • • •• •• • • •• • • •• •• • • •• •• • • •• •• • • •• •• • • •• •• • • •• • • •• •• • • • •• • • •• • • • • •• • •• • • • • •• • •• • • • • • •• • • • • • • • • •• • • • • • • • •• • • • • • • • • • •• • • • • • • • • •• •• • • • • • •• • • • • • • • • • • • • • • •

Do confused and non-confused patients have different MMSE values ? Answer: Yes, p < 0.0001

If yes, how can MMSE be used to detect high risk patients ? Answer:

?

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

98

6.2

Diagnostic tests

• Since confused and non-confused patients have different MMSE scores (p < 0.0001), there is hope that both groups can be well discriminated on the basis of MMSE. • However, are both groups sufficiently ‘separated’ to discriminate between them ? Confused

Not confused

•••••••••••••••••••••••• •••••••••••••••••••••••• •••••• •••• ••••• ••••• • • • • • •••• • • •••• •••• ••• ••• •••• • • • • • • • ••• • • • •••• ••• •• ••• • • • ••••• • ••• • ••••• •• ••• • • • ••• • • ••• • • •• •• •••• ••• • • • • • •••• • • • • • •••• • • • • •••• •••• •• •• • • •••• • • •••• •• •• •••• • • • • • • • • • ••••• • • • • ••••• •• •• • • •••••• • • • • • • • • • • • • • • • • ••••••• • • • • ••••••• ••• ••• • • • • ••••••••• • • • • • • • • • • • • • • • • • • • • • • • •••••••••••••••••••• • • • • • • • • • • • •••••••••••••••••••• ••• ••••••••••••••••••• •••••••••••••••••••

| | µ2 µ1 .......................... ..................∆

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

Confused

Not confused

•••••••••••••••••••••••••••••••••••• •••••• •••••••••••••••••• •••••••••• • • • •••• •••• ••• ••• •••• ••• ••• ••••• ••• •••• • • • • •••• ••• • • •• •••• • ••• •••• • • • • • ••• ••• •• •••• ••• ••• • • ••• ••• •• •••• • • •••• ••• • • • • • • • • •••• •••• •• •••• • •••• •••• • • • • • •••• ••••• •• ••••• • ••••• •••• • • •• •••••• • ••••• ••••• • • • • • • • •••••• •••••• • • • • ••• ••••••••• ••••••••• •••••••• • • • • • • • • • •••••••••••••••••••••••••••••• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • ••••••••••• ••••••••••• • • • • • • • • • • • • • • • • • • • •

| | µ2 µ1 .∆ .........

99

• The construction of a diagnostic test based on MMSE aims at finding an optimal cut-off value c for MMSE, used to classify future patients: MMSE < c =⇒ Confused MMSE ≥ c =⇒ Not confused

Confusion YES

MMSE value

NO

MMSE < c

No error

False ⊕

MMSE ≥ c

False

No error

13

46

What is optimal c ?

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

100

6.3

Measuring the quality of a diagnostic test

• The choice of an optimal c requires a measure for the quality of the test. • Several measures are available, and the most appropriate one(s) depends on the context: . Sensitivity . Specificity . Positive predictive value . Negative predictive value • As an example, let us consider the diagnostic test obtained for c = 12

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

101

Confusion YES

MMSE value

NO

MMSE < 12

9

4

MMSE ≥ 12

4

42

13

46

9 . Sensitivity: Prob. of ⊕ test if confused: = 69.23% 13 42 . Specificity: Prob. of test if not confused: = 91.30% 46 9 . Positive predictive value: Prob. of confused if ⊕ test: = 69.23% 13 42 . Negative predictive value: Prob. of not confused if test: = 91.30% 46 Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

102

• In contrast to positive and negative predictive values, sensitivity and specificity do not depend on the actual proportion of cases and controls:

Confusion YES

MMSE

Confusion NO

YES

NO

< 12

9

4

9

40

≥ 12

4

42

4

420

13

46

13

460

Sensitivity:

9/13

9/13

Specificity:

42/46

420/460

PPV:

9/13

9/49

NPV:

42/46

420/424

• The choice of ‘optimal’ c is therefore usually based on sensitivity and specificity Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

103

6.4

The ROC curve

• We can repeat the calculations for all possible c-values • A good c should yield a high sensitivity • On the other hand, this should not be done at the expense of classifying all non-confused patients as confused • Hence, the specificity should be as large as possible as well

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

104

• The ROC curve is a graphical tool to select a c with good sensitivity and specificity:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

105

• We can gain a lot in sensitivity, at the expense of a small loss in specificity, by selecting a different cut-off c:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

106

• Resulting test and operating characteristics:

Confusion YES

MMSE value

NO

MMSE < 16

13

5

MMSE ≥ 16

0

41

13

46

13 . Sensitivity: Probability of ⊕ test if confused: = 100% 13 41 . Specificity: Probability of test if not confused: = 89.13% 46

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

107

6.5

The AUC

• A good diagnostic test should allow for a c-value implying high sensitivity as well as specificity • Therefore, the AUC should be close to 1:

AUC=0.96

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

108

• A worthless test is one where classification is based on pure guessing:

Confusion YES

Guess

Confusion

NO

YES

NO

YES

5

20

25

xA

(1 − x)A

A

NO

15

60

75

xB

(1 − x)B

B

20

80

100

x (A+B) (1 − x)(A+B) A + B

xA (1 − x)B Sensitivity + Specificity = + = 1 x(A+B) (1 − x)(A+B)

=⇒ AUC = 0.5 Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

109

• Some examples:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

AUC range

Quality

0.90 − 1.00

excellent

0.80 − 0.90

good

0.70 − 0.80

fair

0.60 − 0.70

poor

0.50 − 0.60

fail

110

Chapter 7 Survival analysis

. Example . The survival curve . Estimation of survival curve . The problem of censoring . Kaplan-Meier estimate of survival curve . Comparison of survival curves . Examples from biomedical literature

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

111

7.1

Example: Survival times of cancer patients

• Cameron and Pauling [8]; Hand et al. [9] p. 255 • Patients with advanced cancer of the stomach, bronchus, colon, ovary, or breast were treated (in addition to standard treatment) with ascorbate. • The outcome of interest is the survival time (days) • Research question(s):

What is the prognosis for a patient with specific type of cancer ? Do survival times differ with organ affected ?

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

112

• Dataset ‘Cancer’: Stomach

Bronchus

Colon

Ovary

Breast

124

81

248

1234

1235

42

461

377

89

24

25

20

189

201

1581

45

450

1843

356

1166

412

246

180

2970

40

51

166

537

456

727

1112

63

519

3808

46

64

455

791

103

155

406

1804

876

859

365

3460

146

151

942

719

340

166

776

396

37

372

223

163

138

101

72

20

245

283

Average (days) Median (days) Stomach:

286

124

Bronchus:

211.6

155

Colon:

457.4

372

Ovary:

884.3

406

Breast:

1395.9

1166

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

113

• Note the severe differences between averages and medians, due to the skewness of the distribution:

• Comparisons between groups is therefore based on parametric tests after appropriate transformation (e.g., logarithmic), or based on non-parametric tests (e.g., Wilcoxon test).

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

114

7.2

The survival curve

• Often it is of interest to make a prognosis for specific patients, i.e., it is of interest to estimate the probability of ‘surviving’ a specific amount of time • In other contexts, the response is not ‘survival’, but still a ‘time to event’: . Progression free ‘survival’ . How long will a bulb ‘survive’ . Time untill first tooth is affected with caries . Time a rat needs to find the exit of a maze . ... • Terminology: Survival and Failure

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

115

• In the cancer example, it may be of interest to estimate how likely it is that a patient with colon cancer, treated (in addition to standard treatment) with ascorbate, will survive 1 year, 2 years, . . . • Interest is then in the survival function / curve:

S(t) = P (Outcome > t) “The probability of surviving time point t” • Properties of S(t): . S(0) = 1: There is absolute certainty to ‘survive’ t = 0 . S(+∞) = 0: There is absolute certainty to ‘fail’ eventually . S(t) is a decreasing function

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

116

• Examples of survival curves:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

117

7.3

Estimation of survival curve

• As S(t) can be interpreted as a proportion, it can easily be estimated by the observed proportion of subjects surviving time point t:

S(t) = P (Outcome > t)

−→

# subjects surviving t S(t) = N d

• As an example, we estimate the survival curve for ovary cancer patients • The following 6 event times were recorded: 1234

89

201

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

356

2970

456

118

• Calculations: # Surving t

c S(t)

0

6

6/6 = 1.00

30

6

6/6 = 1.00

89

5

5/6 = 0.83

100

5

5/6 = 0.83

201

4

4/6 = 0.67

356

3

3/6 = 0.50

400

3

3/6 = 0.50

556

2

2/6 = 0.33

1234

1

1/6 = 0.17

2970

0

0/6 = 0.00

Time (t)

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

119

• Graphically:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

120

• Remarks: . S(t) is estimated using a step function . Steps only at the times where events were observed . Step size at time point t: # subjects with event at t N . The estimate is right-continuous:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

121

7.4

The problem of censoring Event time cannot always be measured !

⇓ Censored observations

Various types of censoring: . Right . Left . Interval . Mixture of the above Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

122

No censoring : Before event

: After event

• ◦

• ◦ • ◦ • ◦

•: True event time ◦: Observations ◦•

• ◦ • ◦ ◦•

Subject Subject Subject Subject Subject Subject Subject Subject

1 2 3 4 5 6 7 8

....... ...... ............................................................................................................ ...... .......

Time/Age Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

123

Right censoring due to study end : Before event

: After event

• ◦

•◦ •◦ •◦

•: True event time ◦: Observations .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... ............................................................................................................................................................................................................. .................................................................................................................................................................... ........................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... ............................................................................................................................................................................................................. .................................................................................................................................................................... .................................................................................................................................................................... ........................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... .................................................................................................................................................................... ..... ......................................................................................................................................................................................................................................................................................................... .. .......................................................................................................................................................................... ..

◦ • ◦ •◦

◦ • ... ... ... ... ... ... ... ... .... ... .. ... .. .. .. ... ...... ...



Subject Subject Subject Subject Subject Subject Subject Subject

1 2 3 4 5 6 7 8

Time/Age

End of study Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

124

Right censoring due to dropout : Before event

: After event

•: True event time ◦: Observations •◦

◦ •

Subject Subject Subject Subject Subject Subject Subject ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. ................................................................. Subject ................................................................. ................................................................. ................................................................. ................................................................. .................................................................

......................................................................................................... ......................................................................................................... ......................................................................................................... ......................................................................................................... ......................................................................................................... ......................................................................................................... ......................................................................................................... ......................................................................................................... ......................................................................................................... ......................................................................................................... ......................................................................................................... ......................................................................................................... .........................................................................................................

•◦ •◦

◦•

•◦

•◦





1 2 3 4 5 6 7 8

....... ...... ............................................................................................................ ...... .......

Time/Age Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

125

Left censoring due to late study onset : Before event

: After event

................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ............................................................................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ .................................................................................................................................................................................... ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ............................................................................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ .................................................................................................................................................................................... ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ................................................................................................................................................................................................................................................ ...................................................................................................................................................................................................................................................... ...................................................................................................................................................................................................................................................... ..

• ◦



•◦ • ◦ ◦ ... ... ... ... ... ... .. .. ..... .. .. .. .. .. .. .. ...... . .....

•: True event time ◦: Observations •◦ •◦

• ◦ ◦•

Subject Subject Subject Subject Subject Subject Subject Subject

1 2 3 4 5 6 7 8

....... ...... ............................................................................................................ ...... .......

Time/Age

Begin of study Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

126

Interval censoring due to discrete observation times : Before event ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ .......................................................................................... ........................................................................ ...................................................... ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ .......................................................................................... ........................................................................ ........................................................................ ...................................................... ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................

................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................

................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................

... ... ... ... ... ... ... ... ... ... ... . .. .... .. .. .. .. .. .. ... .. ... . ..... .

◦ ◦ ... ... ... ... ... ... ... ... ... ... ... . ... ..... ... .. ... .. .. .. .. ...... ..

◦•

................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................

◦• ◦

◦•

•: True event time ◦: Observations

: After event

◦ •◦ ◦ • ◦ ... ... ... ... ... ... ... ... ... ... ... . .. .... .. .. ... .. .. . .. ...... . .....

................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................

•◦

............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................................... ............................................................................ ......................................................... ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................ ............................................................................................... ............................................................................ ............................................................................ ......................................................... ............................................................................ ............................................................................ ............................................................................ ............................................................................ ..... ........................................................................................................................................................................................................... .. ............................................................................

◦ •

◦ •◦

... ... ... ... ... ... ... ... ... ... ... . .. .... .. .. .. .. .. .. .. .. ... . ..... .

OBSERVATION Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009



................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................................ ................................................................ ................................................................ ................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................

... ... ... ... ... ... ... ... ... ... ... . ... ..... ... .. ... .. .. . ... ....... ..

... ... ... ... ... ... ... ... ... ... ... . .. .... .. .. ... .. .. . .. ..... . .....

Subject Subject Subject Subject Subject Subject Subject Subject

1 2 3 4 5 6 7 8

Time/Age

TIMES 127

• Our focus will be on right censoring, i.e., either the true event time or a lower bound of it is observed • Standard statistical tools for the analysis of censored observations assume random censoring: Event time and censoring time are independent • Counter examples: . Patients entering the study later have a better prognosis due to increased experience of surgeon =⇒ Negative association between censoring and event time . Patients leaving the study because they get worse =⇒ Positive association between censoring and event time

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

128

7.5

Example: Myelomatosis

• Peto et al. [10]; Allison [11] p.26 • Data on 25 patients diagnosed with myelomatosis (Kahler’s disease), multiple malign tumours in the bone marrow • Patients randomly assigned to two drug treatments • Event time is the time from moment of randomization to death • Some event times are censored due to study termination • Patients with normal and patients with impaired renal functioning at moment of randomization

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

129

• Data: Treat

Duration

Status

Renal

Treat

Duration

Status

Renal

1

8

1

1

2

180

1

0

1

852

0

0

2

632

1

0

1

52

1

1

2

2240

0

0

1

220

1

0

2

195

1

0

1

63

1

1

2

76

1

0

1

8

1

0

2

70

1

0

1

1976

0

0

2

13

1

1

1

1296

0

0

2

1990

0

0

1

1460

0

0

2

18

1

1

1

63

1

1

2

700

1

0

1

1328

0

0

2

210

1

0

1

365

0

0

2

1296

1

0

2

23

1

1

Status: . 0: Censored . 1: Death Renal: . 0: Normal . 1: Impaired

• Interest is in estimating and comparing the survival curves for patients with different treatments and for patients with different renal functioning at baseline

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

130

7.6

Kaplan-Meier estimate of survival curve

• Suppose interest is in estimating the survival curve for patients with treatment 1 • Observed data:

Duration: 8 852 52 220 63 8 1976 1296 1460 63 1328 365 Status:

1

0

1

1

1 1

0

0

0

1

0

0

• Simple ‘naive’ solutions: . Ignoring the censored observations: Over-optimistic . Treating censored observations as event times: Over-pessimistic

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

131

c • The so-called Kaplan-Meier estimate S(t) correctly accounts for the censoring:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

132

7.7

Comparison of survival curves

• Often, interest is in the comparison of survival curves of different groups • For the Myelomatosis data, interest may be to compare survival between the two treatment goups • Also of interest is the comparison of survival for patients with impaired renal functioning with survival for patients with normal renal functioning. • We will focuss on the comparison of two groups, but extensions are available for the comparison of multiple groups • For each group separately, the Kaplan-Meier estimate for the survival curve can be calculated.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

133

• Kaplan-Meier estimates for both treatment groups:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

134

• Kaplan-Meier estimates for patients with normal and impaired renal functioning, respectively:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

135

• Due to the censoring, classical tests such as t-test and Wilcoxon test cannot be used for the comparison of the survival times • Various tests have been designed for the comparison of survival curves, when censoring is present • The most popular ones are: . Logrank test . Wilcoxon (Gehan) test • The Logrank test has more power than Wilcoxon for detecting late differences • The Logrank test has less power than Wilcoxon for detecting early differences

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

136

• Test results:

Effect of treatment

              

Logrank: p=0.2468 Wilcoxon: p=0.6260

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

Effect of renal functioning

              

Logrank: p=0.0029 Wilcoxon: p=0.0005

137

7.8

Examples from biomedical literature

• Shatari et al. [7]: . Methods, p.439:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

138

. Figure 1, p.440:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

139

• Blanchon et al. [12]: . Statistical Methods, p.831:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

140

. Figure 2, p.834:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

141

Bibliography

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

142

Bibliography [1] C.A. Wong, B.M. Scavone, A.M. Peaceman, et al. The risk of cesarean delivery with neuraxial analgesia given early versus late in labor. The New England Journal of Medicine, 352:655–665, 2005. [2] A.I. Amin, O. Hallb¨o¨ok, A.J. Lee, R. Sexton, B.J. Moran, and R.J. Heald. A 5-cm colonic j pouch colo-anal reconstruction following anterior resection for low rectal cancer results in acceptable evacuation and continence in the long term. Colorectal Disease, 5:33–37, 2003. [3] S. Kaplan, S. Etlin, I. Novikov, and B. Modan. Occupational risks for the development of brain tumours. American Journal of Industrial Medicine, 31:15–20, 1997. [4] Y. Baba, J.D. Putzke, N.R. Whaley, Z.K. Wszolek, and R.J. Uitti. Gender and the parkinson’s disease phenotype. Journal of Neurology, 252:1201–1205, 2005. [5] K.M. Kellett, D.A. Kellett, and L.A. Nordholm. Effects of an exercise program on sick leave due to back pain. Physical Therapy, 71:283–293, 1991. [6] S.E. Nissen, E.M. Tuzcu, P. Schoenhagen, et al. Statin therapy, LDL cholesterol, C-reactive protein, and coronary artery disease. The New England Journal of Medicine, 352:29–38, 2005. [7] T. Shatari, M.A. Clark, T. Yamamoto, A. Menon, C. Keh, J.Alexander-Williams, and M. Keighley. Long strictureplasty is as safe and effective as short strictureplasty in small-bowel crohn’s disease. Colorectal Disease, 6:438–441, 2004.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

143

[8] E. Cameron and L. Pauling. Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival times in terminal human cancer. Proceedings of the National Academy of Science U.S.A., 75:4538–4542, 1978. [9] D.J. Hand, F. Daly, A.D. Lunn, K.J. McConway, and E. Ostrowski. A handbook of small datasets. Chapman & Hall, first edition, 1989. [10] R. Peto, M.C. Pike, P. Armitage, N.E. Breslow, D.R. Cox, S.V. Howard, N. Mantel, K. McPherson, J. Peto, and P.G. Smith. Design and analysis of randomised clinical trials requiring prolonged observation of each patient. British Journal of Cancer, 35:1–35, 1977. [11] P.D. Allison. Survival analysis using the SAS system: A practical guide. NC: SAS Institute, 1995. [12] F. Blanchon, M. Grivaux, B. Asselain, et al. 4-year mortality in patients with non-small-cell lunc cancer: development and validation of a prognostic index. Lancet Oncology, 7:829–836, 2006.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

144