Online appendix to 'Statistical Evaluation ...

4 downloads 0 Views 789KB Size Report
BinBin. • Values=: specifies whether the median informational coefficients of correlation (Values="Corrs") or the median odds ratios (Values="ORs") between the ...
Online appendix to ‘Statistical Evaluation Methodology for Surrogate Endpoints in Clinical Studies’ Wim Van der Elst

2

Contents 1 Evaluating surrogacy in the normal-normal setting

5

1.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.2

The dataset: five clinical trials in schizophrenia . . . . . . . . . . . . . . . . . . .

5

1.3

The single-trial setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.3.1

The expected causal association (ECA) . . . . . . . . . . . . . . . . . . .

7

1.3.2

Individual causal association (ICA) . . . . . . . . . . . . . . . . . . . . . .

16

1.3.3

The plausibility of finding a good surrogate . . . . . . . . . . . . . . . . .

26

The multiple-trial setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

1.4.1

Expected causal association (ECA)

. . . . . . . . . . . . . . . . . . . . .

32

1.4.2

Meta-analytic individual causal association (MICA) . . . . . . . . . . . .

39

Accounting for the sampling variability in the estimation of ρS0 T0 and ρS1 T1 . . .

47

1.4

1.5

2 Evaluating surrogacy in the binary-binary setting

49

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

2.2

The dataset: a clinical trial in schizophrenia . . . . . . . . . . . . . . . . . . . . .

49

2.3

BPRS as a surrogate for PANSS . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

2.3.1

Exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

2.3.2

The individual causal association and surrogate predictive function . . . .

52

2.3.3

Impact of monotonicity assumptions on the results

. . . . . . . . . . . .

62

BPRS as a surrogate for CGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

2.4

3 Evaluating predictors of treatment success

71

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

3.2

The dataset: a clinical trial in opiate/heroin addiction . . . . . . . . . . . . . . .

71

3.3

2 The predictive causal association (PCA; Rψ ) . . . . . . . . . . . . . . . . . . . .

73

3.4

2 Rψ )

3.3.1

The relation between ρT 0T 1 and the predictive causal association (PCA;

3.3.2

Predicting ∆T based on a vector S in an individual patient j . . . . . . .

79

Regression-based approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

3

76

CONTENTS

4 Estimating reliability

4

91

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

4.2

The dataset: a simulated study . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

4.3

Estimating reliability using linear mixed-effects models . . . . . . . . . . . . . . .

94

4.3.1

The mean structure of the model . . . . . . . . . . . . . . . . . . . . . . .

94

4.3.2

The covariance structure of the model . . . . . . . . . . . . . . . . . . . .

96

Chapter 1

Evaluating surrogacy in the normalnormal setting: case study analysis details 1.1

Introduction

In this chapter, the use of the R package Surrogate for the analysis of the case study discussed in Chapter 3 of Van der Elst (2016) is detailed. In Section 1.2, the dataset of the case study is briefly introduced. In Sections 1.3 and 1.4, the data are analysed based on expected and individual causal effects in the single- and multiple-trial settings, respectively.

1.2

The dataset: five clinical trials in schizophrenia

The schizophrenia dataset was described in detail in Section 2.2.1 of Van der Elst (2016). Briefly, the dataset combines the data that were collected in five clinical trials in which the efficacy of risperidone versus other anti-psychotic agents (such as haloperidol) was examined in a total of 2, 128 schizophrenic patients. All patients were treated between four and eight weeks, and the Positive and Negative Syndrome Scale (PANSS; Singh and Kay, 1975), the Brief Psychiatric Rating Scale (BPRS; Overall and Gorham, 1962), and the Clinical Global Impression (CGI; Guy, 1976) were administered to each patient at the start and the end of the treatment to assess the change in severity of their symptoms. Even though this is not a standard situation for surrogate evaluation due to the lack of a clear gold standard, two primary measures (true endpoints) are considered. First, the CGI, which is the scale that has the clearest clinical interpretation. Second, the PANSS, which is arguably

5

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

the most complete and reliable instrument. The main idea is to evaluate whether a simpler and easier to administer scale like the BPRS can be used as a substitute for the CGI (a scale that requires medical expertise) and/or the PANSS (a scale that takes more time to administer). Thus, surrogacy analyses will be conducted to examine (i) whether the change in the BPRS score is a good surrogate for the change in the PANSS score, and (ii) whether the change in the BPRS score is a good surrogate for the change in the CGI score. In the analyses below, these questions will be addressed in both the single- and multiple-trial settings using expected and individual causal associations. To simplify the exposition, the names of the endpoints (BPRS, PANSS and CGI) will be used to refer to the change in score between the beginning and the end of the study for each scale. After installation of the Surrogate package in R (using the command install.packages(’Surrogate’)), the following code can be used to load the package and the schizophrenia dataset in memory for the subsequent analyses. > library(Surrogate) # load the Surrogate library > data(Schizo)

# load the Schizophrenia dataset

> head(Schizo)

# have a look at the first rows of the dataset

# Generated output: Id InvestId Treat CGI PANSS BPRS PANSS_Bin BPRS_Bin CGI_Bin 1

1

180

-1

2

-80

-46

1

1

0

2

2

16

1

2

-42

-23

1

1

0

3

3

87

-1

4

-76

-52

1

1

1

4

4

16

-1

2

-49

-21

1

1

0

5

5

180

-1

4

-4

-4

0

0

1

6

6

16

-1

2

-76

-55

1

1

0

The following variables are included in the dataset: • ‘Id’: the identification number of the patient. • ‘InvestId’: the identification number of the investigator (psychiatrist) who treated the patient. • ‘Treat’: the treatment indicator, coded as −1: active control and 1: risperidone. • ‘CGI’: the patient’s change in the CGI score (=posttreatment - baseline score) . • ‘PANSS’: the patient’s change in the PANSS score. • ‘BPRS’: the patient’s change in the BPRS score.

6

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

• ‘PANSS_Bin’: a binary endpoint coded as 1 = clinically meaningful change on the PANSS scale occurred, and 0 = otherwise. • ‘BPRS_Bin’: a binary endpoint coded as 1 = clinically meaningful change on the BPRS scale occurred, and 0 = otherwise. • ‘CGI_Bin’: a binary endpoint coded as 1 = clinically meaningful change on the CGI scale occurred, and 0 = otherwise.

1.3 1.3.1

The single-trial setting The expected causal association (ECA)

In the single-trial setting (STS), Buyse and Molenberghs (1998) introduced two quantities to assess surrogacy: the relative effect RE = β/α, which is the ratio of the expected causal treatment effects on T and S, and the adjusted association γ = corr(Tj , Sj | Zj ), which quantifies the accuracy by which T can be predicted based on S (taking treatment into account). The function Single.Trial.RE.AA() of the Surrogate package provides estimates for α, β, RE and γ. How this function can be used to analyse the data of the case study is illustrated in the next two sections. 1.3.1.1

BPRS as a surrogate for PANSS

The following code can be used to estimate α, β, RE and γ when S = BPRS and T = PANSS: > STS_ECA_BPRS_PANSS summary(STS_ECA_BPRS_PANSS) # Generated output: Function call: Single.Trial.RE.AA(Dataset = Schizo, Surr = BPRS, True = PANSS, Treat = Treat, Pat.ID = Id, Seed = 1)

# Data summary and descriptives #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total number of patients:

2123

Total number of patients in experimental treatment group: Total number of patients in control treatment group:

1589

534

Mean surrogate and true endpoint values in each treatment group: Control.Treatment Experimental.treatment Surrogate True endpoint

-6.6461

-8.9383

-11.4719

-16.0151

Var surrogate and true endpoint values in each treatment group: Control.Treatment Experimental.treatment

1 Notice that the function Single.Trial.RE.AA establishes confidence intervals for RE based on the Delta method, Fieller’s theorem, and a bootstrap procedure. In the output, the warning is given that there are two outliers in the bootstrapped RE sample (which contains 500 bootstrapped values by default). Cases 209 and 338 are marked as outliers because their studentized residuals exceeded the cut-off value of |3.9223| (the critical value is determined as t(1 − α/2n; n − 2), where n is the number of bootstrapped values and α = 0.05 by default) (see Kutner, Nachtsheim, Neter and Li, 2005). In the presence of outliers, the bootstrap-based confidence interval for RE may not be trustworthy. This problem mainly occurs when the parameter estimate for α is relatively close to 0. In this situation, it is recommended to use the confidence intervals based on the Delta-method or Fieller’s theorem. In the present analysis, the impact of the outliers on the established CI was minimal (see below).

8

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING Surrogate

180.6831

180.9433

True endpoint

544.3285

550.6597

Correlations between the true and surrogate endpoints in the control (r_T0S0) and the experimental treatment groups (r_T1S1): Estimate Standard Error CI lower limit CI upper limit r_T0S0

0.9597

0.0061

0.9562

0.9630

r_T1S1

0.9644

0.0057

0.9613

0.9673

# Expected causal effects #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Alpha: Alpha Standard Error CI lower limit CI upper limit -1.1461

0.3364

-1.8058

-0.4865

Beta: Beta Standard Error CI lower limit CI upper limit -2.2716

0.5860

-3.4209

-1.1223

# Relative effect #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Delta method-based confidence interval: RE Standard Error CI lower limit CI upper limit 1.9820

0.1637

1.6610

2.3029

Fieller theorem-based confidence interval:

9

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING RE Standard Error CI lower limit CI upper limit 1.9820

0.1637

1.7121

2.5522

Bootstrap-based confidence interval: RE Standard Error CI lower limit CI upper limit 1.9820

0.2131

1.7009

2.5339

# Adjusted association (gamma) #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fisher Z-based confidence interval: AA (gamma) Standard Error CI lower limit CI upper limit 0.9632

0.0058

0.9600

0.9662

Bootstrap-based confidence interval: AA (gamma) Standard Error CI lower limit CI upper limit 0.9632

0.0018

0.9594

0.9667

The output of the summary() function provides descriptive information such as the number of patients in both treatment groups, the estimated means and variances of S and T , and the estimated correlations between S and T in both treatment groups. The estimated expected causal treatment effects, RE, and γ are given below the descriptives. As can be seen, α b = −1.146 (CI95% [−1.806; −0.487]) and βb = −2.272 (CI95% [−3.421; −1.122]), d = 1.982 with Delta method-based CI95% = [1.661; 2.303], Fieller theorem-based yielding RE CI95% = [1.712; 2.552], and bootstrap-based CI95% = [1.701; 2.534]). Notice that the bootstrapbased CI largely overlaps with the other CIs, indicating that the influence of the outliers on the bootstrapped RE values was small.

A graphical illustration of the results can be ob-

tained by applying the plot() command to the fitted object STS_ECA_BPRS_PANSS using the Trial.Level=TRUE argument. The name of this argument refers to the fact that α and β are quantities that pertain to the level of the trial (rather than to the level of the individual patients): > plot(STS_ECA_BPRS_PANSS, Indiv.Level = FALSE, Trial.Level = TRUE) # Generated output:

10

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

−2.0 −2.5



−3.0

Treatment effect on the true endpoint (β)

−1.5

Relative Effect (RE)

−1.6

−1.4

−1.2

−1.0

−0.8

Treatment effect on the surrogate endpoint (α)

  In this plot, the small circle shows the estimated expected causal treatment effects α b, βb . The solid line depicts the so-called ‘constant RE assumption’, i.e., the assumption that the ratio between β and α remains constant across clinical trials. This assumption is not verifiable in the STS, but it is often made to allow for a prediction of the expected causal treatment effect on T based on the expected causal treatment effect on S in a future clinical trial. The output of the summary() function furthermore shows that γ b = 0.963, with Fisher-Z based CI95% [0.960; 0.966] and bootstrap-based CI95% [0.960; 0.967]. The large γ b indicates that a patient’s PANSS score can be accurately predicted based on his or her BPRS score and treatment status. A graphical illustration of γ b is provided when the plot() command is applied to the fitted object STS_ECA_BPRS_PANSS, now using the Indiv.Level=TRUE argument. The argument name refers to the fact that εSj and εT j pertain to the level of the individual patients: > plot(STS_ECA_BPRS_PANSS, Indiv.Level = TRUE, Trial.Level = FALSE) # Generated output:

11

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

100

Adjusted Assocation (ρZ) ● ●

50 0 −50

Residuals for the true endpoint (εTj)







● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ●●● ●● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ●● ●● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ●● ● ●● ● ●● ● ●● ●● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ●● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ●● ●●● ●● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ●● ●●● ●● ●● ●● ●●● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ●● ●●● ● ● ● ●● ● ●● ● ●●●● ●● ●● ●● ● ● ●● ●●●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



−40

−20

0

20

40

Residuals for the surrogate endpoint (εSj)

1.3.1.2

BPRS as a surrogate for CGI

The following code can be used to obtain estimates for α, β, RE and γ using S = BPRS and T = CGI: > STS_ECA_BPRS_CGI summary(STS_ECA_BPRS_CGI) # Generated output: Function call:

12

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

Single.Trial.RE.AA(Dataset = Schizo, Surr = BPRS, True = CGI, Treat = Treat, Pat.ID = Id, Seed = 1)

# Data summary and descriptives #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total number of patients:

2118

Total number of patients in experimental treatment group: Total number of patients in control treatment group:

1588

530

Mean surrogate and true endpoint values in each treatment group: Control.Treatment Experimental.treatment Surrogate True endpoint

-6.6981

-8.9477

3.4396

3.2034

Var surrogate and true endpoint values in each treatment group: Control.Treatment Experimental.treatment Surrogate True endpoint

181.0164

180.9166

2.3527

2.1470

Correlations between the true and surrogate endpoints in the control (r_T0S0) and the experimental treatment groups (r_T1S1): Estimate Standard Error CI lower limit CI upper limit r_T0S0 r_T1S1

0.7343 0.7394

0.0148 0.0146

0.7141 0.7195

0.7534 0.7581

# Expected causal effects #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Alpha:

13

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING Alpha Standard Error CI lower limit CI upper limit -1.1248

0.3374

-1.7865

-0.4631

Beta: Beta Standard Error CI lower limit CI upper limit -0.1181

0.0372

-0.1910

-0.0452

# Relative effect #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Delta method-based confidence interval: RE Standard Error CI lower limit CI upper limit 0.1050

0.0234

0.0591

0.1509

Fieller theorem-based confidence interval: RE Standard Error CI lower limit CI upper limit 0.1050

0.0234

0.0594

0.1756

Bootstrap-based confidence interval: RE Standard Error CI lower limit CI upper limit 0.1050

0.0318

0.0589

0.1663

# Adjusted association (gamma) #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fisher Z-based confidence interval: AA (gamma) Standard Error CI lower limit CI upper limit 0.7380

0.0147

0.7179

Bootstrap-based confidence interval:

14

0.7568

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING AA (gamma) Standard Error CI lower limit CI upper limit 0.7380

0.0110

0.7178

0.7601

As can be seen, α b = −1.125 (CI95% [−1.787; −0.463]) and βb = −0.118 (CI95% [−0.191; −0.045]). Notice that α b differs slightly from the estimated value that was obtained when S = BPRS and T = PANSS (see Section 1.3.1.1). The reason for this is that the data of n = 5 patients were excluded in the present analysis because of missing CGI scores. d = 0.105, with Delta method-based CI95% = [0.059; 0.151], The output furthermore shows that RE Fieller method-based CI95% = [0.059; 0.176], and bootstrap-based CI95% = [0.059; 0.166]. Thus the three CIs largely overlapped again. Further, γ b = 0.738, with Fisher-Z based CI95% [0.718; 0.757] and bootstrap-based CI95% [0.718; 0.760]). This result indicates that a patient’s CGI score (T ) can be predicted with relatively poor accuracy based on the BPRS score (S) and treatment status. A graphical illustration of the results can be obtained by applying the plot() command to the fitted object STS_ECA_BPRS_CGI: > plot(STS_ECA_BPRS_CGI, Indiv.Level = FALSE, Trial.Level = TRUE) # Generated output:

−0.10 −0.12 −0.14



−0.16

Treatment effect on the true endpoint (β)

−0.08

Relative Effect (RE)

−1.6

−1.4

−1.2

−1.0

−0.8

Treatment effect on the surrogate endpoint (α)

> plot(STS_ECA_BPRS_CGI, Indiv.Level = TRUE, Trial.Level = FALSE) # Generated output:

15

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

4

Adjusted Assocation (ρZ) ●



●● ●●● ●● ●● ●

●●

●●●

● ●● ●

●●

● ●





3





●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●



● ●

2





●● ●●

● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●

●● ●

● ●●● ●●● ●●●●●●●●●●●●●●● ●● ●





1



● ●

●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●●● ● ●● ●●●●●●●●●●●●●●●●● ●●

● ●

0





● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●







●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●

−1

Residuals for the true endpoint (εTj)

●●● ●●●●●● ●●● ●● ●●●●●●●● ● ●●



●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●



● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●● ●



−2





●● ●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●



● ● ● ● ● ● ●●● ●●●●●● ●●●● ●●● ● ● ●

−40

−20

0

20

40

Residuals for the surrogate endpoint (εSj)

1.3.2

Individual causal association (ICA)

As detailed in Section 3.2.1 of Van der Elst (2016), the individual causal association (ICA; ρ∆ ) quantifies the association between the individual causal effects on S and T : √ √ √ √ σT0 T0 σS0 S0 ρT0 S0 + σT1 T1 σS1 S1 ρT1 S1 − σT1 T1 σS0 S0 ρT1 S0 − σT0 T0 σS1 S1 ρT0 S1 ρ∆ = q   , (1.1) √ √ σT0 T0 + σT1 T1 − 2 σT0 T0 σT1 T1 ρT0 T1 σS0 S0 + σS1 S1 − 2 σS0 S0 σS1 S1 ρS0 S1 where ρXY denotes the correlation between X and Y , and σXX is the variance of X. The ρ∆ metric is conceptually appealing but its practical use is challenging because the correlations between the counterfactuals ρS0 T1 , ρS1 T0 , ρT0 T1 and ρS0 S1 in expression (1.1) are not identifiable from the data. The simulation-based sensitivity analysis presented in Section 3.2.4 of Van der Elst (2016) is implemented in the function ICA.ContCont() of the Surrogate package. In the function call, the user specifies the estimable quantities σ bS0 S0 , σ bS1 S1 , σ bT0 T0 , σ bT1 T1 , ρbS0 T0 and ρbS1 T1 , and the scalars or vectors that should be considered for the unidentifiable correlations (ρS0 T1 , ρS1 T0 , ρT0 T1 and ρS0 S1 ). In the next sections, the ICA.ContCont() function is used to analyse the data of the case study. 1.3.2.1

BPRS as a surrogate for PANSS

The estimates for the observable quantities in expression (1.1) can be obtained by applying the summary() function to the fitted object STS_ECA_BPRS_PANSS (see Section 1.3.1.1). Based on these estimates (b σS0 S0 = 180.683, σ bS1 S1 = 180.943, σ bT0 T0 = 544.329, σ bT1 T1 = 550.660, ρbT0 S0 = 0.960, and ρbT1 S1 = 0.964) and the grid of values G = {−1, −0.90, ..., 1} for the correlations between the counterfactuals, the following code can be used to obtain the estimates of ρ∆ :

16

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING > STS_ICA_BPRS_PANSS summary(STS_ICA_BPRS_PANSS) # Generated output: Function call: ICA.ContCont(T0S0 = 0.96, T1S1 = 0.964, T0T0 = 544.329, T1T1 = 550.66, S0S0 = 180.683, S1S1 = 180.943, T0T1 = seq(-1, 1, by = 0.1), T0S1 = seq(-1, 1, by = 0.1), T1S0 = seq(-1, 1, by = 0.1), S0S1 = seq(-1, 1, by = 0.1))

# Total number of matrices that can be formed by the specified vectors # and/or scalars of the correlations in the function call #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 194481 # Total number of positive definite matrices #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 343

# Causal-inference (ICA) results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) ICA: 0.9567 (0.0366) Mode ICA:

[min: 0.6200; max: 0.9935]

0.9736

Quantiles of the ICA distribution:

17

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING 5%

10%

20%

50%

80%

90%

95%

0.9010345 0.9278634 0.9449260 0.9679856 0.9764993 0.9789617 0.9816797

> plot(STS_ICA_BPRS_PANSS) # Generated output:

0.4 0.0

0.2

Percentage

0.6

0.8

ICA

0.6

0.7

0.8

0.9

1.0

ρ∆

The output of the summary() function shows that a total of 194, 481 Σ matrices can be formed, of which 343 matrices were positive definite. These matrices are used to compute the vector of ρ∆ estimates using expression (1.1). Most of the obtained ρ∆ values were large (mean ρ∆ = 0.957, SD = 0.037) with 95% of them ≥ 0.901 (see also the plot). This indicates that the individual causal effects on S = BPRS allow for an accurate prediction of the individual causal effects on T = PANSS. The relatively small SD and narrow range of ρ∆ further confirms that the results are not sensitive to the assumptions regarding the unidentifiable correlations. A useful argument of the plot() function is Labels=TRUE, which adds the percentages of ρ∆ values that are equal to or larger than the midpoint values of each of the bins in the histogram: > plot(STS_ICA_BPRS_PANSS, Labels = TRUE) # Generated output:

18

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

0.8

ICA

0.4 0.0

0.2

Percentage

0.6

26.24%

90.38%

95.63% 99.71% 99.71% 99.71% 99.71% 97.67%

0.6

0.7

0.8

0.9

1.0

ρ∆

Alternatively, the option Type=CumPerc can be used to show the cumulative distribution function of ρ∆ : > plot(STS_ICA_BPRS_PANSS, Type = "CumPerc") # Generated output:

0.6 0.4 0.0

0.2

Cumulative percentage

0.8

1.0

ICA

0.7

0.8

0.9

1.0

ρ∆

Causal diagrams The plots above show all ρ∆ values that are compatible with the observable data, but it can also be interesting to consider a subgroup of the results for closer inspection. For example, one may want to evaluate which assumptions for the unidentified correlations typically lead to a particular range of ρ∆ values. In this context, a useful function is CausalDiagramContCont(). This function shows a causal diagram that depicts the median correlations between the potential outcomes for a specified range of ρ∆ values. For example, the

19

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

following code provides such a diagram for ρ∆ ≥ 0.90: > CausalDiagramContCont(STS_ICA_BPRS_PANSS, Min = 0.9, Max = 1) # Generated output: 0.96

S0

T0

0

0

0

0

S1

0.96

T1

In this diagram, the two horizontal lines depict the observable correlations ρS0 T0 = 0.96 and ρS1 T1 = 0.96 (see Section 1.3.1.1). The other four lines pertain to the unidentified correlations. The values that are shown are the medians of the unidentified correlations that lead to the specified range of ρ∆ values. The thickness of the lines in the diagram represents the strength of the correlations, with thicker lines being used for correlations closer to |1|. As can be seen in the causal diagram, ρ∆ ≥ 0.90 is typically associated with a scenario where the potential outcomes for S and T are zero-correlated, and similarly for the cross-over correlations between S and T . A similar causal diagram for ρ∆ ≤ 0.90 can be obtained using the following code: > CausalDiagramContCont(STS_ICA_BPRS_PANSS, Min = -1, Max = 0.9) # Generated output:

20

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING 0.96

S0

T0

0.8

0.8

0.8

0.8

S1

0.96

T1

As can be seen, ρ∆ ≤ 0.90 is typically associated with a scenario where the potential outcomes for S and T are strongly positively correlated (ρS0 S1 = ρT0 T1 = 0.80), and similarly for the cross-over correlations between S and T (ρS0 T1 = ρS1 T0 = 0.80). Which of these two scenarios for the unidentified correlations (leading to ρ∆ ≤ 0.90 or ρ∆ ≥ 0.90) is more plausible cannot be determined based on the data alone. Nonetheless, subjectspecific knowledge may be used to evaluate the plausibility of these two scenarios. For example, independence of the potential outcomes (ρT0 T1 = ρS0 S1 = 0) and zero cross-correlations (ρS0 T1 = ρS1 T0 = 0) seems less plausible based on substantive knowledge. Indeed, a patient’s responses on the BPRS and PANSS under the active control may be expected to be positively correlated with the patient’s responses on the BPRS and PANSS under the experimental treatment, because the BPRS and PANSS are closely related rating scales and the experimental and control treatments are similar (i.e., both are neuroleptic drugs). Note also that when subject-specific knowledge is available, this information can directly be taken into account in the computation of ρ∆ . For example, suppose that it can be reasonably assumed that ρS0 S1 ≥ 0. This information can be used in the computation of ρ∆ by replacing the argument S0S1=seq(-1, 1, by=.1) in the ICA.ContCont() function call (see above) by the argument S0S1=seq(0, 1, by=.1)). It may also be useful to examine which specific assumptions regarding the unidentified correlations between the counterfactuals lead to a particular ρ∆ value. For example, the unidentified correlations between the counterfactuals that result in the highest and the lowest ρ∆ values can be obtained using the following code: > tail(cbind(STS_ICA_BPRS_PANSS$Pos.Def, STS_ICA_BPRS_PANSS$ICA) [order(STS_ICA_BPRS_PANSS$ICA),])

21

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING # Generated output: T0T1 T0S0 T0S1 T1S0

T1S1 S0S1 STS_ICA_BPRS_PANSS$ICA

184

0.4 0.96

0.2

0.2 0.964

0.0

0.9837381

259

0.0 0.96

0.2

0.2 0.964

0.4

0.9837435

209 283

0.5 0.96 0.1 0.96

0.3 0.3

0.3 0.964 0.3 0.964

0.1 0.5

0.9868505 0.9868577

234

0.6 0.96

0.4

0.4 0.964

0.2

0.9934823

303

0.2 0.96

0.4

0.4 0.964

0.6

0.9934925

> head(cbind(STS_ICA_BPRS_PANSS$Pos.Def, STS_ICA_BPRS_PANSS$ICA) [order(STS_ICA_BPRS_PANSS$ICA),]) # Generated output: T0T1 T0S0 T0S1 T1S0

T1S1 S0S1 STS_ICA_BPRS_PANSS$ICA

343

0.9 0.96

0.9

0.9 0.964

0.9

0.6200344

339

0.9 0.96

0.8

0.9 0.964

0.8

0.7910606

342

0.8 0.96

0.8

0.9 0.964

0.9

0.7910932

337

0.9 0.96

0.9

0.8 0.964

0.8

0.7928505

341

0.8 0.96

0.9

0.8 0.964

0.9

0.7928831

338

0.8 0.96

0.7

0.9 0.964

0.8

0.8087451

As can be seen, the highest ρ∆ = 0.993 is obtained under the assumption that ρT0 T1 = 0.20, ρT0 S1 = 0.40, ρT1 S0 = 0.40, and ρS0 S1 = 0.60, whereas the lowest ρ∆ = 0.620 is obtained when ρT0 T1 = ρT0 S1 = ρT1 S0 = ρS0 S1 = 0.90. Again, subject-specific knowledge may be available to evaluate the plausibility of different sets of assumptions. 1.3.2.2

BPRS as a surrogate for CGI

The estimates for the observable quantities in expression (1.1) can be obtained by applying the summary() function to the fitted object STS_ECA_BPRS_CGI (see Section 1.3.1.2). Based on the obtained estimates (b σS0 S0 = 181.016, σ bS1 S1 = 180.917, σ bT0 T0 = 2.353, σ bT1 T1 = 2.147, ρbT0 S0 = 0.734, and ρbT1 S1 = 0.739) and the grid of values for correlations between the potential outcomes G = {−1, −0.90, ..., 1}, the following code can be used to obtain the ρ∆ estimates: > STS_ICA_BPRS_CGI summary(STS_ICA_BPRS_CGI) # Generated output: Function call: ICA.ContCont(T0S0 = 0.734, T1S1 = 0.739, T0T0 = 2.353, T1T1 = 2.147, S0S0 = 181.016, S1S1 = 180.917, T0T1 = seq(-1, 1, by = 0.1), T0S1 = seq(-1, 1, by = 0.1), T1S0 = seq(-1, 1, by = 0.1), S0S1 = seq(-1, 1, by = 0.1))

# Total number of matrices that can be formed by the specified vectors # and/or scalars of the correlations in the function call #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 194481 # Total number of positive definite matrices #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 13899

# Causal-inference (ICA) results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) ICA: 0.7108 (0.2214) Mode ICA:

[min: -0.9429; max: 0.9996]

0.7793

Quantiles of the ICA distribution: 5%

10%

20%

50%

80%

90%

95%

0.2875001 0.4543061 0.6001749 0.7583680 0.8730139 0.9205580 0.9507228

23

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

> plot(STS_ICA_BPRS_CGI) # Generated output:

0.15 0.10 0.00

0.05

Percentage

0.20

0.25

ICA

−1.0

−0.5

0.0

0.5

1.0

ρ∆

The output of the summary() function shows that a total of 194, 481 matrices can be formed, of which 13, 899 were positive definite. Compared to the case where S = BPRS and T = PANSS, the mean value for the estimates of ρ∆ is much lower now and the variability of the distribution is larger (M = 0.711, SD = 0.221, range [−0.943; 0.999]). Clearly, the results are highly sensitive to the assumptions regarding the unidentifiable correlations. Consequently, the assessment of the appropriateness of ∆S = the individual causal treatment effect on the BPRS as a surrogate for ∆T = the individual causal treatment effect on the CGI is largely a matter of unverifiable assumptions. Therefore, it may be concluded that the validity of the BPRS score as a surrogate for the CGI score is not clearly established. Causal diagrams To explore the typical assumptions regarding the unidentified correlations that are associated with a range of ρ∆ values, the function CausalDiagramContCont() can again be used. For example, the following codes provide causal diagrams that shows the median correlations between the potential outcomes for ρ∆ ≥ 0.70 and ρ∆ ≤ 0.70: > CausalDiagramContCont(STS_ICA_BPRS_CGI, Min = 0.7, Max = 1) # Generated output:

24

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING 0.73

S0

T0

−0.2

−0.2

−0.2

−0.2

S1

0.74

T1

> CausalDiagramContCont(STS_ICA_BPRS_CGI, Min = -1, Max = 0.7) # Generated output: 0.73

S0

T0

0.4

0.3

0.3

0.4

S1

0.74

T1

Notice that the red lines in the plot are used for negative correlations. As can be seen in the output, ρ∆ ≥ 0.70 is typically associated with a setting where it is assumed that all potential outcomes for S and T are slightly negatively correlated, i.e., ρS0 S1 = ρT0 T1 = ρS0 T1 = ρS1 T0 = −0.20. On the other hand, ρ∆ ≤ 0.70 is typically obtained when all unidentifiable correlations have small positive correlations. The biological plausibility of both scenarios can again be discussed with experts in the field.

25

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

1.3.3

The plausibility of finding a good surrogate

A lot of methodological work has been carried out to develop strategies to evaluate surrogate markers (for an overview, see Burzykowski, Molenberghs and Buyse, 2005), but the question regarding the existence of a valid surrogate endpoint has been largely left unanswered. Indeed, given a clinically relevant true endpoint and a treatment of interest, one could wonder whether there exists a good surrogate marker in the first place. As was detailed in Section 3.2.6 of Van der Elst (2016), the prediction mean squared error (PMSE) and ρ2min can be used to assess the plausibility of finding a good surrogate given a particular T . In the Surrogate package, the function ICA.ContCont() can be used to compute δ, and MinSurrContCont() can be used to compute ρ2min for a desired precision (a fixed value for δ). This will be illustrated in the next sections using the data of the case study. 1.3.3.1

BPRS as a surrogate for PANSS

The function ICA.ContCont() can be used to compute the vector of values for δ (PMSE). Using the fitted object STS_ICA_BPRS_PANSS (see Section 1.3.2.1), a plot that depicts the distribution of δ across all plausible ‘realities’ can be obtained using the code: > plot(STS_ICA_BPRS_PANSS, Good.Surr = TRUE, ICA = FALSE, breaks = 16)

0.2 0.0

0.1

Percentage

0.3

# Generated output:

20

40

60

80

δ

As can be seen, most δ ≥ 40. The 5th percentile of the vector of δ and its maximum value can be obtained using the following code:

26

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING > quantile(STS_ICA_BPRS_PANSS$GoodSurr$delta, c(0.05)) # Generated output: 5% 37.1181

> max(STS_ICA_BPRS_PANSS$GoodSurr$delta) # Generated output: [1] 83.939899 Thus 95% of the obtained δ values lay between 37.118 and 83.940. This means that the individual causal treatment effect on S allows for the prediction of the individual causal treatment effect on T with a PMSE that lies between about 6 and 9 points. The function MinSurrContCont() can be used to compute ρ2min for a fixed value of δ (the desired prediction precision) given the estimable variances σ bT0 T0 and σ bT1 T1 and a specified grid of values G for the unidentified parameter ρT0 T1 . For example, the following code can be used to obtain ρ2min for δ = 81 (corresponding with an average prediction error of 9 points), with σ bT0 T0 = 544.329 and σ bT1 T1 = 550.660 (see Section 1.3.2.1) and G = {−1, −0.90, ..., 1} for ρT0 T1 : > Rho2_min_BPRS_PANSS summary(Rho2_min_BPRS_PANSS) # Generated output: Function call: MinSurrContCont(T0T0 = 544.329, T1T1 = 550.66, Delta = 81, T0T1 = seq(-1, 1, by = 0.1))

# Rho2.Min results summary (Inf values are excluded) #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

27

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING Mean (SD) Rho^2_min: 0.8669 (0.1653)

[min: 0.2604; max: 0.9630]

Quantiles of the Rho2.Min distribution: 5% 10% 20% 50% 80% 90% 95% 0.6116690 0.7411044 0.8446589 0.9293890 0.9543103 0.9591197 0.9611637

> plot(Rho2_min_BPRS_PANSS, main = (delta ~ "=81")) # Generated output:

0.3 0.0

0.1

0.2

Percentage

0.4

0.5

0.6

δ =81

0.2

0.4

0.6

0.8

1.0

ρ2min

The output of the summary() function shows that the mean ρ2min value was about 0.867, and ρ2min ≥ 0.612 in 95% of the cases. Thus, a candidate S should produce a ρ2∆ of about (at least) 0.612 to achieve the desired average prediction error of 9 points in the prediction of the individual causal treatment effects on T . Obviously, the plausibility of finding a good surrogate is related to the desired prediction precision. The previous analysis may help to find a reasonable balance between these two components. For example, suppose that we are willing to take more risk and use δ = 225 (corresponding with an average prediction error of 15 points): > Rho2_min_BPRS_PANSS_2 summary(Rho2_min_BPRS_PANSS_2) # Generated output: Function call: MinSurrContCont(T0T0 = 544.329, T1T1 = 550.66, Delta = 225, T0T1 = seq(-1, 1, by = 0.1))

# Rho2.Min results summary (Inf values are excluded) #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) Rho^2_min: 0.7191 (0.2377)

[min: -0.0273; max: 0.8973]

Quantiles of the Rho2.Min distribution: 5%

10%

20%

50%

80%

90%

95%

0.2808456 0.4520649 0.6301383 0.8131983 0.8745950 0.8870444 0.8923917

> plot(Rho2_min_BPRS_PANSS_2) # Generated output:

29

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

0.3 0.0

0.1

0.2

Percentage

0.4

0.5

δ =225

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

2 ρmin

The output now shows that the mean ρ2min = 0.719, and ρ2min ≥ 0.281 in 95% of the settings. Thus a candidate S should produce a ρ2∆ of about (at least) 0.281 to achieve the prediction of the individual causal treatment effects on T with an average prediction error of 15 points. As expected, this value is lower as compared to the value ρ2min ≥ 0.612 that was obtained when an average prediction error of 9 points was used. 1.3.3.2

BPRS as a surrogate for CGI

Using the fitted object STS_ICA_BPRS_CGI (see Section 1.3.2.2), a plot that depicts the distribution of δ across all plausible ‘realities’ can be obtained using the command: > plot(STS_ICA_BPRS_CGI, Good.Surr = TRUE, ICA = FALSE, breaks = 9) # Generated output:

30

0.10 0.00

0.05

Percentage

0.15

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

0

1

2

3

4

δ

As can be seen, most δ lay in the [0.5; 3.5] interval. Thus, the individual causal effect on the CGI can be predicted using the individual causal treatment effect on the BPRS with a prediction error between about 0.7 and 1.9 points. The function MinSurrContCont() can be used to compute ρ2min for a fixed value of δ. For example, the following code can be used to obtain ρ2min for δ = 1 (corresponding with an average prediction error of 1 point on the 7-point CGI scale), with σ bT0 T0 = 2.353 and σ bT1 T1 = 2.147 (see Section 1.3.2.2) and G = {−1, −0.90, ..., 1} for ρT0 T1 : > Rho2_min_BPRS_CGI summary(Rho2_min_BPRS_CGI) # Generated output: Function call: MinSurrContCont(T0T0 = 2.353, T1T1 = 2.147, Delta = 1, T0T1 = seq(-1, 1, by = 0.1))

# Rho2.Min results summary (Inf values are excluded) #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) Rho^2_min: 0.6966 (0.2561)

[min: -0.1065; max: 0.8888]

31

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

Quantiles of the Rho2.Min distribution: 5%

10%

20%

50%

80%

90%

95%

0.2243130 0.4084667 0.6003414 0.7979605 0.8643237 0.8777851 0.8835676

> plot(Rho2_min_BPRS_CGI, main = (delta ~ "=1")) # Generated output:

Percentage

0.0

0.1

0.2

0.3

0.4

δ =1

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

ρ2min

The output shows that the mean ρ2min value was 0.697, and ρ2min ≥ 0.224 in 95% of the cases. Thus, a candidate S should produce a ρ2∆ of about (at least) 0.224 to achieve the desired level of accuracy in the prediction of the individual causal treatment effects on T .

1.4 1.4.1

The multiple-trial setting Expected causal association (ECA)

In the meta-analytic framework, two quantities are used to assess surrogacy. First, the individuallevel surrogacy coefficient, which is defined as the treatment- and trial-corrected correlation between S and T . Second, the trial-level surrogacy coefficient, which is the correlation between the expected causal treatment effects on S and T (ECA). The function BifixedContCont() implements the two-stage bivariate fixed-effect modelling approach (for details, see Tibaldi et al., 2003). The use of this function is illustrated in the next

32

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

two sections. 1.4.1.1

BPRS as a surrogate for PANSS

The following code can be used to assess whether the BPRS is an appropriate surrogate for the PANSS, based on expected causal effects using the reduced bivariate fixed-effect modelling approach with treating psychiatrist (investigator) as the trial/clustering unit: > MTS_ECA_BPRS_PANSS summary(MTS_ECA_BPRS_PANSS) # Generated output: Function call: BifixedContCont(Dataset = Schizo, Surr = BPRS, True = PANSS, Treat = Treat, Trial.ID = InvestId, Pat.ID = Id, Model = "Reduced")

# Data summary and descriptives #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total number of trials:

150

Total number of patients:

2020

M(SD) patients per trial: 13.4667 (10.1165)

[min: 2; max: 52]

Total number of patients in experimental treatment group: Total number of patients in control treatment group:

1497

523

Mean surrogate and true endpoint values in each treatment group: Control.Treatment Experimental.treatment Surrogate True endpoint

-6.6577

-8.9873

-11.5277

-16.1222

33

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING Var surrogate and true endpoint values in each treatment group: Control.Treatment Experimental.treatment Surrogate

182.8884

181.2518

True endpoint

552.2727

549.4296

Correlations between the true and surrogate endpoints in the control (r_T0S0) and the experimental treatment groups (r_T1S1): Estimate Standard Error CI lower limit CI upper limit r_T0S0

0.9599

0.0062

0.9563

0.9632

r_T1S1

0.9638

0.0059

0.9606

0.9668

# Meta-analytic results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ R2 Trial Standard Error CI lower limit CI upper limit 0.9199

0.0127

0.8951

0.9447

R2 Indiv Standard Error CI lower limit CI upper limit 0.9276

0.0031

0.9215

0.9337

R Trial Standard Error CI lower limit CI upper limit 0.9591

0.0233

0.9439

0.9702

R Indiv Standard Error CI lower limit CI upper limit 0.9631

0.0060

0.9494

0.9732

The summary() function provides descriptive information such as the number of trials/clusters, the total number of patients, the mean number of patients per trial/cluster, and the means and variances of S and T , as well as their correlations in both treatment groups. The surrogacy btrial (ECA) and R bindiv are given at the end of the output. estimates R btrial estimate equalled 0.959 (CI95% = [0.944; 0.970]), indicating that As can be seen, the R there is a strong association between the expected causal treatment effects on S = BPRS and

34

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

T = PANSS at the level of the trials/clusters (investigators). Thus, the expected causal treatment effect on T can be highly accurately predicted based on the expected causal treatment effect on S. A graphical illustration of the trial-level surrogacy can be obtained by applying the plot() function to the fitted object MTS_ECA_BPRS_PANSS using the Trial.Level=TRUE argument: > plot(MTS_ECA_BPRS_PANSS, Indiv.Level = FALSE, Trial.Level = TRUE) # Generated output: Trial−level surrogacy 20





●●

●●

10





●● ● ●

●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●

● ●

0



● ●



−10







● ●





●● ● ●





● ● ● ●

● ● ●●



● ●

●●●●● ● ● ●





● ●●● ●



●● ● ●





● ●







● ●● ●● ● ●● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ● ●●

−20

Treatment effect on the True endpoint (βi)



















−30





−15

−10

−5

0

5

10

Treatment effect on the Surrogate endpoint (αi)

Each circle in the figure depicts an expected causal treatment effect on S and T in a cluster. The size of the circles is proportional to the number of patients that are included in the cluster i (to make the size of all circles identical, the argument Weighted=FALSE can be used in the plot() function call). bindiv = 0.963 (CI95% = The output of the summary() function furthermore shows that R [0.949; 0.973]), indicating a strong association between S = BPRS and T = PANSS at the level of the individual patient (after adjusting for trial- and treatment-effects). A graphical illustration of the individual-level surrogacy can be obtained by applying the plot() function to the fitted object MTS_ECA_BPRS_PANSS using the argument Indiv.Level=TRUE: > plot(MTS_ECA_BPRS_PANSS, Indiv.Level = TRUE, Trial.Level = FALSE) # Generated output:

35

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

100

Individual−level surrogacy ●

50 0 −50

Residuals for the True endpoint (εTij)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●●● ● ●●● ●● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ●● ● ●●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ●● ●● ● ● ● ● ●●● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ●●● ● ●●●●●● ●● ● ●● ● ●●● ●● ●● ● ●●●●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●●● ●● ●● ●●● ● ● ●



−100





−40

−20

0

20

40

Residuals for the Surrogate endpoint (εSij)

Overall, the analysis based on expected causal treatment effects in the MTS indicates that the BPRS score is an excellent surrogate for the PANSS score. 1.4.1.2

BPRS as a surrogate for CGI

The following code can be used to assess surrogacy based on expected causal effects (S = BPRS, T = CGI) in the MTS, using a weighted bivariate fixed-effect model with treating physician as the clustering unit: > MTS_ECA_BPRS_CGI summary(MTS_ECA_BPRS_CGI)

# Generated output: Function call: BifixedContCont(Dataset = Schizo, Surr = BPRS, True = CGI, Treat = Treat, Trial.ID = InvestId, Pat.ID = Id, Model = "Reduced")

# Data summary and descriptives

36

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total number of trials:

149

Total number of patients:

2011

M(SD) patients per trial: 13.4966 (10.0879)

[min: 2; max: 52]

Total number of patients in experimental treatment group: Total number of patients in control treatment group: 518

1493

Mean surrogate and true endpoint values in each treatment group: Control.Treatment Experimental.treatment Surrogate True endpoint

-6.6853

-8.9853

3.4440

3.1936

Var surrogate and true endpoint values in each treatment group: Control.Treatment Experimental.treatment Surrogate True endpoint

183.3727

181.0789

2.3634

2.1589

Correlations between the true and surrogate endpoints in the control (r_T0S0) and the experimental treatment groups (r_T1S1): Estimate Standard Error CI lower limit CI upper limit r_T0S0

0.7337

0.0152

0.7128

0.7532

r_T1S1

0.7346

0.0151

0.7138

0.7541

# Meta-analytic results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ R2 Trial Standard Error CI lower limit CI upper limit 0.5159

0.0576

0.4031

0.6287

R2 Indiv Standard Error CI lower limit CI upper limit 0.5412

0.0151

0.5117

0.5708

37

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

R Trial Standard Error CI lower limit CI upper limit 0.7183

0.0574

0.6302

0.7880

R Indiv Standard Error CI lower limit CI upper limit 0.7357 0.0151 0.6520 0.8017

> plot(MTS_ECA_BPRS_CGI, Indiv.Level = FALSE, Trial.Level = TRUE) # Generated output: Trial−level surrogacy

1.5







1.0













0.5





●●



0.0













●●

● ●● ●





● ●



● ●





●●

● ● ●●●●●●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ●



● ●●● ● ● ● ● ●●● ●● ●● ● ● ● ●







●● ● ● ● ●● ●●





●●



−0.5





● ●





● ●

−1.0

●● ●









−2.0

−1.5

Treatment effect on the True endpoint (βi)



● ●



−15

−10

−5

0

5

10

Treatment effect on the Surrogate endpoint (αi)

> plot(MTS_ECA_BPRS_CGI, Indiv.Level = TRUE, Trial.Level = FALSE) # Generated output:

38

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING Individual−level surrogacy

4

● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ●● ●● ●● ● ●● ● ●● ●● ● ● ●●● ● ● ● ● ●● ●●●● ● ●●●● ●● ● ●● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●●● ● ● ●●●● ●●● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ●●●●●● ●● ● ●● ●● ● ●●●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●●●●●● ● ●●● ●●● ●● ●● ● ●● ●● ●●● ● ● ● ●● ● ●● ●●● ● ●● ●● ● ●●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ●● ●●●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ●● ● ● ●● ● ● ●● ● ●●● ● ●●● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ●●●●● ● ●●● ● ● ●● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ●●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ● ●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●●●● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●●●● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ●●●● ●● ●●● ● ● ● ● ● ●● ●● ●●● ● ●● ●●●● ●● ● ●● ●● ● ●● ●● ●● ● ●● ● ●● ●●● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●●● ●● ● ●●● ● ● ● ●●● ● ●● ●● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

2 0





−2

Residuals for the True endpoint (εTij)





−40

−20



0

20

● ●

40

Residuals for the Surrogate endpoint (εSij)

btrial = 0.718 (CI95% [0.630; 0.788]), indicating that the accuracy by The output shows that R which the expected causal treatment effect on the T = CGI can be predicted based on the bindiv = 0.736 expected causal treatment effect on S = BPRS is relatively low. Likewise, R (CI95% [0.652; 0.802]), indicating that the accuracy by which a patient’s T can be predicted as based on S (after adjusting for cluster- and treatment-effects) is relatively low.

1.4.2

Meta-analytic individual causal association (MICA)

The Meta-Analytic Individual Causal Association (MICA; ρM ) is defined as the correlation between the individual causal treatment effects on S and T (for details, see Chapter 3 in Van der Elst, 2016): √ ρM =

q   √ √ σT0 T0 + σT1 T1 − 2 σT0 T0 σT1 T1 ρT0 T1 σS0 S0 + σS1 S1 − 2 σS0 S0 σS1 S1 ρS0 S1 ρ∆ dbb daa Rtrial + q ,   √ √ dbb + σT0 T0 + σT1 T1 − 2 σT0 T0 σT1 T1 ρT0 T1 daa + σS0 S0 + σS1 S1 − 2 σS0 S0 σS1 S1 ρS0 S1

(1.2) where daa and dbb are the variances of the expected causal treatment effects on S and T , Rtrial is the correlation between the expected causal treatment effects on S and T (ECA), ρXY is the correlation between X and Y , σXX is the variance of X, and ρ∆ is the individual causal association (see Section 1.3.2). As was also the case with the ICA, MICA is a conceptually appealing measure of surrogacy but its practical use is challenging because the correlations between the counterfactuals (i.e., ρS0 T1 , ρS1 T0 , ρT0 T1 and ρS0 S1 ) are not estimable from the data. To deal with this issue, the simulation-based approach detailed in Van der Elst (2016) was implemented in the function MICA.ContCont(). btrial , dbaa , dbbb , ρbS T , ρbS T , In the function call, the user specifies the estimable quantities (R 0 0 1 1 σ bS0 S0 , σ bS1 S1 , σ bT0 T0 and σ bT1 T1 ) and the scalars or vectors that should be considered for the

39

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

unidentifiable parameters (ρS0 T1 , ρS1 T0 , ρT0 T1 and ρS0 S1 ). The use of this function will be illustrated below for the cases where S = BPRS and T = PANSS, and S = BPRS and T = CGI. 1.4.2.1

BPRS as a surrogate for PANSS

The estimates for Rtrial , ρS0 T0 , ρS1 T1 , σS0 S0 , σS1 S1 , σT0 T0 , and σT1 T1 can be obtained by applying the summary() function to the fitted object MTS_ECA_BPRS_PANSS (see Section 1.4.1.1). The estimated variances of the expected causal treatment effects on S and T (daa and dbb ) can be obtained using the following code: > MTS_ECA_BPRS_PANSS$D.Equiv # Generated output: Treatment.S Treatment.T Treatment.S

22.62958

38.75356

Treatment.T

38.75356

71.28230

Using the obtained estimates and the grid of values G = {−1, −0.90, ..., 1} for the unidentified correlations, the ρM values can be obtained using the following code: > MTS_MICA_BPRS_PANSS summary(MTS_MICA_BPRS_PANSS) # Generated output: Function call: MICA.ContCont(Trial.R = 0.959, D.aa = 22.63, D.bb = 71.282, T0S0 = 0.96, T1S1 = 0.964, T0T0 = 552.273, T1T1 = 549.43, S0S0 = 182.888, S1S1 = 181.252, T0T1 = seq(-1, 1, by = 0.1), T0S1 = seq(-1, 1, by = 0.1), T1S0 = seq(-1, 1, by = 0.1), S0S1 = seq(-1, 1, by = 0.1))

40

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING # Total number of matrices that can be formed by the specified vectors # and/or scalars of the correlations in the function call #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 194481 # Total number of positive definite matrices #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 343

# Causal-inference (MICA) results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) MICA: 0.9577 (0.0285) Mode MICA:

[min: 0.7515; max: 0.9852]

0.9731

Quantiles of the MICA distribution: 5%

10%

20%

50%

80%

90%

95%

0.9086699 0.9277713 0.9456289 0.9668705 0.9755264 0.9780773 0.9793051 The output shows that a total of 194, 481 matrices can be formed based on the estimable correlations ρbS0 T0 and ρbS1 T1 and all possible combinations of the values in G for the unidentifiable correlations. Of these matrices, only 343 were positive definite. The mean of the ρM = 0.958, and 95% of the ρM values exceeded 0.909. The variability of ρM was small (SD = 0.029, range [0.752; 0.985]), indicating that the sensitivity of the results with respect to the assumptions regarding the unidentifiable correlations was relatively small. Overall, the analyses based on individual causal effects in the MTS indicate that the BPRS is an appropriate surrogate for the PANSS (in line with the results based on expected causal effects). Notice that, in line with expectations, the SD and range of ρM is smaller than the SD and range of ρ∆ (see Section 1.3.2.1), because the trial-level information is taken into account in the computation of ρM . A histogram of the distribution of ρM can be obtained by applying the plot() command to the fitted object MTS_MICA_BPRS_PANSS (of class MICA.ContCont()): > plot(MTS_MICA_BPRS_PANSS, MICA = TRUE, ICA = FALSE) # Generated output:

41

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

0.3 0.0

0.1

0.2

Percentage

0.4

0.5

0.6

MICA

0.75

0.80

0.85

0.90

0.95

1.00

ρM

Causal diagrams The function CausalDiagramContCont() provides a causal diagram that shows the median correlations between the potential outcomes for a specified range of values for ρM (similarly as was the case for ρ∆ , see above). For example, the following codes provides such diagrams for ρM ≥ 0.90 and ρM ≤ 0.90: > CausalDiagramContCont(MTS_MICA_BPRS_PANSS, Min = 0.9, Max = 1) # Generated output: 0.96

S0

T0

0

0

0

0

S1

0.96

T1

> CausalDiagramContCont(MTS_MICA_BPRS_PANSS, Min = -1, Max = 0.9)

42

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

# Generated output: 0.96

S0

T0

0.8

0.8

0.8

0.8

S1

0.96

T1

As can be seen, the results are identical to those that were obtained for ρ∆ (described in Section 1.3.2.1). 1.4.2.2

BPRS as a surrogate for CGI

The estimates for Rtrial , ρS0 T0 , ρS1 T1 , and the variances of S and T that are needed as input for the MICA.ContCont were provided earlier (see Section 1.4.1.2). Further, estimates of daa and dbb are needed as input of the function. These can be obtained using the code: > MTS_ECA_BPRS_CGI$D.Equiv # to obtain estimates for d_aa and d_bb # Generated output: Treatment.S Treatment.T Treatment.S

22.716329

2.0788811

Treatment.T

2.078881

0.3212402

Using the obtained estimates and the grid of values G = {−1, −0.90, ..., 1} for the correlations between the potential outcomes, the ρM estimates can be obtained using the following command: > MTS_MICA_BPRS_CGI summary(MTS_MICA_BPRS_CGI) # Generated output: Function call: MICA.ContCont(Trial.R = 0.718, D.aa = 22.716, D.bb = 0.321, T0S0 = 0.734, T1S1 = 0.735, T0T0 = 2.363, T1T1 = 2.159, S0S0 = 183.373, S1S1 = 181.079, T0T1 = seq(-1, 1, by = 0.1), T0S1 = seq(-1, 1, by = 0.1), T1S0 = seq(-1, 1, by = 0.1), S0S1 = seq(-1, 1, by = 0.1))

# Total number of matrices that can be formed by the specified vectors # and/or scalars of the correlations in the function call #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 194481 # Total number of positive definite matrices #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 14099

# Causal-inference (MICA) results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) MICA: 0.7128 (0.1898) Mode MICA:

[min: -0.4787; max: 0.9839]

0.7736

Quantiles of the MICA distribution: 5%

10%

20%

50%

80%

90%

95%

0.3403669 0.4765764 0.6055082 0.7517271 0.8584296 0.9027517 0.9314159

44

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

> plot(MTS_MICA_BPRS_CGI) # Generated output:

0.15 0.00

0.05

0.10

Percentage

0.20

0.25

MICA

−0.5

0.0

0.5

1.0

ρM

The output shows that the mean ρM was moderate (ρM = 0.713) and the variability was large (SD = 0.190, range [−0.479; 0.984]). These results indicate that the impact of the assumptions regarding the unidentified correlations on ρM was large. Thus, in some scenario’s there is a strong association between the individual causal effects on S and T , whilst in other scenarios the association is very weak. Notice again that the SD and range of ρM is smaller than the SD and range of ρ∆ (see Section 1.3.2.2), because the trial-level information is taken into account in the computation of ρM . Causal diagrams To explore the typical assumptions regarding the unidentified correlations that are associated with a range of ρM values, the function CausalDiagramContCont() can again be used. For example, the following codes provide causal diagrams with the median correlations between the counterfactuals for ρM ≥ 0.70 and ρM ≤ 0.70: > CausalDiagramContCont(MTS_MICA_BPRS_CGI, Min = 0.70, Max = 1) # Generated output:

45

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING 0.73

S0

T0

−0.2

−0.2

−0.2

−0.2

S1

0.74

T1

> CausalDiagramContCont(MTS_MICA_BPRS_CGI, Min = -1, Max = 0.70) # Generated output: 0.73

S0

T0

0.4

0.3

0.3

0.4

S1

0.74

T1

As can be seen, the results are the same to the results that were obtained for ρ∆ (see Section 1.3.2.2). Overall, the analyses based on individual causal effects in the MTS indicate that the appropriateness of the BPRS as a surrogate for the CGI score could not be clearly established (in line with the results based on expected causal effects).

46

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING

1.5

Accounting for the sampling variability in the estimation of ρS0 T0 and ρS1 T1

In the results that were detailed above, the sampling variability in the estimation of ρS0 T0 and ρS1 T1 was not accounted for, e.g., in the analyses where S = BPRS and T = PANSS these quantities were fixed as ρbS0 T0 = 0.960 and ρbS1 T1 = 0.964. To take the sampling variability in the estimation of ρS0 T0 and ρS1 T1 into account, ρS0 T0 and ρS1 T1 can be sampled from a uniform distribution with (min, max) values equal to the upper and lower boundaries of the 95% CIs of these correlations. The following command can be used to conduct such an analysis for S = BPRS and T = PANSS in the single-trial setting, where CI95% = [0.956; 0.963] and CI95% = [0.961; 0.967] (and similarly for the other analyses): > STS_ICA_BPRS_PANSS summary(STS_ICA_BPRS_PANSS) # Generated output: Function call: ICA.ContCont(T0S0 = runif(n = 2e+05, min = 0.956, max = 0.963), T1S1 = runif(n = 2e+05, min = 0.961, max = 0.967), T0T0 = 544.329, T1T1 = 550.66, S0S0 = 180.683, S1S1 = 180.943, T0T1 = seq(-1, 1, by = 0.1), T0S1 = seq(-1, 1, by = 0.1), T1S0 = seq(-1, 1, by = 0.1), S0S1 = seq(-1, 1, by = 0.1))

# Total number of matrices that can be formed by the specified vectors # and/or scalars of the correlations in the function call #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 194481

47

CHAPTER 1. EVALUATING SURROGACY IN THE NORMAL-NORMAL SETTING # Total number of positive definite matrices #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 346

# Causal-inference (ICA) results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) ICA: 0.9560 (0.0375) Mode ICA:

[min: 0.6021; max: 0.9930]

0.9736

Quantiles of the ICA distribution: 5%

10%

20%

50%

80%

90%

95%

0.8977303 0.9261374 0.9448691 0.9678203 0.9761053 0.9787146 0.9803783 As shown in the output, the ρ∆ mean = 0.956, SD = 0.038, and range [0.602; 0.993], whereas the ρ∆ mean = 0.957, SD = 0.037, and range [0.620; 0.994] when the uncertainty in the estimation of ρS0 T0 and ρS1 T1 was not accounted for (see Section 1.3.2.1). The impact of accounting for measurement error on the results was thus small. Similar results were obtained when sampling uncertainty was taken into account in the analysis of S = BPRS and T = CGI (data not shown).

48

Chapter 2

Evaluating surrogacy in the binarybinary setting: case study analysis details 2.1

Introduction

In this chapter, the use of the R package Surrogate for the analysis of the case study discussed in Chapter 4 of Van der Elst (2016) is detailed. In Section 2.2, the data are briefly described. In 2 Section 2.3, the RH and SPF results are discussed for the analysis where S = clinically relevant change on the BPRS and T = clinically relevant change on the PANSS. Section 2.4 focuses on the analysis where S = clinically relevant change on the BPRS and T = clinically relevant change on the CGI.

2.2

The dataset: a clinical trial in schizophrenia

In Section 2.1.1 of Van der Elst (2016), five clinical trials in schizophrenia were described. Here, a subset of this dataset is considered, i.e., the data of one of these clinical trial were used in the 2 analysis (recall that RH and SPF are metrics that were developed in the single-trial setting). In

this trial, a total of 454 patients were treated for eight weeks. The endpoints of interest were the presence or absence of clinically relevant changes in schizophrenic symptomatology as evaluated by the BPRS/PANSS/CGI scales (see Section 2.1.1 of Van der Elst, 2016). In the analyses below, it will be examined whether clinically meaningful change on the BPRS (a simpler and easier to administer scale to assess schizophrenic symptoms) is an appropriate surrogate for clinically meaningful change on the PANSS (a more complex scale that requires

49

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

more time and more skilled personnel for its administration). R code is also provided for the analysis where it is examined whether clinically meaningful change on the BPRS is an appropriate surrogate for clinically meaningful change on the CGI. To simplify the exposition, in the following sections the names BPRS/PANSS/CGI will be used to refer to clinically meaningful change in these scales. The dataset (Schizo_Bin) is included in the Surrogate package. After installation of the package in R, the following code can be used to load the package and the schizophrenia dataset in memory for the subsequent analyses: > library(Surrogate) # load the Surrogate library > data(Schizo_Bin)

# load the data

> head(Schizo_Bin)

# have a look at the first observations

# Generated output: Id InvestId BPRS_Bin PANSS_Bin CGI_Bin Treat 1

1

1

1

1

0

1

2

2

2

1

1

0

1

3 4

3 4

2 2

0 0

0 0

1 1

-1 1

5

5

2

1

1

0

-1

6

6

3

1

1

0

1

The dataset contains six variables: • ‘Id’: the identification number of the patient. • ‘InvestId’: the identification number of the treating physician (investigator). • ‘BPRS_Bin’: a binary endpoint coded as 1 = clinically meaningful change on the BPRS scale occurred, and 0 = otherwise. • ‘PANSS_Bin’: a binary endpoint coded as 1 = clinically meaningful change on the PANSS scale occurred, and 0 = otherwise. • ‘CGI_Bin’: a binary endpoint coded as 1 = clinically meaningful change on the CGI scale occurred, and 0 = otherwise. • ‘Treat’: the treatment indicator, coded as −1 = control treatment (a dose of 10 mg. of haloperidol) and 1 = experimental treatment (a dose of 8 mg. of risperidone).

50

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

2.3

BPRS as a surrogate for PANSS

2.3.1

Exploratory data analysis

The function MarginalProbs() can be used to obtain some descriptive summary measures of the data: > MarginalProbs(Dataset = Schizo_Bin, Surr = BPRS_Bin, True = PANSS_Bin, Treat = Treat) # Generated output: $Theta_T0S0 [1] 68.54167 $Theta_T1S1 [1] 141.6104 $Freq.Cont 0

1

0 105

12

1

94

12

$Freq.Exp 0

1

0

94

7

1

11 116

$pi1_1_ [1] 0.4215247 $pi0_1_ [1] 0.05381166 $pi1_0_ [1] 0.05381166

51

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING $pi0_0_ [1] 0.470852 $pi_1_1 [1] 0.5087719 $pi_1_0 [1] 0.03070175 $pi_0_1 [1] 0.04824561 $pi_0_0 [1] 0.4122807 attr(,"class") [1] "MarginalProbs" In the output, the Theta_T0S0 (θT 0S0 ) and Theta_T1S1 (θT 1S1 ) components contain the estimated odds ratios for S = BPRS and T = PANSS in the active control and experimental treatment groups, respectively. As can be seen, the association between S and T is stronger in the experimental treatment group (θbT 1S1 = 141.6104) than in the control treatment group (θbT 0S0 = 68.5417). Further, the Freq.Cont and Freq.Exp components in the output provide the frequencies for the cross-tabulation of S versus T in the control and experimental groups. For example, Freq.Cont shows that 12 patients had S = 1 and T = 0 in the control group. Towards the end of the output, estimates are provided for the identifiable marginal probabilities. For example, pi1_1_ provides an estimate for π1·1· = P (T = 1, S = 1 | Z = 0) = 94/223 = 0.4215, and the other marginal probabilities are obtained in a similar way.

2.3.2

The individual causal association and surrogate predictive function

2.3.2.1

The vector of potential outcomes Y

2 The RH and the SPF are functions of the parameters π characterizing the distribution of the vec2 tor of potential outcomes Y . Therefore, the first step in the computation of the RH and the SPF

is the implementation of a Monte-Carlo algorithm to uniformly sample vectors π in the region of the parametric space that is compatible with the data (i.e., ΓD ). In addition, one may also sample a sub-region of ΓD that has a special conceptual meaning, e.g., the sub-region of ΓD where monotonicity holds for both endpoints. In the Surrogate library, samples of π can be obtained using the functions ICA.BinBin(), ICA.BinBin.Grid.Full(), or ICA.BinBin.Grid.Sample()

52

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

(for details, see the Surrogate manual). Due to its better numerical performance, the function ICA.BinBin.Grid.Sample() is used here. The ICA.BinBin.Grid.Sample() function requires the user to specify the following main arguments: • pi1_1_=, pi0_1_=, ..., pi_0_1=: the identifiable marginal probabilities that can be obtained using the MarginalProbs() function as shown earlier. • Monotonicity=: the assumption that is made regarding monotonicity (for details, see the Surrogate package). In the current analysis, it is assumed that monotonicity (a strong and often unverifiable assumption) does not hold. Such an analysis can be requested by using the Monotonicity=c("No") argument in the function call (the impact of monotonicity on the results is examined in Section 2.3.3). • M=: the number of runs that are conducted. The following command can be used to request an analysis using M = 10, 000: > ICA ICA$Pi.Vectors[1:1,1:16] # Generated output: Pi_0000

Pi_0100

Pi_0010

Pi_0001

Pi_0101

Pi_1000

1 0.2834182 0.00165064 0.004011403 0.01415234 0.1716788 0.005595477 Pi_1010

Pi_1001

Pi_1110

Pi_1101

Pi_1011

Pi_1111

1 0.1192749 0.007882472 0.01047195 0.02701474 0.01546092 0.2762922 Pi_0110

Pi_0011

Pi_0111

Pi_1100

1 0.005270105 0.01070427 0.03381422 0.01330731 2.3.2.2

2 The individual causal association (RH )

2 The computation of RH using the Surrogate package is straightforward, as the aforementioned

ICA.BinBin.Grid.Sample() function (see Section 2.3.2.1) also computes this metric. The summary() function can be applied to the fitted ICA object to obtain descriptive statistics for these metrics:

53

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING > summary(ICA) # Generated output (restricted): Function call: ICA.BinBin.Grid.Sample(pi1_1_ = 0.4215, pi1_0_ = 0.0538, pi_1_1 = 0.5088, pi_1_0 = 0.0307, pi0_1_ = 0.0538, pi_0_1 = 0.0482, Monotonicity = c("No"), M = 10000, Seed = 1) # Total number of valid Pi vectors #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 86

# R2_H results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) R2_H: 0.5280 (0.0963) Mode R2_H:

[min: 0.2352; max: 0.6951]

0.5654

Quantiles of the R2_H distribution: 5%

10%

20%

50%

80%

90%

95%

0.3319 0.3910 0.4828 0.5475 0.5912 0.6291 0.6412

# R_H results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) R_H: 0.7231 (0.0720) Mode R_H:

[min: 0.4850; max: 0.8337]

0.7519

Quantiles of the R_H distribution: 5%

54

10%

20%

50%

80%

90%

95%

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING 0.5761 0.6253 0.6948 0.7399 0.7689 0.7932 0.8007

# Theta_T results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) Theta_T: 5.6613 (12.0080) Mode Theta_T:

[min: 0.0278; max: 86.5034]

0.6469

Quantiles of the Theta_T distribution: 5%

10%

20%

50%

0.05224

0.09916

0.16837

1.20480

80%

90%

95%

6.08761 14.14089 27.62107

# Theta_S results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) Theta_S: 6.5233 (16.1188) Mode Theta_S:

[min: 0.0192; max: 117.6268]

0.5267

Quantiles of the Theta_S distribution: 5%

10%

20%

50%

0.06388

0.09188

0.14532

1.21947

80%

90%

95%

6.64230 14.02679 32.09758

The first part of the output shows the number of valid vectors π (vectors in ΓD ) that are obtained in the analysis. As can be seen, the 10, 000 runs of the algorithm led to 86 vectors π 2 that are compatible with the data. These valid vectors are subsequently used to compute RH , 2 2 b mean = 0.5280, median = 0.5475, RH , θT , and θS . Here, the focus is R . As can be seen, R H

H

mode = 0.5654 (SD = 0.0963, range [0.2352; 0.6951]). These results indicate that the individual causal treatment effect on the PANSS can be predicted with relatively low accuracy based on 2 the individual causal treatment effect on the BPRS – but the large range of RH values indicate

that the unverifiable assumptions regarding the unidentifiable probabilities π substantially affect 2 RH . 2 The density function for RH can be obtained using the following command:

> plot(ICA, ylim=c(0, 8)) # Generated output:

55

3 0

1

2

Density

4

5

6

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

0.2

0.3

0.4

0.5

0.6

0.7

R2H

Causal diagrams The plot above shows the frequency density based on the vectors π that are compatible with the data at hand. Therefore, based on the data alone one cannot discriminate 2 received more support and other between the scenarios in which relatively large values of RH 2 received more support. scenarios in which smaller values of RH

However, in some situations, expert knowledge may be available to evaluate the plausibility of the different scenarios. The function CausalDiagramBinBin() may play an important role in this context (see also the function CausalDiagramContCont() discussed in Chapter 1). This function shows a causal diagram that depicts the median of the informational coefficients of association (rh2 ) or odds ratios, describing the association structure for the counterfactual vector 0

Y = (T0 , T1 , S0 , S1 ) . The function can also be used to describe the association structure of Y 2 in a specified subgroup defined by the values of RH . The following arguments are needed when

the function is called: • x=: a fitted object of class ICA.BinBin. • Values=: specifies whether the median informational coefficients of correlation (Values="Corrs") or the median odds ratios (Values="ORs") between the counterfactuals should be depicted. Default Values="Corrs".

56

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

b2 that should be considered. Default • Min=, Max=: the minimum and maximum values for R H Min=0, Max=1. For example, the following commands can be used to obtain causal diagrams that are compatible 2 2 with RH ≥ 0.50 and RH ≤ 0.50, respectively:

> CausalDiagramBinBin(x=ICA, Min = 0.5, Max = 1) # Generated output: Note. The figure is based on 63 observations.

S0

0.51

T0

0.06

0.06

0.06

0.08

S1

0.6

T1

> CausalDiagramBinBin(x=ICA, Min = 0, Max = .5) # Generated output: Note. The figure is based on 23 observations.

57

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

S0

0.51

T0

0.21

0.21

0.21

0.24

S1

0.6

T1

In these diagrams, the two horizontal lines depict the identifiable informational coefficients of association between S and T in the two treatment conditions, i.e., rbh2 (S0 , T0 ) = 0.51 and rbh2 (S1 , T1 ) = 0.60. Essentially, these coefficients quantify the strength of the association between S and T in both treatment groups. The other four non-horizontal lines depict the medians of the unidentified informational coef2 ficients of association between the counterfactuals. In the first causal diagram (for RH ≥ 0.50),

the median informational association between the potential outcomes for S and T are negligible, i.e., rbh2 (S0 , S1 ) = rbh2 (T0 , T1 ) = 0.06. This means that a patient’s outcome on BPRS/PANSS in the active control condition (S0 /T0 ) essentially conveys no information on the patient’s outcome on BPRS/PANSS in the experimental treatment condition (S1 /T1 ). Given that the treatments under study are similar and S0 , S1 (and also T0 , T1 ) are repeated measurements in the same patient, this independence is counter-intuitive. Further, the other median informational associations rbh2 (S0 , T1 ) = 0.06 and rbh2 (S1 , T0 ) = 0.08 are also low. Since the BPRS is a sub-scale of the more complex PANSS scale, one would also expect a certain level of association between these potential outcomes and independence is again counter-intuitive. 2 In the second causal diagram (for RH ≤ 0.50), the median informational associations between the potential outcomes are no longer close to zero. Indeed, the median rbh2 (S0 , S1 ) = rbh2 (T0 , T1 ) = rbh2 (S0 , T1 ) = 0.21 and rbh2 (S1 , T0 ) = 0.24. As was discussed in the previous paragraph, this pattern of association between the potential outcomes seems to be more compatible with our biological knowledge. Clearly, there will always be subjectivity involved in this type of qualitative analysis. Nonetheless, expert opinion may be useful to interpret the previous diagrams and evaluate their biological plausibility.

58

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

2.3.2.3

The surrogate predictive function (SPF)

2 The use of the SPF in conjunction with the RH can help to assess the appropriateness of a 2 putative surrogate. Indeed, while RH offers a general quantification of the surrogate predictive

capacity, SPF zooms in to offer a more detailed view on how ∆T and ∆S are related. The function SPF.BinBin() generates the SPF using the sensitivity-based approach. The function requires the user to specify a fitted object of class ICA.BinBin (which contains the π vectors that are needed to compute SPF, see Section 2.3.2.1). The following commands can be used to compute SPF and explore the results: > SPF summary(SPF) # Generated output: Function call: SPF.BinBin(x = ICA)

Total number of valid Pi vectors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 86

SPF Descriptives ~~~~~~~~~~~~~~~~ r_min1_min1

Mean: 0.7484;

Median: 0.82509;

Mode: 0.87114;

SD: 0.21717; Min: 0.058457; Max: 0.97168; 95% CI = [0.13888; 0.95012] r_0_min1

Mean: 0.2033;

Median: 0.13568;

Mode: 0.1034;

SD: 0.18458; Min: 0.00467; Max: 0.7679; 95% CI = [0.028757; 0.72956] r_1_min1

Mean: 0.048296;

Median: 0.023475;

Mode: 0.014946;

SD: 0.083193; Min: 0.0001159; Max: 0.60665; 95% CI = [0.00056563; 0.19158]

59

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

r_min1_0

Mean: 0.078138;

Median: 0.05819;

Mode: 0.042112;

SD: 0.056138; Min: 0.0059661; Max: 0.25208; 95% CI = [0.0092982; 0.20547] r_0_0

Mean: 0.85965;

Median: 0.89062;

Mode: 0.91531;

SD: 0.087427; Min: 0.54755; Max: 0.97051; 95% CI = [0.63161; 0.95874] r_1_0

Mean: 0.062216;

Median: 0.047302;

Mode: 0.039454;

SD: 0.048094; Min: 0.0037408; Max: 0.31073; 95% CI = [0.0078327; 0.16727] r_min1_1

Mean: 0.044326;

Median: 0.030975;

Mode: 0.022397;

SD: 0.045417; Min: 0.00068337; Max: 0.23243; 95% CI = [0.0012776; 0.17517] r_0_1

Mean: 0.13028;

Median: 0.11176;

Mode: 0.079046;

SD: 0.089613; Min: 0.0044017; Max: 0.42858; 95% CI = [0.020308; 0.35413] r_1_1

Mean: 0.8254;

Median: 0.84728;

Mode: 0.88384;

SD: 0.099755; Min: 0.45564; Max: 0.97397; 95% CI = [0.6032; 0.94632]

A histogram of the SPF can be obtained by applying the plot() function to the fitted object SPF: > plot(SPF) # Generated output:

60

60 40 20

Frequency

25 0

0.0

0.4

0.8

0

5

15

Frequency

25 15 0 5

Frequency

35

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

0.0

0.4

0.8

0.0

r(0, − 1)

0.4

0.8

r(1, − 1)

0.8

0.4

0.0

0.8

15

Frequency

20

10 15 20 25

0

Frequency

0 0.8

0.4

r(1, 0)

5

50 30

0.4

r(− 1, 1)

30

0.8

r(0, 0)

0 10

Frequency

0.0

20

Frequency

0 0.0

10

0.4

r(− 1, 0)

5

0.0

10

20

Frequency

0

5 10

25 15 0 5

Frequency

40

35

r(− 1, − 1)

0.0

0.4

r(0, 1)

0.8

0.0

0.4

0.8

r(1, 1)

The output of the summary() function shows descriptives like the mean, median, mode, and range [min; max] for each of the r(i, j) = P (∆T = i|∆S = j). The output of the plot() function shows histograms for each of the r(i, j). The bottom left figure in the plot suggests that the probability of a false positive result is rather small, with mean r(−1, 1) = 0.0443. Further, the level of uncertainty with respect to the true r(−1, 1) is also small, as is evidenced by the relatively narrow range of r(−1, 1) values [0.0007; 0.2324]. The top right figure suggests that the probability of a false negative result is small as well, with mean r(1, −1) = 0.0483, but now the range of r(1, −1) values is much wider [0.0001; 0.6067]. Thus, there is a substantial level of uncertainty with respect to the true r(1, −1). Further, the top left figure indicates that a negative individual causal treatment effect on the BPRS (i.e., ∆S = −1) typically leads to the prediction that the individual causal treatment effect on the PANSS is negative as well (i.e., ∆T = −1), with r(−1, −1) = 0.7484 but there is again also a large level of uncertainty with range [0.0585; 0.9717]. The results shown in the center figure offer some support for causal necessity (as defined by Frangakis and Rubin, 2002), with mean r(0, 0) = 0.8597 and a relatively wide range [0.5476; 0.9705].

61

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

Thus, a lack of an individual causal treatment effect on the BPRS (i.e., ∆S = 0) seems to give evidence of a lack of an individual causal treatment effect on PANSS (i.e., ∆T = 0). There is still some substantial uncertainty in this case, but it is smaller than the uncertainty in the prediction where the treatment has a negative impact on the BPRS. Similar results are obtained when the treatment has a beneficial effect on the BPRS (bottom row of the figure). Overall, r(i, i) is typically relatively high for all i (the main diagonal in the figure), with all means ≥ 0.7484, all medians ≥ 0.8251, and all modes ≥ 0.8711. However, as the relatively low 2 RH and the previous figure indicate, there is also a substantial level of uncertainty regarding the

true value of the probabilities r (i, j).

2.3.3 2.3.3.1

Impact of monotonicity assumptions on the results 2 The individual causal association RH

In the analysis above, it was assumed that monotonicity for S and T did not hold. To evaluate the impact of different monotonicity assumptions on the results, the Monotonicity=c("General") argument can be used in the ICA.BinBin.Grid.Sample() function call: > ICA.Mono summary(ICA.Mono) # Generated output: Function call: ICA.BinBin.Grid.Sample(pi1_1_ = 0.4215, pi1_0_ = 0.0538, pi_1_1 = 0.5088, pi_1_0 = 0.0307, pi0_1_ = 0.0538, pi_0_1 = 0.0482, Monotonicity = c("General"), M = 10000, Seed = 1) # Number of valid Pi vectors #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total: 8024

62

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

In the different montonicity scenarios: No

True

86

55

Surr SurrTrue 84

7799

# Summary of results obtained in different monotonicity scenarios #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # R2_H results summary ~~~~~~~~~~~~~~~~~~~~~ Mean: No

True

0.5280

0.2411

Surr SurrTrue 0.2695

0.1304

Median: No

True

0.54752

0.24385

Surr SurrTrue 0.26325

0.08583

Mode: No

True

Surr SurrTrue

0.5654 0.2585 0.2718

0.01309

SD: No

True

0.09635

0.13093

Surr SurrTrue 0.13832

0.13484

Min: No

True

Surr

SurrTrue

2.352e-01 3.396e-03 2.187e-02 3.086e-08 Max: No

True

0.6951

0.5599

Surr SurrTrue 0.6114

0.6322

# Theta_T results summary #~~~~~~~~~~~~~~~~~~~~~~~~

63

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

Mean: No

True

5.661

Inf

Surr SurrTrue 73.820

Inf

Median: No

True

1.205

Inf

Surr SurrTrue 54.582

Inf

SD: No

True

12.01

NaN

Surr SurrTrue 61.57

NaN

Min: No

True

0.0278

Inf

Surr SurrTrue 18.5317

Inf

Max: No

True

86.5

Inf

Surr SurrTrue 378.8

Inf

# Theta_S results summary #~~~~~~~~~~~~~~~~~~~~~~~~ Mean: No

True

6.523

64.972

Surr SurrTrue Inf

Inf

Median: No 1.219

True 56.829

Surr SurrTrue Inf Inf

No

True

Surr SurrTrue

16.12

44.08

SD: NaN

NaN

Min: No

64

True

Surr SurrTrue

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING 0.01921 17.23491

Inf

Inf

Max: No

True

117.6

232.8

Surr SurrTrue Inf

Inf

> plot(ICA.Mono) # Generated output

4 0

2

Density

6

8

No monotonicity Monotonicity S Monotonicity T Monotonicity S and T

0.0

0.2

0.4

0.6

R2H

The first part of the output shows the number of valid vectors π (vectors in ΓD ) that were obtained in the analysis. As it can be seen, the 40, 000 runs of the algorithm (i.e., 10, 000 runs for each monotonicity scenario) led to 8, 024 vectors π that were compatible with the data.

65

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

2 These valid vectors are subsequently used to compute RH and other metrics of interest. Here, 2 the focus is on RH . As can be seen, the means, medians, mode, SD, and min, max values for 2 RH that were obtained under the different monotonicity scenarios are provided in the output

of the summary() function. The No, True, Surr, and SurrTrue labels in the output depict the results that were obtained in the no monotonicity, monotonicity for T alone, monotonicity for S alone, and monotonicity for both S and T scenarios, respectively. 2 The largest estimates for the measures of central tendency of RH were obtained when no 2 b monotonicity was assumed, with R mean = 0.5280, median = 0.5475, mode = 0.5654 (SD H

= 0.0964, range [0.2352; 0.6951]). The lowest estimates were obtained when monotonicity was b2 mean = 0.1304, median = 0.0858, mode = 0.0131 (SD = assumed for both S and T , with R H

0.1348, range [0.0001; 0.6322]). When monotonicity was assumed for S alone and for T alone, the estimates of the measures of central tendency lied between the previous ones. 2 The plot shows the density functions for RH under the different monotonicity scenarios. As 2 can be seen, small values for RH are much more supported than large values when monotonicity

is assumed for both S and T (the blue line in the figure), whereas large values received more support than small values when monotonicity is not assumed (the black line in the figure). When monotonicity is assumed for only one endpoint, the frequency densities lie between the ones obtained in the previous two cases and, here again, smaller values are more supported than large ones. As the previous analyses clearly show, the results are quite sensitive to the unverifiable monotonicity assumptions. Causal diagrams may again be useful to evaluate the biological plausibility of these different scenarios (see Section 2.3.2.2). 2.3.3.2

The surrogate predictive function (SPF)

2 2 , i.e., RH In Section 2.3.3.1 it was observed that monotonicity had a substantial impact on RH

tended to be substantially higher in the no monotonicity scenario compared to the scenarios where monotonicity was assumed for S alone, for T alone, and for both S and T . In Section 2.3.2, the SPF results in the no monotonicity scenario were detailed. Here, the SPF results in the monotonicity for S scenario will be provided. The SPF results that were obtained under the monotonicity for T and monotonicity for both S and T scenarios are not detailed, because the results in the latter two scenarios are similar to those that were obtained under the monotonicity for S scenario. To obtain the SPF results under the assumption of monotonicity for S, the following commands can be used: > ICA.MonoS SPF.MonoS summary(SPF.MonoS)

# Generated output Function call: SPF.BinBin(x = ICA.MonoS)

Total number of valid Pi vectors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 84

SPF Descriptives ~~~~~~~~~~~~~~~~ r_min1_0

Mean: 0.028009;

Median: 0.024862;

Mode: 0.017337;

SD: 0.016733; Min: 0.0028209; Max: 0.06698; 95% CI = [0.005466; 0.065813] r_0_0

Mean: 0.90912;

Median: 0.90736;

Mode: 0.9045;

SD: 0.024644; Min: 0.84563; Max: 0.9715; 95% CI = [0.85094; 0.94962] r_1_0

Mean: 0.062867;

Median: 0.061211;

Mode: 0.05893;

SD: 0.015146; Min: 0.020086; Max: 0.089875; 95% CI = [0.03232; 0.087476] r_min1_1

Mean: 0.12134;

Median: 0.11724;

Mode: 0.051222;

SD: 0.090701; Min: 0.00045333; Max: 0.44463; 95% CI = [0.0031818; 0.32532] r_0_1

Mean: 0.36332;

Median: 0.34078;

Mode: 0.32548;

SD: 0.17894; Min: 0.051993; Max: 0.76668; 95% CI = [0.073257; 0.72383]

67

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

r_1_1

Mean: 0.51534;

Median: 0.50723;

Mode: 0.47828;

SD: 0.18378; Min: 0.1018; Max: 0.92469; 95% CI = [0.18044; 0.88129] > plot(SPF.MonoS)

15 10

Frequency

15

Frequency

10

10 0.0

0.4

0.8

0

0

0

5

5

5

Frequency

20

15

25

20

30

# Generated output

0.0

0.4

0.8

0.0

r(0, 0)

0.4

0.8

r(1, 0)

15 10

Frequency

5

10

Frequency 0.0

0.4

r(− 1, 1)

0.8

0

0

0

5

5

10

Frequency

15

15

20

r(− 1, 0)

0.0

0.4

r(0, 1)

0.8

0.0

0.4

0.8

r(1, 1)

Notice that the previous output does not show estimates for r(i, j = −1), as the probabilities of these events are 0 when monotonicity for S is assumed. As can be seen in the top center figure of the plot, the mean rb(0, 0) = 0.9091 (range [0.8456; 0.9715]) and, therefore, causal necessity is largely supported in this scenario. However, when there is a positive individual causal treatment effect on S (i.e., ∆S = 1), there is substantial uncertainty with respect to the individual causal treatment effect on T (i.e., ∆T ; see bottom

68

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING

figures in the plot). Indeed, the mean rb(1, 1) = 0.5153 (range [0.1018; 0.9247]) whilst the mean rb(−1, 1) = 0.1213 (range [0.0005; 0.4446]) and mean rb(0, 1) = 0.3633 (range [0.0520; 0.7667]). The (mean) probability of correctly predicting ∆T when ∆S = 1 is thus roughly equal to the (mean) probability of making an erroneous prediction, i.e., rb (1, 1) ≈ rb (−1, 1) + rb (0, 1) ≈ 0.50. Overall, the results in the monotonicity for S scenario indicate that the amount of information 2 that ∆S conveys with respect to ∆T is quite low (relatively low RH ). The SPF results indicate

that a lack of treatment effect on S strongly suggest a lack of treatment effect on T , but a positive impact of the treatment on S does not provide strong evidence that there will also be a positive impact on T .

2.4

BPRS as a surrogate for CGI

The following R code can be used to analyse whether S = clinically relevant change on the BPRS is a good surrogate for T = clinically relevant change on the PANSS (output not shown): # Obtain marginal probs for S=BPRS, T=CGI > MarginalProbs(Dataset = Schizo_Bin, Surr = BPRS_Bin, True = CGI_Bin, Treat = Treat) # ICA, assume no monotonicity > ICA summary(ICA) > plot(ICA) # SPF > SPF summary(SPF) > plot(SPF)

# Explore impact of monotonicity assumptions

69

CHAPTER 2. EVALUATING SURROGACY IN THE BINARY-BINARY SETTING # ICA > ICA_mono plot(ICA_mono) > summary(ICA_mono) # SPF > SPF_mono summary(SPF_mono) > plot(SPF)

70

Chapter 3

Evaluating predictors of treatment success: case study analysis details 3.1

Introduction

In personalized medicine one wants to know, for a given patient and his or her outcomes on predictor variables, how likely it is that a given treatment will be more beneficial than an alternative one. The R package EffectTreat allows for quantifying the predictive causal association (PCA), i.e., the association between the vector of pretreatment predictors and the individual causal effect of the treatment. Here, it is detailed how the case study analysis results detailed in Chapter 5 of Van der Elst (2016) can be replicated using the EffectTreat package.

3.2

The dataset: a clinical trial in opiate/heroin addiction

The data come from a randomized clinical trial in which the clinical utility of buprenorphine/naloxone (experimental treatment) was compared to clonidine (control treatment) for a short-term (13day) opiate/heroin detoxification treatment. Before and after the treatment, patients were assessed for relapse, withdrawal symptoms, and treatment satisfaction (for details, see Section 2.1.2 in Van der Elst, 2016). The vector of potential pretreatment predictors (S) contains craving at screening (S1 ), the Clinical Opiate Withdrawal Scale (COWS) score at screening (S2 ), and heroin use in the 30 days prior to screening (S3 ). Craving at screening was measured by means of a visual analogue scale (score range [0; 100]). The COWS is an 11-item interviewer-administered questionnaire designed to provide a description of the signs and symptoms of opiate withdrawal like, e.g., sweating, runny nose, etc; (score range [52; 200]). Higher craving and higher COWS scores are indicative for more severe heroin addiction.

71

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS Table 3.1: Correlations between S1 = craving at screening, S2 = COWS at screening, S3 = heroin use at screening and T = heroin use after treatment in the control (T0 ) and experimental treatment (T1 ) groups, and significance of the difference of the correlations. T0

T1

Difference r(Sx , T0 ) and r(Sx , T1 )

S1

−0.029 (p = 0.768)

−0.087 (p = 0.190)

p = 0.624

S2

−0.334 (p = 0.001)

−0.326 (p = 0.001)

p = 0.936

S3

0.148 (p = 0.130)

0.185 (p = 0.005)

p = 0.749

The number of days that heroin was used in the 30 days prior to the second follow-up (the second follow-up took place 3 months after the start of the treatment) was used as the true endpoint T . The data are not included in the EffectTreat library as they are not in the public domain. Nonetheless, the data can be downloaded (after registration) from the National Institute on Drug Abuse website (https://www.drugabuse.gov; here, we analyze the combined data of studies NIDA-CTN-0001 and NIDA-CTN-0002). Data descriptives Data were available for 335 patients, of whom 106 received the active control clonidine and 229 received the experimental treatment buprenorphine/naloxone. Study drop-out was substantial: T was observed for 104 patients and missing for 231 patients. The missing values for T were multiply imputed using M = 1, 000 imputations. The imputation model included treatment, the three pretreatment variables S1 –S3 , and T . The analysis below are based on the multiply imputed data. Table 3.1 shows the correlations between the different components of S and T in the active control (T0 ) and experimental treatment (T1 ) conditions. The correlations between S1 and T0 /T1 close to zero and not significant. The correlations between S2 and T0 /T1 were significantly negative, indicating that patients who had higher COWS scores tended to use less heroin in the 30-day interval after the treatment in both treatment conditions. On the other hand, the correlations between S3 and T0 /T1 were positive, which indicates that patients who used more heroin in the 30-day interval prior to screening also tended to use more heroin in the 30-day interval after the treatment. Notice that the correlation between S3 and T was not significant in the active control treatment group whilst it was significant in the experimental treatment group – albeit the difference between both correlation coefficients was not significant.

72

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

3.3

The predictive causal association (PCA; Rψ2 )

After installation of the EffectTreat package in R (using the install.packages(“EffectTreat”) command), the package can be loaded in memory using the command: > library(EffectTreat) The function Multivar.PCA.ContCont() implements the sensitivity-based approach to estimate 2 Rψ across a set of plausible values for the unidentified correlation ρT 0T 1 (for details, see Chapter

5 of Van der Elst, 2016). This function requires the user to specify the following main arguments: • Sigma_TT=: the variance-covariance matrix of the true endpoints: ! σT 0T 0 σT 0T 1 ΣT T = . σT 0T 1 σT 1T 1 • Sigma_TS=: the matrix that contains the covariances σT 0Sr , σT 1Sr . For example, when there are 2 pretreatment predictors: ΣT S =

σT 0S1

σT 0S2

σT 1S1

σT 1S2

! .

• Sigma_SS=: the variance-covariance matrix of the pretreatment predictors. For example, when there are 2 pretreatment predictors: ΣSS =

σS1S1

σS1S2

σS1S2

σS2S2

! .

• T0T1=: a vector (or scalar) that specifies the correlation(s) between the (unidentifiable) counterfactuals T0 and T1 (ρT 0T 1 ). Default seq(-1, 1, by=.01), i.e., the values −1, −0.99, −0.98, ..., 1.

In the opiate/heroin data set, the estimated relevant covariance matrices are: ! 82.274 σT 0T 1 ΣT T = , σT 0T 1 96.386 ! −7.266 −61.212 13.281 ΣT S = , −25.196 −65.375 17.563 and



882.352

49.234

 ΣSS =   49.234 6.420

411.964 −26.205

6.420



 −26.205  . 95.400

The following command is used to conduct the analysis:

73

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS # First define the covariance matrices to be used in the # Multivar.PCA.ContCont() function call > Sigma_TT = matrix(c(82.274, NA, NA, 96.386), byrow=TRUE, nrow=2) > Sigma_TS = matrix(data = c(-7.266, -61.212, 13.281, -25.196, -65.375, 17.563), byrow = T, nrow = 2) > Sigma_SS = matrix(data=c(882.352, 49.234, 6.420, 49.234, 411.964, -26.205, 6.420, -26.205, 95.400), byrow = T, nrow = 3)

# Compute PCA > Results summary(Results) # Generated output: Function call: Multivar.PCA.ContCont(Sigma_TT = Sigma_TT, Sigma_TS = Sigma_TS, Sigma_SS = Sigma_SS)

# Total number of matrices that can be formed by the specified vectors # and/or scalars of the correlations in the function call #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 201 # Total number of positive definite matrices #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 174

74

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

# Predictive causal association (PCA; R^2_{psi}) results summary #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean (SD) PCA: 0.0099 (0.0240) Mode PCA: 0.0029

[min: 0.0019; max: 0.2470]

Quantiles of the PCA distribution: 5%

10%

20%

50%

80%

0.001958460 0.002066354 0.002322228 0.003694800 0.009035708 90%

95%

0.017440915 0.032640483 The output shows that out of the 201 matrices that can be formed (based on the specified ΣT T , ΣT S , ΣSS and the vector of plausible values for ρT 0T 1 ), 174 positive-definite matrices 2 were retained. The subsequent section in the output shows that the mean Rψ = 0.0099, mode 2 2 2 2 Rψ = 0.0029, and median Rψ = 0.0037. Further, 95% of the Rψ values were ≤ 0.0326 and Rψ was

at most 0.2470. These results show that in most ‘realities’ that are compatible with the observed 2 data, Rψ is low. It can thus be concluded that the accuracy by which a patient’s individual

causal treatment effect on T can be predicted based on S is poor. This result exemplifies what was stated in Lemma 4 in Chapter 5 of Van der Elst (2016), i.e., that the presence of correlation (see Table 3.1 on page 72) does not guarantee the predictive validity of S with respect to ∆T . 2 The plot() function allows to further explore the behavior of estimates of Rψ . For example,

plots of the relative frequencies (percentages) and cumulative frequencies can be requested by using the Type=”Percent” and Type=”CumPerc” arguments in the plot() call: > plot(Results, Type="Percent") #histogram with percentages # Generated output:

75

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

0.4 0.0

0.2

Percentage

0.6

0.8

PCA

0.00

0.05

0.10

0.15

0.20

0.25

R2ψ

> plot(Results, Type="CumPerc") #cumulative percentages

0.6 0.4 0.2

Cumulative percentage

0.8

1.0

PCA

0.00

0.05

0.10

0.15

0.20

0.25

R2ψ

2 These plots confirm the earlier claim that Rψ is low in the large majority of realities that are

compatible with the observed data.

3.3.1

The relation between ρT 0T 1 and the predictive causal association (PCA; Rψ2 )

A plot that is useful to examine the impact of the assumptions regarding the unidentified cor2 relations ρT 0T 1 on Rψ can be obtained using the Effect.T0T1=TRUE argument in the plot()

call:

76

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS > plot(Results, EffectT0T1=TRUE)

0.00

0.05

0.10

PCA

0.15

0.20

0.25

# Generated output:

−0.5

0.0

0.5

1.0

ρT0T1

2 As can be seen, Rψ increases as a function of ρT 0T 1 . Nonetheless, even when ρT 0T 1 is close to 1 2 2 the Rψ is only about 0.25. A table that contains the ρT 0T 1 and their corresponding Rψ that lead

to valid results (positive definite Σ matrices) can be obtained using the following commands: > TableResults colnames(TableResults) head(TableResults) # Show lowest PCA values # Generated output: T0T1

PCA

[1,] -0.74 0.001861259 [2,] -0.73 0.001871999 [3,] -0.72 0.001882863 [4,] -0.71 0.001893853 [5,] -0.70 0.001904973 [6,] -0.69 0.001916224

> tail(TableResults) # Show highest PCA values

77

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS # Generated output: T0T1

PCA

[169,] 0.94 0.05138932 [170,] 0.95 0.06106092 [171,] 0.96 0.07521695 [172,] 0.97 0.09791763 [173,] 0.98 0.14024349 [174,] 0.99 0.24702061

> TableResults[TableResults[,1]==0] # Show PCA for rho_T0T1=0 # Generated output: [1] 0.000000000 0.003234288 2 The output shows that e.g., the highest Rψ = 0.2470 (obtained when ρT 0T 1 = 0.99) is about 2 130 times higher compared to the lowest Rψ = 0.0019 (obtained when ρT 0T 1 = −0.74). The 2 unverifiable assumptions regarding ρT 0T 1 thus have a profound impact on Rψ . As another 2 example, the output shows that Rψ = 0.0032 when ρT 0T 1 = 0 (independence).

Given that ρT 0T 1 is not identifiable, based on the data alone one cannot distinguish between these scenarios. Nonetheless, in some applications subject-specific information may be available. This subject-specific information can be easily incorporated into the analysis. For instance, let us assume that, based on expert opinion, ρT 0T 1 < 0.2 is considered to be biologically implausible. 2 A plot of the Rψ values that are obtained under ‘biologically plausible’ conditions (ρT 0T 1 ≥ 0.2) can be obtained using the commands:

# Make matrix with subgroup of more biologically plausible results > BiolPlaus =0.20,] # Plot PCA based on biologically more plausible assumptions > hist(BiolPlaus[,2], col="grey", xlab="PCA", main="Biologically more plausible results") # Generated output:

78

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

30 0

10

20

Frequency

40

50

60

Biologically more plausible results

0.00

0.05

0.10

0.15

0.20

0.25

PCA

3.3.2

Predicting ∆T based on a vector S in an individual patient j

In practice, one is interested in the prediction of a patient’s individual causal effect of the treatment (∆T ) given the patient’s observed S. The function Predict.Treat.Multivar.ContCont() is useful in this context. It requires the user to specify the following arguments: • Sigma_TT=, Sigma_TS=, Sigma_SS=: the estimated variance-covariance matrices that are also used as arguments of the Multivar.PCA.ContCont() function (see above). • Beta=: the treatment effect on T in the validation sample. Under SUTVA, β = E(T1 − T0 ) can be estimated as β = E(T | Z = 1) − E(T | Z = 0), i.e., the difference between the observed means of T in the experimental and control treatment groups, respectively. • S=: the vector of observed S of the patient. • mu_S=: the vector of the mean S in the validation sample. • T0T1=: a vector (or scalar) that specifies the correlation(s) between the (unidentifiable) counterfactuals T0 and T1 (ρT 0T 1 ). Default seq(-1, 1, by=.01), i.e., the values −1, −0.99, −0.98, ..., 1. In the heroin/opiate detoxification dataset, βb = −0.7935. The negative βb indicates that the average number of days that heroin is used post-treatment was slightly smaller in the experimental treatment group (b µE = 11.6303) than in the control treatment group (b µC = 12.4238) – albeit the difference was not significant, p = 0.871. Further, the mean S1 –S3 equalled 66.8149, 84.8393, and 25.1939, respectively. Suppose that we have two patients with an average COWS score and heroin use at screening, 0

of which one patient has strong craving behaviour with S = (95, 85, 25) and the other one has

79

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS 0

low craving behaviour with S = (5, 85, 25) . The following code can be used to predict ∆T for these patients: > Beta mu_S = matrix(c(66.8149, 84.8393, 25.1939), nrow=3) # Specify the vector of S values for patient with strong craving > S_strong = matrix(c(95, 85, 25), nrow=3) # Specify the vector of S values for patient with low craving > S_low = matrix(c(5, 85, 25), nrow=3) # Predict Delta_T based on S # For patient with strong craving > Pred_S_strong Pred_S_low summary(Pred_S_strong) # Generated output: Function call: Predict.Treat.Multivar.ContCont(Sigma_TT = Sigma_TT, Sigma_TS = Sigma_TS, Sigma_SS = Sigma_SS, Beta = Beta, S = S_strong, mu_S = mu_S, T0T1 = seq(-1, 1, by = 0.01))

80

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

# Predicted (Mean) Delta_T_j | S_j #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -1.377375

# Variances and 95% support intervals of Delta_T_j | S_j for # different values of rho_T0T1 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ rho_T0T1

Var Delta_T_j | S_j

95% supp. int. around Delta_T_j | S_j

(min. value)

-0.740

309.877

[-35.87928; 33.12453]

(max. value)

0.990

1.761

[-3.978588; 1.223839]

(median value)

0.125

155.819

[-25.84315; 23.0884]

(mean value)

0.125

155.819

[-25.84315; 23.0884]

# Proportion of 95% support intervals for Delta_T_j | S_j that include 0, are < 0, and are > 0 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0 included in support interval: 1

(obtained for rho_T0T1 values in range [-0.74; 0.99])

Entire support interval below 0:

0

Entire support interval above 0:

0

# Results for patient with low craving > summary(Pred_S_low) # Generated output: Function call:

81

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS Predict.Treat.Multivar.ContCont(Sigma_TT = Sigma_TT, Sigma_TS = Sigma_TS, Sigma_SS = Sigma_SS, Beta = Beta, S = S_low, mu_S = mu_S, T0T1 = seq(-1, 1, by = 0.01))

# Predicted (Mean) Delta_T_j | S_j #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0.4567497

# Variances and 95% support intervals of Delta_T_j | S_j for # different values of rho_T0T1 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ rho_T0T1

Var Delta_T_j

95% supp. int.

| S_j

around Delta_T_j | S_j

(min. value)

-0.740

309.877

[-34.04516; 34.95866]

(max. value)

0.990

1.761

[-2.144464; 3.057963]

(median value)

0.125

155.819

[-24.00902; 24.92252]

(mean value)

0.125

155.819

[-24.00902; 24.92252]

# Proportion of 95% support intervals for Delta_T_j | S_j that include 0, are < 0, and are > 0 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0 included in support interval: 1

(obtained for rho_T0T1 values in range [-0.74; 0.99])

Entire support interval below 0: Entire support interval above 0:

0 0

As can be seen in the output, the expected ∆T |S for the patients with strong and low craving behaviour equal −1.3774 and 0.4567, respectively. Thus, a patient with strong craving behaviour is expected to have about 1.5 more heroin-free days in the post-treatment interval with the experimental treatment than with the control treatment, whereas the patient with low craving behaviour is expected to have about 0.5 more heroin-free days in the post-treatment interval with the control treatment compared to with the experimental treatment.

82

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

On average, the experimental treatment thus appears to be more effective for patients with strong craving behaviour whereas the control treatment appears to be more effective for patients with low craving behaviour. In spite of the previous average predictions, the 95% support interval around ∆T |S for a patient with low craving is [−24.0090; 24.9225] when it is assumed that ρT 0T 1 = 0.125 (the mean value of ρT 0T 1 ). Thus, the new treatment may still be a better option for some patients with low craving behaviour. Notice further that the 95% support interval for ∆T |S narrows when ρT 0T 1 increases. Nonetheless, even when it is assumed that there is a nearly perfect correlation between T1 and T0 , there remains a substantial amount of uncertainty in the prediction of the individual causal treatment effect. For instance, assuming ρT 0T 1 = 0.990, the 95% support interval around ∆T |S for the patient with low craving behavior is [−2.1445; 3.0580]. This level of uncertainty is to be 2 expected given the negligible value of Rψ .

The final part of the output provides an overview of the proportion of support intervals for ∆T |S that include 0 (inconclusive prediction), that lay entirely below 0 (the experimental treatment is more beneficial to the patient), and that lay entirely above 0 (the control treatment is more beneficial to the patient). As can be seen, the 95% support intervals of ∆T |S included 0 in all cases for the patients with strong and low craving behavior. The results can also be graphically depicted by applying the plot() function to the fitted objects Pred_S_strong and Pred_S_low. By default, the distribution of ∆T |S is shown for the median ρT 0T 1 value, i.e., for ρT 0T 1 = 0.125. The following command can be used to obtain the plot for the patient with strong craving behaviour: > plot(Pred_S_strong) # Plot for patient with strong craving behaviour # Generated output:

83

0.06

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

0.00

0.01

0.02

0.03

0.04

0.05

ρT0T1= 0.125

−40

−20

0

20

40

∆Tj|Sj

The vertical black dashed line is the expected ∆T |S value, and the dashed green lines depict the 95% support interval. In line with the earlier results, the 95% support interval for ∆T |S of the patient with strong craving behaviour assuming ρT 0T 1 = 0.125 contains 0 and thus no significant differences between both treatments on T are expected for this patient. It is also possible to request the 95% support interval for a particular value of ρT 0T 1 by using the Specific.T0T1= argument in the plot() call. For example, the 95% support interval around ∆T |S assuming ρT 0T 1 = 0.7 for the patient with strong craving behaviour can be requested using the following command: > plot(Pred_S_strong, Specific.T0T1=.7) # Plot for patient with strong # craving, rho_T0T1=.7 # Generated output:

84

0.06

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

0.00

0.01

0.02

0.03

0.04

0.05

ρT0T1= 0.125 ρT0T1= 0.7

−40

−20

0

20

40

∆Tj|Sj

As expected, the width of the 95% support interval decreases when ρT 0T 1 increases. A plot that shows the relation between ρT 0T 1 and the width of the 95% support interval for the patient with strong craving behaviour can be obtained with the following command: > plot(x=Pred_S_strong$T0T1, y=(sqrt(Pred_S_strong$Var_Delta.T_S)*1.96)*2, type="l", xlab=expression(rho[T0T1]), ylab=expression(paste("Width 95% CI of ", Delta, T[j], "|", S[j])))

40 30 10

20

Width 95% CI of ∆Tj|Sj

50

60

70

# Generated output:

−0.5

0.0

0.5

1.0

ρT0T1

For example, this plot shows that if we want to be 95% confident that the true ∆T | S deviates at most 10 days (in positive or negative direction) from the expected ∆T | S (width of the 95%

85

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

support interval below 20), the assumption that ρT 0T 1 ≥ 0.96 should be reasonable. A user-friendly scoring sheet

The Predict.Treat.Multivar.ContCont() function allows

for the computation of the expected ∆T | S and 95% support interval, as illustrated above. However, not all practitioners may be familiar with R (e.g., a medical doctor who is deciding which treatment is best for his or her patient). To increase the user-friendliness and the use of the methodology in practice, it may be useful to make an Excel sheet (a well-known and widely available software tool) available that conducts all the required computations automatically. By means of illustration, consider the screen shot shown in Figure 3.12 . Here, the user simply fills-in the observed values for the pretreatment variables S of the patient and the assumed correlation ρT 0T 1 in the cells marked in blue. For example, suppose that S1 = 5, S2 = 85, and S3 = 25 (the patient with low craving behaviour in the example above) and it is assumed that ρT 0T 1 = 0.125. After filling in these values in the Excel sheet, the output shows the expected ∆T | S = 0.4567 and 95% support interval [−24.0090; 24.9225]. Further, towards the end of the sheet, the conclusion of the analysis is explicitly stated, i.e., the expected ∆T for this patient is positive and thus the control treatment is expected to be more beneficial than the experimental treatment though the difference between the treatments is not significant (the 95% support interval contains zero).

3.4

Regression-based approach

The methodology used in Section 3.3 of this online Appendix focussed on individual causal treatment effects. Here, the focus will be on average causal treatment effects (for details, see Gelman and Hill, 2006). By virtue of the randomisation procedure that was used in the opiate/heroin addiction study and by assuming SUTVA (i.e., the treatment assignment for one individual does not affect the outcome of other patients in the study), the average causal treatment effect can be estimated by fitting the following multiple linear regression model: T = β0 + β1 Z +

p X k=1

αk Sk +

p X

γk S k Z + ε

(3.1)

k=1

where T is the true endpoint (number of days that a patient uses heroin post-treatment), Z is the treatment indicator, and Sk are the pretreatment indicators (S1 = craving at screening, S2 = COWS at screening, and S3 = heroin use at screening). Table 3.2 shows the results when model (3.1) is fitted to the data. As can be seen, the estimated average causal treatment effect is ECE (S) = −1.4429 − 0.006 S1 − 0.002 S2 + 0.045 S3 . A modified likelihood-ratio test based on the multiply-imputed datasets (Meng and Rubin, 1992) 2 The

86

Excel sheet can be downloaded at https://dl.dropboxusercontent.com/u/8416806/PredictDaysHeroin.xlsx.

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

Figure 3.1: Excel sheet for user-friendly prediction of ∆T | S and its 95% support interval.

87

CHAPTER 3. EVALUATING PREDICTORS OF TREATMENT SUCCESS

was conducted to evaluate whether the S by treatment interaction was significant. This was not the case (Dm = 0.1833, p = 0.908), indicating that S is a poor predictor of the average causal treatment effect. Table 3.2: Parameter estimates for regression model (3.1). β

s.e.

p

Intercept

22.519

6.385

0.001

Z

−1.4429

7.134

0.840

S1 : craving

−0.008

0.040

0.838

S2 : COWS

−0.143

0.060

0.015

S3 : heroin

0.010

0.113

0.378

Z by S1 interaction

−0.006

0.046

0.899

Z by S2 interaction

−0.002

0.063

0.970

Z by S3 interaction

0.045

0.131

0.729

2 2 As was discussed in Section 5.6 of Van der Elst (2016), Rψmax and Rψmin can be computed based 0

on γ (with γ = (γ1 , γ2 , · · · , γk )) and the variance components ΣSS , σT 0T 0 and σT 1T 1 . The 2 2 and Rψmin . This function function Min.Max.Multivar.PCA() can be used to compute Rψmax

requires the user to specify the following arguments: • gamma: the vector of the S by treatment interaction coefficients (see model (3.1)). • Sigma_SS=: the variance-covariance matrix of the pretreatment predictors. For example, ! σS1S1 σS1S2 when there are 2 pretreatment predictors, ΣSS = . σS1S2 σS2S2 • Sigma_T0T0, Sigma_T1T1=: the variances of T in the control and treatment conditions, respectively. Applied to the heroin case study, the following commands can be used to obtain the results: # Specify vector of S by treatment interaction coefficients > gamma Sigma_SS = matrix(data=c(882.352, 49.234, 6.420, 49.234, 411.964, -26.205, 6.420, -26.205, 95.400), byrow = T, nrow = 3) Sigma_T0T0 library(CorrMixed) # load the package > data(Example.Data) # load the data > Example.Data[1:5,] # have a look at the first five data rows # Generated output: Id Cycle Condition Time Outcome 1

1

1

1

1

117.54

2

1

1

2

2

113.74

3

1

1

3

3

108.16

4

1

1

4

4

91.96

5

1

1

5

5

67.60

Exploratory data analysis A plot of the individual profiles for all animals and the mean evolution over time can be obtained using the Spaghetti.Plot() function of the CorrMixed package. This function requires the following arguments:

92

CHAPTER 4. ESTIMATING RELIABILITY

• Dataset= : the name of the dataset. • Outcome=, Id=, Time= : the names of the outcome, subject indicator (Id) and time variables. By default, the individual profiles and the overall mean are provided. The following command can be used to obtain a spaghetti plot for the case study data: > Spaghetti.Plot(Dataset=Example.Data, Outcome="Outcome", Id="Id", Time="Time")

Outcome

0

50

100

150

200

250

# Generated output:

0

10

20

30

40

Time

Other options are possible, for example, a plot that shows the median (rather than the mean) and no individual profiles can be requested using the command: > Spaghetti.Plot(Dataset=Example.Data, Outcome="Outcome", Id="Id", Time="Time", Add.Profiles=FALSE, Add.Mean=FALSE, Add.Median=TRUE) # Generated output:

93

Outcome

0

50

100

150

200

250

CHAPTER 4. ESTIMATING RELIABILITY

0

10

20

30

40

Time

4.3

Estimating reliability using linear mixed-effects models

4.3.1

The mean structure of the model

As was also the case with the ZSV outcome (see Chapter 7 in Van der Elst, 2016), the plots indicate that the relation between time and the outcome is quite complex and cannot be modelled in a straightforward way by using e.g., linear or quadratic polynomials. Therefore, fractional polynomials will be considered. The function Fract.Poly() of the CorrMixed package is useful in this respect. It requires the following arguments: • Covariate=, Outcome=, Dataset= : the names of the covariate, outcome, and dataset. • S=: the set that specifies the powers that will be considered in the different models. By default, S = {−2, −1, −0.5, 0, 0.4, 1, 2, 3}. • Max.M=: the maximum order to be considered for the fractional polynomials. Here, we request an analysis using the standard set S and m = 3, i.e., fractional polynomials of order 1, 2, and 3 will be considered: > FP summary(FP) # Generated output:

94

CHAPTER 4. ESTIMATING RELIABILITY Best fitting model for m=1: power1

AIC

-0.5 2715.0 Best fitting model for m=2: power1 power2 -2 -2

AIC 2714

Best fitting model for m=3: power1 power2 power3 3

3

2

AIC 2702

As can be seen, the model of order m = 3 is preferred based on the AIC (a lower AIC is indicative for a better model fit). This model includes the powers p1 = 3, p2 = 3, and p3 = 2. Recall that β1 tp1 + β2 tp1 log(t) when p1 = p2 . Thus, the mean relation between time and the outcome will be modelled as β1 t2 + β2 t3 + β3 t3 log(t) (with t = time) when the LMMs are fitted to estimate reliability. In addition, cycle and condition are added as covariates in all models (coded as dummies). To examine the fit of the model graphically, the following commands can be used: # Code the predictors for fractional poly (powers: 3, 3, 2) term1 Model3$R[1:5,1:5] # Estimated reliabilities t_1 - t_5 # Generated output: [,1]

[,2]

[,3]

[,4]

[,5]

[1,] 0.9278 0.9044 0.8471 0.7795 0.7222 [2,] 0.9044 0.9243 0.8998 0.8398 0.7688 [3,] 0.8471 0.8998 0.9207 0.8950 0.8321 [4,] 0.7795 0.8398 0.8950 0.9169 0.8900 [5,] 0.7222 0.7688 0.8321 0.8900 0.9129

> Model3$CI.Upper[1:5,1:5] # Upper bound CIs # Generated output: [,1]

[,2]

[,3]

[,4]

[,5]

[1,] 0.9557 0.9419 0.9122 0.8775 0.8417 [2,] 0.9419 0.9538 0.9383 0.9069 0.8701 [3,] 0.9122 0.9383 0.9525 0.9344 0.9013 [4,] 0.8775 0.9069 0.9344 0.9518 0.9313 [5,] 0.8417 0.8701 0.9013 0.9313 0.9511

> Model3$CI.Lower[1:5,1:5] # Lower bound CIs # Generated output: [,1]

[,2]

[,3]

[,4]

[,5]

[1,] 0.8257 0.7618 0.5832 0.3736 0.2003 [2,] 0.7618 0.8327 0.7618 0.5832 0.3736 [3,] 0.5832 0.7618 0.8327 0.7634 0.5832

104

CHAPTER 4. ESTIMATING RELIABILITY [4,] 0.3736 0.5832 0.7634 0.8327 0.7754 [5,] 0.2003 0.3736 0.5832 0.7754 0.8293 The results can be graphically shown by applying the plot() function to the fitted object Model3: > plot(Model3) # Generated output:

0.0 −1.0

−0.5

Reliability

0.5

1.0

Model 3

0

10

20

30

40

Measurement moment

b (tk , tl ) are high when the time lag u is small and tends to flatten out for As can be seen, the R longer time lags. Further, depending on the particular pair of measurement moments (tk , tl ) b (tk , tl ) as a function of time lag differs. that is considered, the slope and amount of decline in R b (t1 , tk ), it can be seen that the estimated reliabilities decline For example, when considering R particularly strong for the first few subsequent measurements (say, until about t5 ) and continue b (t30 , tk ) there is only a substantial decline in the to decline for all tk afterwards. Instead, for R estimated reliabilities for the first few subsequent measurements (say, until about t35 ) after which the estimated reliabilities remain essentially constant. b (tk , tl ). Such plots can To avoid cluttered figures, the plot does not show CIs around the R be easily obtained based on the fitted components in the Model3 object. For example, when b (t1 , tk ) and R b (t20 , tk ), the following commands interest is in obtaining a plot with the CIs for R can be used: > pred up low plot(pred, col=0, ylim=c(-1, 1), xlab="Time", ylab="Reliability", main=expression(paste("Estimated R(", t[1], ", ", t[k], ")", sep=""))) > lines(pred) # add predicted R(time_1, time_j) > lines(up, lty=2) # add upper bound CI > lines(low, lty=2) # add lower bound CI # Generated output:

0.0 −1.0

−0.5

Reliability

0.5

1.0

Estimated R(t1, tk)

0

10

20

30

40

Time

> pred2 up2 low2 plot(pred, col=0, ylim=c(-1, 1), xlab="Time", ylab="Reliability", main=expression(paste("Estimated R(", t[20], ", ", t[k], ")", sep=""))) > lines(pred2, x=c(20:47)) # add predicted R(time_20, time_j) > lines(up2, x=c(20:47), lty=2) # add upper bound CI > lines(low2, x=c(20:47), lty=2) # add lower bound CI # Generated output:

106

CHAPTER 4. ESTIMATING RELIABILITY

0.0 −1.0

−0.5

Reliability

0.5

1.0

Estimated R(t20, tk)

0

10

20

30

40

Time

4.3.2.4

Selecting the most appropriate model

The function Model.Fit() can be used to compare Models 1–3 using a likelihood ratio test (using a mixture of χ2 distributions to compute the correct p-value when necessarily, for details see Verbeke & Molenberghs, 2000): # compare models 1 and 2 > Model.Fit(Model.1 = Model1, Model.2 = Model2) # Generated output: Compare Models: ================= G2: 253.3003 p-value: 9.920398e-56

# compare models 2 and 3 > Model.Fit(Model.1 = Model3, Model.2 = Model3) # Generated output: Compare Models: ================= G2: 14.76563

107

CHAPTER 4. ESTIMATING RELIABILITY p-value: 0.0003717904 As shown, Model 2 fits the data significantly better than Model 1, and Model 3 fits the data significantly better than Model 2. Model 3 would thus be preferred if we would solely rely on statistical arguments, but considerations of practical usefulness should also be taken into account. For example, in the paper that accompanies this Web Appendix Model 3 also had the best fit with the data but Model 2 was nonetheless preferred because it provides a more parsimonious result.

108

Bibliography Buyse, M., and Molenberghs, G. (1998). The validation of surrogate endpoints in randomized experiments. Biometrics, 54, 1014–1029. Burzykowski, T., Molenberghs, G., and Buyse, M. (2005). The Evaluation of Surrogate Endpoints. New York: Springer-Verlag. Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58, 21–29. Gelman, A., and Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press. Guy, W. (1976). ECDEU Assessment Manual for Psychopharmacology - Revised. Rockville, MD: U.S. Department of Health, Education, and Welfare. Kane, J., Honigfeld, G., Singer, J., and Meltzer, H. (1988). Clozapine for the treatment-resistant schizophrenic. A double-blind comparison with chlorpromazine. Archives of General Psychiatry, 45, 789–796. Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W. (2005). Applied linear statistical models (5th ed.). New York: McGraw Hill. Leucht, S., Kane, J. M., Kissling, W., Hamann, J., Etschel, E., and Engel, R. (2005). Clinical implications of the Brief Psychiatric Rating Scale Scores. British Journal of Psychiatry, 187, 366–371. Meng, X., and Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79, 103-111. Overall, J., and Gorham, D. (1962). The Brief Psychiatric Rating Scale. Psychological Reports, 10, 799–812.

109

BIBLIOGRAPHY

Pikkemaat, R., Lundin, S., Stenqvist, O., Hilgers, R. D., and Leonhardt, S. (2014). Recent advances in and limitations of cardiac output monitoring by means of electrical impedance tomography. Anesthesia & Analgesia, 119(1), 76–83. Singh, M., and Kay, S. (1975). A comparative study of haloperidol and chlorpromazine in terms of clinical effects and therapeutic reversal with benztropine in schizophrenia. Theoretical implications for potency differences among neuroleptics. Psychopharmacologia, 43, 103–113. Tibaldi, F. S., Cortiñas Abrahantes, J., Molenberghs, G., Renard, D., Burzykowski, T., Buyse, M., Parmar, M., Stijnen, T., and Wolfinger, R. (2003). Simplified hierarchical linear models for the evaluation of surrogate endpoints. Journal of Statistical Computation and Simulation, 73, 643–658. Van der Elst, W. (2016). Statistical Evaluation Methodology for Surrogate Endpoints in Clinical Studies and Related Topics. Unpublished doctoral dissertation. Verbeke, G., and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag.

110