Mendelian Randomisation and Instrumental Variables: What Can and What Can’t be Done Vanessa Didelez∗ (Department of Statistical Science, University College London)
& Nuala Sheehan † (Departments of Health Sciences and Genetics, University of Leicester)
December 2005
University of Leicester Department of Health Sciences
Technical Report 05-02 Correspondence to: ∗
Dr Vanesssa Didelez Department of Statistical Science University College London Gower Street London WC1E 6BT e-mail:
[email protected] or † Dr Nuala Sheehan Department of Health Sciences University of Leicester 22–28 Princess Road West Leicester LE1 6TP e-mail:
[email protected]
1
Abstract In epidemiological research, the effect of a potentially modifiable phenotype or exposure on a particular outcome or disease is often of public health interest. Inferences on this effect can be distorted in the presence of confounders affecting both phenotype and disease. Randomised controlled trials are not always a viable option so reliable inferential methods for observational data are important. Issues of confounding require causal rather than associational arguments. Mendelian randomisation is a method for deriving unconfounded estimates of such causal relationships and basically exploits the fact that a gene known to affect the phenotype can often be reasonably assumed not to be itself associated with any confounding factors and thus has an indirect effect on the disease. It is well known in the economics and causal literature that these properties define an instrumental variable but they are minimal conditions in the sense that they only permit unique consistent identification of the causal effect of the phenotype on the disease status in the presence of additional and fairly strong assumptions. This does not appear to have been well understood. The additional assumptions typically relate to the distributions of the variables such as multivariate normality, for example, and to the nature of the dependencies between them, such as linearity, for instance. We discuss Mendelian randomisation as an instrumental variable approach and we explore the relevant assumptions in the context of standard epidemiological applications, both with regard to testing for and estimation of a causal effect. We clarify where these assumptions are and are not satisfied and we make some suggestions as to when they may, or may not, be relaxed. The ideas are illustrated using directed acyclic graphs with interventions.
2
1
Introduction
Inferring causation from observed associations or correlations is often a problem with epidemiological data as it is not always clear which of two correlated variables is the cause, which the effect, or whether both are common effects of a third unobserved variable. In the case of experimental data, causal inference is facilitated either by using randomisation or experimental control. By randomly allocating levels of “treatment” to “experimental units”, the randomised experiment of Fisher (1926) renders two of the three elemental causal explanations for an observed association between a treatment and a response—reverse causation and confounding— highly unlikely. In a controlled experiment, causality can be inferred by the experimental setting of all other variables to constant values although Fisher (1970) argued that this is inferior to randomisation as it is logically impossible to know that “all other variables” have been accounted for. In many biological settings, it is not always possible to randomly assign values of a hypothesised “cause” to experimental units of interest, independently of any attributes of these units, for ethical, financial or practical reasons (Shipley 2000), In epidemiological applications, for example, randomised controlled trials (RCTs) to evaluate the effects of smoking, alcohol consumption, physical activity and complex nutritional regimes are unlikely to be carried out. In any case, many studies are concerned with the relationships between such exposures. However, when randomisation is possible, there is considerable concern about the number of spurious causal associations reported from conventional observational epidemiological studies which have failed to be replicated in large-scale followup RCTs. For example, early observational findings which suggested that the risk of smoking-related cancers might be reduced by increased dietary intake of the anti-oxidant vitamin beta-carotene (Peto, Doll, Buckley & Sporn 1981) were negated by subsequent RCT findings (Alpha-Tocopherol, Beta Carotene Cancer Prevention Study Group 1994). One of the main reasons for such spurious findings is confounding whereby one factor, that is not itself causally related to the disease of interest, is associated with a range of other factors which do affect disease risk. Indeed, contradictory findings between observational studies and RCTs are often to do with associations between exposures and diseases related to socioeconomic position and behavioural factors where individuals in a particular exposure category tend to differ in a wide range of other characteristics that are themselves related to the risk of disease (Davey Smith & Ebrahim 2003). Controlling for this kind of confounding is not always possible due to the limited number of confounders actually measured in many studies and the difficulty in obtaining accurate measurements of these factors. Mendelian randomisation has been proposed as a method to test for, or estimate, the causal effect of an intermediate phenotype on a disease in situations where confounding between the phenotype and the disease status is believed to be likely and is not fully understood (Davey Smith & Ebrahim 2003, Katan 2004, Thomas & Conti 2004). It has been suggested (Tobin, Minelli, Burton & Thompson 2004) that the method would be more appropriately named as 3
“Mendelian deconfounding’ to reflect its primary aim, but we will continue to use the term as it is commonly interpreted. Essentially, Mendelian randomisation exploits the idea that a genotype affecting the phenotype of interest, and thus indirectly affecting the disease status, is assigned randomly at meiosis and independently of any possible confounding factors. It is well known in the econometrics and causal literatures (Bowden & Turkington 1984, Angrist, Imbens & Rubin 1996, Pearl 2000, Greenland 2000) that these properties define an instrumental variable but they are minimal conditions in the sense that unique identification of the causal effect of the phenotype on the disease status is only possible in the presence of additional fairly strong assumptions. This has often been overlooked. These additional assumptions can take the form of linearity and additivity assumptions for all dependencies, as are typically assumed in econometrics applications. They could also be assumptions about the compliance behaviour of subjects under study, as are often made in the context of randomised trials with incomplete compliance (Angrist et al. 1996). Without such distributional assumptions it is possible to compute bounds on the causal effect (Robins 1989, Manski 1990, Balke & Pearl 1994, Lauritzen 2000, Dawid 2003) when all relevant variables are binary. This becomes more expensive computationally when some variables have more than two categories and is intractable for continuous variables. In this paper, we explore the relevance of the instrumental variable approach to Mendelian randomisation for causal inference in typical examples from genetic epidemiology. We will focus in particular on the strength of the required assumptions. The outline of the paper is as follows. We will begin with a description of Mendelian randomisation as it is commonly understood and present some examples where it has actually been used. We will then outline what is currently known about instrumental variables and discuss this approach throughout the paper with regard to how it can be formally exploited in Mendelian randomisation applications.
2
Mendelian randomisation
A fundamental aim of observational epidemiology is to identify environmentally modifiable risk factors for disease which can then inform public health policies for improving health in the form of population-wide intervention measures, Unlike genetic epidemiology, the aim is not to identify groups of individuals at risk on the basis of their genotype. The emphasis of a Mendelian randomisation analysis is hence on the former, rather than the latter, where the genotype is studied because it mimics the effect of some exposure of interest and is not generally susceptible to the usual confounding and reverse causation interpretations. An alternative use of the method is to provide information about alternative biological pathways to a disease (Davey Smith, Ebrahim, Lewis, Hansell, Palmer & Burton 2005). The term was first used in a different context, however, by Gray & Wheatley (1991) who exploited the notion that the random assortment of genes at 4
conception could provide an unconfounded study design for estimating treatment effects for leukaemia. Here, the question of interest was whether survival among acute myeloid leukaemia patients was better for those receiving bone marrow transplants as opposed to those on conventional chemotherapy treatment. This question is awkward to address due to the difficulty in identifying a conventional chemotherapy control group. Confounding factors include disease stage at presentation, general fitness and selection effects by treating physicians and a randomised controlled trial would be unlikely to happen for ethical reasons. The proposal was to compare results in patients who have HLA-compatible siblings (and hence have donors) with those who do not—independently of whether the patient with the donor actually receives a transplant or not—in a form of an intentto-treat analysis. Which of the two groups a patient belongs to is determined solely by the random assortment of genes to his siblings and should not be related to any confounding factors. Mendelian randomisation is now used in a different sense and derives from an idea put forth by Katan (1986). This is the interpretation that we will use in this paper and we will describe it in more detail below.
2.1
The Idea of Mendelian Randomisation
In the mid-1980s, there was much debate over the direction of an association between low serum cholesterol levels and cancer which was noted both in observational studies and in the early trials on lowering cholesterol. The hypothesis at the centre of the debate was that low serum cholesterol increases risk of cancer but it was also possible that either the presence of hidden tumours induces a lowering of cholesterol in future cancer patients or other factors such as diet and smoking affect both cholesterol levels and cancer risk. Moreover, there was a vested interest in the debate with producers of dairy products, meat and eggs on one side and margarine and oil producers on the other (Katan 2004). In a letter to The Lancet, Katan (1986) noted that people with the rare genetic disease abetalipoproteinaemia do not tend to get premature cancer, despite the fact that their serum cholesterol levels are practically zero, and hence that identification of a larger group of individuals genetically predisposed to having low cholesterol levels might help to resolve this issue. The apolipoprotein E (APOE) gene was known at this time to be associated with serum cholesterol levels. Synthesis of APOE is controlled by three alleles: E2, E3 and E4 with population frequencies of about 8%, 77% and 15%, respectively. Different levels of cholesterol are associated with different APOE genotypes with the E2 allele, in particular, associated with lower levels than either E3 or E4. Thus, E2 carriers should have relatively low levels of serum cholesterol and, crucially, should be similar on average to E3 and E4 carriers in socioeconomic position, lifestyle and all other respects. The idea is simple. According to Mendel’s Second Law (the law of assortment), APOE genes are assigned randomly during meiosis independently of other factors. We know that this is not true in general, but for our purposes, all that matters 5
is that the assignment of genes can be held to be independent of all confounding factors. The APOE sequence is not disrupted by disease so lower cholesterol levels for those assigned the E2 allele are present from birth. Katan reasoned that a prospective study was hence unnecessary and a simple comparison of APOE genotypes in cancer patients and controls should suffice to resolve the causal dilemma. If low serum cholesterol level is really a risk factor for cancer, then patients should have more E2 alleles and controls should have more E3 and E4 alleles. Otherwise, APOE alleles should be equally distributed across both groups. In summary, the hypothesis that an association between low serum cholesterol levels and cancer is causal can be tested by studying the relationship between cancer and a genetic determinant of serum cholesterol. The former association is subject to confounding, the latter is not since alleles are assigned at random, and causality can be inferred because we are more or less back in the world of Fisher’s randomised experiment (Davey Smith & Ebrahim 2003). This idea of using gene-phenotype and gene-disease associations to make unconfounded inferences about the causal association between a phenotype and disease has become known in the epidemiological literature as Mendelian randomisation. It is important to note that Katan’s original idea was centred around hypothesis testing to confirm or disprove causality, However, the method is also recommended to estimate the size of the effect of the phenotype on the disease together with a measure of its uncertainty (Tobin et al. 2004) and, indeed, to compare this estimate with that obtained from observational studies in order to assess the extent to which the observational studies have controlled for confounding. Katan’s idea was never actually implemented for the low serum cholesterol level and cancer risk association. In fact, the large statin trials which came later and were primarily concerned with the effects of high cholesterol levels on CHD risk disproved this association (Scandinavian Simvastin Survival Study (4S) 1994, Heart Protection Study Collaborative Group 2002). The idea has been used several times since, however, and we will outline some examples below.
2.2
Applications of Mendelian Randomisation
Homocysteine, Folate, Coronary Heart Disease (CHD) (Davey Smith & Ebrahim 2003) Observational studies have consistently demonstrated an association between higher plasma levels of the amino acid, homocysteine, and coronary heart disease (CHD) (Ford, Smith, Stroup, Steinberg, Mueller & Thacker 2002). Results from RCTs, on the other hand, imply that a moderate increase in folate consumption can substantially reduce homocysteine levels (Homocysteine Lowering Trialists’ Collaboration 1998). Thus, if the observational association is truly causal, a simple intervention of increasing folate intake could reduce the risk of CHD. However, homocysteine–CHD associations could be confounded by smoking and socioeconomic background, for example, or could be due to reverse causality with existing atherosclerosis increasing levels of homocysteine. In the absence of a definitive folate trial, the T variant of of the gene coding for the 6
enzyme methylene tetrahydrofolate reductase (MTHFR) can be used to study the observational association. This variant causes reduced enzyme activity so carriers tend to have higher levels of homocysteine, In particular, T homozygotes tend to have homocysteine levels of 2.6 µ mol/l higher on average than homozygotes for the more common C allele and so should be at a higher risk of CHD. The observed relative risk of CHD for TT over CC homozygotes was 1.16 whereas the value predicted from the phenotype-disease and genotype-phenotype associations was 1.13. Davey Smith & Ebrahim (2003) note that the two estimates are similar and that the latter is lower because of measurement error due to laboratory level or one-off reading errors. One should expect the genotype to be related to the “usual” levels rather than the “one-off” level obtained from a single reading. To check this, studies were carried out in which repeated measures of homocysteine levels were used and the corresponding relative risk from these was 1.17—much closer to the estimate from the MTHFR studies. In conclusion, there was evidence that the observed homocysteine-CHD link was indeed causal and hence support for the protective effect of folate. However, the observed association of MTHFR genotype and CHD was not strong enough to indicate that genetic testing would be a useful or efficient way to identify a group at high risk of developing CHD. C-Reactive Protein (CRP), Blood Pressure and Hypertension (Davey Smith, Lawlor, Harbord, Rumley, Lowe, Day & Ebrahim 2005) It has been postulated that C-Reactive Protein (an indicator of systemic inflammation) increases the risk of development of hypertension and several studies have reported an association between CRP and blood pressure (BP). Other factors known to increase CRP levels such as obesity, smoking, adverse socioeconomic circumstances and other diseases, could also influence BP levels and reverse causation is possible. While it is possible, in principle, to adjust for confounding factors, they are in practice difficult to measure and are not always available. Besides, some plausible factors may be on the “causal pathway” between CRP and blood pressure outcomes so to adjust for them as confounders would be erroneous. Mendelian randomisation is hence proposed as an additional check on whether the CRP-BP association is truly causal using the 1059G/C polymorphism within the CRP gene which is strongly associated with CRP concentrations. The study involved more than 3,500 women aged between 60 and 79 years and there were three outcomes of interest: BP, taken as the average of two readings one minute apart (continuous); pulse pressure (PP) which is the difference between the two measurements of systolic (SBP) and diastolic (DBP) blood pressure; and hypertension, a binary variable assuming the value 1 either if SBP ≥ 160 mm Hg or DBP ≥ 100 mm Hg or the patient was on antihypertensive treatment not known to have been prescribed for any other condition. Adjusting the various regressions for a large number of confounding factors acting over the life course eliminated most of the observed associations between CRP levels and the three outcomes. There was no discernible association between any of these outcomes and the CRP gene. (The author’s conclusions about the 7
predicted associations using a “Mendelian randomisation analytic approach” did not seem to reflect the reported regression parameters and odds ratios.) Since there is no biological reason to suppose that if something is unlikely to be causal in females, as suggested here, it could be causal in males, the overall conclusion is that the observed CRP-BP/hypertension associations can be explained by confounding and/or reverse causality. Development of pharmaceutical agents to lower CRP levels is hence not recommended as a useful strategy to lower blood pressure. MTHFR, Homocysteine and Stroke (Casas, Bautista, Smeeth, Sharma & Hingorani 2005) As in the first example, MTHFR TT homozygotes have higher homocysteine levels on average than CC homozygotes and should thus be at a higher risk of stroke if the observed homocysteine-stroke associations are causal. The same confounding and reverse causation explanations mentioned earlier apply to this association. Again, the authors check for causality by investigating the consistency between the expected odds ratio for stroke among TT homozygotes extrapolated from phenotype-disease and genotype-phenotype studies with the odds ratio observed in genotype-disease studies. As there was no statistical difference between these two quantities, it is concluded that increased homocysteine levels are indeed causal for stroke. However, as the effect would appear to be modest by comparison with classic cardiovascular risk factors, there is no need to screen for the MTHFR variant to to measure homocysteine levels in isolation. Since dietary folate could be a useful intervention, they recommend that more detailed and larger folate trials be conducted to establish the magnitude of the folate effect and to assess any possible adverse side effects. Lipid-Related Genes and Myocardial Infarction (Keavney, Palmer, Parish, Clark, Youngman, Danesh, McKenzie, Delphine, Lathrop, Peto & Collins 2004) The emphasis here would appear to be slightly different from previous examples in that the authors seem to believe that plasma lipid levels are causally related to the risk of CHD. Their interest lies more in investigating the associations between lipid-related genes and CHD risk and in determining whether any differences in CHD risk associated with these genes are consistent with their effect on plasma lipid concentration levels. These genes all have a moderate effect and this is the first study that is large enough for such an investigation. The two main plasma apolipoproteins, B, carried on low-density lipoprotein particles and A1 , carried on high-density particles, have opposite effects on the incidence of CHD with the former being atherogenic and latter cardio protective. The ratio of the two levels, apoB/apoA1 is a “very strong predictor” of CHD risk and is used as the intermediate phenotype in this example. 3460 controls were used to get the genotype-phenotype associations for each of 6 lipid-related genes and these controls, plus 4685 individuals with confirmed myocardial infarction (MI), were used to derive the genotype-disease associations. The association between the apoB/apoA1 ratio and MI was also estimated from these data. Of the 3 common APOE genotypes, it was noted that the direction of effects 8
on MI risk was consistent with the effect direction on the apoB/apoA1 ratio but the magnitude of these effects was “smaller than expected”. Consistency of effect direction was not evident for all 6 genes considered. The authors argued that adjusting the genotype-disease associations for the apoB/apoA1 ratio should give a relative risk of 1 unless the gene influences the risk via some other mechanism. This adjustment actually reversed the trend for some genes. The authors do not believe that their phenotype-disease association is all due to confounding and suggest that these genes should be studied in more detail. The consequences for genetic risk prediction are that at given plasma lipid concentrations, genotypes that adversely affect lipoprotein levels were associated with lower, rather than higher, risk of MI and hence do not identify high-risk individuals requiring more aggressive management. This is in contrast with an earlier reported study (Collins 1999). Plasma Fibrinogen and CHD (Youngman, Keavney, Palmer, Parish, Clark, Danesh, Delphine, Lathrop, Peto & Collins 2000) There is much controversy over the role of plasma fibrinogen as a cardiovascular risk factor. Observational studies consistently find an association. However, it is well known that existing atherosclerosis increases fibrinogen levels (reverse confounding) and that substantial confounding with higher fibrinogen levels is also evident in subgroups known to have increased CHD risk (e.g. smokers, non-drinkers, less fit and less advantaged individuals). RCTs of drugs that reduce blood clotting have demonstrated a reduction in CHD risk but these drugs do not work simply by reducing fibrinogen levels, Those drugs that explicitly do this—the fibrates—have not been associated with reduced cardiovascular disease and CHD risk. The question of interest, then, is whether fibrinogen is truly causal or whether it just a marker of both disease status and other causal factors (Davey Smith & Ebrahim 2003). Mendelian randomisation is used to derive an unconfounded test of this association via the β-fibrinogen gene. The observed genotype-disease odds ratio was significantly different from the odds ratio predicted from the genotypephenotype and phenotype disease associations under the hypothesis that there is no confounding. The authors conclude that plasma fibrinogen levels are not causal for CHD.
3
Causal Concepts and Terminology
The applications discussed in Section 2 featured frequent use of causal vocabulary to express something that is more than association between genotypes, intermediate phenotypes and disease. While this is common practice in the medical literature where underlying knowledge about the biology of the problem may indeed allow one to deduce the direction of an observed association and where “causal pathways” for disease are common concepts in epidemiology (see Stanley, Blair & Alberman (2000), for example), it is important for our purposes that we make a formal distinction between association and causation. Even the term “causal effect” is used loosely in practice and can mean different things in 9
different settings. We begin with a clarification of exactly what it is we want to identify and calculate by exploiting the instrumental variable technique through Mendelian randomisation. Let X be the cause under investigation and Y the response. In the type of application we have in mind here, X would be the intermediate phenotype, such as cholesterol or homocysteine level, and Y would be the disease status, such as cancer or coronary heart disease.
3.1
Interventions
The concept of interventions is crucial to the notion of causality that we will use. As in Lauritzen (2000), Dawid (2002), Dawid (2003) and Pearl (1995) we regard causal inference to be predicting the effect of interventions in a given system. For the applications we are considering, this would typically be the motivation for investigating a causal effect and would constitute a public health intervention such as adding folate to flour, vitamin E to milk or giving advice on diet etc. There are many other notions of causality including, for instance, the use of the term “causality” in a courtroom for retrospective assignment of guilt, but we will not consider any other interpretations here. We focus on the question of whether intervening on X has an effect on Y . By intervening on X, we mean that we can set X (or more generally its distribution) to any value we choose without affecting the distributions of the other variables in the system, other than through the resulting changes in X. This is clearly an idealistic situation and not always easily justified for the examples of public health interventions given above. For example, increasing dietary folate will not determine a specific homocysteine level which is why we need results from a controlled randomised trial on the effect of adding folate to the diet to inform the intervention. However, a causal analysis exploiting Mendelian randomisation can be used to generate hypotheses that can afterwards be investigated by controlled randomised trials where applicable. Also, if a phenotype is found to be causal in the above sense, then different ways of intervening on this phenotype can be explored.
3.2
Causal effect
The causal effect is a function of the distributions of Y under different interventions in X. It is well known that this is not necessarily equal to the usual conditional distribution P (Y |X = x) (see Lauritzen (2000), for example). The latter is just describing a statistical dependence. How the distribution of Y should be modified given information that X = x is not necessarily how it should be modified when X = x is forced. (See the Appendix for an example when the two are not the same.) We will follow Pearl (2000) and use the notation P (Y |do(X = x)) to make it clear that we mean conditioning on intervention in X. These different notations are reflecting the common phrase “correlation is not causation”. We define the (average) causal effect (ACE) as the difference in expectations 10
under different settings of X: ACE(x1 , x2 ) = E(Y |do(X = x1 )) − E(Y |do(X = x2 )).
(1)
In particular, X is regarded as causal for Y if the average causal effect as defined in (1) is non-zero for some values x1 , x2 with x1 6= x2 . If X is binary, the unique average causal effect is given by E(Y |do(X = 1)) − E(Y |do(X = 0)). If Y is continuous, a popular assumption is that the causal dependence of Y on X is linear (possibly after suitable transformations), i.e. E(Y |do(X = x)) = α + βx. In this case, the average causal effect is β(x1 − x2 ) and can be simply summarised by β which is now interpreted as the average effect of increasing X by one unit through some intervention. (See the Appendix for more technical details on the linear case.) In the more general cases of more than two categories and/or nonlinear dependency the average causal effect is not necessarily summarised by a single parameter. Most of the literature on instrumental variables deals with the causal effect as given in (1). Other ways of measuring the changes in Y when we intervene in X could be based on the ratio E(Y |do(X = x1 ))/E(Y |do(X = x2 )) or the odds ratio if Y is binary. However, in general much stronger assumptions have to be made if these measures are used (Greenland (2000), for example). Some of these issues are discussed further in Section 6.3.
3.3
Identifiability
A causal parameter is identifiable if we can show that it can be consistently be estimated from the data under the conditions of how those data were obtained (e.g. randomised trial, case-control study, cohort study etc.). Mathematically, this amounts to being able to express the parameter in terms that do not involve the intervention (i.e. the “do” operation) by using ‘observational’ terms only. These, being observational, can then be estimated from data. As noted above, the distribution under intervention P (Y |do(X = x)) is not necessarily the same as the observational distribution P (Y |X = x), e.g. due to confounding. Hence we cannot directly estimate parameters of P (Y |do(X = x)) from observations that represent P (Y |X = x). In the rare case of known confounders, it can be shown mathematically that the intervention distribution can be re-expressed in observational terms and can hence be estimated from the observed data by adjusting for these confounders (Pearl 2000, Lauritzen 2000, Dawid 2002). The instrumental variable technique based on Mendelian randomisation allows a different way of identifying causal parameters when the confounders are unobservable. 11
4
Instrumental Variables
We now define the minimal properties that characterise an instrumental variable. These will be expressed in terms of conditional independence statements where A⊥ ⊥B|C means “A is conditionally independent of B given C”. These conditions have been given in many different forms. Our terminology and notation closely follow Greenland, Robins & Pearl (1999), Pearl (2000) and Dawid (2002). Other authors use counterfactual variables (Angrist et al. 1996, Robins 1997) or linear structural equations (Goldberger 1972, Pearl 2000). The conditions we give below are common to most instrumental variable methods but on their own they do not necessarily allow for identification of the ACE as we will discuss more fully in the following sections. For now, we will focus on these core assumptions and illustrate their meaning. In addition, we present a way of representing and checking the relevant conditional independencies graphically.
4.1
Core Conditions
Let X and Y be defined as above with the causal effect of X on Y being of primary interest. Furthermore, let G be the variable that we want to use as the instrument (the genotype in our case) and let U be an unobservable variable that will represent the confounding between X and Y . The ‘core conditions’ that G has to satisfy are the following 1. G⊥ ⊥U , i.e. G must be (marginally) independent of the confounding between X and Y ; 2. G⊥ ⊥ / X, i.e. G must not be (marginally) independent of X; and 3. Y ⊥ ⊥G | (X, U ), i.e. conditionally on X and the confounder U , the instrument and the response are independent. It is easy to misunderstand and misinterpret these assumptions. One of the reasons for this is that that they cannot be formally tested and have to be justified on the basis of subject matter or other background knowledge. This is because U , by definition, is not observable: if it were, we could adjust for it and would not need any instrument to identify the effect of X on Y . Furthermore, the above assumptions do not imply any testable conditional independencies regarding the instrument G. In particular they do not imply that G is independent of Y , either marginally or conditionally on X alone, although this often seems to be overlooked (see Thomas & Conti (2004), for example). We also need to condition on U in order to get property 3 which is not possible from the observable data. Of course, U can be empty indicating that there is no unobserved confounding between X and Y . In that case, if X and Y can simultaneously be observed, there is no need for an instrumental variable in the first place.
12
4.2
Graphical Representation
Graphical models based on directed acyclic graphs (DAGs) can be used to represent conditional independencies among a set of joint variables in the following way. Every node of the graph represents a variable and these can be linked by directed edges which we represent by arrows (−→). If a −→ b we say that a is a parent of b and b is a child of a. If a −→ · · · −→ b then a is an ancestor of b and b is a descendant of a. A cycle occurs when a node a is its own ancestor or descendant meaning that there exists an unbroken sequence of directed edges leading from a back to itself. DAGs have no such cycles. As a result, all the conditional independencies represented in the graph can be derived from the Markov properties of the graph by which every node is independent of all its non-descendants given its parents (Pearl 1988, Cowell, Dawid, Lauritzen & Spiegelhalter 1999).
U
PSfrag replacements
G
X
Y
Figure 1: The directed acyclic graph (DAG) representing the required conditional independencies for G to be an instrument in our example. Figure 1 shows the unique DAG involving G, X, Y and U that satisfies assumptions 1–3. From the graph, we have that G⊥ ⊥U because G and U are nondescendants of each other (and their parent sets are empty). Likewise, G⊥ ⊥ / X because X is a descendant of G, and Y ⊥ ⊥G|(X, U ) because (X, U ) are the parents of Y and G is a non-descendant of Y . The conditional independence restrictions imposed by the graph in Figure 1 are equivalent to a factorisation of the joint density in the following way: p(y, x, u, g) = p(y|u, x)p(x|u, g)p(u)p(g).
(2)
From this it can be seen (by integrating out y and conditioning on x) that G⊥ ⊥ / U |X, for instance. Similarly, by integrating out x and conditioning on y, we have that G⊥ ⊥ / U |Y , or formally P
p(u)p(g) x p(y|u, x)p(x|u, g) p(u)p(g)p(y|u, g) p(g, u|y) = P = 6= p(g|y)p(u|y) P p(y) u,g p(u)p(g) x p(y|u, x)p(x|u, g)
despite p(g, u) = p(g)p(u). This is the so–called selection effect whereby two variables such as G and U , which are marginally independent, may become dependent once we condition on a common descendant. The selection effect is particularly relevant in the situation of case–control data when everything is conditional on the outcome Y . In graphical terms, a moral edge is induced between two variables that have a common child when conditioning on this child 13
or a descendant thereof (Cowell et al. (1999), for example). Here, as G and U have a common child X, and the variable we condition on, Y , is a descendant of X, such a moral edge has to be introduced to represent the case–control situation (Figure 2). Particular considerations will therefore be given to the suitability of
U
PSfrag replacements
G
X
Y
Figure 2: Conditioning on the outcome Y possibly induces an association between G and U although they are marginally independent. Thus G⊥ ⊥ / U |Y = y for casecontrol data in general. Mendelian randomisation for case–control data. As will be shown in Section 5, testing for a causal effect is still possible but estimation of the causal effect (ACE) will be problematic. Notice that directed acyclic graphs (DAGs) only represent conditional dependencies and independencies: they are not causal in themselves despite the arrow suggesting a ‘direction’ of dependence. We say that the DAG has a causal interpretation with respect to the relationship between X and Y , or, more specifically, the DAG is causal with respect to intervention in X, if we believe that an intervention in X does not change any of the other factors in the joint distribution (2). This means that p(y, u, g|do(X = x0 )) = p(y|u, x0 )p(u)p(g),
(3)
assuming that p(y|u, x0 ) = p(y|u, do(X = x0 )) which will typically depend on the choice of U . Graphically, the intervention corresponds to removing all the arrows leading into X from the graph in Figure 1. Hence, if X has no graph parents, p(y|do(X = x0 ) = p(y|X = x0 ). Note that the validity of the assumption about the intervention that allows (3) depends on the variables included in the graph and the actual kind of intervention being contemplated. After all, why should the conditional distributions of the remaining variables remain unchanged in general if a potentially very different situation is created by intervening? For the graph as a whole to be considered as causal we need to be able to intervene in every variable and make additional assumptions, in particular with regard to what variables are included as vertices in the graph. Loosely speaking, one can say that ‘all common causes’ have to be in the graph (Spirtes, Glymour & Scheines 2000). Also, in order to graphically check our core conditions 1-3 it will typically be helpful to include more variables in the graph than just Y, X, G and U to represent what is thought to be the data generating process based on subject matter knowledge. An example is given in the next subsection and also see Section 7. 14
4.3
G–X Association G2
PSfrag replacements
G1
U
X
Y
Figure 3: An alternative formulation to the instrumental variable problem depicted in Figure 1. It is important to note that all that is required with regard to the relationship between the instrument and X is an association as stated in core condition 2. The instrument (genotype in our case) does not need to be causal for X, i.e. the arrow G → X in Figure 1 is not representing a causal direction. The association could instead be due to mediation or another unobserved variable that affects both, G and X. This is illustrated in the graph of Figure 3 where we consider two variables (e.g. two genotypes in our application) G1 and G2 . The conditional independencies encoded by this graph are (G1 , G2 )⊥ ⊥U,
X⊥ ⊥G1 |(G2 , U ),
Y⊥ ⊥(G1 , G2 )|(X, U ),
where the first and second conditional independencies together imply X⊥ ⊥G1 |G2 . Again, we emphasise that Y is neither marginally independent of (G1 , G2 ) nor conditionally given X alone. The corresponding factorisation of the density is given by p(y, x, u, g1, g2 ) = p(y|x, u)p(x|u, g2)p(g1 |g2 )p(g2 )p(u). Furthermore, if we believe that we can intervene in X without this intervention affecting anything else, we have that p(y, u, g1, g2 |do(X = x0 )) = p(y|x0 , u)p(g1 |g2 )p(g2 )p(u).
(4)
Now assume that only G1 is observed, not G2 . In this case the joint distribution of the remaining variables is p(y, x, u, g1) = p(y|x, u)p(u)
X g2
p(x|u, g2 )p(g1 |g2 )p(g2 )
= p(y|x, u)p(u)p(g1)
X g2
p(x|u, g2 )p(g2 |g1 )
= p(y|x, u)p(x|u, g1)p(g1 )p(u). This is the same as factorisation (2) and equivalent to assuming our core conditions 1–3 with G1 as instrumental variable. It also yields the same intervention distribution as before p(y, u, g1|do(X = x0 )) = p(y|x0 , u)p(g1 )p(u), 15
which is alternatively obtained by integrating out g2 in (4). For our purposes therefore, we do not have to find the “right” gene as it does not matter how the association between X and the genotype comes about. Hence, without loss of generality, we will assume the situation depicted in Figure 1 described algebraically in equation (2) and in equation (3) for the intervention case. However, it is plausible that the stronger the G − X association the better the instrument, and the association will be stronger the closer G is to the causal genotype for X.
5
Testing for Zero Causal Effect
Assuming that the core conditions 1–3 of Section 4.1 are satisfied, let us first consider the situation where we just want to know whether there is a causal link from a phenotype X to a disease Y without quantifying it. Obviously we cannot simply test for association between X and Y because any observed association might be due to confounding or reverse causality. In this section we investigate whether a test for dependence between X and Y can be replaced by a test of dependence between the instrument G and Y . Assuming the core conditions 1– 3, it turns out that the latter is ‘practically’ equivalent to testing for a causal effect where by ‘practically’ we mean that specific numerical configurations of the parameters would have to arise for the equivalence to be violated and that these seem very unlikely in practice. Due to the selection effect in case–control situations, mentioned earlier, we consider prospective and retrospective views separately.
5.1
Prospective view
From (3) we obtain the (marginal) distribution of Y under intervention as p(y|do(X = x0 )) =
X
p(y|u, x0)p(u)p(g) =
X
p(y|u, x0 )p(u).
(5)
u
u,g
This can be recognised as the usual adjustment formula for the case where U is observable, i.e. we partition the population according to U , assess the effect of X on Y within each subgroup and then average over the subgroups (cf. Pearl (2000), p. 78). The ACE as defined by (1) in Section 3 is then ACE(x1 , x2 ) =
X u
(E(Y |U = u, X = x1 ) − E(Y |U = u, X = x2 ))p(u).
If E(Y |U = u, X = x) = E(Y |U = u), or more strongly if p(y|u, x) = p(y|u) (i.e. Y⊥ ⊥X | U ) then the causal effect is obviously zero. Note that Y ⊥ ⊥X | U has a graphical counterpart, shown in Figure 4, which is obtained by deleting the arrow from X to Y in Figure 1. However, the reverse is not necessarily true. If the ACE is zero, or formally if p(y|do(X = x)) does not depend on x, then we cannot conclude that p(y|u, x) does not depend on x because, as implied in (5), there could be an interaction between X and U in their effect on Y which together 16
U
PSfrag replacements
G
X
Y
Figure 4: The directed acyclic graph representing the case where Y ⊥ ⊥X | U and there is no causal effect between X and Y . with the weights p(u) cancels out the overall effect. Only for models without interactions, more specifically where for any u 6= u0 , we have E(Y |U = u, X = x1 ) − E(Y |U = u, X = x2 ) = E(Y |U = u0 , X = x1 ) − E(Y |U = u0 , X = x2 ), does a zero causal effect lead to the conclusion that E(Y |U, X) does not depend on X. Such interactions, of course, can never be completely ruled out as U is unobservable. However, one has to presume that even when allowing for possible interactions between X and U , it would be rare in practice to obtain a zero causal effect without at least E(Y |U = u, X = x) = E(Y |U = u) being true because the cancellation discussed above requires a very specific numerical configuration. It would be convenient if (under conditions 1–3) Y ⊥ ⊥X|U if and only if Y ⊥ ⊥G. This would imply that if we believe that the conditional distribution of Y under intervention in X is the same as when X is just observed (i.e. the DAG has a causal interpretation so p(y|x, u) = p(y|do(X = x), u)) and if we disregard the particular numerical cancellations discussed above, we could test for a causal effect by checking for association between G and Y . Using (2) and integrating out x and u gives the marginal joint distribution of (Y, G) p(y, g) = p(g)
X u
p(u)
X
p(y|u, x)p(x|u, g).
x
We can see that if Y ⊥ ⊥X|U , i.e. if p(y|u, x) = p(y|u), then p(y, g) = p(g)
X
p(y|u)p(u)
X
p(x|u, g) = p(g)p(y).
x
u
So Y ⊥ ⊥X|U ⇒ Y ⊥ ⊥G. We note that the joint distribution also factorises if p(x|u, g) = p(x|u), i.e. if X and the instrumental variable are not associated — this is why we need core condition 2. The reverse argument Y ⊥ ⊥G ⇒ Y ⊥ ⊥X|U does not hold, however, even if we know that p(x|u, g) 6= p(x|u). Again, this is due to the possibility of very specific numerical cancellations that might induce a factorisation of p(y, g) without the desired independence.
17
Summarising, we have G⊥ ⊥Y ⇐ Y ⊥ ⊥X | U ⇒ p(y | do(x)) = p(y), whereas the reverse implications can be violated if specific numerical patterns of the parameters of the involved distributions occur. This problem is also known as the ‘faithfulness’ problem (Spirtes et al. 2000). As the latter would appear unlikely in practice, we will consider it reasonably safe to regard the above three statements as “equivalent for practical purposes.” Hence we can say that if a test for dependence between G and Y results in rejecting independence, we can conclude that Y is not independent of X given U which for practical purposes will imply that X is causal for Y . However, if we find independence between G and Y , then it will mean for practical purposes that Y⊥ ⊥X | U which implies that X is not causal for Y and if interactions between X and U on Y can be firmly ruled out, one might be more confident in this assertion. However, we can never test for such interactions.
5.2
Retrospective view
In the case control study situation where we condition on the outcome Y , the data only allow us to identify properties of the distribution p(x, u, g | y) which from equation (2) is seen to be p(y | u, x)p(x | u, g)p(u)p(g) . p(x, u, g | y) = P x,u p(y | u, x)p(x | u)p(u)
This will typically not factorise in any way: there are no independencies among X, G, U conditional on Y due to the moral link induced in the graph by this conditioning, as discussed in Section 4.2. However, assuming, as in the prospective case, that for practical purposes ‘no causal effect’ is equivalent to Y ⊥ ⊥X | U , then the above conditional distribution becomes p(x, u, g | y) =
p(y | u)p(x | u, g)p(u)p(g) P u p(y | u)p(u)
if there is no causal effect. By summing out x and u we then find that p(g|y) = p(g). Hence, for a case control study, we can also expect that if there is no causal effect there should be no association between Y and G, or equivalently, if we find an association between Y and G then there is a causal effect.
5.3
Conclusions
Our overall conclusion in this section is that exploiting the idea of instrumental variables via Mendelian randomisation to test for the presence of a causal effect of an intermediate phenotype on a disease is justifiable, whatever the nature of the variables and the pattern of dependencies between them. We note that any 18
test for dependence between G and Y that seems appropriate, such as a t–test if Y is normal and G is binary for example, or the odds–ratio when Y is binary in a case–control situation, can be used here. It should be pointed out that, in principle, it can happen that E(Y |do(X = x)) = E(Y ), implying a zero ACE, without P (Y |do(X = x)) = P (Y ) and our reasoning here applies to the latter. Testing for dependence is in this sense stronger than testing for zero ACE. Note that the initial idea of Katan (1986) to exploit Mendelian randomisation refers to testing for a causal effect as discussed in the present section and not to estimating it. However, if the conclusion from the test is that there is a dependence and thus a causal link between X and Y , then we would typically want to quantify this and ideally calculate the causal effect (1). This, however, is more tricky as we will explain below.
6
Identification of Causal Effect
Identifiability of the average causal effect, by which we mean the possibility of finding a consistent unique estimated value from the observed data, requires more assumptions than 1–3. We first consider the additional assumption that all conditional expectations of the variables in Figure 1 are linear without interactions in the graph parents. However, we note that linearity is usually not an appropriate assumption for binary or categorical variables. For such variables or when linearity cannot be assumed for other reasons, it is still possible to derive upper and lower bounds for the average causal effect (1). This will be discussed in more detail in Section 6.2 below. Section 6.3 addresses the question of why the non–linear case cannot be treated in a similar manner to the additive case.
6.1
The Easy Case: Linearity Without Interactions
In the following we assume general linear models for the dependencies among the variables Y, X, G and U . In this case, conditions 1–3 can be reformulated replacing “independence” by “uncorrelated” as can often be found in the literature on structural equation models. Furthermore, we assume that all dependencies only affect the mean. In other words, we assume that E(Y |X = x, U = u) = α+β1 x+β2 u
and
E(X|G = g, U = u) = γ +δ1 g +δ2 u, (6) both with constant (possibly different) conditional variances. In addition we assume that the first expectation is the same if we intervene in X, i.e. the link is causal in the sense that E(Y |do(X = x0 ), U = u) = α + β1 x0 + β2 u. This is slightly different from (and not as strong as) the usual structural equations assumption where all conditional expectations are assumed to be invariant to intervention (Goldberger 1972, Angrist et al. 1996, Pearl 2000) The latter is not actually necessary for the present purpose as shown in the Appendix. 19
As discussed in Section 3.2, in this framework β1 is the causal parameter that we are interested in as ACE(x1 , x2 ) = β1 (x1 − x2 ). It cannot be estimated from a linear least squares regression of Y on X and U as U is unobserved, nor is it estimable from a linear least squares regression of Y on X alone as (X, U ) are correlated. A linear least squares regression of X on G, however, will yield an estimate rX|G of δ1 , the coefficient of G, because G and U are uncorrelated. Since the regression coefficient of G in a linear least squares regression of Y on G alone can be shown to be β1 δ1 , the required causal parameter, β1 , can be consistently estimated from the ratio rY |G , βˆ1 = rX|G where rY |G is the estimated coefficient of G in the latter regression. Obviously we need rX|G to be non–zero but this is provided by condition 2: G⊥ ⊥ / X (see Appendix for technical details). Notice that these results rely on using linear least squares and models without interactions. In particular, there can be no interactions with the unobserved confounder U . Given that U is unknown, this is obviously an untestable assumption. Straightforward generalisation to the case where G is binary is possible since the conditional expectation E(X|G = g, U = u) can be linear for binary G if a dummy indicator variable is used. In this case, the parameter δ1 would then be the mean difference in X for the two different values of G. In fact, G can have more than two values as long as its dependence on X is linear. The implication of this for Mendelian randomisation applications when G could assume three values, one for each genotype in the simplest diallelic case, is that the expected change in X between genotypes 0 and 1 must be the same as the expected change in X between genotypes 1 and 2. In other words, the genetic model must be additive. There is no sensible genetic model that is consistent with this requirement for a polymorphic locus with more than three genotypes. The case where X is binary is well known in the econometrics literature and is called the dummy endogenous variable model (Maddala 1983, Bowden & Turkington 1984). This is modelled using a threshold approach which assumes an underlying unobservable continuous variable Xc with linear conditional expectation E(Xc |G, U ) as given in (6) above. The observable quantity is X where X = 1 if Xc > 0 and X = 0 otherwise. It can be shown that β1 can still be recovered as before (see Appendix for discussion).
6.2
Bounds on the Causal Effect
When only the core conditions 1–3 are assumed without additional assumptions such as linearity and no interactions, the current literature, to our knowledge, only offers the option of calculating bounds on the ACE. This method, however, 20
requires all observable variables to be binary or categorical which could be achieved by suitable categorisation of continuous variables. The more categories there are, the more difficult the computations and the less informative the bounds. For this reason, the bounds are usually given for the case when G, X and Y are all binary. In our applications, Y is usually binary and G can often be considered as binary with one genotype having an effect on the trait and the others pooled into one that has no effect. To calculate bounds, we would dichotomise the usually continuous phenotype X as being above or below a given threshold and compare the bounds for different discretisations of X. These bounds have been derived in several places in the literature (Balke & Pearl 1994, Lauritzen 2000, Pearl 2000, Dawid 2003), so we will simply present them below. Bounds on the intervention probabilities Define qij = P (Y = i|do(X = j)) for i, j = 0, 1, so q10 is the probability of having the disease given we intervened to “set” the phenotype level below the given threshold. Similarly, let pij.k = P (Y = i, X = j|G = k) represent the usual conditional probability which can be estimated directly from the data using the corresponding relative frequencies. Then, for example, it can be shown that p10.1 p10.0 p10.0 + p11.0 − p00.1 − p11.1 p01.0 + p10.0 − p00.1 − p01.1
≤ q10
1 − p00.1 1 − p00.0 ≤ p01.0 + p10.0 + p10.1 + p11.1 p10.0 + p11.0 + p01.1 + p10.1
so the lower bound on q10 is the maximum of the estimated quantities on the left and the upper bound is the minimum of the estimated values on the right. Bounds on the average causal effect (ACE) As noted in Section 3, there is only one average causal effect for X on Y in the binary case. In the above notation, the bounds for this ACE(1, 0) = P (Y = 1|do(X = 1)) − P (Y = 1|do(X = 0)) are p11.1 + p00.0 − 1 p11.0 + p00.1 − 1 p11.0 − p11.1 − p10.1 − p01.0 − p10.0 p11.1 − p11.0 − p10.0 − p01.1 − p10.1 −p01.1 − p10.1 −p01.0 − p10.0 p00.1 − p01.1 − p10.1 − p01.0 − p00.0 p00.0 − p01.0 − p10.0 − p01.1 − p00.1
≤ ACE ≤
1 − p01.1 − p10.0 1 − p01.0 − p10.1 −p01.0 + p01.1 + p00.1 + p11.0 + p00.0 −p01.1 + p11.1 + p00.1 + p01.0 + p00.0 . p11.1 + p00.1 p11.0 + p00.0 −p10.1 + p11.1 + p00.1 + p11.0 + p10.0 −p10.0 + p11.0 + p00.0 + p11.1 + p10.1
Given that the core conditions 1–3 are satisfied, these bounds are sharp (Pearl 2000) in the sense that they cannot be improved upon without making additional assumptions. How informative they are will depend on whether they have the same sign (e.g. both positive implying that there is a positive causal effect of X on Y ) and how narrow they are. Under certain conditions, these bounds may collapse to a point estimate and thus permit consistent estimation of the average 21
causal effect of X on Y . These assumptions are usually phrased in terms of assumptions about compliance behaviour in a clinical trial when the instrument is assignment to treatment (Balke & Pearl 1994) Note that the above bounds cannot be used for the case–control data situation. This is because they rely on estimation of the quantities P (Y = i, X = j|G = k) which cannot be done when the data have been selected on Y . We are not aware of any method in the literature that generalises the idea of such bounds to the case–control situation.
6.3
The Difficult Case
From Section 6.1 and the corresponding derivations in the Appendix we can deduce that in order to use the instrumental variable technique, we need to specify what our causal parameter is and how it relates to the regression parameters of X on G, obtained by marginalising over U , (i.e. of E(X|G = g), or more generally P (X|G = g)) and the regression parameters of Y on G, obtained by marginalising over X and U (i.e. of E(Y |G = g), or more generally P (Y |G = g)). We note that specification of P (Y |G = g) requires specification of how X depends on U because the manner in which Y depends probabilistically on G is not mediated by X alone but also by U . Recall that our core conditions 1–3 in Section 4.1 do not imply that Y ⊥ ⊥G|X. Causal parameter for non–linear models Assume we have a binary response variable Y and that we want to use a logistic regression, for example, to model the dependence of Y on the other variables. In addition, we assume that this model is invariant with respect to intervention on X by which we mean E(Y |X = x, U = u) = E(Y |do(X = x), U = u) =
exp{α + β1 x + β2 u} . 1 + exp{α + β1 x + β2 u}
To determine the ACE we need to work out E(Y |do(X = x)) = EU E(Y |do(X = x), U ) which (by similar reasoning to that in the Appendix) is given by E(Y |do(X = x)) =
Z
exp{α + β1 x + β2 u} p(u)du, 1 + exp{α + β1 x + β2 u}
where p(u) is the unknown density of U . Unlike the linear case without interactions the above expectation can usually not be written as a logistic model where only the constant needs to be changed, i.e. E(Y |do(X = x)) 6=
exp{α∗ + β1 x + β2 u} . 1 + exp{α∗ + β1 x + β2 u}
(7)
The exact shape of E(Y |do(X = x)) will depend on the distribution of U and it is not trivial to work out which distributions would retain the logistic shape. 22
Greenland et al. (1999) discuss this problem referring to it as non–collapsibility of the logistic regression model. One implication of the above is that if we use the ACE(x1 , x2 ) as measure of causal effect it will actually depend on the values of x1 and x2 . In other words, it cannot be summarised by a single parameter due to the non–linearity of E(Y |do(X = x)) in x. Instead we could consider a different causal parameter as a function of E(Y |do(X = x)) for different values of x. In the context of logistic regression, it would be natural to define the causal odds ratio (COR) as COR(x1 , x2 ) =
P (Y = 1|do(X = x1 )) P (Y = 0|do(X = x2 )) , P (Y = 0|do(X = x1 )) P (Y = 1|do(X = x2 ))
where P (Y = 1|do(X = x1 )) = E(Y |do(X = x1 )). However, from the inequality in (7), it will not usually be the case that β1 is the causal parameter of interest in the sense that COR(x1 , x2 ) 6= exp{β1 (x1 − x2 )}. Alternatively, consider the relative risk causal parameter CRR which we define as CRR(x1 , x2 ) =
P (Y = 1|do(X = x1 )) . P (Y = 1|do(X = x2 ))
Corresponding to this parameter we consider a (causal) log–linear model E(Y |X = x, U = u) = E(Y |do(X = x), U = u) = exp{α + β1 x + β2 u} which does imply that E(Y |do(X = x)) = exp{α + β1 x}
Z
exp{β2 u}p(u)du = exp{α∗ + β1 x}
(where typically α∗ 6= α) after marginalising over U , independently of the distribution of U . Thus, the log–linear shape is retained so here we obtain CRR(x1 , x2 ) = exp{β1 (x1 − x2 )}, which does allow us to interpret β1 as the causal parameter describing the causal relative risk.
Relationship between regression parameters The next step is to investigate how the regression parameters estimated by a regression of Y on G and a regression of X on G are related to the causal parameter of interest. From the Appendix we see that this relies on working out E(Y |G = g) = EU EX|U,G=g E(Y |X, U ). The derivation in the Appendix crucially depends on the different expectations being additive in the conditioning variables so that taking expectations is 23
straightforward. But if we again assume a logistic regression for a binary response variable Y , E(Y |X, U ) = P (Y = 1|X, U ) is not additive in X and U and so the expectation EX|U,G=g E(Y |X, U ) is not straightforward to compute as it involves an integral to which there is no analytic solution. Thompson, Tobin & Minelli (2003) suggest an approximation. However, this approximation for one thing ignores U and assumes that Y ⊥ ⊥G|X which, if true, would mean that an instrumental variable approach is unnecessary as there would be no confounding and the effect of X on Y could be estimated directly from the data. Hence, this does not provide a way of identifying the causal odds ratio in the situation where there is confounding. The approximation could, however, be valid if there is no confounding and could serve instead as a rough check on whether there is any confounding. Although this approach does not seem to be detailed clearly anywhere, it often appears in the epidemiological literature (Davey Smith & Ebrahim 2003, Casas et al. 2005, Youngman et al. 2000) and features in some of the examples outlined in Section 2.2. It is reasonable provided the G − X link is strong and the approximation of the relevant integral (Thompson et al. 2003) is good. The latter essentially replaces the expectation of a ratio by the ratio of the relevant expectations and although the authors give some guidelines as to when this is acceptable, it is well known to be unreliable. The approximate test for confounding assumes binary Y and G (or an additive model for G), and continuous X, and links the odds ratio ORY |G for Y given G, on the one hand, to the odds ratio ORY |X for Y given X and the mean difference in X for different G, µ1 − µ0 , on the other hand, as follows: ORY |G ≈ ORY |X µ1 −µ0 .
(8)
If there is no confounding between X and Y , i.e. if U is empty, then ORY |X = COR(x, x + 1). The idea is that the estimate on the LHS is compared to its predicted value from the RHS of (8) and a (significant) difference would indicate that there is confounding. If the conclusion is that there is no confounding then the estimate of ORY |X can be interpreted as the causal parameter COR(x, x + 1). Recall from Section 5 that the direct estimate for ORY |G can also be used to test for whether any effect of X on Y is causal. In practice, applying both approaches can lead to contradictory results, An example of this arises in a discussion in Davey Smith & Ebrahim (2003) (page 14). A small study (Van der Bom, Maat & M.L. Bots et al 1998) on the causal effect of fibrinogen levels on coronary heart disease (CHD) reported an odds ratio of 1.45 for the risk of CHD corresponding to a rise in fibrinogen level of 1 g/l. The odds ratio corresponding to the combined GA and AA genotypes versus the GG genotype of the polymorphism of interest was reported as 1.08. Since this value was not significantly different from 1, the authors concluded that their study did not support a causal explanation for the fibrinogen-CHD association. Davey Smith & Ebrahim (2003) point out that as the genotype-fibrinogen association in this study was reported as an increase of 0.17 g/l for GA and AA versus GG genotypes, 24
their predicted odds ratio was 1.07 (i.e. 1.450.17 = 1.0652) and was not significantly different from their observed odds ratio. “Thus the authors’ claim that their study suggests that fibrinogen is not causally related to the risk of CHD is not supported by evidence from their own study”. Since the approximation in (8) can never hold exactly as long as the relationships between the observed variables are not deterministic, we would recommend that more emphasis be placed on the test of association between G and Y using ORY |G in order to detect if the X − Y association is causal in the first place. Careful attention should then be given to the conditions under which (8) is “good” before checking for confounding.. If we model P (Y = 1|X, U ) log–linearly corresponding to the causal relative risk CRR, then E(Y |X, U ) = P (Y = 1|X, U ) is multiplicative in X and U and is again not additive. Hence, just as for the logistic relationship, solving E(Y |G = g) = EU EX|U,G=g E(Y |X, U ) requires an approximation. Thomas & Conti (2004) suggest an approximation for the expectation in this case but they also ignore U and assume that Y ⊥ ⊥G|X. The same general problem arises with the probit link that these authors suggest. The probit model can however be used when it is assumed that all binary variables are generated by unobserved underlying continuous variables that have a joint multivariate normal distribution (Stanghellini & Wermuth 2005).
6.4
Conclusions
In conclusion, this section shows that consistent unique estimation of the causal effect requires additional parametric assumptions that involve the unobserved confounder. They are therefore untestable and always prone to misspecification. If we are not willing to make such assumptions, we can compute bounds for the causal effect. However, these bounds will not be valid in a case–control data situation. Furthermore, Section 6.3 gives a taste of how much more complicated the case of a binary outcome is, even with the standard parametric assumptions: not alone are the conditional expectations less straightforward but the choice of causal parameter is not obvious. We would therefore recommend (a) to always compute the bounds in order to assess how informative the data are without making further assumptions; and (b) to carry out simulations and numerical studies to investigate “how wrong” the parametric assumptions and approximations suggested so far can be.
7
Complications for Mendelian Randomisation
Everything we have discussed so far, including testing for the presence of a causal effect, assumes that the simple graph of Figure 1 (or, equivalently, of Figure 3) in Section 4, provides a reasonable representation of the underlying biology. However, the common complex diseases that are of most interest from a public health perspective are generally multifactorial in nature and the definition of disease outcome itself is often ambiguous. In reality, our biological model will be 25
more complicated than that of Figure 1. However, even when we are prepared to accept the simple model, poor inferences about the phenotype-disease association may arise due to possible confounding of the genotype-phenotype and genotypedisease associations (Davey Smith & Ebrahim 2003, Thomas & Conti 2004, Davey Smith, Lawlor, Harbord, Rumley, Lowe, Day & Ebrahim 2005). Moreover, these problems that arise can sometimes violate one or more of the core assumptions so that Figure 1 no longer applies and inferences may actually be wrong. It is important to understand what the added complexity implies with regard to meeting the core conditions required for a genotype to be an instrument. Before we discuss the sorts of complication that can arise, we will review the core assumptions 1–3 and interpret their specific implications for the simplest applications of Mendelian randomisation. 1. G⊥ ⊥U states that the chosen genotype should be independent of the unobserved factors U that confound the phenotype X and the disease Y . It does seem that the plausibility of this assumption can be reasonably assessed from background knowledge for most of our applications. If we think of U as some behavioural pattern or life style, this independence condition can be justified as long as we are reasonably certain that any possible genetic factors influencing the behavioural pattern are unrelated to this particular gene. 2. G⊥ ⊥ / X states that the genotype and intermediate phenotype should be dependent. Obviously, the stronger the dependence, the better G is as an instrument. This means that we have to select a suitable genotype which is known to be strongly associated with the phenotype in question. (See Section 4.3 and Pearl (2000) p.248.) 3. Y ⊥ ⊥G|(X, U ) states there is no association between the genotype and the disease status given the intermediate phenotype and the life style. Thus, for example, there should be no other route from G to Y other than through X or U (Figure 5).
U
PSfrag replacements
G
X
Y
Figure 5: Condition 3 is satisfied but condition 1 is violated implying that G is not a suitable instrument in this case. The full implications of the core conditions will become clearer in our discussions of reasons for possible violations.
26
PSfrag replacements
G1
G2
U replacements PSfrag X
G1
Y
(a)
G2
U X
Y
(b)
Figure 6: Linkage disequilibrium where the chosen instrument G1 is associated with another genotype G2 which directly influences the outcome Y (a) or influences Y indirectly via the confounder U , as in (b). Note that the “direction” of the G1 − G2 link is not important. Linkage Disequilibrium. Linkage disequilibrium, sometimes called gametic phase disequilibrium, refers to the association between alleles at different loci across the population. It can occur because the loci are linked in that they are physically close on the chromosome and thus tend to be inherited together. However, the association may simply be statistical association and due to other reasons such as natural selection, assortative mating, and migration, for example (Lynch & Walsh 1998). When our chosen gene G1 is in linkage disequilibrium with another gene G2 which has a direct influence on the disease Y , or an indirect link with Y via a route other than through X, condition 3, Y ⊥ ⊥G1 |(X, U ), might be violated as shown in Figure 6 (a) or else condition 1, G⊥ ⊥U might be violated as shown in Figure 6 (b). In any case, even if neither of the above situations applies, the genotype–phenotype association may be attenuated if there is linkage disequilibrium. PSfrag replacements
PSfrag replacements
X2 G
X1
X2
U G
Y
X1
U Y
(b)
(a)
Figure 7: An example of pleiotropy where the instrument G is associated with both X1 and X2 where (a) they both have a direct effect on the outcome of interest, Y and (b) where X1 has a direct effect but X2 has an indirect effect via the confounder U . Pleiotropy. Pleiotropy refers to the phenomenon whereby a single gene may influence several traits. If the chosen instrument G is associated with more than one intermediate phenotype which also has an affect on the disease Y , as depicted in Figure 7 (a), condition 3, Y ⊥ ⊥G|(X1 , U ), is again violated if we ignore X2 . 27
We must also condition on the other intermediate phenotype(s) to obtain the required conditional independence for G to be an instrument. Moreover, the other intermediate phenotypes do not have to have a direct association with Y for a violation of the core conditions. For example, a genetic polymorphism under study might have pleiotropic effects that influence consumption of tobacco or alcohol, for example, which in turn have an effect on the disease (Davey Smith & Ebrahim 2003). This is represented in Figure 7 (b) and violates condition 1. PSfrag replacements
U
G3 G2
X
Y
G1 Figure 8: Genetic heterogeneity showing three genes which are all associated with the intermediate phenotype X but none of which has an effect on the disease Y other than through X. Genetic Heterogeneity. When more than one gene affects the phenotype, all conditions may still hold for the chosen instrument G1 (Figure 8), as long as none of the other genes are both associated with G1 and can influence Y in any way other than via their effect on X which would be the same situation as that depicted in Figure 6(b) for the linkage disequilibrium scenario, for example. However, genetic heterogeneity could of course weaken the G1 − X association in that G1 might not turn out to be the best choice of instrument.
PSfrag replacements
G
P
PSfrag U replacements
P X
G
Y
U X
Y
(b)
(a)
Figure 9: Two examples of population stratification where one of the conditions for G to be an instrument is violated (a) and all conditions are satisfied (b). Population Stratification. Population stratification can be a cause of confounding of the genotype-disease association whereby the co-existence of different disease rates and allele frequencies within subgroups of individuals could lead to an association between the two at the population level. In Figure 9 (a), 28
we see that condition 3, Y ⊥ ⊥G1 |(X, U ), is again violated: we need to condition on the population subgroup as well. We note that this is a general problem and is as relevant to the consideration of unrelated individuals as it is to case-control studies, despite comments to the contrary in Thomas & Conti (2004). However, if population stratification causes an association between allele frequencies and phenotype levels, as in Figure 9 (b), all conditions for G to be an instrument are still satisfied, and, in this situation, the G − X association may in fact be strengthened, as a result. Interactions. For the same reasons as given above, gene-gene interactions whereby the effect of a genotype at one locus may depend on the particular genotype at another locus, should not be a problem in terms of violating the core conditions as long as both genes can only affect Y through the intermediate phenotype X. The same is true for gene-environment interactions unless the environmental factor influences the disease via a route other than through X.
X2 PSfrag replacements
G2 U
Y
G1 X1 Figure 10: A more complicated example for causal inference from epidemiological data.
Canalisation. Canalisation is a compensation that can occur during foetal development in response to potentially disruptive influences—either genetic or environmental—and development is influenced in such a way as to buffer against these adverse effects. This can be achieved either through genetic redundancy whereby more than one gene can essentially have the same function or through the development of alternative metabolic pathways to the same phenotype. In particular, a phenotype can be rendered insensitive to changes in the underlying genetic or environmental factors. Results from knockout studies in which a functioning gene is removed from an organism have demonstrated canalisation in that they always result in lower phenotypic effects than would have been expected from background knowledge of the gene function. Findings from such experiments are always difficult to interpret, however. Consequently, it is not clear to what extent canalisation could affect Mendelian randomisation applications as 29
the process usually relates to dramatic developmental disruptions (Davey Smith & Ebrahim 2003), but it should always be considered. In reality, the true biological situation could involve several genes affecting several phenotypes whose joint influence on the disease of interest is subject to confounding (Figure 10). In other words, there may be several alternative routes from the “instrument” G to the outcome of interest, Y .
8
Discussion
Inferences about the causal effects of a phenotype or exposure on a disease must sometimes be obtained from observational data. The effects of alcohol consumption on cardiovascular disease and exposure to organophosphates in sheep dip on ill health in farmers are two examples for which reliable evidence is difficult to obtain and randomised controlled trials unlikely to be carried out (Davey Smith & Ebrahim 2003). Because of the confounding issues that arise with such data, the exploitation of instrumental variable methods via Mendelian randomisation, by which genes related to alcohol metabolism or genes associated with detoxification could be be used to investigate these associations, is very appealing. It is not clear from the current literature, however, that the limitations of this approach have been fully understood. It is important to clearly define what is meant by “causal”. As we have indicated in Section 6, the specification of the causal parameter is crucial. There is a potential problem with case-control data whereby subjects have been selected into the study according to their disease status as the selection effect causes one of the core conditions for an instrumental variable to be violated. However, an appropriate test of the genotype–disease association, as recommended by Katan (1986), provides a test for causality of the phenotype–disease association (Section 5) in all cases. It should be noted that the approximate test that is often performed in practice is a test for confounding, which can be a test for causality if the conclusion is that there is no confounding. Estimating the causal effect is more problematic and especially so when the disease is binary, as it often is in Mendelian randomisation applications. “There is, in fact, no agreed upon generalization of instrumental variables to non-linear systems” (see Pearl (2000), p.248). In particular, the ratio point estimate for the relevant regression parameter in the additive linear case (Section 6.1) is not necessarily meaningful for binary outcomes. Extra assumptions that involve the unobserved confounding variables are required for unique consistent estimation of the causal effect. It is possible to calculate upper and lower bounds on this effect, without making such assumptions. These are most straightforward when the phenotype is dichotomised. They can also be used (Pearl 2000) as a rough test for ruling out poor instruments. By insisting that all the upper bounds given in Section 6.2 on the right are greater than or equal to the lower bound on the left, testable constraints arise which, if violated, will imply that at least one of the modelling assumptions is not satisfied. These bounds cannot be calculated from case-control data. 30
Imperfect compliance in randomised controlled trials usually means that some of the study subjects decide not to take the treatment that they have been assigned to. When instrumental variable methods are used in this context, the average causal effect of “treatment on the treated”, or the causal effect on the“compliers”, is often considered and is called a local causal effect because it does not refer to the whole population. In these randomised trials, the instrumental variable G is the treatment assignment, X is the actual treatment received or taken by the patient, which may be different from the assigned treatment, and Y is the response. There are two reasons why one might be interested in the causal effect of the actual treatment on the treated. The first is purely technical: it can be estimated under less restrictions than the average causal effect in (1). The second reason has to do with interpretational differences. The average causal effect (1) has also been called the “forced” causal effect and is the average response obtained if the whole population for which the study is meant to be representative is forced to have the treatment. In some contexts this might not be possible and all that can be done is to encourage people to take a treatment. Hence, it is only of interest whether there will be an effect for those who follow the encouragement. An argument against the latter viewpoint is that encouragement in a randomised trial is usually not representative of encouragement in “real life” so the compliance behaviour might not be generalisable. For our applications, the genotype as the instrument does not seem altogether compatible with the notions of “treatment assignment” or “encouragement”. In particular, the “effect of treatment on the compliers” would translate to the “effect of phenotype on those whose phenotype would always be consistent with whatever genotype they receive.” Essentially, this would mean that the lifestyles of such individuals would never override the effects of their genotypes i.e. there would be no confounding effects in this group. These individuals, of course, cannot be determined from the data. For this reason, we have not considered the local causal effect in this paper. Most genetic association studies are statistically underpowered to detect the moderate effects of common gene variants that underlie common complex diseases Failure to establish reliable genotype–phenotype or genotype–disease associations could lead to poor inference. This is not just a problem for Mendelian randomisation but is due to the general non-replicability of findings from genetic association studies. To begin with, estimates of the relevant quantities should always be considered alongside their confidence intervals. However, if large studies are not available, accurate estimates of these relationships may only be possible using meta-analytical approaches which introduce other biases. In particular, Minelli, Thompson, Tobin & Abrams (2004) advise a multivariate approach to deal with the correlations induced when some studies provide evidence on both G − X and G − Y associations. Publication bias can also be a serious issue. In reality, the underlying biology will not be appropriately summarised by the simple graph in Figure 1. In the example outlined in Section 2.2 (Keavney, Youngman, Palmer, Parish, Clark, Delphine, Lathrop, Peto & Collins 2000, Keavney et al. 2004), six lipid-related genes and two plasma apoliproteins 31
having opposite effects on coronary heart disease (CHD) were considered. In order to use the Mendelian randomisation approach, each gene was considered as a separate instrument and the ratio of the plasma lipoproteins taken as the intermediate phenotype. There are many reasons why the reported genotype– CHD associations were not consistent with the genotype–phenotype associations whereby genes that adversely affected lipoprotein levels were not necessarily associated with increased risk of CHD (Davey Smith & Ebrahim 2003). It was noted that several of these genes were known to be in tight linkage disequilibrium, for instance. Another possibility is that the underlying graph is more like an extension of Figure 10 and that interactions between genes and between phenotypes need to be taken into account. To date, here is no extension of the instrumental variable approach that is applicable to this situation. Indeed, we have already seen that care has to be taken with the simple graph. Ideally, however, we would wish the biology to dictate the graph and then determine if that graph can be queried for causal inference. Should this prove to be impossible, how should the graph be modified in order to facilitate such inference—and does this still address reasonable biological questions?
Appendix Background on the linear & no interactions case Let us first show why β1 is the causal parameter of interest. With the assumptions from Section 6.1 we have that E(Y |do(X = x)) = = = =
EU |do(X=x) E(Y |do(X = x), U ) EU E(Y |do(X = x), U ) α + β 1 x + β 2 µU α∗ + β1 x,
where µU = E(U ) and using obvious notation for iterated conditional expectation. The second equality holds because do(X = x) is not informative for U as there cannot be any dependence between an intervention and an unknown variable. (Graphically, intervening on X removes the arrows leading into X from Figure 1.) From the above we obtain ACE(x1 , x2 ) = β1 (x1 − x2 ), so we are interested in estimating β1 . Using the usual conditioning without intervention, a regression of Y on X alone corresponds to E(Y |X = x) = EU |X=x E(Y |X = x, U ) = α + β1 x + β2 µU |X=x , where µU |X=x = E(U |X = x) is typically not constant in x, in particular not equal to µU , due to the dependence between X and U in the non–interventional case, 32
i.e. in the observational regime. Hence β1 cannot be identified from a regression of Y on X alone. Instead consider a regression of Y on G alone. This corresponds to E(Y |G = g) = = = = = =
E(X,U )|G=g E(Y |X, U, G = g) EU |G=g EX|U,G=g E(Y |X, U ) since Y ⊥ ⊥G|(X, U ) EU EX|U,G=g E(Y |X, U ) since U ⊥ ⊥G EU (α + β1 (γ + δ1 g + δ2 U ) + β2 U ) α + β1 γ + β1 δ1 g + (β1 δ2 + β2 )µU α∗ + β1 δ1 g,
where we note that uncorrelatedness rather than independence is sufficient to make these relations hold. Hence, the coefficient of G in a regression of Y on G is β1 δ1 . Furthermore, a regression of X on G alone corresponds to E(X|G = g) = EU |G=g E(X|G = g, U ) = EU E(X|G = g, U ) = γ + δ 1 g + δ 2 µU , so the coefficient of G in this regression is δ1 . Thus, as given in Section 6.1, the causal parameter of interest, β1 , can be estimated from the ratio of these two regression coefficients.
Threshold Models The above can be generalised to the case of binary G and X in the following way. Let E(Y |X = x, U = u) = α + β1 x + β2 u, as before. Consider an underlying unobserved variable Xc with E(Xc |G = g, U = u) = γ + δ1 g + δ2 u and define X=
(
1, Xc > 0 . 0, otherwise
The dependence structure can be represented as in Figure 11 where the relationship between Xc and X is deterministic. The conditional independencies with respect to all variables are Y ⊥ ⊥(G, Xc )|(U, X), X⊥ ⊥(G, U )|Xc and G⊥ ⊥U . But as Xc is not observed we have Y ⊥ ⊥G|(U, X), X⊥ ⊥ / G and still G⊥ ⊥U for the remaining variables. Hence the conditions 1–3 apply to (G, U, X, Y ), ignoring Xc , and β1 is still the parameter we are interested in as it describes the required causal effect.
33
U
PSfrag replacements
Xc
G
X
Y
Figure 11: Mendelian randomisation for a threshold model. By an argument similar to that presented above, we have (assuming δ2 > 0) E(Y |G = g) = = = =
EU EX|U,G=g E(Y |X, U ) EU (α + β1 I(γ + δ1 g + δ2 U > 0) + β2 U ) α + β1 PU (γ + δ1 g + δ2 U > 0) + β2 µU α + β1 PU (U > (−γ − δ1 g)/δ2 ) + β2 µU .
Recall that G is binary and assumes the value 0 or 1. If we let ξ0 = −γ/δ2 and ξ1 = (−γ − δ1 )/δ2 , the above model can be written as E(Y |G = g) = α + β1 PU (U > ξ0 ) + β1 (PU (U > ξ1 ) − PU (U > ξ0 ))g + β2 µU . This is linear in G and the corresponding coefficient rY |G=1 = β1 (PU (U > ξ1 ) − PU (U > ξ0 )) can be estimated by linear least squares. The relationship between the observable X and G works out to be E(X|G = g) = EU E(X|G = g, U ) = PU (γ + δ1 g + δ2 U > 0) = PU (U > ξ0 ) + (PU (U > ξ1 ) − PU (U > ξ0 ))g which is also linear in G. The relevant coefficient rX|G = (PU (U > ξ1 ) − PU (U > ξ0 )) can again be estimated by linear least squares. Notice that the modelling assumptions made above are very specific, in particular about the role of U and cannot necessarily be tested empirically.
Acknowledgments The authors acknowledge research support from Leverhulme Research Interchange Grant F/07134/K which instigated our collaboration.
References Alpha-Tocopherol, Beta Carotene Cancer Prevention Study Group (1994), The effect of vitamin E and beta carotene on the incidence of lung cancer and other cancers in male smokers, New England Journal of Medicine 330, 1029–1035. 34
Angrist, J., Imbens, G. & Rubin, D. (1996), Identification of causal effects using instrumental variables, Journal of the American Statistical Society 91(434), 444– 455. Balke, A. A. & Pearl, J. (1994), Counterfactual probabilities: Computational methods, bounds and applications, in R. Mantaras & D. Poole, eds, Proceedings of the 10th Conference on Uncertainty in Artificial Inteligence, pp. 46–54. Bowden, R. & Turkington, D. (1984), Instrumental Variables, Cambridge University Press, Cambridge, U.K. Casas, J., Bautista, L., Smeeth, L., Sharma, P. & Hingorani, A. (2005), Homocysteine and stroke: evidence on a causal link from mendelian randomisation, Lancet 365, 224–232. Collins, F. (1999), Shattuck lecture—medical and societal consequences of the human genome project, New England Journal of Medicine 341, 28–37. Cowell, R. G., Dawid, A. P., Lauritzen, S. L. & Spiegelhalter, D. J. (1999), Probabilistic Networks and Expert Systems, Statistics for Engineering and Information Science, Springer-Verlag, New York, Inc. Davey Smith, G. & Ebrahim, S. (2003), Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease?, International Journal of Epidemiology 32, 1–22. Davey Smith, G., Ebrahim, S., Lewis, S., Hansell, A., Palmer, L. & Burton, P. (2005), Genetic epidemiology and public health: hope, hype, and future prospects., Lancet 366, 1484–1498. Davey Smith, G., Lawlor, D., Harbord, R., Rumley, A., Lowe, G., Day, I. & Ebrahim, S. (2005), Association of C-reactive protein with blood pressure and hypertension. life course confounding and Mendelian randomisation tests of causality, Arteriosclerosis, Thrombosis and Vascular Biology 25, 1051–1056. Dawid, A. P. (2002), Influence diagrams for causal modelling and inference, International Statistical Review 70, 161–189. Dawid, A. P. (2003), Causal inference using influence diagrams: the problem of partial compliance, in P. J. Green, N. L. Hjort & S. Richardson, eds, Highly Structured Stochastic Systems, Oxford University Press, Oxford, UK, pp. 45–81. Fisher, R. (1926), The Design of Experiments, 1st edn, Oliver & Boyd, Edinburgh. Fisher, R. (1970), The Design of Experiments, 8th edn, Hafner, New York. Ford, E., Smith, S., Stroup, D., Steinberg, K., Mueller, P. & Thacker, S. (2002), Homocysteine and cardiovascular disease: a systematic review of the evidence with special emphasis on case-control studies and nested case-control studies, International Journal of Epidemiology 31, 59–70. 35
Goldberger, A. (1972), Structural equation methods in the social sciences, Econometrica 40, 979–1001. Gray, R. & Wheatley, K. (1991), How to avoid bias when comparing bone marrow transplantation with chemotherapy, Bone Marrow Transplantation 7 (Suppl 3), 9–12. Greenland, S. (2000), An introduction to instrumental variables for epidemiologists, International Journal of Epidemiology 29, 722–729. Greenland, S., Robins, J. M. & Pearl, J. (1999), Confounding and collapsibility in causal inference, Statistical Science 14, 29–46. Heart Protection Study Collaborative Group (2002), MRC/BHF heart protection study of cholesterol lowering with simvastin in 20,536 high-risk individuals, Lancet 360, 7–22. Homocysteine Lowering Trialists’ Collaboration (1998), Lowering blood homocysteine with folic acid based supplements: meta-analysis of randomized controlled trials, BMJ 316, 894–898. Katan, M. (2004), Commentary: Mendelian randomization, 18 years on, International Journal of Epidemiology 33, 10–11. Katan, M. B. (1986), Apolipoprotein E isoforms, serum cholesterol, and cancer, Lancet i, 507–508. Keavney, B. D., Palmer, A., Parish, S., Clark, S., Youngman, L., Danesh, J., McKenzie, C., Delphine, M., Lathrop, M., Peto, R. & Collins, R. (2004), Lipid-related genes and myocardial infarcion in 4685 cases and 3460 controls: discrepancies between genotype, blood lipid concentrations, and coronary disease risk, International Journal of Epidemiology 33, 1002–1013. Keavney, B. D., Youngman, L., Palmer, A., Parish, S., Clark, S., Delphine, M., Lathrop, M., Peto, R. & Collins, R. (2000), Large-scale test of hypothesised associations between polymorphisms of lipid-related genes and myocardial infarcion in about 5000 cases and 6000 controls, Circulation 102 (Suppl 2), 852. Lauritzen, S. L. (2000), Causal inference from graphical models, in O. E. BarndorffNielsen, D. R. Cox & C. Kluppelberg, eds, Complex Stochastic Systems, Chapman & Hall, chapter 2, pp. 63–107. Lynch, M. & Walsh, B. (1998), Genetics and Analysis of Quantitative Traits, Sinauer Associates Inc., USA. Maddala, G. (1983), Limited-Dependent and Qualitative Variables in Econometrics, Cambridge University Press, Cambridge, U.K. Manski, C. F. (1990), Nonparametric bounds on treatment effects, American Economic Review, Papers and Proceedings 80, 319–323. 36
Minelli, C., Thompson, J., Tobin, M. & Abrams, K. (2004), An integrated approach to the Meta-Analysis of genetic association studies using Mendelian randomization, American Journal of Epidemiology 160, 445–452. Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Mateo, CA. Pearl, J. (1995), Causal diagrams for empirical research, Biometrika 82, 669–710. Pearl, J. (2000), Causality, Cambridge University Press. Peto, R., Doll, R., Buckley, J. & Sporn, M. (1981), Can dietary beta-carotene materially reduce human cancer rates?, Nature 290, 201–208. Robins, J. (1989), The analysis of randomized and nonrandomized aids treatment trials using a new approach to causal inference in longitudinal studies, in L. Sechrest, H. Freeman & A. Mulley, eds, Health Service Research Methodology. A Focus on AIDS, U.S. Public Health Service, Washington, D.C., pp. 113–159. Robins, J. (1997), Causal inference from complex longitudinal data, in M. Berkane, ed., Latent Variable Modeling with Applications to Causality, Springer-Verlag, New York, pp. 69–117. Scandinavian Simvastin Survival Study (4S) (1994), Randomised trial of cholesterol lowering in 4444 patients with coronary heart disease, Lancet 344, 1383–1389. Shipley, B. (2000), Cause and Correlation in Biology, Cambridge University Press. Spirtes, P., Glymour, C. & Scheines, R. (2000), Causation, Prediction and Search, 2nd edn, MIT Press, Massachusetts. Stanghellini, E. & Wermuth, N. (2005), On identification of path analysis models with one hidden variable, Biometrika 92, 337–350. Stanley, F., Blair, E. & Alberman, E. (2000), Cerebral Palsies: Epidemiology and Causal Pathways, Mac Keith Press, London, U.K. Thomas, D. & Conti, D. (2004), Commentary: The concept of ”Mendelian randomization”, International Journal of Epidemiology 33, 21–25. Thompson, J., Tobin, M. & Minelli, C. (2003), On the accuracy of the effect of phenotype on disease derived from Mendelian randomisation studies, Genetic Epidemiology Technical Report 2003/GE1, Centre for Biostatistics, Department of Health Sciences, University of Leicester, (http://www.prw.le.ac.uk/research/HCG/getechrep.html). Tobin, M., Minelli, C., Burton, P. & Thompson, J. (2004), Commentary: Development of Mendelian randomization: from hypothesis test to ”Mendelian deconfounding”, International Journal of Epidemiology 33, 26–29. 37
Van der Bom, J. G., Maat, M. D. & M.L. Bots et al (1998), Elevated plasma fibrinogen. cause or consequence of cardiovascular disease, Arteriosclerosis, Thrombosis and Vascular Biology 18, 621–625. Youngman, L., Keavney, B. D., Palmer, A., Parish, S., Clark, S., Danesh, J., Delphine, M., Lathrop, M., Peto, R. & Collins, R. (2000), Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial infarction and in 6002 controls: test of causality by Mendelian randomisation, Circulation 102 (Suppl.2), 31–32.
38