A cross-validation of risk-scores for coronary heart disease mortality ...

12 downloads 12861 Views 181KB Size Report
Feb 1, 2002 - A cross-validation of risk-scores for coronary heart disease mortality based on data from the. Glostrup Population Studies and Framingham.
© International Epidemiological Association 2002

Printed in Great Britain

International Journal of Epidemiology 2002;31:817–822

A cross-validation of risk-scores for coronary heart disease mortality based on data from the Glostrup Population Studies and Framingham Heart Study Troels F Thomsen,a Dan McGee,b Michael Davidsena and Torben Jørgensena

Background Due to marked regional differences in the incidence of coronary heart disease (CHD) in Europe, the recommendation by the European Society of Cardiology to use the Coronary Risk Chart based on data from the Framingham Heart Study, could be questioned. Methods

Data from two population studies (The Glostrup Population Studies, n = 4757, the Framingham Heart Study, n = 2562) were used to examine three different levels of cross-validation. The first level of examination was whether a risk-score developed from one sample adequately ordered the risk of participants in the other sample, using the Area Under a Receiver Operating Characteristic (AUROC) curve. The second level compared the magnitude of coefficients in logistic models in the two studies; while the third level tested whether the level of risk of CHD death in one sample could be estimated based on a risk function from the other sample.

Result

Coronary heart disease mortality was 515 per 100 000 person-years in Framingham and 311 per 100 000 person-years in Glostrup. The AUROC curve was between 75% and 77% and regardless of which risk-score was used. Logistic coefficients did not differ significantly between studies. The Framingham risk-score significantly overestimated the risk in the Glostrup sample and the Glostrup risk-score underestimated in the Framingham sample.

Conclusion

Using this Framingham risk-score on a Danish population will lead to a significant overestimation of coronary risk. The validity of risk-scores developed from populations with different incidence of the disease should preferably be tested prior to their application.

Keywords

Risk-score, validation, coronary heart disease mortality

Accepted

1 February 2002

The main purpose of coronary risk-scores is to assist the clinician in identifying those patients at highest level of coronary risk, reserving preventive measures for those individuals above a specified coronary risk. The guidelines from The European Society of Cardiology on Primary Prevention of Coronary Heart Disease (CHD) recommend that the Coronary Risk Chart based

a Copenhagen County Centre for Preventive Medicine, Medical Department

M, Glostrup University Hospital, DK-2600 Glostrup, Denmark. b Medical University of South Carolina, Biometry and Epidemiology,

Charleston, USA. Correspondence: Troels F Thomsen, Centre for Preventive Medicine, Medical Department M, Glostrup University Hospital, Building 8, 7th floor, DK-2600 Glostrup, Denmark. E-mail: [email protected]

on a Framingham risk-score is used in Europe for the estimation of individual level of coronary risk.1 If the estimated risk exceeds 20% over a 10-year period, risk reducing treatment should be initiated. However, the applicability of this Framingham risk function to a low-risk population was recently questioned. It was thus shown that the Framingham risk function markedly overestimated the level of coronary risk in an Italian population where the incidence of coronary events is one-third of the incidence in Framingham (220 versus 627 per 100 000 person-years).2,3 Although the guidelines assume that ‘the Framingham function predicts the level of risk reasonably well in high-risk populations‘,1 there is almost no evidence to support the use of this function in northern European populations. The mortality rates from ischaemic heart disease in Denmark are

817

818

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

approximately twice the rates of Italy (for men: 423 versus 224, for women: 145 versus 65 per 100 000 person-years) and it may therefore be questioned whether applying a Framingham riskscore to a Danish population also will lead to an overestimation of individual coronary risk.2

unknown value were furthermore excluded from the analysis, and all patients who previously had experienced a myocardial infarction were also excluded (n = 178), leaving 4757 individuals from the Glostrup cohort in this study.

Endpoint

Our purpose was not to validate one particular risk function from Framingham, but rather to ask whether the risk functions derived from one sample would be adequate for use in the other sample, when similar methodologies were applied to samples from Framingham and Glostrup. To do this, we derived risk functions from primary data from the two studies (Framingham and Glostrup) that were available to us at the time the analysis was conducted.

Endpoint in this validation was CHD mortality (ICD-8 codes 410–414). Mortality rather than morbidity data were used as the endpoint since the former were assumed to be more comparable between countries than the latter. Cause of death in the Framingham cohort was determined by a panel review of death certificates and other documentation available to study investigators while in the Glostrup cohort the national death certificate for underlying cause of death was used. The followup period was fixed at 10 years, thus only those participants dying of CHD within 10 years were considered to be events.

The Framingham Heart Study

Risk factors

The Framingham data for the present analysis stem entirely from the original cohort examined during the lipoprotein phenotyping project that corresponded approximately to the eleventh examination cycle (1971). At this examination 2788 individuals participated and it was the first of the Framingham cohorts that included lipid determinations, other than total cholesterol, for the participants. For some participants without records of smoking status at this examination, status at the next earliest examination was used; and for a small proportion of this group, the examination coincided with the 10th or 12th biennial examination of the cohort. Since a complete-case analysis was conducted, 107 participants with unknown values for at least one of the characteristics under consideration were excluded prior to analyses. We also excluded from the analysis all participants who previously had experienced a myocardial infarction (n = 119), leaving 2562 participants in our analysis. The analytical sample utilized is half of the sample used for the derivation of the most recently published Framingham risk function. That is, the recently published risk functions from Framingham are derived from a sample that included a pool of the data from the original Framingham cohort used here as well as data from the Framingham Offspring Study.4,5

The following variables were included in the analyses: sex, age, serum total cholesterol and high density lipoprotein (HDL) (mg/dl), smoking (self-reported: non versus current), systolic blood pressure (mmHg), and diabetes. All measurement techniques were comparable, however diabetes was established in Glostrup by the question ‘Has a doctor ever told you that you had diabetes?’ In Framingham it was defined as a random glucose .9 mmol/l and/or the use of diabetic treatment.7

The Glostrup Population Studies

Level 1

The Glostrup sample is a pool of five observational cohorts from The Glostrup Population Studies: (1) a cohort of individuals born in 1914 (n = 804, examined in 1984), (2) a cohort of individuals born in 1922, 1932, 1942, and 1952 (DAN-MONICA 1, n = 3785, examined in 1983), (3) a cohort of individuals born in 1926, 1936, 1946, and 1956 (DAN-MONICA 2, n = 1416, examined in 1983), (4) a cohort of individuals born in 1921, 1931, 1941, 1951 and 1961 (DAN-MONICA 3, n = 2026, examined in 1992), and finally (5) a cohort of individuals born in 1918, 1928, 1938, and 1948, (n = 928, examined in 1978). The Glostrup Population Studies have been described previously.6 As every person in the Danish population is identified by a unique registration number, linkage of individual information over time as well as linkage with national health registers is highly accurate. The pooled cohort covers a wide age range (30–70 years), but since we wished to compare Glostrup and Framingham over a similar age range, anyone less than 49 years was not included in these analyses. Eighty-three participants with at least one

Our first examination used the Area Under the Receiver Operating Characteristics (AUROC) curve. The AUROC curve measures the proportion of case/non-case pairs that are correctly ordered.8 The method takes, however, no account of the actual prevalence of the disease that is tested for.

Material

Methods The basic principle in the analysis was to fit a logistic regression model based on one cohort and then apply this model to the second cohort to obtain predicted probabilities for each member of the second cohort. Then, these predicted probabilities of CHD death were compared to whether the person actually died from CHD. All analyses were repeated, reversing the roles of the two cohorts, using the second cohort to fit the model and examining its validity when applied to the first cohort. All analyses were also repeated including people with existing CHD. The length of follow-up was fixed to 10 years. The cross-validation procedure progressed over three levels.

Level 2 The second examination compared the magnitude of the coefficients in a logistic model predicting CHD death in each of the cohorts. To compare the magnitude of the coefficients estimated in the logistic models, we used a Wald statistic to test whether coefficients differed in the two studies.

Level 3 This was the core analysis in which we examined (1) the observed and predicted number of cases and (2) the refinement (spread) and the calibration (accuracy). To calculate the predicted number of deaths each participant of the Glostrup

GLOSTRUP AND FRAMINGHAM RISK-SCORES

cohort was assigned a probability of CHD death based on the Framingham logistic. The sum of these estimated probabilities is the predicted number of deaths in Glostrup based on the Framingham risk-score. The refinement and the calibration were analysed using the methods described by Miller et al.9 in a two-step procedure. Initially, a logistic regression analysis predicting CHD death in one study was used to estimate the probability of CHD death in the other cohort. These predicted probabilities were then transformed to logits; these estimated logits were used as the only covariates in a logistic regression model predicting CHD in the second study. In the first step we tested whether the coefficient (β) associated with the predicted logit was one. If the coefficient (β) could be assumed to be one, we inferred that the spread of the risk estimates was correct and a second logistic analysis was conducted in which the coefficient of the predicted logit was fixed at 1. In this model we then tested if the constant term (α) was zero, i.e. a test of α = 0 given β = 1. If the constant term was not zero, we inferred that the level of probabilities estimated was not correct (a negative value indicated over-prediction and a positive value indicated underprediction).

Results Table 1 presents baseline characteristics of the Framingham and Glostrup cohorts. The Framingham sample is slightly older and has a higher average systolic blood pressure, and a higher prevalence of diabetes than the Glostrup sample. On the other hand, the Glostrup sample contains a higher proportion of males and smokers, and has, on average, higher serum cholesterol and higher HDL than the Framingham sample. All these differences are statistically significant. Table 2 shows the AUROC curve for the models developed from Framingham/Glostrup including the risk factors from

819

Table 1. There is practically no difference between the two models. The AUROC curve is 75–77% regardless of which risk function is used to calculate it. Table 3 presents the relative odds with 95% CI from the logistic model relating the specified characteristic to CHD mortality (Level 2). No differences in the effect of the included risk factors between the two studies are noted (P-values between 0.39 and 0.98). Tables 4 and 5 present the results of the final examination of validity when using each study’s model to predict CHD death in the other study. Each table presents three analyses with Table 4 testing the Framingham model against the Glostrup population and Table 5 testing the Glostrup model against the Framingham population. Table 4 presents the observed and predicted number of deaths in Glostrup. This Table demonstrates that the Framingham model predicts a larger number of cases for the Glostrup model (162) than were observed (148) and Table 5 shows that the Glostrup model predicts fewer cases (109) for the Framingham cohort than actually occurred (132). With 2562 individuals followed for 10 years with 132 events, the CHD mortality rates may be calculated as 515 per 100 000 personyears in Framingham. With the same calculations the rates were 311 per 100 000 person-years in Glostrup. Tables 4 and 5 also demonstrate that the coefficient associated with the logit is not significantly different from one (Table 4: 0.95 and Table 5: 1.03). Fixing this coefficient at one yields a constant term (α) significantly different from 0. This implies that the Framingham risk-score significantly overestimates the risk in the Glostrup sample (α = –0.28) and that the Glostrup riskscore significantly underestimates the risk in the Framingham population (α = 0.35). Inclusion of individuals with existing heart disease made no difference in this third level (α = –0.26 and α = 0.30, respectively).

Discussion Table 1 Distribution of risk factors in Framingham and Glostrup Framingham (n = 2562) Mean Males

SD

41.8%

Age

Glostrup (n = 4757) Mean

SD

50.1%

61.9

8.0

58.9

7.7

Systolic blood pressure (mmHg)

140.7

22.1

133.8

19.7

Cholesterol (mg/dl)

232.3

42.5

250.8

46.4

52.3

15.9

58.3

16.9

HDLa (mg/dl) Smoker

30.8%

51.1%

Diabetes

11.7%

3.3%

a High density lipoprotein-cholesterol.

All of the noted differences are significant, P , 0.001.

Table 2 Ability to rank individuals depending on model and sample in Glostrup and Framingham Model applied to: Framingham

Glostrup

AUROCa

SE

AUROC

SE

Framingham

0.77

0.02

0.75

0.02

Glostrup

0.77

0.02

0.76

0.02

Model sampled from:

a Area under a receiver operating characteristic curve.

While there are significant differences in the general risk profile between Glostrup and Framingham, the difference in relative risk between the two studies remains insignificant. This finding regarding the consistency of the relative risk estimations has been found before.10,11 Nonetheless, the magnitude of some of the coefficients differs somewhat. For instance, the risks associated with having diabetes tended to be lower in Framingham than in Glostrup. Although this finding may be accidental, it was unexpected since the Framingham diabetes cases are more likely to be ‘true’ cases given that they are based on a medical review rather than self-report. However, the coefficient for smoking is larger in Glostrup although smokers were based on self-report in both samples. Using cardiovascular death as the endpoint may raise some methodological questions. In Framingham, cardiovascular death was determined by a panel review of death certificates and other documentation available to the study investigators, while in Denmark determination was solely by death certificate. However, the bias introduced by using death certificates only, should be an overestimation of the true incidence of cardiovascular death in Glostrup since the diagnosis ischaemic heart disease is likely to be given too frequently as the cause of death.12 The observed difference between the two studies may therefore be even larger. The problem itself may also be of

820

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

Table 3 Comparison of relative odds from Framingham and Glostrup Framingham Relative odds

95% CI

Relative odds

95% CI

P-value

1 year

1.07

(1.04–1.09)

1.06

(1.04–1.09)

0.94

10 mmHg

1.19

(1.10–1.29)

1.17

(1.08–1.27)

0.75

20 mg/dl

1.12

(1.03–1.22)

1.15

(1.08–1.23)

0.65

Age Systolic blood pressure Cholesterol

Glostrup

Unit

HDLa

5 mg/dl

0.91

(0.84–0.97)

0.91

(0.85–0.96)

0.98

Smoking

yes

1.25

(0.82–1.91)

1.59

(1.12–2.25)

0.39

Diabetes

yes

2.36

(1.55–3.61)

2.62

(1.45–4.74)

0.78

Male

yes

2.64

(1.74–4.00)

2.12

(1.43–3.14)

0.45

a High density lipoprotein-cholesterol.

Table 4 Framingham prediction of Glostrup

Table 5 Glostrup prediction of Framingham

Expected CHDa deaths in Glostrup

162

Expected CHDa Deaths in Framingham

109

Observed CHD deaths in Glostrup

148

Observed CHD Deaths in Framingham

132

Beta

SE(Beta)

Fit of Framingham to Glostrup Logit Constant

0.95

0.09

–0.42

Test of constant given coefficient for logit is 1 Constant

Beta

SE(Beta)

Logit

1.03

0.10

Constant

0.44

Fit of Glostrup to Framingham

Test of constant given coefficient for logit is 1 –0.28

0.09

Constant

0.35

0.09

a Coronary heart disease.

a Coronary heart disease.

minor importance since the majority of cardiovascular deaths in Denmark occur within hospital and thus are more likely to be reviewed by one or more doctors. Finally, the mortality rates in Glostrup have been found in general to be similar to the rates in Denmark at large.13 The lack of difference in the ranking of individuals, using either the Framingham or the Glostrup model, implies that those at highest risk would be identified independently of the model that has been chosen. The proportion of individuals in Glostrup that would be estimated to be above the cut-point of 20% risk within 10 years will, however, be larger if a Framingham model is used than if a Glostrup model was used. The Framingham risk-score produced in this study, in general, overestimates the number of cases in the Glostrup sample and the Glostrup risk-score underestimates the number of cases in the Framingham sample. This is probably reflecting the higher incidence in Framingham, with a CHD mortality rate approximately 60% higher than the Glostrup population. Using the Coronary Risk Chart in Denmark may thus lead to an overestimation of risk and thereby a possible over-treatment with e.g. statins. This might have significant impact on national healthcare expenditures. The problem of the validity of risk-scores developed from other population samples than the one they are applied to has been examined in many different settings. An early paper by Keys et al.14 compared risk factors reported by several studies with data from four of the Pooling Project Studies of the American Heart Association15 (Pool 4) together with samples from the US Railroad Workers Study and the International Cooperative Study on Cardiovascular Epidemiology.16 The investigators concluded that the ordering of participants

was similar for the two functions. The magnitudes of the total number of expected cases, however, differed significantly, with the international coefficients under-predicting the American cohorts. The question on how much relative risk differs between populations has been investigated several times. Gordon et al.17 compared CHD rates for Framingham, Honolulu, and Puerto Rican study populations. The researchers concluded that the relative odds for Framingham, at the average values for risk factors, was about twice that of the other studies. Menotti et al.18 has compared coefficients from several cohorts (Seven Countries Study, the Italian RIFLE project and MRFIT) concluding that, with some exceptions, the coefficients of the participating studies were similar. The application of one risk score model to another population has also been examined to some extent. Kozarevic et al. compared samples from urban and rural areas participating in the Yugoslavia Cardiovascular Diseases Study to a contemporaneous cohort from Framingham. Multivariate logistic function coefficients were estimated and comparisons conducted. The researchers found a threefold increase in risk for Framingham over both urban and rural Yugoslav populations. Brand et al.19 compared earlier findings from the Framingham study to those of the Western Collaborative Group Study (WCGS), using published Framingham logistic coefficients to calculate probabilities of an event in WCGS. The investigators concluded that the risk calculated according to Framingham correlated well with actual risk. McGee et al.20 analysed data from five of the American Heart Association Pooling Project Studies15 to determine whether Framingham results could be used to predict CHD and death among these five studies. Goodness-of-fit statistics

GLOSTRUP AND FRAMINGHAM RISK-SCORES

were calculated using the Framingham and specific-study models, and the fits for all six models were similar. Finally, Leaverton et al.21 used the first 10-year follow-up of the NHANES I cohort to examine whether the Framingham risk model for CHD mortality could be applied to other studies. They observed an increase in level of risk according to decile of risk, regardless of the study from which the model was derived. In Denmark the Framingham coefficients have previously been compared with coefficients from one cohort in the Glostrup Population Studies (the cohort of individuals born in 1914).22 It showed that the Framingham coefficients generally tended to be higher than the coefficients from Glostrup. This is in line with this pooled analysis. The Copenhagen City Heart Study tested the Framingham Stroke risk-score and found that, in spite of similar stroke probabilities based on a point system from the two studies, a prognostic index could not be recommended for individual prediction because of large statistical uncertainty.23 None of these above-mentioned comparisons involved a systematic framework for judging the validity of the risk functions. Since original data were used in this study it was possible to control for e.g. differences in age and risk factor distribution which made it possible to examine the different dimensions of validity like the ordering of the risk estimates, calibration and refinement.

821

Conclusion The results of this analysis showed that a risk-score developed from a population with high risk ordered the individuals correctly when applied to a population with medium risk and vice versa. The relative risks in the two models did not differ significantly from each other. However, probably due to the differences in the incidence of the disease, the risk-score based on Framingham Heart Study predicted a significantly higher level of risks when applied to the Danish population and the Glostrup risk-score underestimated on Framingham. This suggests that using the Coronary Risk Chart on a Danish population, with a definition of high risk as above 20%, may reserve treatment for those at highest level of risk, but the risk among those treated will not necessarily be 20% or greater.

Acknowledgements This work was partially funded by grants from NIH, HL 61769 and The Danish Heart Foundation 2-F-22518, respectively. Data from the Framingham Heart Study were obtained from the National Heart, Lung, and Blood Institute. The views expressed in this paper are those of the authors and do not necessarily reflect the views of this agency.

KEY MESSAGES •

The risk-score from the Framingham Heart Study is recommended for individual risk assessment throughout Europe.



It has been shown that this risk-score overpredicts on low-risk populations.



This study compares a Framingham risk-score using CHD mortality with a medium-risk European population (the Glostrup Population Studies).



The Framingham risk-score in this study overpredicts on a Glostrup population and Glostrup risk-score underpredicts on the Framingham population.

References 1 Wood D, De Backer G, Faergeman O, Graham I, Mancia G. Prevention

of coronary heart disease in clinical practice. Atherosclerosis 1998;140: 199–270. 2 Menotti A, Puddu PE, Lanti M. Comparison of the Framingham

risk function-based coronary chart with risk function from an Italian population study. Eur Heart J 2000;21:365–70.

8 Hanley JA, McNeil BJ. The meaning and use of the area under a receiver

operating characteristic (ROC) curve. Radiology 1982;143:29–36. 9 Miller ME, Hui SL, Tierney WM. Validation techniques for logistic

regression models. Stat Med 1991;10:1213–26. 10 Tunstall-Pedoe H, Kuulasmaa K, Mahonen M, Tolonen H, Ruokokoski

diseases mortality in Europe. Task Force of the European Society of Cardiology on Cardiovascular Mortality and Morbidity Statistics in Europe. Eur Heart J 1997;18:1231–48.

E, Amouyel P. Contribution of trends in survival and coronary-event rates to changes in coronary heart disease mortality: 10-year results from 37 WHO MONICA project populations. Monitoring trends and determinants in cardiovascular disease. Lancet 1999;353: 1547–57.

4 Anderson KM, Wilson PWF, Odell PM, Kannel WB. An updated

11 van den Hoogen PC, Feskens EJ, Nagelkerke NJ, Menotti A, Nissinen

coronary risk profile. A statement for health professionals. Circulation 1991;83:356–62.

A, Kromhout D. The relation between blood pressure and mortality due to coronary heart disease among men in different parts of the world. Seven Countries Study Research Group. N Engl J Med 2000; 342:1–8.

3 Sans S, Kesteloot H, Kromhout D. The burden of cardiovascular

5 Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H,

Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation 1998;97:1837–47. 6 Schroll M, Jorgensen T, Ingerslev J. The Glostrup Population Studies,

1964–1992. Dan Med Bull 1992;39:204–07. 7 Kannel WB, McGee DL. Diabetes and glucose tolerance as risk factors

for cardiovascular disease: the Framingham study. Diabetes Care 1979; 2:120–26.

12 Juel K, Sjol A. Decline in mortality from heart disease in Denmark:

some methodological problems. J Clin Epidemiol 1995;48:467–72. 13 Andersen LB, Vestbo J, Juel K et al. A comparison of mortality rates in

three prospective studies from Copenhagen with mortality rates in the central part of the city, and the entire country. Copenhagen Center for Prospective Population Studies. Eur J Epidemiol 1998;14:579–85.

822

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

14 Keys A, Aravanis C, Blackburn H et al. Probability of middle-aged men

developing coronary heart disease in five years. Circulation 1972;45: 815–28. 15 The Pooling Project. The Final Report of The Pooling Project. J Chronic

Dis 1977;31:201–306. 16 Keys A, Aravanis C, Blackburn HW et al. Epidemiological studies

related to coronary heart disease: characteristics of men aged 40–59 in seven countries. Acta Med Scand Suppl 1966;460:1–392. 17 Gordon T, Garcia-Palmieri MR, Kagan A, Kannel WB, Schiffman J.

Differences in coronary heart disease in Framingham, Honolulu and Puerto Rico. J Chronic Dis 1974;27:329–44. 18 Menotti A, Keys A, Blackburn H et al. Comparison of multivariate

predictive power of major risk factors for coronary heart diseases in different countries: results from eight nations of the Seven Countries Study, 25-year follow-up. J Cardiovasc Risk 1996;3:69–75. 19 Brand RJ, Rosenman RH, Sholtz RI, Friedman M. Multivariate

prediction of coronary heart disease in the Western Collaborative

© International Epidemiological Association 2001

Printed in Great Britain

Group Study compared to the findings of the Framingham study. Circulation 1976;53:348–55. 20 McGee D, Gordon T. The results of the Framingham Study applied to

four other US based studies of cardiovascular disease. In: Kannel WB, Gordon T (eds). The Framingham Study. An Epidemiological Investigation of Cardiovascular Disease. DHEW Publication No. (NIH) 76–1083, 1976. 21 Leaverton PE, Sorlie PD, Kleinman JC et al. Representativeness of the

Framingham risk model for coronary heart disease mortality: a comparison with a national cohort study. J Chronic Dis 1987;40:775–84. 22 Schroll M, Larsen S. A ten-year prospective study, 1964–1974, of

cardiovascular risk factors in men and women from the Glostrup population born in 1914. Multivariate analyses. Dan Med Bull 1981; 28:236–51. 23 Truelsen T, Lindenstrøm E, Boysen G. Comparison of probability

of stroke between the Copenhagen City Heart Study and the Framingham Study. Stroke 1994;25:802–07.

International Journal of Epidemiology 2001;31:822–824

Commentary: The prediction of coronary heart disease risk in individuals— an imprecise science Peter Brindle and Margaret May

The paper by Thomsen and colleagues1 tells us that we should be cautious about applying a clinical decision rule developed in one population to another without first assessing its accuracy in that population. Unfortunately in the case of coronary risk prediction that is what has already happened. Coronary risk prediction using multiple risk factors has evolved as a method of helping clinicians prioritize prevention measures in patients who do not yet have overt cardiovascular disease. In the case of statins, restricting treatment to those individuals above an arbitrary level of risk, say 3% annual coronary heart disease (CHD) risk,2 has also provided a mechanism of limiting the costs of treating all those who might benefit (annual CHD risk ù0.6%) according to trial evidence.3 As treatment decisions are based on estimated levels of absolute risk, it is important that these estimates be as accurate as possible. The Framingham risk equations4 and the clinical decision rules (CDR) that have been derived from them are proposed as the mainstay of primary prevention of CHD in Europe.5 There are three steps in the development of a CDR: deriving the rule, testing the rule, and assessing the impact of the rule on clinical behaviour.6 Until the publication of Thomsen’s paper, which is Department of Social Medicine, University of Bristol, Canynge Hall, Whiteladies Road, Bristol BS8 2PR, UK.

concerned with the second step of this CDR development process, and despite there being evidence that the Framingham risk equations overestimate risk in southern European countries,7 there has been no study that has adequately investigated the validity of any Framingham-based risk equation in a northern European country. When testing the performance of a clinical prediction model in different populations, the accuracy of the predicted probability has two components (calibration and discrimination) which both need to be assessed.8 A well-calibrated model has predictions that are neither too high nor too low i.e. the baseline risk is correctly assessed. A model that discriminates well ranks individual risk in the correct order, i.e. it has high sensitivity and specificity. Using both validation criteria, the three studies that are usually cited as evidence of Framingham’s validity in northern European populations do not provide sufficiently robust evidence upon which the CHD primary prevention programme of Europe should be based. In a theoretical modelling exercise, Haq et al. concluded that there was moderate agreement between Framingham and other northern European functions in the prediction of risk for individuals but they did not test whether any of the functions predicted observed events accurately.9 In an evaluation by the West of Scotland Coronary Prevention Study Group it was

GLOSTRUP AND FRAMINGHAM RISK-SCORES

simply stated that the observed incidence of CHD events in the placebo arm of a statin trial ‘was close to that predicted by the Framingham regression function’.10 However, no numerical comparisons were supplied and no test of discrimination was performed. In contrast to the other two studies, Ramachandran et al. did use observed events in a non-trial population.11 They concluded that although there was no significant difference between the observed and predicted event rate in the higher risk population (.1.5% per year), Framingham underestimated the event rate in those at lower risk. Direct comparisons with Framingham were difficult as smoking was classified differently and for a rule designed to stratify individuals to different levels of risk, absence of any assessment of discriminatory ability left this attempt at validation lacking. Thomsen and colleagues have been more complete in their assessment of the generalizability of a Framingham model. They have shown by using observed events, a Framingham model performs adequately compared with the Glostup model in the ranking of individuals and in the comparison of the relative risks of the well-known risk factors. However, the Framingham model consistently over-predicted risk in this Danish population because of the different baseline survival rates of the two populations. The question that primary care practitioners in northern Europe want to know is ‘do our Framingham-based decision aids predict accurately in our practice population those individuals above and below a risk threshold for treatment?’ Thomsen and colleagues do not claim to answer this question, and they cannot do so for three main reasons. First, the Framingham data they used are different from those used in the derivation of the current risk scoring methods. Second, Thomsen defined ‘CHD death’ as the outcome and not ‘CHD events’ that is the endpoint used by the clinical prediction tools. Third, different statistical techniques were used in the original Framingham derivation and in the models developed for this validation exercise. Since the baseline risk of CHD varies between countries there can be no single risk assessment equation that is valid throughout Europe. Even within a country important variation in CHD rates can occur.12 Furthermore, the incidence of CHD in parts of the developed world has declined considerably since the 1970s so any prediction method based on ‘old’ data will usually over-predict risk in an individual today. There is also additional uncertainty with patients who, unlike the Framingham study population are not white, are not between the ages of 30 and 74 or have a family history of CHD. Moreover, clinicians and patients are unaware of the precision of the risk estimates as no confidence intervals are given around the predicted risks. The accuracy of the Framingham-based decision aids is further reduced because they are difficult to use in practice. McManus et al. found that only a fifth of patient records had all the information required to assess CHD risk and even when the information was complete, risk calculations made by general practitioners and practice nurses were only moderately accurate when compared to the ‘gold standard’ which the authors assumed to be the Framingham risk function.13 The authors concluded that adequate training is required to use these risk calculation rules, but even experts make mistakes. For example, authors of the British recommendations state that an absolute

823

risk of non-fatal myocardial infarction or coronary death of ù30% over 10 years should be identified and treated with statins. However, they appeared to be unaware that the Framingham equations used in CDR predict a much wider outcome (i.e. coronary death, clinical non-fatal myocardial infarction, electrocardiographic myocardial infarction, physician assessed angina, and coronary insufficiency) which includes at least 50% more events, thereby increasing the absolute rate by the same amount.14 A correction has now been issued.15 Also recent guidance issued to all general practitioners in the UK on how to use the coronary risk prediction charts accurately16 stated that smokers who had quit in the last 5 years should be regarded as current smokers in the risk calculation. The definition of current smokers in the Framingham study, upon which these coronary risk prediction charts are based, includes those who have quit in the last year but not any earlier.17 Basing the primary prevention of CHD in individuals using coronary risk estimates from Framingham-derived risk estimates may not be as accurate or as easy as it is often thought to be. Despite the growing safety record and anticipated falling prices of statins, some appropriate measure of absolute risk will still remain useful to calculate the numbers needed to treat and for the evaluation of the cost-effectiveness of interventions. However, it is becoming clearer that a single risk function derived in one place at a particular time may not be applicable elsewhere. There is a need for more subtle and, perhaps, locally derived prediction rules which can be adjusted for geographical and temporal factors and are easy to apply in practice.

References 1 Thomsen TF, McGee D, Davidsen M, Jørgensen T. A cross-validation

of risk-scores for CHD mortality based on data from the Glostrup Population Studies and Framingham Heart Study. Int J Epidemiol 2002;31:817–22. 2 Department of Health. National Service Framework for Coronary Heart

Disease. London: Department of Health, 2000. 3 Downs JR, Clearfield M, Weis S et al. Primary prevention of acute

coronary events with lovastatin in men and women with average cholesterol levels: results of AFCAPS/TexCAPS. Air Force/Texas Coronary Atherosclerosis Prevention Study. JAMA 1998;279: 1615–22. 4 Anderson KM, Odell PM, Wilson PW, Kannel WB. Cardiovascular

disease risk profiles. Am Heart J 1991;121:293–98. 5 Wood D, De Backer G, Faergeman O, Graham I, Mancia G, Pyorala K.

Prevention of coronary heart disease in clinical practice: recommendations of the Second Joint Task Force of European and other Societies on Coronary Prevention. Atherosclerosis 1998;140: 199–270. 6 McGinn TG, Guyatt GH, Wyer PC, Naylor CD, Stiell IG, Richardson

WS for the Evidence-Based Medicine Working Group. User’s Guides to the Medical Literature. XXII: How to use articles about clinical decision rules. JAMA 2000;284:79–84. 7 Menotti A, Puddu PE, Lanti M. Comparison of the Framingham risk

function-based coronary chart with risk function from an Italian population study. Eur Heart J 2000;21:365–70. 8 Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of

prognostic information. Ann Intern Med 1999;130:515–24. 9 Haq IU, Ramsay LE, Yeo WW, Jackson PR, Wallis EJ. Is the

Framingham risk function valid for northern European populations? A comparison of methods for estimating absolute coronary risk in high risk men. Heart 1999;81:40–46.

824

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

10 The West of Scotland Coronary Prevention Study Group. Baseline risk

factors and their association with outcome in the West of Scotland Coronary Prevention Study. Am J Cardiol 1997;79:756–62. 11 Ramachandran S, French JM, Vanderpump MP, Croft P, Neary RH.

Using the Framingham model to predict heart disease in the United Kingdom: retrospective study. BMJ 2000;320:676–77. 12 Morris R, Whincup PH, Lampe F, Walker M, Wannamethee G, Shaper

and nurses using different calculation tools in general practice: cross sectional study. BMJ 2002;324:459–64. 14 Lampe FC, Walker M, Shaper AG, Brindle PM, Whincup PH, Ebrahim

S. Endpoints for predicting coronary risk must be clarified. BMJ 2001;323:396. 15 Correction. Joint British recommendations on prevention of coronary

heart disease in clinical practice: summary. BMJ 2001;323:780.

AG. Geographical variation of incidence of coronary heart disease in Britain: the contribution of established risk factors. J Epidemiol Community Health 2000;54:787–88.

16 British Heart Foundation. How to use the coronary risk prediction

13 McManus RJ, Mant J, Meulendijks CFM et al. Comparison of

coronary risk profile—a statement for health professionals. Circulation 1991;83:356–62.

estimates and calculations of risk of coronary heart disease by doctors

charts for primary prevention. Factfile 01/2002. 17 Anderson KM, Wilson PWF, Odell PM, Kannel WB. An updated