British Journal of Anaesthesia 114 (1): 32–43 (2015) Advance Access publication 17 September 2014 . doi:10.1093/bja/aeu294
Predicting perioperative mortality after oesophagectomy: a systematic review of performance and methods of multivariate models I. Warnell1*, M. Chincholkar2 and M. Eccles 3 1
Department of Anaesthesia, Royal Victoria Infirmary, Newcastle upon Tyne NHS Foundation Trust, Queen Victoria Road, Newcastle upon Tyne NE1 4LP, UK 2 Department of Anaesthesia, Salford Royal NHS Foundation Trust, Stott Lane, Salford M6 8HD, UK 3 Institute of Health and Society, Newcastle University, The Baddiley-Clark Building, Richardson Road, Newcastle upon Tyne NE2 4AX, UK * Corresponding author. E-mail:
[email protected]
Editor’s key points † The authors systematically reviewed the prediction of mortality risk after oesophagectomy for cancer. † They found generally unsatisfactory performance in commonly used models, and recommend further work in developing and validating new prediction models via large data sets.
Summary. Predicting risk of perioperative mortality after oesophagectomy for cancer may assist patients to make treatment choices and allow balanced comparison of providers. The aim of this systematic review of multivariate prediction models is to report their performance in new patients, and compare study methods against current recommendations. We used PRISMA guidelines and searched Medline, Embase, and standard texts from 1990 to 2012. Inclusion criteria were English language articles reporting development and validation of prediction models of perioperative mortality after open oesophagectomy. Two reviewers screened articles and extracted data for methods, results, and potential biases. We identified 11 development, 10 external validation, and two clinical impact studies. Overestimation of predicted mortality was common (5–200% error), discrimination was poor to moderate (area under receiver operator curves ranged from 0.58 to 0.78), and reporting of potential bias was poor. There were potentially important case mix differences between modelling and validation samples, and sample sizes were considerably smaller than is currently recommended. Steyerberg and colleagues’ model used the most ‘transportable’ predictors and was validated in the largest sample. Most models have not been adequately validated and reported performance has been unsatisfactory. There is a need to clarify definition, effect size, and selection of currently available candidate predictors for inclusion in prediction models, and to identify new ones strongly associated with outcome. Adoption of prediction models into practice requires further development and validation in well-designed large sample prospective studies. Keywords: oesophagectomy; postoperative complications, mortality; risk assessment
The UK government has put the provision of information to facilitate patient choice of treatment and provider at the centre of its vision for the NHS.1 2 For oesophagectomy, perioperative morbidity and mortality rates are likely to feature in this information as reported in-hospital mortality is around 5%,3 4 major complication rates up to 60%, and there is a possibility of reduced quality of life in the postoperative period.5 Unadjusted mortality rates for individual surgeons, who carry out oesophagectomy, are also now publicly available.6 Risk prediction models may allow a risk-stratified and more suitable comparison of service providers and also assisting individual choice of treatment. However, these stratifiers can only be considered for general use if they have been shown to be reliable, can contribute clinical benefit to patient care, and are ‘transportable’ to new settings.7 8 Currently, available prediction models of perioperative mortality for oesophagectomy are not widely used, because it is not clear that they fulfil the above criteria.
Clinicians assess a range of potential comorbidities when providing prognostic information, and therefore, successful prediction models should probably also reflect the multifactorial nature of outcome prediction.9 Therefore, in this review, we focus on the multivariate models which have been used for this purpose. In a descriptive review of some models, Shende and colleagues10 reported poor validation and performance, and Dutta and colleagues reported overestimation of mortality in a quantitative data synthesis of POSSUM (Physiological and Operative Severity Score for the enUmeration of Mortality and Morbidity)11 models in a mixed gastric and oesophageal cancer cohort.12 To our knowledge, there are no current systematic reviews of methodology and performance of available prediction models of perioperative mortality after oesophagectomy. The methods for studying and reporting multivariate prediction models have been well described,7 13 – 15 as have
& The Author 2014. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email:
[email protected]
BJA
Predicting mortality after oesophagectomy
causes of poor performance.15 In this systematic review, we aim to report the performance of currently available clinical multivariate prediction models and to report recognized sources of methodological bias, which could contribute to impaired performance.
Methods
large databases. ‘High volume’ was defined as 10 or more cases annually, based on approximating Killeen and colleagues’25 definition of eight or nine cases required annually to reduce mortality by one case per year. Annual volume was estimated by dividing the reported total operating load by the duration of the study period. Studies were confined to English language reports.
This systematic review was carried out in accordance with guidelines published in the PRISMA statement.16
Search strategy
Inclusion criteria for primary studies Studies of development, validation in new patient groups, or clinical impact of multivariate prediction models of perioperative mortality were included. The study population included adult patients, who underwent elective open surgical resection of oesophageal cancer. Studies of laparoscopic, thoracoscopic, minimally invasive, and endoscopic techniques were excluded. Perioperative mortality was defined as ‘all cause’ mortality associated with the hospital admission for oesophagectomy (‘in-hospital’ mortality), or 30 day ‘all cause’ mortality.
Selection filters Reported perioperative mortality from oesophagectomy has decreased from 72% in 194117 to 2.9% currently.18 This trend has been observed across European, American, and Far Eastern centres.19 – 24 This review was intended for contemporary practice; therefore, we only included studies that were published after 1990. Improved outcome has also been associated with ‘higher volume’ centres;19 22 25 – 28 therefore, we included only studies from ‘high volume single centres’ or results from
Medline and Embase were searched from 1990 to 2012, and hand searches were made of reference lists from primary research studies, review articles,10 29 and standard texts.30 The search strategy used the ‘AND’ logical operator to combine population definition (e.g. oesophagectomy), study type (e.g. cohort study), and a combination of outcome (e.g. mortality) ‘OR’ prognostic testing (e.g. prediction). The full search strategy is available in Supplementary material.
Study selection and data extraction Two reviewers (I.W. and M.C.) screened titles and abstracts from potentially relevant studies and examined full-text versions of selected articles for inclusion criteria. The selection process is summarized in Figure 1. Data items were extracted into an Excel spreadsheet by one reviewer (I.W.) and validated by the second (M.C.); ‘potential for bias’ items were extracted and compared independently by both reviewers. Disagreements were resolved by consensus. The following study characteristics were extracted: study period, geographical location, data source (e.g. population database, clinical centre), modelling and validation methods, sample size, case mix descriptors (e.g. surgical procedure,
17 939 studies from Medline and Embase deduplication 13 744 studies from Medline and Embase
526 abstracts from database searches and reference lists from other sources
137 full-text articles retrieved for full examination
Studies of multivariate prediction models included. Studies of effects of individual candidated predictors excluded 117 studies did not fulfil inclusion criteria
20 studies fulfilled inclusion criteria
11 clinical prediction models
10 validation studies
2 clinical impact studies
Note: 23 separate studies were reported in 20 articles
Fig 1 Flow chart of selection process for included studies.
33
BJA
Warnell et al.
tumour histology), perioperative mortality definition, and individual predictor descriptions. We also extracted performance items which measured how accurately outcome was predicted (calibration) and how well models could discriminate between survivors and non-survivors.
Potential sources of bias in primary studies Items used to assess potential for bias in clinical prediction models were adapted from Hayden and colleagues’14 study of the reporting of potential risk of bias in systematic reviews of prognostic studies. Items relating to confounding variables were not included as the selection of candidate predictors for inclusion into models was not the subject of this review (Table 1).
Quantitative data synthesis We considered attempting to synthesize summary statistics of discrimination and calibration, which could be applied generally to other populations. However, the variety of study designs, case mixes, and reported summary statistics would have resulted in very few relevant data points, for which a summary statistic may have been inappropriate and misleading.
Results Included studies Twenty studies met the inclusion criteria (Table 2). Eleven studies developed clinical prediction models,4 31 – 40 1031 41 – 49
Table 1 Extracted items for main areas of potential bias. Methodology adapted from Hayden and colleagues.14 Scoring criteria: M, fully met; P, partially met; N, not met; U, unclear or unknown; NA, not applicable Main category of potential bias
Scoring items to assess potential for bias
Scoring method
The sample adequately represents the population of interest
Reported exclusions from surgery in eligible patients (e.g. unfit for surgery)
Excluded cases described and quantified, M; reported but not quantified or reasons not given, P; not reported, N; unclear, U All oesophagectomies included, M; reasons for exclusion from sample reported, P; otherwise U All characteristics described, M; partially described, P; not described, N; unclear, U
Sample data include all patients undergoing oesophagectomy during the reporting period Sample characteristics are described adequately to apply them to the population of interest, e.g. age, gender, tumour histology and stage, surgical procedure, surgical operative volume, geographical location, period of study, overall study mortality rate The data represent the sample
Follow-up rate is reported
Prospective (e.g. database) or retrospective (e.g. clinical record review) data collection Evidence of data validation
Missing values reported Description of missing value handling
Data audit or double entry described, M; partial validation, e.g. data cross-checked with more than one database, P; not stated or not done, N; unclear, U Missing values reported, M; deducible from article or partially stated, P; no report or unclear, U No missing values or, acceptable missing value procedure reported, M; some information given, P; no report or unclear, U
Transportable predictors to new patient group (clearly defined and easily and reliably predictor)
Adequate description of predictor
Outcome adequately measured
Outcome defined
Period of follow-up to perioperative mortality clearly defined, M; deducible from text, P; not stated, N; unclear U
Appropriate data analysis
Description of appropriate statistical model
Selection of statistical model and variables is appropriate M; inappropriate model, N; unclear, U For validation models: discrimination and, calibration reported, M; some elements of above, P; unclear, unavailable U Adequate model description and presentation of results M; model described but incomplete details, P; inadequate information or unclear, U At least 10 outcome events for each predictor in regression models M; sample too small, N; unclear, U
Continuous variables handled appropriately
Sufficient information given to assess adequacy of analysis Adequate sample size
34
Number of survivors and deaths separately reported, M; follow-up rate deducible from article, P; unreported, or unknown, U Prospective, P; retrospective, R; unclear or unknown, U
Predictor defined, M; some predictors described and or ‘transportable’, P; predictors not defined, N; unclear, U Continuous variables used, M; predefined cut points with rational basis, P; ‘data-driven’ cut points, N; unclear, U
BJA
Predicting mortality after oesophagectomy
Table 2 Characteristics of included studies. SEER, Surveillance, Epidemiology and End Results Medicare database; ACS-NSQIP, American College of Surgeons National Surgical Quality Improvement Program Author, study period
Study design
Total sample size
Geographical location
Source of data
Steyerberg31 1991 –2002
Prediction model and validation
n¼3592
USA/The Netherlands
Population databases and clinical centre Modelling sample: USA SEER (1991– 6) database Validation sample: USA SEER (97 – 99), Eindhoven registry (1993 –2001), Rotterdam hospital (1980 –2002)
Ra32 1997 –2003
Prediction model and internal validation
n¼1162
USA
Population database, SEER Medicare database
Tekkis33 1994 – 2000
Prediction model and internal validation
n¼1042 (538 oesophagectomies)
UK
Regional and national clinical databases (36 centres) of gastrectomy and oesophagectomy
Bartels34 1982 –1996
Prediction model, internal validation, and clinical application
n¼805
Germany
Single centre
McCulloch P4 1999 –2002
Prediction model and internal validation
n¼995
UK
Subset of ASCOT National database (multiple centres reporting gastric and oesophageal surgery)
Bailey35 1991 – 2000
Prediction model and internal validation
n¼1777
USA
Population database. Data submitted from 109 Veterans Affairs medical centres, USA
Law36 1982 –1992
Prediction model
n¼523
Hong Kong
Single centre
Liu37 1994 – 7
Prediction model
n¼32
Australia
Single centre
Sanz38 1987 – 1999
Prediction model
n¼114
Spain
Single centre
Dhungel39 2005 –8
Prediction model
n¼1032
USA
ACS-NSQIP database
Zhang40 1986 –9
Prediction model, validation, and clinical application
n¼162
Japan
Single centre
Schroder41 1997 –2002
Predictor effect study, external validation
n¼126
Germany
Single centre
Lai42 2001 –5
External validation
n¼545
Hong Kong
Administrative database (data submitted from 14 centres)
Nagabhushan43 1990 –2002
External validation
n¼313
UK
Single centre
Lagarde44 1993 –2005
External validation
n¼663
The Netherlands
Single centre
Zafirellis45 1990 – 9
External validation
n¼204
UK
Single centre
Zingg46 1990 – 2007
External validation
n¼346
Australia, The Netherlands, Switzerland
Two centre
Single centre
Bosch47 1991 – 2007
External validation
n¼280
The Netherlands
Ball48 2 yr period
External validation
n¼53
UK
Two centre
Dutta49 2005 –9
External validation
n¼121
UK
Single centre
evaluated prediction models on new data sets (external validation), and two34 40 reported clinical impact studies.
Excluded studies The search strategy retrieved many studies of individual candidate predictor effects (e.g. age), but these were excluded from this review. Studies of thoracoscopic procedures were excluded,50 as were studies of mixed surgical caseloads if summary statistics and results for oesophagectomies were unavailable.51 – 53 Studies with after operation measured predictors54 or unclearly reported perioperative mortality were also excluded.55 – 59
Clinical prediction models Four models were developed on data from the USA,31 32 35 39 two from the UK4 33 and one each from Germany,34 Spain,38 Hong Kong,36 Australia,37 and Japan (Table 3).40 Six models were developed on data from medium to large databases.4 31 – 33 35 39 Bailey and colleagues,35 Ra and colleagues,32 and Steyerberg and colleagues31 used data from US population databases. Bailey and colleagues35 used 1777 records of the Veterans Affairs National Surgical Improvement Program. Ra and colleagues32 and Steyerberg and colleagues31 used 1172 and 1327 records, respectively, from the Surveillance, Epidemiology and End Results (SEER) Medicare database.
35
BJA
Warnell et al.
Table 3 Methods, performance, and predictors used in development of clinical prediction models of perioperative mortality after oesophagectomy. H – L, Hosmer – Lemeshow; O:E, observed to expected ratio; CCI, Charlson comorbidity index;60 BUN, blood urea nitrogen Study
Modelling method
Sample size (n), number of deaths
Validation results (2 decimal places)
Predictors included in model
Steyerberg and colleagues31
Logistic regression, bootstrap internal validation. Generation of simple risk score
n¼1327, 147 deaths
Discrimination: area under ROC in modelling cohort 0.66; 0.65 on internal validation
Age categories (50 –65, 66 –80, .80), comorbidities (cardiorespiratory, diabetes, hepatic, renal), neoadjuvant therapy, hospital surgical volume
Ra and colleagues32
Multivariate logistic regression. Generation of six-point risk score
n¼1172, 160 deaths
Predicted and observed mortality reported for modelling sample. Predicted high-risk group 29.8% vs observed 22% (sparse data)
Age over 80, modified Charlson score,60 hospital surgical volume
Tekkis and colleagues33
Multiple logistic regression on 70% of sample. Individual centres accounted for in multilevel model (m). Validation on 30% and comparison with P-POSSUM
n¼1042, 125 deaths. Combined sample of oesophagectomy (538) and gastrectomy
Discrimination: C-index: P-POSSUM 0.74; O-POSSUM 0.75; multilevel O-POSSUM (m) 0.80 Calibration: H-L, P value: P-POSSUM, P,0.01; O-POSSUM, P¼0.23; O-POSSUM, (m), P¼0.25 O:E ratio: P-POSSUM 1.21; O-POSSUM 1.03; O-POSSUM (m) 1.04
Physiological POSSUM, age, urgency, surgical procedure, POSSUM tumour stage
Bartels and colleagues34
Modelling (1982 –1991), validation (1992 –3), clinical application (1994 –6). Predictors stratified and modelled (discriminant analysis) against outcome (‘normal’, ‘prolonged’, ‘severe’, ‘fatal’)
Modelling: n¼432, 43 deaths Validation: n¼121, 9 deaths Application: n¼252, 4 deaths
Model: Low, moderate, and high-risk groups for 30 day mortality (3.6%, 8.7%, and 28%) Validation: predicted risk groups of 2%, 5%, and 25% Clinical practice: Reduction in mortality from 9.4% to 1.6% after application
Karnofsky index,61 spirometry, arterial PO2, aminopyrine breath test, cirrhosis, cardiac function (cardiologist impression)
McCulloch and colleagues4
Logistic regression. Validation on mixed oesophagogastric sample from final year of study
Modelling: n¼773, about 97 deaths Validation: n¼222, about 16 deaths
Discrimination: C-index 0.79 in modelling sample and 0.68 in validation Calibration: O:E ratio 1.04 (H –L, P¼0.5) in modelling. 0.82 (H –L, P¼0.49) in validation
Physiological POSSUM, surgeon’s assessment of fitness for surgery (‘fit’, ‘significant comorbidity’, ‘comorbidity serious risk to postoperative survival’) tumour stage, operation
Bailey and colleagues35
Multivariate logistic regression
n¼1777, 174 deaths
Discrimination: C-index 0.69 in modelling sample Calibration: H – L (P¼0.93) in modelling sample
Age, diabetes, ‘functional’ status, neoadjuvant, BUN, alcohol intake, ascites, alkaline phosphatise
Law and colleagues36
Discriminant analysis to select risk predictors. Three level risk (7%, 30%, and 38% mortality) score
n¼523, 81 deaths
Sensitivity 0.72, specificity 0.74, overall accuracy 0.74 in modelling sampling
Age, mid-arm circumference, operative blood loss, spirometry, abnormal chest X-ray, curative vs palliative procedure
Liu and colleagues37
Multiple regression; stratified three levels of risk (mortality 50%, 27%, and 8%). (Sample of 32 selected from total 70)
n¼32, 8 deaths
No validation
Hypertension, smoking, spirometry
Sanz and colleagues38
Discriminant analysis to generate three level mortality risk: ‘low’ (6.8%), ‘intermediate’ (12.5%), ‘high’ (50%)
n¼114, 14 deaths
No validation
Previous cancer, cirrhosis, abnormal spirometry, cholesterol, albumen
Dungel and colleagues39
Multivariate logistic regression
n¼1032, 30 deaths
Not reported
Diabetes, dyspnoea, age
Continued
36
BJA
Predicting mortality after oesophagectomy
Table 3 Continued Study
Modelling method
Sample size (n), number of deaths
Validation results (2 decimal places)
Predictors included in model
Zhang and colleagues40
Logistic regression to develop risk score (1986– 1990). Validation sample from same centre 1990 – 1
Modelling: n¼100, 13 deaths Validation: n¼62, 2 deaths
Modelling: sensitivity 0.75, specificity 0.99 Validation: sensitivity 0.33, specificity 0.98
Oral glucose tolerance test, tumour stage, age, abnormal ECG, creatinine clearance, surgical procedure
Tekkis and colleagues33 developed the O-POSSUM from 1042 records (538 oesophagectomies) and McCulloch and colleagues4 used 995 from the UK ASCOT database and the Risk Scoring Collaborative. Bartels and colleagues34 used 432 records, Law and colleagues36 used 523, Sanz and colleagues38 used 114, and Liu and colleagues37 used 32 in single-centre studies. The outcome event was ‘in hospital’ mortality in four studies,4 33 37 38 30 day mortality in two,31 35 both of these in two,32 36 30 and 90 day mortality in Bartels and colleagues’ model,34 and 45 day mortality in Zhang and colleagues’ model.40 Mortality was not ‘time defined’ by Dhungel and colleagues.39 There was considerable variation in candidate predictor representation. For example, age was coded as a continuous variable,33 ordered age group categories,31 and an octogenarian subgroup.32 Nutritional state was represented by weight loss,34 serum albumin,34 38 and skin fold thickness.36 Some comorbidity was also included within composite general health or comorbidity scores such as the Karnovsky34 61 or Charlson32 60 scores. Some scores were entirely subjective classifications, for example, ‘physician assessment of cardiac risk’,34 surgeon classification of fitness for surgery,4 and some items from the POSSUM scoring systems (e.g. the radiological and respiratory comorbidity scores). Models were developed using logistic regression by all studies except for three, which used discriminant analysis.34 36 38 All studies except Steyerberg and colleagues’31 used some ‘data driven’ statistical methods to select predictors for the final model. These included univariate selection of statistically significant predictors,32 33 36 39 40 ‘stepwise’ elimination in logistic regression,4 35 and ‘data driven’ cut-offs to define predictors.34 36 38 Steyerberg and colleagues,31 Ra and colleagues,32 and Bartels and colleagues34 used logistic regression equations to create simplified scoring systems. The larger database studies reported about 10 or more deaths for each predictor screened.4 31 – 33 The exception was Bailey and colleagues,35 who used stepwise regression to select from 122 candidate predictors in a model with about 170 fatalities. Smaller studies from single centres had much lower event to predictor ratios,34 37 38 40 and therefore were more susceptible to overfitting. Investigators used a variety of methods to test robustness of models on the development data sets. Steyerberg and colleagues31 used bootstrap methods and Ra and colleagues,32
Bailey and colleagues,35 and Law and colleagues36 examined model fit on the development data (‘apparent’ validation).62 Tekkis and colleagues33 and McCulloch and colleagues4 used split samples for development and validation. Bartels and colleagues34 and Zhang and colleagues40 prospectively validated their models on patients from subsequent periods. Liu and colleagues37 and Sanz and colleagues38 did not formally examine model performance. In development studies, discrimination was moderate with area under the receiver operator curves (ROC) ranging from 0.6531 to 0.797.33 Steyerberg and colleagues31 reported good calibration, but generally predictions were reported to overestimate mortality.4 32 33
Studies of model performance in new populations (external validation) Ten authors validated prediction models in new patient samples (Table 4). POSSUM-based models were the most extensively studied, but the models of Ra and colleagues,32 Bartels and colleagues,34 and Steyerberg and colleagues31 have also been validated. A variety of performance measurements were reported including overall observed to expected mortality ratio,42 44 48 standardized mortality ratios,43 45 47 49 tabulated or graphical calibration of predicted risk levels,41 – 45 47 and goodness of fit statistics.42 – 45 47 Discrimination was assessed most frequently using ROC curves.42 – 45 47 49 Zingg and colleagues reported summary statistics from a logistic regression of the predicted mortality from original models against observed mortality for various outcomes in four different models, but did not report standard values for calibration or discrimination. Validation study sample sizes were small containing between five49 and 3243 fatalities. Only Steyerberg and colleagues31 validated a model on a large data set (291 deaths). Seven studies evaluated ‘POSSUM’ models.42 – 45 47 – 49 Overestimation was common in all POSSUM models, but the P-POSSUM generally performed best with prediction errors ranging from 5% underestimate to 40% overestimate.33 42 43 47 49 The O-POSSUM overestimation ranged up to 200%.42 – 45 47 49 Discrimination was moderate and ranged from 0.644 to 0.776.42 Schroder and colleagues41 evaluated Bartels and colleagues’ model34 on 126 patients. Discrimination and calibration were not formally studied, but Schroder and colleagues’ predicted ‘high’ risk group observed 16.7% mortality, lower than
37
BJA
Warnell et al.
Table 4 External validation studies: methods and results (rounded two decimal places). SMR, standardized mortality ratio; H –L Hosmer – Lemeshow goodness of fit; O:E, observed to expected ratio; CCI, Charlson comorbidity index; ACCI, age adjusted Charlson comorbidity index63
38
Study
Study design
Sample size (n) and number of deaths
Calibration
Discrimination
Lai and colleagues42
Comparison of O-, P-, and original POSSUM
n¼545, 30 deaths (5.5%)
Overall predicted mortality and x 2 lack of fit (P-value): POSSUM 15% (,0.01) O-POSSUM 10.9% (,0.01) P-POSSUM 4.7% (0.81) Note: All overpredicted over whole risk range but P-POSSUM most accurate
Area under the ROC curve: POSSUM 0.78 P-POSSUM 0.78 O-POSSUM 0.68
Nagabhushan and colleagues43
Comparison of O- & P-POSSUM
n¼313, 32 deaths
SMR (P-value for H – L goodness of fit): P-POSSUM 0.89; (P¼0.02) O-POSSUM 0.65; (P¼0.01) Note: Calibration over 6 predicted levels: ,5 deaths in three highest risk groups. All failed to predict accurately
Area under the ROC curve: P-POSSUM 0.68 O-POSSUM 0.61
Lagarde and colleagues44
External validation of O-POSSUM
n¼663, 24 deaths
O:E ratio 0.29 H –L goodness of fit, P,0.01 Note: Highest two risk strata had ,5 deaths
Area under the ROC curve 0.6
Zafirellis and colleagues45
External validation of POSSUM
n¼204, 26 deaths
SMR 0.66 H –L goodness of fit, P,0.01 Note: Highest three risk groups had ,5 deaths
Area under the ROC curve 0.62
Zingg and colleagues46
Comparison of Ra and colleagues, Steyerberg and colleagues, Bartels and colleagues models and ASA on samples from Australia and Switzerland
n¼346, 14 deaths Australia, 8 deaths Switzerland
Non-standard calibration or discrimination. Concluded none practically useful
Not available
Bosch and colleagues47
Comparison of P-, O-, and original POSSUM, CCI, ACCI, ASA
n¼280, 15 deaths
Overall SMR: P-POSSUM 1.05 O-POSSUM 0.67 H –L (P-value): P-POSSUM P¼0.04 O-POSSUM P¼0.53 CCI (P¼0.66) ACCI (P¼0.27) ASA (P¼0.21) O-POSSUM overpredicted compared with P-POSSUM
Area under the ROC curve: P-POSSUM 0.77 O-POSSUM 0.76 CCI score 0.57 ACCI score 0.68 ASA score 0.64
Ball and colleagues48
External validation of P-POSSUM
n¼53, 6 deaths
Expected 2 deaths, observed 6 deaths
Not available
Dutta and colleagues49
Comparison of P-, O-, and original POSSUM
n¼121, 5 deaths (4.1%)
Predicted overall mortality rate and SMR: POSSUM 16.5%, SMR 0.25 P-POSSUM 5.8%, SMR 0.71 O-POSSUM 9.9%, SMR 0.42
Area under the ROC curve: POSSUM 0.76 P-POSSUM 0.81 O-POSSUM 0.72
Schroder and colleagues41
External validation of Bartels and colleagues’ model
n¼126, 7 deaths
Used Bartels model to predict low, moderate, and high risk. O:E mortality (%): Low risk 2.9:3.6 (%) Moderate 3.0:8.7 (%) High 16.7:28 (%) Fewer than 5 deaths in each risk group
Not available
Steyerberg and colleagues31
External validation in SEER database (USA), 1997 –9 Eindhoven Cancer Registry, 1993 –2001 Rotterdam University Hospital, 1980 –2002
SEER, n¼714, 74 deaths Eindhoven, n¼349, 25 deaths Rotterdam, n¼1202, 45 deaths Grand total, n¼3592, 291 deaths
Comparison of O:E risk. Reported good calibration for pooled sample but ‘problematic’ for Netherlands samples
Area under the ROC curve: 0.56 –0.7
BJA
Predicting mortality after oesophagectomy
Sufficient data to assess analysis Reported appropriate model Continuous data handled appropriately Prognostic predictors defined Reported acceptable follow up rate Met
Missing values handled appropriately
Partially met
Missing values stated or deducible Data validation
Not met
Data collection
prospective
retrospective
Unclear, unknown
Sample characteristics described Consecutive cases Surgical exclusions described 0%
20%
40%
60%
80%
100%
% studies meeting criteria items Fig 2 Percentage of primary studies meeting reporting criteria for risk of bias.
the 25% in Bartels and colleagues’ study, suggesting overestimation by the original model. Steyerberg and colleagues31 evaluated the original Rotterdam model in a later SEER cohort, and in cohorts from a Netherlands population database and Rotterdam clinical centre. Discrimination was reported as poor (receiver operator AUC 0.56–0.7), but calibration was described as excellent for SEER patients and pooled data, but ‘problematic’ for the Netherlands cohorts. Zingg and colleagues46 evaluated Steyerberg and colleagues,31 Bartels and colleagues,34 and Ra and colleagues’32 models on cohorts from Switzerland and Australia. Standard discrimination and calibration methods were not reported, but the authors concluded that no models were applicable in practice.
Clinical impact studies Two studies34 40 reported using their models in clinical practice to reduce perioperative mortality, but these were not within prospective impact studies.
between 3.6%44 and 12.7%.45 Reported operative volumes ranged from nine43 to 5636 annually. The large population databases frequently did not report details of operative volume, overall mortality rates, and histological or operative details.
Risk of bias in primary studies Sixteen studies did not report exclusions from surgery for fitness or other reasons,4 33 – 35 37 39 – 49 and in 13,4 31 – 33 35 – 37 39 41 – 43 48 49 it was not clear whether samples included consecutive operated cases (Fig. 2). Data were retrospectively extracted from medical records in six studies,31 42 45 46 48 49 and in nine,4 32 – 34 37 39 40 43 44 it was unclear whether data collection was prospective or retrospective. Data validation (e.g. data audit) was not performed or was poorly reported in 17 studies.4 31 32 34 36 – 38 40 41 43 – 49 Reporting of the quantity and handling of missing data was poor or unclear in 14 studies.32 – 41 46 – 49 The larger database studies were generally better at reporting data validation methods and management of missing data. Explicit reporting of survivor and non-survivor counts was not clear in 12 studies.4 31 32 34 36 38 – 40 42 46 48 49
Case mix differences between modelling and validation samples
Discussion
Case mix details are reported in Supplementary material. Differences between POSSUM modelling and validation samples included mortality definition, for example, the use of 30 day mortality,43 45 49 percentages of elective cases,43 44 and proportions of gastrectomy and oesophagectomy.43 44 Most centres reported similar mixes of squamous and adenocarcinoma, but Lai and colleagues’42 sample from Hong Kong was exclusively squamous. Sample mortality rates also varied, for example, the O-POSSUM study reported 8.6% ‘in hospital mortality’, but the external validation studies reported mortality
We conducted this systematic review to identify a prediction model, which could be used as an aid to decision-making for patients or to assist in comparative audit. We found 11 models, of which the ‘POSSUM’-based models and those developed by Steyerberg and colleagues,31 Ra and colleagues,32 and Bartels and colleagues34 have been validated in new patients. Reported discrimination was weak for all models and predicted mortality frequently exceeded observed mortality. In comparisons of POSSUM models, all tended to overestimate mortality but the P-POSSUM was most accurate. Poor reporting of case
39
BJA selection and missing data management was common, sample sizes were frequently smaller than currently recommended particularly in validation studies, and there was evidence of potentially important case mix differences in validation samples compared with original development samples. Steyerberg and colleagues’ model31 was subjected to the most rigorous validation and appeared to use predictors more likely to be reliably ‘transportable’ to other settings. Unreliable prediction in new patients may occur when a model is too closely aligned to random variations in data from development samples (‘overfitting’). This can occur if candidate predictors are selected using statistically significant associations between predictors and outcome (‘data driven’ methods), rather than using clinical and evidential knowledge to make selections.64 65 Most models except for Steyerberg and colleagues’31 and the P-POSSUM66 used ‘data driven’ methods to some degree. Overfitting is also common in small samples, especially when the ratio of outcome events to screened predictors is ,10.67 This was the case for the single-centre models and in Bailey and colleagues’35 model. The P-POSSUM66 did not use ‘data driven’ selection and used a fairly large sample, perhaps partially explaining its superior performance in new sample comparisons with the original POSSUM and O-POSSUM. Sample size is also important in validation studies and some investigators recommend using up to 100 outcome events68 69 to allow valid model comparisons, and to reduce random imbalance of predictors. Except for Steyerberg and colleagues’ study,31 small samples were used and in calibration, the highrisk categories frequently contained very few events, limiting precision and reliability. Since high-risk categories may be of particular interest in decision-making, much larger samples are likely to be needed to improve model utility. Case mix differences can also affect prediction in new populations8 by affecting the distribution of predictors. These may be explicit, for instance, ‘POSSUM’ development and validation samples differed in outcome definitions, and some potentially important variables.42 – 45 49 Differences in mortality rates were also apparent and could reflect differences in important predictors (measured or not). Implicit and less obvious case mix differences may also have arisen from biases in case selection and the handling of missing data, both of which were often poorly reported. The use of subjectively interpreted predictors may not be reliably reproducible and also contribute to case mix differences, for example, the physician assessment of ‘cardiac risk’ in Bartels and colleagues’ model. There are other contemporary scores available for cardiac comorbidity70 or heart failure,71 which may be more reliable than subjective assessment and could be considered. The poor discrimination between survivors and nonsurvivors reflects the weak association between currently available predictors and perioperative mortality. Age was the most consistently used and reliable predictor, but the most discriminating predictors (e.g. the presence of ascites had an odds ratio 15.7)35 were unlikely to be relevant to current practice because their presence would exclude such patients from surgery. Similarly, the Glasgow coma score and the extreme
40
Warnell et al.
categories of haematological and biochemical tests of the POSSUM models are probably not relevant to contemporary elective surgical populations, having been developed on a mixed elective and emergency surgical population. Clinical prediction models are unlikely to perform well if important predictors are omitted.7 The considerable variation in individual predictors and their definition across the included studies highlights the current uncertainty about which candidate predictors are important. There is also a need to identify stronger candidate predictors and a potential candidate is cardiopulmonary exercise (CPX) testing. Unlike multivariate models, this is used reasonably widely in the UK72 as part of risk stratification for surgery. Given the likely multifactorial nature of perioperative and medium-term outcome, it would seem reasonable to examine the potential role for CPX within robustly validated multivariate prognostic models. Similarly, the development of rapid genotyping may present opportunities to identify susceptible individuals to perioperative complications such as acute lung injury73 and could be a candidate for prognostic study. Much work has been done to develop clinical prediction models for oesophagogastric surgery, but the heterogeneity of methods, presentation of results, and candidate predictor definition makes direct model comparison difficult. Further advances require better predictor characterization and large high-quality validation studies. This was done prospectively for the cardiothoracic euroSCORE, when the data from 19 030 procedures were collected from 128 European centres in a 3 month period in 1995.74 For oesophagectomy, a project on a similar scale would take considerably longer, because clinical units do fewer cases in 3 months than the 120 cases submitted by each cardiothoracic centre. For instance, the National Oesophago-Gastric Cancer Audit 20103 took nearly 2 yr to collect data from 16 264 patients. With limited time and resources, we should consider alternative ways to make use of available information. Pooling results from current available data or individual patient systematic reviews may be an option. This method has been used to estimate predictor effects in cancer survival studies.75 – 78 Large databases, such as the ICNARC database79 or the National Oesophago-Gastric Cancer audits,3 could be suitable sources from which to estimate some predictor effects. However, ultimately, large-scale prospective data collection will be necessary to validate and assess potential clinical impact. Finally, this review highlights some of the potential biases which may be encountered in studies of outcomes in high-risk surgery. This applies to most studies of clinical prediction models because they are observational and likely to have used secondary data sources such as clinical notes, and clinical or administrative databases. Outcome prediction studies in any high-risk surgical speciality need to consider how to best manage and report these potential risks of bias. There are good practice guidelines available for the design of both clinical prediction studies7 – 9 13 80 81 and more generally for the design82 and reporting83 of studies of clinical outcomes, which use secondary data sources.
BJA
Predicting mortality after oesophagectomy
Strengths and weaknesses
Acknowledgements
Systematic review methods for prediction models are less well developed than for interventions; however, we used PRISMA guidelines and adapted current recommendations for prognostic models.14 80 84 85 Inevitably, parts of the review process were iterative. For example, we modified the search strategy, when studies known to the reviewers were not retrieved and redefined risk of bias items which were found to be difficult to apply. Iteration may introduce bias, but has been recognized as an acceptable part of systematic review methodology.16 86 We intended to apply conclusions to ‘high surgical volume’ centres, such as our own. Therefore, we only included studies from either large population databases, which are likely to be widely applicable because of the larger sample sizes and more general predictors, or ‘high volume’ clinical centres. However, restriction to ‘high volume’ studies may have introduced bias which could adversely affect application to less specialized settings. Publication bias may have affected this study just as it has been reported in other prognostic and outcome studies for oesophageal cancer surgery87 and limitation to English language articles may have biased selection to studies with statistically significant results.88 Searches did not include the ‘grey literature’ and we did not contact authors. Neither did we use formal quantitative methods (e.g. funnel plots) to assess potential publication bias because of the heterogeneous nature of the reported summary statistics.88
We would like to thank Erika Gynett (Walton Library) for help structuring the search strategy and Dr Nick Steen (Institute of Health and Society) for statistical advice.
Conclusion None of the models identified in this review can currently be applied in clinical practice with any confidence, because performance was generally unreliable, discrimination poor, and validation studies were too small. Potential study biases were poorly managed or reported in some studies. Steyerberg and colleagues’ model31 has more potential for future validation and application than POSSUM models because the constituent predictors are more transportable and relevant to current elective surgical groups. Further model development requires achieving consensus on predictor definition and effect, and also validation and model comparison in large samples using currently acceptable methods and reporting. These are unlikely to be obtainable from single clinical centres. Existing UK databases and published studies may be useful sources for data synthesis, but prospective high-quality validation in large samples requires coordinated multicentre or large database studies.
Supplementary material Supplementary material is available at British Journal of Anaesthesia online.
Authors’ contributions I.W.: review design, data extraction and interpretation, writing up, and revising article. M.C.: data extraction and interpretation and revising article. M.E.: review design and revising article.
Declaration of interest None declared.
References 1 Secretary of State for Health. Equity and Excellence: Liberating the NHS. HM Government, Department of Health, 2010 2 Secretary of State for Health. Liberating the NHS: No Decision About Me, Without Me. Further Consultation on Proposals to Secure Shared Decision-Making. HM Government, Department of Health, 2012 3 Cromwell D, Palser T, van der Meulen J, et al. National OesophagoGastric Cancer Audit 2010. The NHS Information Centre, 2010 4 McCulloch P, Ward J, Tekkis PP, ASCOT Group of Surgeons, British Oesophago-Gastric Cancer Group. Mortality and morbidity in gastro-oesophageal cancer surgery: initial results of ASCOT multicentre prospective cohort study. Br Med J 2003; 327: 1192–7 5 Blazeby JM, Farndon JR, Donovan J, Alderson D. A prospective longitudinal study examining the quality of life of patients with esophageal carcinoma. Cancer 2000; 88: 1781–7 6 Association of Upper Gastrointestinal Surgeons of Great Britain and Ireland. Outcomes data. Available from http://www.augis.org/ surgical-outcomes/outcomes-data.htm (accessed 14 October 2013) 7 Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research: validating a prognostic model. Br Med J 2009; 338: 1432 8 Moons KGM, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. Br Med J 2009; 338: b606 9 Moons KGM, Royston P, Vergouwe Y, Grobbee DE, Altman DG. Prognosis and prognostic research: what, why, and how? Br Med J 2009; 338: b375 10 Shende MR, Waxman J, Luketich JD. Predictive ability of preoperative indices for esophagectomy. Thorac Surg Clin 2007; 17: 337– 41 11 Copeland P, Jones D, Walters M. POSSUM: a scoring system for surgical audit. Br J Surg 1991; 78: 355–60 12 Dutta S, Horgan P, McMillan D. POSSUM and its related models as predictors of postoperative mortality and morbidity in patients undergoing surgery for gastro-oesophageal cancer: a systematic review. World J Surg 2010; 34: 2076– 82 13 Royston P, Moons KGM, Altman DG, Vergouwe Y. Prognosis and prognostic research: developing a prognostic model. Br Med J 2009; 338: b604 14 Hayden JA, Cote P, Bombardier C. Evaluation of the quality of prognosis studies in systematic reviews. Ann Intern Med 2006; 144: 427– 37 15 Steyerberg EW. Patterns of external validity. In: Gail M, Tsiatis A, Krickeberg K, Sarnet J, eds. Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating. New York: Springer, 2009; 335 16 Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol 2009; 62: 1006–12 17 Ochsner JL, DeBakey M. Surgical aspects of carcinoma of the esophagus; review of the literature and report of 4 cases. J Thorac Surg 1941; 10: 401–45
41
BJA 18 Clinical Effectiveness Unit, The Royal College of Surgeons of England. National Oesophago-Gastric Cancer Audit, 2013 19 Al-Sarira AA, David G, Willmott S, Slavin JP, Deakin M, Corless DJ. Oesophagectomy practice and outcomes in England. Br J Surg 2007; 94: 585–91 20 Dimick JB, Wainess RM, Upchurch GR Jr, Iannettoni MD, Orringer MB. National trends in outcomes for esophageal resection. Ann Thorac Surg 2005; 79: 212–6 21 Hofstetter W, Swisher SG, Correa AM, et al. Treatment outcomes of resected esophageal cancer. Ann Surg 2002; 236: 376–84 22 Rouvelas I, Zeng W, Lindblad M, Viklund P, Ye W, Lagergren J. Survival after surgery for oesophageal cancer: a population-based study. Lancet Oncol 2005; 6: 864 –70 23 Sauvanet A, Mariette C, Thomas P, et al. Mortality and morbidity after resection for adenocarcinoma of the gastroesophageal junction: predictive factors. J Am Coll Surg 2005; 201: 253– 62 24 Jamieson GG, Mathew G, Ludemann R, Wayman J, Myers JC, Devitt PG. Postoperative mortality following oesophagectomy and problems in reporting its rate. Br J Surg 2004; 91: 943– 7 25 Killeen SD, O’Sullivan MJ, Coffey JC, Kirwan WO, Redmond HP. Provider volume and outcomes for oncological procedures. Br J Surg 2005; 92: 389–402 26 Bachmann MO, Alderson D, Edwards D, et al. Cohort study in South and West England of the influence of specialization on the management and outcome of patients with oesophageal and gastric cancers. Br J Surg 2002; 89: 914 –22 27 Allareddy V, Allareddy V, Konety BR. Specificity of procedure volume and in-hospital mortality association. Ann Surg 2007; 246: 135– 9 28 Birkmeyer JD, Siewers AE, Finlayson EVA, et al. Hospital volume and surgical mortality in the United States. N Engl J Med 2002; 346: 1128– 37 29 Pennefather SH. Anaesthesia for oesphagectomy. Curr Opin Anaesthesiol 2007; 20: 15– 20 30 Shaw IH. Anaesthetic aspects and case selection for oesophageal and gastric surgery. In: Griffin SM, Raimes S, eds. Oesophagogastric Surgery: A Companion to Specialist Surgical Practice, 4th Edn. Saunders Elsevier, 2008 31 Steyerberg EW, Neville BA, Koppert LB, et al. Surgical mortality in patients with esophageal cancer: development and validation of a simple risk score. J Clin Oncol 2006; 24: 4277– 84 32 Ra J, Paulson EC, Kucharczuk J, et al. Postoperative mortality after esophagectomy for cancer: development of a risk prediction model. Ann Surg Oncol 2008; 15: 1577– 84 33 Tekkis PP, McCulloch P, Poloniecki JD, Prytherch DR, Kessaris N, Steger AC. Risk-adjusted prediction of operative mortality in oesophagogastric surgery with O-POSSUM. Br J Surg 2004; 91: 288–95 34 Bartels H, Stein HJ, Siewert JR. Preoperative risk analysis and postoperative mortality of oesophagectomy for resectable oesophageal cancer. Br J Surg 1998; 85: 840 –4 35 Bailey SH, Bull DA, Harpole DH, et al. Outcomes after esophagectomy: a ten-year prospective cohort. Ann Thorac Surg 2003; 75: 217– 22 36 Law SYK, Fok M, Wong J. Risk analysis in resection of squamous cell carcinoma of the esophagus. World J Surg 1994; 18: 339– 46 37 Liu JF, Watson DI, Devitt PG, Mathew G, Myburgh J, Jamieson GG. Risk factor analysis of post-operative mortality in oesophagectomy. Dis Esophagus 2000; 13: 130– 5 38 Sanz L, Ovejero VJ, Gonzalez JJ, et al. Mortality risk scales in esophagectomy for cancer: their usefulness in preoperative patient selection. Hepatogastroenterology 2006; 53: 869– 73 39 Dhungel B, Diggs B, Hunter J, Sheppard B, Vetto J, Dolan J. Patient and peri-operative predictors of morbidity and mortality after
42
Warnell et al.
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
esophagectomy: American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP), 2005– 2008. J Gastrointest Surg 2010; 14: 1492– 501 Zhang GH, Fujita H, Yamana H, Kakegawa T. A prediction of hospital mortality after surgical treatment for esophageal cancer. Surg Today 1994; 24: 122–7 Schroder W, Bollschweiler E, Kossow C, Holscher AH. Preoperative risk analysis—a reliable predictor of postoperative outcome after transthoracic esophagectomy? Langenbeck‘s Arch Surg 2006; 391: 455– 60 Lai F, Kwan TK, Yuen WC, Wai A, Shung YCSE. Evaluation of various POSSUM models for predicting mortality in patients undergoing elective oesophagectomy for carcinoma. Br J Surg 2007; 94: 1172–8 Nagabhushan JS, Srinath S, Weir F, Angerson WJ, Sugden BA, Morran CG. Comparison of P-POSSUM and O-POSSUM in predicting mortality after oesophagogastric resections. Postgrad Med J 2007; 83: 355–8 Lagarde SM, Maris AKD, de Castro S, Busch ORC, Obertop H, van Lanschot JJB. Evaluation of O-POSSUM in predicting in-hospital mortality after resection for oesophageal cancer. Br J Surg 2007; 94: 1521– 6 Zafirellis KD, Fountoulakis A, Dolan K, Dexter SPL, Martin IG, Sue-Ling HM. Evaluation of POSSUM in patients with oesophageal cancer undergoing resection. Br J Surg 2002; 89: 1150–5 Zingg U, Langton C, Addison B, et al. Risk prediction scores for postoperative mortality after esophagectomy. J Gastrointest Surg 2009; 13: 611–8 Bosch DJ, Pultrum BB, de Bock GH, Oosterhuis JK, Rodgers MGG, Plukker JTM. Comparison of different risk-adjustment models in assessing short-term surgical outcome after transthoracic esophagectomy in patients with esophageal cancer. Am J Surg 2011; 202: 303– 9 Ball C, Butterworth J, Seidel J. Predictive value of P-POSSUM scoring for Ivor-Lewis oesophagectomy. Abstracts of ESICM LIVES 2011, Berlin, 1 –5 October 2011. Intensive Care Med 2011; 37: S61 Dutta S, Al-Mrabt N, Fullarton G, Horgan P, McMillan D. A comparison of POSSUM and GPS models in the prediction of post-operative outcome in patients undergoing oesophago-gastric cancer resection. Ann Surg Oncol 2011; 18: 2808–17 Yamashita S, Takeno S, Moroga T, et al. E-PASS (the Estimation of Physiologic Ability and Surgical Stress) scoring system helps the prediction of postoperative morbidity and mortality in esophageal cancer operation. Dis Esophagus 2010; 23(Suppl. S1): 54A Chamogeorgakis T, Toumpoulis I, Tomos P, et al. External validation of the modified Thoracoscore in a new thoracic surgery program: prediction of in-hospital mortality. Interact Cardiovasc Thorac Surg 2009; 9: 463–6 Luna A, Rebasa P, Navarro S, etal. An evaluation of morbidityand mortality in oncologic gastric surgery with the application of POSSUM, P-POSSUM, and O-POSSUM. World J Surg 2009; 33: 1889–94 Guest RV, Chandrabalan VV, Murray GD, Auld CD. Application of variable life adjusted display (VLAD) to risk-adjusted mortality f esophagogastric cancer surgery. World J Surg 2012; 36: 104– 8 Noble F, Curtis N, Harris S, et al. Risk assessment using a novel score to predict anastomotic leak and major complications after oesophageal resection. J Gastrointest Surg 2012; 16: 1083–95 Sunpaweravong S, Ruangsin S, Laohawiriyakamol S. Prediction of post-operative complications and survival for esophageal carcinoma. 12th World Congress of the International Society for Diseases of the Esophagus, 2010. Dis Esophagus 2010; 23(Suppl. S1): 14A Vashist Y, Loos J, Dedow J, et al. Glasgow prognostic score is a predictor of perioperative and long-term outcome in patients with only
Predicting mortality after oesophagectomy
57
58
59
60
61 62
63 64
65
66
67
68
69
70
71
72
surgically treated esophageal cancer. Ann Surg Oncol 2011; 18: 1130– 8 Ferguson MK, Celauro AD, Prachand V. Assessment of a scoring system for predicting complications after esophagectomy. Dis Esophagus 2011; 24: 510– 5 Grotenhuis BA, van Hagen P, Reitsma JB, et al. Validation of a nomogram predicting complications after esophagectomy for cancer. Ann Thorac Surg 2010; 90: 920– 5 Lagarde SM, Reitsma JB, Maris AKD, et al. Preoperative prediction of the occurrence and severity of complications after esophagectomy for cancer with use of a nomogram. Ann Thorac Surg 2008; 85: 1938– 45 Charlson M, Pompei P, MacKenzie C. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 1987; 40: 373– 83 Karnofsky D. Reporting results of cancer treatment. Cancer 1984; 1: 634– 5 Steyerberg EW. Validation of prediction models. In: Gail M, Tsiatis A, Krickeberg K, Sarnet J, eds. Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating. New York: Springer, 2009; 301 Charlson ME, Szatrowski TP, Peterson J, Gold J. Validation of a combined comorbidity index. J Clin Epidemiol 1994; 47: 1245– 51 Harrell FE. Multivariable modelling strategies. Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer, 2001; 60 –4 Steyerberg EW. Overfitting and optimism in regression models. In: Gail M, Tsiatis A, Krickeberg K, Sarnet J, eds. Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating. New York: Springer, 2009; 87 Prytherch DR, Whiteley MS, Higgins B, Weaver PC, Prout WG, Powell SJ. POSSUM and Portsmouth POSSUM for predicting mortality. Br J Surg 1998; 85: 1217–20 Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996; 49: 1373–9 Peek N, Arts DGT, Bosman RJ, van der Voort PHJ, de Keizer NF. External validation of prognostic models for critically ill patients required substantial sample sizes. J Clin Epidemiol 2007; 60: 491.e1–13 Vergouwe Y, Steyerberg EW, Eijkemans MJC, Habbema JDF. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol 2005; 58: 475–83 Lee TH, Marcantonio ER, Mangione CM, et al. Derivation and prospective validation of a simple index for prediction of cardiac risk of major noncardiac surgery. Circulation 1999; 100: 1043– 9 National Institute for Clinical Exellence. Chronic heart failure: management of chronic heart failure in adults in primary and secondary care, 2010. Available from www.nice.org.uk/nicemedia/live/13099/ 50526/50526.pdf (accessed 26 August 2013) Huddart S, Young E, Smith R-L, Holt P, Prabhu P. Preoperative cardiopulmonary exercise testing in England—a national survey. Periop Med 2013; 2: 4
BJA 73 Nonas S, Finigan J, Garcia J. Functional genomic insights into acute lung injury: role of ventilators and mechanical stress. Proc Am Thorac Soc 2005; 2: 188–94 74 Roques F, Nashef SAM, Michel P, et al. Risk factors and outcome in European cardiac surgery: analysis of the EuroSCORE multinational database of 19030 patients. Eur J Cardiothorac Surg 1999; 15: 816– 23 75 Look MP, van Putten WLJ, Duffy MJ, et al. Pooled analysis of prognostic impact of urokinase-type plasminogen activator and its inhibitor PAI-1 in 8377 breast cancer patients. J Natl Cancer Inst 2002; 94: 116– 28 76 Steyerberg EW, Eijkemans MJC, Van Houwelingen JC, Lee KL, Habbema JDF. Prognostic models based on literature and individual patient data in logistic regression analysis. Stat Med 2000; 19: 141– 60 77 Steyerberg EW. Estimation with external information. In: Gail M, Tsiatis A, Krickeberg K, Sarnet J, eds. Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating. New York: Springer, 2009; 243 78 Riley RD, Lambert P, Abo-Zaid G. Meta-analysis of individual participant data: rationale, conduct, and reporting. Br Med J 2010; 340: 521 79 Park DP, Welch CA, Harrison DA, et al. Outcomes following oesophagectomy in patients with oesophageal cancer: a secondary analysis of the ICNARC Case Mix Programme Database. Crit Care 2009; 13 (Suppl. 2): S1 80 Hemingway H, Riley RD, Altman DG. Ten steps towards improving prognosis research. Br Med J 2009, 10.1136/bmj.b4184 81 Hayden JA, van der Windt DA, Cartwright JL, Cote P, Bombardier C. Assessing bias in studies of prognostic factors. Ann Intern Med 2013; 158: 280–6 82 International Society for Pharmacoeconomics and Outcomes Research. ISPOR Good Practices for Outcomes Research Index, 2013. Available from http://www.ispor.org/workpaper/practices_index. asp (accessed 30 November 2013) 83 von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbrouke JPSI. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol 2008; 61: 344–9 84 Centre for Reviews and Dissemination, York University. CRD’s Guidance for undertaking reviews in health care: Chapter 2.3, Prognostic Tests, 2009. Available from http://www.york.ac.uk/inst/crd/SysRev/ !SSL!/WebHelp/SysRev3.htm (accessed July 2009) 85 Cochrane Prognosis Methods Group. 2011. Available from http:// prognosismethods.cochrane.org 86 Pope C, Mays N, Popay J. Synthesising Qualitative and Quantitative Health Evidence: A Guide to Methods. Maidenhead: McGraw Hill, Open University Press, 2007; 22 87 Jamieson GG, Matthew G, Ludemann R, et al. Postoperative mortality following oesophagectomy and problems in reporting its rate. Br J Surg 2004; 91: 943– 7 88 Rothstein HR, Sutton JA, Borenstein M, eds. Publication Bias in Meta-analysis. Chichester: John Wiley and Sons, Ltd, 2005
Handling editor: J. G. Hardman
43