Downloaded from http://gut.bmj.com/ on October 19, 2015 - Published by group.bmj.com
Gut Online First, published on October 16, 2015 as 10.1136/gutjnl-2015-310393 Inflammatory bowel disease
ORIGINAL ARTICLE
Development and validation of a histological index for UC Mahmoud H Mosli,1,2,3 Brian G Feagan,1,2,4 Guangyong Zou,1,4 William J Sandborn,1,5 Geert D’Haens,1,6 Reena Khanna,1,2 Lisa M Shackelton,1 Christopher W Walker,1 Sigrid Nelson,1 Margaret K Vandervoort,1 Valerie Frisbie,1 Mark A Samaan,1 Vipul Jairath,1,7,8 David K Driman,9 Karel Geboes,10 Mark A Valasek,11 Rish K Pai,12 Gregory Y Lauwers,13,14 Robert Riddell,15 Larry W Stitt,1,4 Barrett G Levesque1,5 ▸ Additional material is published online only. To view these files please visit the journal online (http://dx.doi. org/10.1136/gutjnl-2015310393) For numbered affiliations see end of article. Correspondence to Dr Barrett G Levesque, Division of Gastroenterology, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA;
[email protected] Received 17 July 2015 Revised 15 September 2015 Accepted 22 September 2015
ABSTRACT Objective Although the Geboes score (GS) and modified Riley score (MRS) are commonly used to evaluate histological disease activity in UC, their operating properties are unknown. Accordingly, we developed an alternative instrument. Design Four pathologists scored 48 UC colon biopsies using the GS, MRS and a visual analogue scale global rating. Intra-rater and inter-rater reliability for each index and individual index items were measured using intraclass correlation coefficients (ICCs). Items with high reliability were used to develop the Robarts histopathology index (RHI). The responsiveness/validity of the RHI and multiple histological, endoscopic and clinical outcome measures were evaluated by analyses of change scores, standardised effect size (SES) and Guyatt’s responsiveness statistic (GRS) using data from a clinical trial of an effective therapy. Results Inter-rater ICCs (95% CIs) for the total GS and MRS scores were 0.79 (0.63 to 0.87) and 0.80 (0.69 to 0.87). The correlation estimates between change scores in RHI and change score in GS and MRS were 0.75 (0.67 to 0.82) and 0.84 (0.79 to 0.88), respectively. The SES and GRS estimates for GS, MRS and RHI were: 1.87 (1.54 to 2.20) and 1.23 (0.97 to 1.50), 1.29 (1.02 to 1.56) and 0.88 (0.65 to 1.12), and 1.05 (0.79 to 1.30) and 0.88 (0.64 to 1.12), respectively. Conclusions The RHI is a new histopathological index with favourable operating properties.
INTRODUCTION
To cite: Mosli MH, Feagan BG, Zou G, et al. Gut Published Online First: [please include Day Month Year] doi:10.1136/gutjnl2015-310393
Disease activity in UC has traditionally been evaluated by symptoms and endoscopy.1 2 However, a need exists to develop additional objective measures for use in drug development and clinical care. Histological grading of disease activity has promise as an outcome measure in clinical trials of therapy for UC and as a prognostic marker in practice.3–8 Notwithstanding that disease-specific histological activity indices currently exist,9 the limited data available indicate that their operating properties may be suboptimal.7 10 11 Ideally an evaluative index should be reproducible, respond to clinically meaningful change in disease activity, remain unchanged in stable patients and
Significance of this study What is already known on this subject?
▸ Histological disease activity is an important outcome measure in controlled studies of therapy for UC. ▸ A need exists for reproducible and responsive histological indices. ▸ The Geboes score (GS) and modified Riley score (MRS), which are the currently available histopathological instruments, have not been adequately validated. ▸ Intra-rater reliability for histological assessment of UC disease activity by expert pathologists is adequate, but inter-rater reliability is suboptimal. ▸ Sources of scoring disagreement among pathologists have been identified through a formal consensus process.
What are the new findings?
▸ Scoring conventions improved inter-rater reliability among central expert pathologists; intraclass correlation coefficients for inter-rater reliability for histological disease activity using GS, MRS and 100 mm visual analogue score (VAS) in UC are ‘substantial’ to ‘almost perfect’. ▸ Although all components of the GS were shown to be reliable, the relationship between GS and a global measure of histological inflammation (VAS) was not linear indicating that GS is not an optimal index. ▸ Component items of GS that best predicted VAS were chronic inflammatory infiltrate, lamina propria neutrophils, neutrophils in the epithelium, and erosion or ulceration. These items were included in a new index, the Robarts histopathology index (RHI). ▸ RHI was shown to be reproducible, responsive and valid.
detect relevant changes in disease status over time. The responsiveness of an evaluative instrument has important implications for sample size calculations
Mosli MH, et al. Gut 2015;0:1–9. doi:10.1136/gutjnl-2015-310393
1
Copyright Article author (or their employer) 2015. Produced by BMJ Publishing Group Ltd (& BSG) under licence.
Downloaded from http://gut.bmj.com/ on October 19, 2015 - Published by group.bmj.com
Inflammatory bowel disease Significance of this study How might it impact on clinical practice in the foreseeable future?
▸ RHI is a new evaluative instrument that was developed using rigorous methodology. ▸ External validation of RHI using an independent data set is required. ▸ RHI is likely to be a valuable outcome measure for use in clinical trials.
in clinical trials, since a responsive instrument can minimise the number of patients required to show significant differences between interventions.12 In a previous study, we assessed the reliability among expert pathologists in evaluating the Geboes score (GS) and the modified Riley score (MRS), the two most commonly used histological instruments.9 Although high levels of intra-rater reliability were observed for both measures, inter-rater reliability was less satisfactory. Six index items were responsible for the majority of the variability in scoring. These findings have important negative connotations for the conduct of large clinical trials where it may be necessary, based upon logistic considerations, to have multiple pathologists score biopsies. In an attempt to understand and minimise sources of disagreement a RAND consensus process examined the component items of these two scores.13 The RAND appropriateness methodology uses a modified ‘Delphi’ panel approach to combine the best available evidence and the personal clinical experience of experts in the field.14 The panel aids decision-making through an iterative process in which questions are identified, alternative viewpoints are defined, experts rate the appropriateness of statements and then, using predefined criteria, consensus is obtained. The RAND appropriateness methodology is considered a robust and highly regarded method for evaluating appropriateness that has been widely used in medicine for a broad range of purposes.15 In this exercise, 10 pathologists with specific expertise in IBD participated in multiple electronic surveys with a goal to identify and minimise variances in item scoring. Ultimately, histological scoring conventions were generated (see online supplementary table S1 and figure S1) for the problematic items. Given this previous research, the objectives of the current study were to: (1) reassess intra-rater and inter-rater reliability for the existing histological indices and their constituent items following introduction of the scoring conventions, (2) to derive a new index using reliable histological items, (3) to establish the construct validity of the new index, and (4) to assess the responsiveness of the new index by evaluating the validity of its change scores and the ability to detect the treatment effect of a proven effective medical therapy for UC.
METHODS Study population We used colonic biopsies obtained during the conduct of a multicentre, randomised, placebo-controlled phase 2 induction study of MLN0216 to perform an integrated reproducibility/ responsiveness/index development and validation study. MLN02 was a previous formulation of vedolizumab, a monoclonal antibody to the α4β7 integrin, which is now approved in the USA and Europe for the treatment of UC. In this trial MLN02 was 2
evaluated as induction therapy in 181 adult patients with UC with clinically active disease (UC Clinical Score (UCCS) score between 5 and 9 points with a score of at least 1 point on stool frequency or rectal bleeding), an endoscopic modified Baron score (MBS) >1 and a minimum of 25 cm of disease extent on sigmoidoscopic examination. Participants were randomly assigned in a 1:1:1 ratio to receive intravenous MLN02 at a dose of 0.5 mg/kg (n=58), 2.0 mg/kg (n=60) or placebo (n=63) at baseline and day 29. Clinical remission at week 6 was the primary outcome measure. Flexible sigmoidoscopy with colonic biopsy was performed at baseline, week 4 and week 6. At 6 weeks, 33%, 32% and 14% of patients in the treatment groups were in clinical remission (overall p=0.03; p=0.02 for both comparisons with placebo). Corresponding improvements in favour of MLN02 were observed for mucosal healing, histopathology and quality of life.16 Subsequently the efficacy of vedolizumab, an improved formulation of MLN02, was confirmed in large-scale phase 3 trials for the induction and maintenance of remission in UC.17 For the purpose of this study, data from the two MLN02 dose groups were pooled for analysis because no important differences in efficacy were observed between these groups.
Study material Original tissue blocks collected during the conduct of the MLN02 trial were used. Biopsies were prepared ( paraffin embedded, sectioned, H&E stained) and scanned at ×400 magnification (40×objective×10×microscope eye piece) on a Ventana whole slide scanner (Mt Sinai Services, Toronto, Canada). Scanned images were compressed using Web Microscope Compressor (V.1.064) and hosted for viewing by the pathologist readers on the Robarts Web Microscope database hosted on a secure remote server.
Study design and analytical approaches The overall design of the study is shown in figure 1. Digital images were read by expert histopathologists (MAV, DKD, GL and RKP), who were unaware of clinical information, treatment assignment or visit number. In the first part of the study which assessed reliability, histological disease activity was evaluated according to the GS, MRS and a global rating of histopathological severity measured on a 100 mm visual analogue scale (VAS). The GS is a seven-item ordinal instrument that has been used as an outcome measure in clinical trials, which classifies histological changes as grade 0 (structural change only); grade 1 (chronic inflammation); grade 2 (a, lamina propria neutrophils; b, lamina propria eosinophils); grade 3 (neutrophils in the epithelium); grade 4 (crypt destruction) and grade 5 (erosions or ulcers), and generates a score from 0 to 5.4, with higher scores indicating greater inflammation.10 Several methods have been used for calculating GS and no generally accepted scoring convention exists. In this study we calculated a total GS score using the original ordinal 6-point scale, which specifies grades 0–5. MRS is a six-item (presence of an acute inflammatory cell infiltrate (neutrophils in the lamina propria), crypt abscesses, mucin depletion, surface epithelial integrity, chronic inflammatory cell infiltrate (round cells in the lamina propria), crypt architectural irregularities) instrument graded on a 4-point scale (none, mild, moderate or severe). Scores range from 0 (no inflammation) to 7 (severe acute inflammation).7 16 VAS is an evaluative tool commonly used to measure constructs that range across a continuum of possible responses when no gold standard is available. It requires the evaluator to place a mark on a 100 mm line, where 0 means no disease activity and 100 means severe disease Mosli MH, et al. Gut 2015;0:1–9. doi:10.1136/gutjnl-2015-310393
Downloaded from http://gut.bmj.com/ on October 19, 2015 - Published by group.bmj.com
Inflammatory bowel disease
Figure 1 Overall study design illustrating the major phases of index development for RHI. RCT, randomised-controlled trial; VAS, visual analogue scale; GS, Geboes score; MRS, modified Riley score; ICC, intraclass correlation coefficient; RHI, Robarts histopathology index. activity.18 Total scores and individual components for GS and MRS were assessed. Fifty images of biopsies taken during the MLN02 trial were randomly selected from the complete population of images, two of which were ultimately determined to be unusable due to poor image quality, leaving a total sample of 48 images. The sampling procedure was stratified according to MBS, an endoscopic measure of disease severity, to ensure that a full spectrum of disease activity was available. MBS ranges from 0 to 4 (0, normal mucosa; 1, granular mucosa with an abnormal vascular pattern; 2, friable mucosa; 3, microulceration with spontaneous bleeding; 4, gross ulceration).16 Images were scored three times by all four readers, in a random order, at least 2 weeks apart. GS, MRS and VAS were evaluated on each reading. The participating pathologists were extensively educated on the item scoring conventions developed during the previously described consensus process (see online supplementary figure S1). Representative examples of all of the items were provided to the readers in an electronic atlas (see online supplementary figure S1). Inter-rater and intra-rater reliability statistics were calculated for the overall histological scores and the component items of the GS. For the new index development (Part 2 of the study), items with at least a ‘moderate’ level of reliability, based on Landis and Koch benchmarks, whereby intraclass correlation coefficients (ICCs) of 0.8 constitute ‘poor,’ ‘fair,’ ‘moderate,’ ‘good,’ ‘substantial’ and ‘almost perfect’ reliability, respectively,19 were selected as candidate items in developing a new index, using the VAS score as the anchor. These empirical benchmarks, which were originally Mosli MH, et al. Gut 2015;0:1–9. doi:10.1136/gutjnl-2015-310393
developed for grading kappa statistics, have become widely adopted for assessment of ICCs. The development of the new index, which was designated the Robarts Histopathology Index (RHI) centred on building the model that best predicts the VAS score. Exploratory bivariate analyses between the VAS and each of the items selected based on reliability were performed first to guide the coding of each item. Specifically, we prespecified that GS item variables would be coded as continuous if a linear relationship was demonstrated between change in score and change in VAS. If a linear relationship was not evident, the bivariate relationships were used to collapse item levels. A full model was then obtained using all items, followed by a step-down model building approach with p=0.05 used as the criterion for item selection. Residuals from the final model were subjected to statistical diagnostics examination. The stability of the final model was assessed and calibrated using the bootstrap method with 2000 replicates.20 For ease of calculation of RHI, we standardised the regression coefficients by dividing the smallest coefficient and rounding to integers. The construct validity of the new index (Part 3 of the study) was evaluated by comparing Pearson product correlations between the index and clinical, endoscopic and health related quality of life instruments. In this approach a priori predictions were made regarding the correlations between RHI and other valid measures of disease activity (VAS, MBS, sum of the aggregate Mayo Clinic Score (MCS)21 rectal bleeding and stool frequency subscores and the IBD Questionnaire (IBDQ)22). The full MCS21 is composed of four subscores (bleeding, stool 3
Downloaded from http://gut.bmj.com/ on October 19, 2015 - Published by group.bmj.com
Inflammatory bowel disease frequency, physician assessment and endoscopic appearance) rated from 0 to 3 that are summed to give a total score ranging from 0 to 12, with higher scores representing more severe disease. The IBDQ consists of 32 questions divided into four dimensions: bowel symptoms (10 items), systemic symptoms (5 items), emotional function (12 items) and social function (5 items). Every question has graded responses from 1 (worst situation) to 7 (best situation). Total scores range from 32 to 224 with higher scores representing better quality of life.22 These a priori correlation predictions were then compared with the observed correlations. A valid instrument should show appropriate, ordered relationships with the other relevant measures of disease activity. We also calculated RHI scores corresponding to usual definitions of clinical and endoscopic remission (MBS of 0 or 1, Mayo Clinic rectal bleeding score of 0, Mayo Clinic stool frequency score of ≤1, and a sum of the Mayo Clinic rectal bleeding and stool frequency scores of 0 or 1). For measurement of responsiveness, (Part 4 of the study), images from the baseline biopsies were paired with their corresponding week 6, post-treatment image (a week 4 image was used if no week 6 image was available). The original study included 181 patients, of whom 154 had a baseline and a week 6 (or week 4) image available (18 patients did not have a week 4 or week 6 image, 1 patient did not have a baseline image, 8 patients had unusable images at either baseline or week 4 or week 6). For these analyses, a single central reader (RKP) reviewed each pair of images and scored GS, MRS and RHI. Correlations between change scores were used to assess the validity of the new index; correlations exceeding a threshold of 0.7 were considered acceptable. To further evaluate the potential value of RHI as an outcome measure in clinical trials and to facilitate index interpretability, the standardised effect size (SES) and Guyatt’s responsiveness statistic (GRS)23 were calculated using treatment allocation with patients assigned to MLN02 considered ‘changed’ and those assigned to placebo considered ‘unchanged.’ Furthermore, these statistics were also calculated using the total population and the following clinical definitions of change: (1) a 1-point difference from baseline in the MBS and (2) a 2-point change in the sum of the aggregate rectal bleeding and stool frequency subscores of the MCS. The degree of index responsiveness was classified using previously described conventions whereby effect sizes of 0.2, 0.5 and 0.8 represent low, moderate and large degrees of responsiveness.24
Sample size justification In Part 1 of the study, the sample size calculation was based on a one-way random effects model as discussed by Zou.25 Assuming a true ICC of 0.7, rating of 50 images three times would have 80% chance of obtaining the one-sided 95% lower bound >0.5. For Part 2, the sample size was determined by applying the ‘rule-of-10’ which states that 10 observations per item are sufficient in a regression analysis. The purpose of the rule is to assure stability of the model estimates and although it was empirically derived, considerable experience exists that supports the validity of this approach.26 A total of 150 images was regarded as sufficient for the present study since 15 items were evaluated in the regression model. This sample size was also large enough to distinguish a difference in ICCs of 0.8 from 0.7 at the 5% significance level, suggesting the sample size for Part 3 and responsiveness in Part 4 were sufficient. The sample size for assessing the magnitude of effect size in Part 4 was determined according to a formula for an imbalanced study with smaller group sizes of n and m, respectively (ie, where r is the ratio of the larger group to smaller group, ES is the target effect 4
size, z is the quantile value of the standard normal distribution). Thus, for r=2 (based upon a treatment assignment of 1:1:1 to placebo, low dose MLN02 and high dose), a total of 150 images (50 placebo and 100 pooled MLN002) would have 88% power to detect an effect size of 0.3 at a two-sided 5% significance level.
Ethical considerations The biopsies analysed in this study were obtained from a clinical trial that complied with all applicable regulatory requirement(s). The consent of study participants included the use of the collected data for other medical purposes, and thus additional consent for the present study was not obtained. All participant information used in the present study was de-identified and the pathologists were blinded to clinical information.
RESULTS Study population Baseline characteristics of the study patients are summarised in table 1. There were no important differences in baseline demographics or disease characteristics between patients treated with MLN02 and those treated with placebo, and the characteristics were generally representative of participants in induction trials of treatment for active UC.
Index reliability Individual ICCs for VAS, MRS and GS are summarised in table 2. Intra-rater ICCs (95% CI) for VAS, MRS and GS (5-point scale) were 0.83 (0.77 to 0.87), 0.85 (0.77 to 0.91) and 0.88 (0.79 to 0.93), respectively, indicating ‘almost perfect’ intra-rater reliability. Inter-rater ICCs (95% CI) for VAS, MRS and GS (5-point scale) were 0.67 (0.55 to 0.74), 0.80 (0.69 to 0.87) and 0.79 (0.63 to 0.87) indicating ‘substantial’ to ‘almost perfect’ inter-rater reliability (table 2). Intra-rater ICCs were also above the ‘substantial’ benchmark for scoring of all items
Table 1
Baseline demographics and clinical characteristics
Age Male sex, n (%) Months since diagnosis Current smoker, n (%) UC clinical score† Stool frequency Rectal bleeding Assessment by the patient Assessment by the physician Rectal bleeding+stool frequency (MCS) Modified Baron score Modified Riley histopathological index Mesalamine use, n (%) Haemoglobin concentration White cell count
All patients (N=155)
Placebo (N=53)
MLN02 (N=102)
41.7±14.1 87 (56.1) 77.9±82.6 8 (5.2) 6.9±1.5 2.2±0.8 1.5±0.8 1.3±0.7 1.9±0.3 3.7±1.1
39.8±13.1 29 (54.7) 80.6±82.2 4 (7.5) 6.6±1.6 2.2±0.9 1.4±0.9 1.3±0.8 1.8±0.4 3.6±1.1
42.8±14.6* 58 (56.9) 76.5±83.2 4 (3.9) 7.0±1.4 2.3±0.8 1.6±0.8 1.3±0.7 1.9±0.3 3.8±1.1
2.7±0.7 5.9±1.3
2.7±0.8 5.6±1.6
2.7±0.7 6.1±1.0
124 (80.0) 13.4±1.7 8.4±2.6
41 (77.4) 13.2±1.7 8.4±2.1
83 (81.4) 13.5±1.7 8.5±2.8
*Plus–minus values are means±SD. †The UC Clinical Score, a modification of the Mayo Clinic Score (MCS), consists of four items—rectal bleeding, stool frequency, functional assessment by the patient and global assessment by the physician. Items are scored on a scale from 0 (normal) to 3 (severe disease). The composite score ranges from 0 (inactive disease) to 12 (severe disease activity).16
Mosli MH, et al. Gut 2015;0:1–9. doi:10.1136/gutjnl-2015-310393
Downloaded from http://gut.bmj.com/ on October 19, 2015 - Published by group.bmj.com
Inflammatory bowel disease Table 2 Reliability of VAS, MRS and GS ICC (95% CI)*
VAS MRS GS (0) Structural—architectural change (1) Chronic Inflammatory infiltrate (2a) Lamina propria eosinophils (2b) Lamina propria neutrophils (3) Neutrophils in epithelium (4) Crypt destruction (5) Erosion or ulceration Total—5-point scale
Intra-rater
Inter-rater
0.83 (0.77 to 0.87) 0.85 (0.77 to 0.91)
0.67 (0.55 to 0.74) 0.80 (0.69 to 0.87)
0.81 (0.72 to 0.88)
0.60 (0.39 to 0.73)
0.86 (0.73 0.72 (0.63 0.74 (0.66 0.86 (0.81 0.78 (0.70 0.90 (0.84 0.88 (0.79
0.75 0.48 0.61 0.56 0.51 0.79 0.79
to to to to to to to
0.92) 0.79) 0.81) 0.89) 0.85) 0.94) 0.93)
(0.54 (0.34 (0.48 (0.43 (0.35 (0.66 (0.63
to 0.86) to 0.58) to 0.69) to 0.67) to 0.64) to 0.86) to 0.87)
*Values represent ICCs. GS, Geboes score; ICC, intraclass correlation coefficient; MRS, modified Riley score; VAS, visual analogue score.
included in GS. Inter-rater ICCs for the scoring of the individual items were all above the benchmark indicating ‘good’ reliability, with the lowest inter-rater ICC (0.48 (0.34 to 0.58)) observed for scoring of lamina propria eosinophils. These results suggest that all GS items following application of scoring conventions may be regarded as reliable and, hence were eligible candidate items for further index development. It is also notable that the point estimates for all of the index scores and the component items, with the exception of structural/architectural change, were superior to those obtained in a previous reliability study.9 This finding suggests that the scoring conventions improved inter-rater reliability. This impression was confirmed by examining the scores of the single reader (RKP) who participated in both studies. Clear evidence of a training effect was demonstrated (see online supplementary table S2).
Item selection for index development Figure 2A, which shows the bivariate relationships between each of the GS items and the VAS score, demonstrates a linear relationship between increments in VAS scores and the GS items ‘structural/architectural change’, ‘ chronic inflammatory infiltrate’ and ‘lamina propria neutrophils.’ Thus, the scores for these items were treated as continuous values in the model. Figure 2A also shows that a VAS score of approximately 50 corresponds to scores of 1, 2 and 3 for the GS item ‘lamina propria eosinophils,’ which justified collapsing these three levels into one level. Therefore, for model development, ‘lamina propria eosinophils’ scores were recoded as 0 and 1. Similarly, ‘neutrophils in epithelium’ was recoded as 0, 1 (original scores=1 and 2) and 2 (original score=3), ‘crypt destruction’ was recoded as 0 and 1 (original score=1, 2 and 3), and ‘erosion or ulceration’ was recoded as 0 and 1 (original scores=2 and 3), 2 (original score=3) and 3 (original score=4). The model building process started with all seven GS item variables, followed by a step-down procedure with a bootstrap of 2000 resamples, which selected a final model with ‘chronic inflammatory infiltrate’, ‘lamina propria neutrophils’, ‘neutrophils in the epithelium’ and ‘erosion or ulceration’ as items that best predicted VAS with an R2 value of 0.82 (table 3). The calibration plot using a bootstrap of 2000 resamples (figure 2B) shows that the final model has reasonable external validity. After simplification of the model, RHI can be calculated as: Mosli MH, et al. Gut 2015;0:1–9. doi:10.1136/gutjnl-2015-310393
Figure 2 (A)Univariable summaries of VAS scores as stratified by the levels of Geboes items. The figure shows the VAS scores for the histopathological items evaluated according to each of their levels. These data were used to select the number of item levels for regression analysis. For example, a linear relationship is present for ‘lamina propria neutrophils’ so four levels were included whereas only two levels were appropriate for ‘crypt destruction.’ The numbers of images are shown in the right margin. (B) Calibration plot of actual versus predicted VAS using the final model with four variables (chronic inflammatory infiltrate, lamina propria neutrophils, neutrophils in epithelium, and erosion or ulceration). The perfect prediction (Ideal) is shown by the 45° line. The model performance as assessed by the derivation sample is shown by the dotted line (Apparent). The model performance as assessed by bootstrap validation with 2000 replications is shown by the dashed line (Bias-corrected). The closeness between the Apparent and Bias-corrected plots suggests stability of the model performance in data sets other than that used to derive the model. VAS, visual analogue scale.
RHI ¼ 1 chronic inflammatory infiltrate level ð4 levels) þ 2 lamina propria neutrophils ð4 levelsÞ þ 3 neutrophils in epithelium ð4 levelsÞ þ 5 erosion or ulceration ð4 levels after combining Geboes 5:1 and 5:2Þ: The total score ranges from 0 (no disease activity) to 33 (severe disease activity). The intra-rater and inter-rater ICCs 5
Downloaded from http://gut.bmj.com/ on October 19, 2015 - Published by group.bmj.com
Inflammatory bowel disease Table 3 Final regression model for Robarts Histopathology Index Component
Coefficient (SE)
p Value
Intercept Chronic inflammatory infiltrate 0=No increase 1=Mild but unequivocal increase 2=Moderate increase 3=Marked increase Lamina propria neutrophils 0=None 1=Mild but unequivocal increase 2=Moderate increase 3=Marked increase Neutrophils in epithelium 0=None 1=