American Journal of Epidemiology ª The Author 2009. Published by the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail:
[email protected].
Vol. 170, No. 7 DOI: 10.1093/aje/kwp195 Advance Access publication August 13, 2009
Practice of Epidemiology The Need for Validation of Statistical Methods for Estimating Respiratory Virus–Attributable Hospitalization
Rodica Gilca, Gaston De Serres, Danuta Skowronski, Guy Boivin, and David L. Buckeridge Initially submitted September 17, 2008; accepted for publication June 9, 2009.
child; database; hospitalization; influenza, human; models, statistical; prospective studies; respiratory syncytial viruses
Abbreviations: P&I, pneumonia and influenza; RSV, respiratory syncytial virus.
Quantification of disease burden caused by respiratory viruses is of considerable interest to public health experts when recommending, evaluating, and prioritizing prevention and control programs. Given the large number of respiratory viruses, it is difficult to estimate the proportion of hospital admissions attributable to a single type of virus. This proportion can be measured directly by prospective study, but such an approach is cumbersome and expensive. As an alternative, ecologic approaches have been adopted to quantify disease burden indirectly by using readily accessible administrative databases, virus surveillance data, and a range of statistical methods. Initially, Serfling (1) derived estimates of influenzarelated mortality from the seasonal patterns of deaths
attributed to pneumonia and influenza (P&I). Simonsen et al. (2, 3) developed this approach further by applying a periodic regression model to summer deaths and attributing the deaths above the epidemic threshold of the model to influenza. Neuzil et al. (4) used another approach, in which morbidity attributable to influenza is based on the risk difference between influenza and non- or peri-influenza periods. Modified versions were subsequently developed to adjust for cocirculation of respiratory syncytial virus (RSV), but these approaches remained essentially univariate (5, 6). Nicholson (7) applied multivariate linear regression to 4-week aggregated mortality data to assess deaths attributable to influenza and RSV by including in the model different covariates, such as seasons and temperature.
Correspondence to Dr. Rodica Gilca, Institut National de Sante´ Publique du Que´bec, 2400 d’Estimauville, Quebec, PQ, Canada, G1E 7G9 (e-mail:
[email protected]).
925
Am J Epidemiol 2009;170:925–936
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
Public policy regarding influenza has been based largely on the burden of hospitalization estimated through ecologic studies applying increasingly sophisticated statistical methods to administrative databases. None are known to have been validated by observational studies. The authors illustrated how 6 commonly applied statistical methods estimate virus-attributable hospitalization of children 6–23 months of age and compared the estimates with results obtained from a prospective study using virologic assessment. The proportions of pneumonia and influenza and of bronchiolitis hospitalizations attributable to respiratory syncytial virus and/or influenza were derived by using Serfling regression, periseason differences, Poisson regression with log link, negative binomial regression with identity link, and a Box-Jenkins transfer function. No method provided accurate or consistent estimates for both viruses and outcomes. Virus-attributable hospitalization estimates varied widely between statistical methods and between seasons, with greater between-season variation for admissions attributed to influenza compared with respiratory syncytial virus. Sophistication of statistical methods may have been interpreted as assurance that results are more accurate. Without validation against epidemiologic data, with viral etiology confirmed in individual patients, the accuracy of statistical methods in ecologic studies is simply not known. Until these methods are validated, their methodological limitations should be made explicit and proxy estimates used cautiously in guiding public policy.
926 Gilca et al.
MATERIALS AND METHODS Sources of data Prospective study. Nasopharyngeal aspirates were collected at admission from children less than 3 years of age hospitalized with acute respiratory illness at the Centre Hospitalier Universitaire de Que´bec during the winters of
2001–2002 and 2002–2003. Nasopharyngeal aspirates were tested for influenza A/B and RSV, as described elsewhere (18, 19). Hospital discharge database. Hospital discharge diagnoses for infants and toddlers 6–23 months of age were obtained from an administrative database, MED-ECHO, which records all acute care hospitalizations in the province of Quebec, Canada, a population of approximately 7.6 million. Records with a primary diagnosis based on the International Classification of Diseases, Ninth Revision for P&I (codes 480–486 and 487, respectively) or bronchiolitis (code 466.1) were included. Data were extracted from August 1998 through August 2005 and were aggregated according to Centers for Disease Control and Prevention (Atlanta, Georgia) weeks for viral surveillance. Denominators for hospitalization rates were obtained from census data provided by the Institut de la statistique du Que´bec. Provincial viral surveillance data. A weekly count of laboratory-confirmed RSV and influenza detection from more than 30 hospitals is produced year-round by the Laboratoire de sante´ publique du Que´bec. Statistical methods Serfling-type regression model. A Serfling-type periodic regression model (1) was applied to P&I and bronchiolitis weekly admission data for the entire study period according to the modification described by Simonsen et al. (2, 3) for influenza virus (Appendix). Excess admissions were calculated as the observed minus predicted admissions for all ‘‘epidemic weeks.’’ Periseason differences I. To estimate the influenzaattributable rate of hospitalization, the rate of hospitalization during the peri-influenza season (November 1–April 30 without influenza activity) was subtracted from the rate during the influenza season, as described by Neuzil et al. (4). The number of cases attributable to influenza was calculated by using the corresponding person-months and length of the influenza season each year. The same approach was then applied to calculate hospitalizations attributable to RSV. Influenza (RSV) season was defined as consecutive weeks with at least 1% of the annual influenza (RSV) specimens being positive. Periseason differences II. Izurieta et al. (5) defined the RSV or influenza season as 2 or more consecutive weeks in which each week accounted for at least 5% of the total annual number of specific virus isolates. For the ‘‘predominant’’ period, the frequency of one virus was above while the other was below 5%. The rate of hospitalization attributable to a specific virus, for each year and diagnosis, was the difference between the mean weekly hospitalization rate during predominant periods of the specific virus and the mean weekly hospitalization rate during the periseasonal baseline (2 or more consecutive weeks during October– May with less than 5% of the total annual isolates of influenza and/or RSV). The mean weekly number of cases attributable to each virus was calculated from the attributable rate by using the corresponding person-months denominator. Am J Epidemiol 2009;170:925–936
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
Although this model was the first to explore the independent effects of influenza and RSV, its complexity precludes interpretation and limits large-scale application. Thompson et al. (8) added a trigonometric function and a quadratic term to a Poisson multivariate regression (8), which enabled estimation of independent influenza and RSV effects on mortality with adjustment for seasonality. Mangtani et al. (9) used a multivariate Poisson regression model and added autoregressive terms to account for the residual correlation observed after inclusion of all covariates. The above studies using Poisson regression relied on a log-link function. This link implies multiplicative effects of respiratory viruses, an unrealistic assumption. To model additive effects, recent approaches have used an identity link (10, 11). A Box-Jenkins or autoregressive integrated moving average transfer function model has been used to circumvent 2 problems of traditional regression models: 1) the unrealistic assumption that consecutive influenza counts or admissions are independent and 2) their inability to detect time-delayed associations, which often exist between increasing influenza activity and hospitalization or mortality. Box-Jenkins transfer function models enable estimation of the strength of the association between 2 time series while controlling for autocorrelation and allowing for a delayed or lagged correlation between the 2 series (12, 13), and they have been used to estimate the health care utilization attributable to respiratory viruses (14, 15). Each of these statistical methods is still variously in use. Despite heterogeneity in methods and underlying assumptions, results from studies that use these methods are cited widely and are used to inform public policy and decision making. Newer and more sophisticated statistical methods may have been interpreted as an assurance that the results are more accurate. In fact, to our knowledge, indirect estimates generated from statistical methods have not been compared with direct epidemiologic measures based on prospective data with an individual-level virus diagnosis. The purpose of this study was to illustrate how estimates derived from 6 statistical methods of RSV- and influenza-attributable hospitalization compare with observations from a prospective study with virus detection in children hospitalized for acute respiratory illness. Infants and toddlers aged 6–23 months were selected because this age group was recently added to the recommended list of persons eligible for routine, publicly funded influenza immunization in Canada and the United States (2004) based on indirect estimates of hospitalization (16, 17).
Respiratory Virus–Attributable Hospitalization
Am J Epidemiol 2009;170:925–936
RESULTS Prospective study
During the 2001–2002 and 2002–2003 seasons, 448 children with acute respiratory illness were enrolled, of whom 230 (51%) were 6–23 months of age. Of these children, 168/ 230 (73%) had at least one respiratory virus identified by polymerase chain reaction, antigen detection, or viral culture, including 119 (52%) of RSV and 39 (17%) of influenza A/B. A diagnosis of P&I was given to 55/230 (24%) and of bronchiolitis to 121/230 (53%) children 6–23 months of age. In 2001–2002 and 2002–2003, respectively, RSV was detected in 48% and 47% of infants/toddlers hospitalized with P&I and in 60% and 74% hospitalized with bronchiolitis. Influenza-attributable proportions for P&I hospitalization were 36% and 17% and for bronchiolitis were 25% and 6%, respectively. As many as 11% of P&I or bronchiolitis hospitalizations were associated with RSV/influenza coinfection during these 2 seasons. Provincial viral surveillance data
For the 1998–1999 through 2004–2005 seasons, a seasonal average of 2,233 RSV detections (range, 1,621–2,619) were reported, providing an average rate of 29.6 detections per 100,000 population overall. During the same period was a seasonal average of 1,927 influenza virus detections (range, 842– 4,159), giving an average rate of 25.5 per 100,000 inhabitants. Weekly number of positive tests for RSV ranged from 0 to 276 (mean, 43; median, 13). Weekly number of positive tests for influenza ranged from 0 to 520 (mean, 37; median, 1). RSV seasons were always longer than influenza seasons (mean, 22 weeks vs. 17 weeks; P ¼ 0.002) (Web Figure 1; this supplementary figure is posted on the Journal’s website (http:// aje.oupjournals.org/)). For 1999–2000 and 2002–2003, the period of circulation of the 2 viruses overlapped completely. During the 5 other seasons, the peak weeks of RSV and influenza were 3–8 weeks apart. Primary hospital discharge diagnoses
During the study period, there were an annual average of 1,478 P&I admissions (range, 1,305–1,634) and 1,555 bronchiolitis admissions (range, 1,528–1,619) of infants/ toddlers. The weekly average number of P&I admissions was 28 (range, 1–94; median, 22) compared with 30 for bronchiolitis (range, 1–121; median, 21) (Web Figure 1). Bronchiolitis and P&I admissions were positively correlated with viral counts (Figure 1). The magnitude of the correlation was much greater for RSV viral counts with P&I (R ¼ 0.84) and bronchiolitis (R ¼ 0.91) hospital admissions than for influenza viral counts with P&I and bronchiolitis admissions (R ¼ 0.64 and R ¼ 0.46, respectively). Within multiplicative models, the Poisson regression with autoregressive terms (Appendix) provided the best fit to the data. However, its estimates were almost 2-fold less than those for models without autoregressive terms and 2-fold to 4-fold less than those for the prospective study. Therefore, in this paper, we present the results obtained by using only the Poisson regression model without autoregressive terms,
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
Generalized linear models. Multivariate analysis was performed with generalized linear models by using a logor identity-link function and a Poisson or negative-binomial distribution of the response variable (Appendix). The dependent variables were P&I or bronchiolitis admissions; explanatory variables were RSV and influenza weekly positive tests. A product term between RSV and influenza was added to the models to allow for nonadditive effects of the 2 viruses. Such an effect may occur if there is coinfection, with the 2 viruses causing one disease instead of 2, and the cases attributable to RSV and influenza subsequently not summing to the total number of cases because the infections are not mutually exclusive. By including seasonal patterns for annual and biannual cycles that are common in most time-varying risk factors, we adjusted for those factors (including variables not measured in our study, for example, other respiratory pathogens or environmental factors). We also included covariates for mean outdoor weekly temperature (registered at the Dorval meteorological station situated in the Montreal area of Canada) and indicator variables for the Christmas and New Year periods (indicative of health care utilization). For the multiplicative (log-link) model, the proportions of admissions attributable to RSV and to influenza were calculated by using the adjusted rate ratios for RSV or influenza for each week of the series, assuming that the whole population is exposed to the respiratory viruses (attributable proportion ¼ (rate ratio 1)/rate ratio 3 100) (9, 20, 21). For the additive model with an identity link, RSV- and influenza-attributable admissions were calculated as the difference between the model-predicted admissions with RSV and influenza and the model-predicted admissions in the absence of RSV or influenza. The proportion attributable to each virus was then calculated by dividing the sum of weekly specific admissions associated with each virus for every season (and for the total study period) by total admissions for every season (and for the total study period). Box-Jenkins model. Box-Jenkins (autoregressive integrated moving average) models were fit for every series by using the standard approach of identification, estimation, and checking (12, 13) (Appendix). Transfer-function models were developed to describe the relation between the admission (target or output) series and each virus count (explanatory or input) series (12). To assess the global effect of the virus on admissions, we calculated the effect of the transfer function for each virus (Appendix). The proportion attributable to each virus was then calculated by dividing the number of P&I and bronchiolitis admissions associated with each virus by the total P&I and bronchiolitis admissions. We compared estimates of the seasonal proportion of P&I and bronchiolitis hospital admissions attributable to RSV or influenza based on the above statistical methods relative to each other and to observations from the prospective study by using virus confirmation. We also compared the temporal correspondence of weekly estimates with the epidemic curves for hospitalization and viral surveillance. SAS software (version 9.1, SAS Institute, Inc., Cary, North Carolina) was used for all analyses. A P value of 0.05). During the 2 seasons of the prospective study, no single statistical method consistently reflected prospective estimates (Figure 2). With the exception of periseason method I for influenza during 2002–2003, all statistical point estimates underrepresented the influenza-attributable proportion measured prospectively. Estimates were more consistent and were higher for RSV for both P&I and bronchiolitis hospitalization and were most closely matched to prospective results by the negative binomial regression and Box-Jenkins models. Comparison of weekly estimates from different methods
Weekly admissions attributed to RSV and influenza by the 6 statistical methods are compared in Figure 3 for the 2001– 2002 season (similar patterns were observed for other Am J Epidemiol 2009;170:925–936
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
Figure 1. Pneumonia/influenza and bronchiolitis weekly hospital admissions of children aged 6–23 months plotted against the weekly provincial count of positive viral surveillance tests, with least-squares lines and Pearson correlation coefficients (all P < 0.0001). A) Pneumonia/influenza and respiratory syncytial virus (RSV), B) pneumonia/influenza and influenza, C) bronchiolitis and RSV, D) bronchiolitis and influenza.
Respiratory Virus–Attributable Hospitalization
929
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
Figure 2. Proportions of hospitalizations attributable to respiratory syncytial virus (RSV) and influenza virus obtained by using different statistical methods, by study year, compared with prospective study results obtained in 2001–2002 and 20022003. Black squares with thick black vertical lines: prospective study with 95% confidence interval; white diamonds: periseason I; black diamonds: periseason II; white circles: Serfling method; black circles: Poisson regression; white triangles: negative binomial regression; black triangles: Box-Jenkins method. NA, estimates not available for these seasons with the periseason method II.
Am J Epidemiol 2009;170:925–936
930 Gilca et al.
seasons). The periseason methods often attributed the same hospitalizations to RSV and influenza because of overlapping ‘‘epidemic weeks’’ (Figure 3A). The Serfling regression estimates followed the number of hospital admissions more closely, but not the periods and intensity of RSV or influenza transmission (Figure 3A). Weekly estimates provided by all multivariate methods followed indicators of virus circulation well but were less well aligned with weekly admissions (Figure 3B). DISCUSSION
In this study, we compared estimates of hospitalizations for RSV and influenza calculated by using 6 statistical methods applied to ecologic data to estimates obtained from
a prospective study with virologic confirmation. These 6 statistical methods are still in use, and studies using these methods have directly influenced policy decisions about influenza vaccination. None of the statistical methods examined provided accurate or consistent estimates of disease burden for both viruses as compared with the prospective study. Moreover, our comparison revealed substantial within-season and between-method variation in estimates obtained by using the statistical methods. The variability between the 6 statistical methods is likely due to a combination of incorrect or violated assumptions, limits imposed by nonspecific syndromic outcomes incompletely recorded and captured in administrative databases, and reliance on laboratory surveillance data for virus circulation aggregated at the population level without accounting Am J Epidemiol 2009;170:925–936
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
Figure 3. Weekly respiratory syncytial virus (RSV) and influenza-attributable hospitalizations according to different statistical methods for obtaining pneumonia/influenza estimates for one season (Centers for Disease Control and Prevention weeks 40 to 20 are presented). A) Univariate methods, B) multivariate methods. Dashed, thick black lines: influenza; solid, thick black lines: RSV; solid, thick gray lines: observed pneumonia/ influenza admissions; dark gray areas: pneumonia/influenza attributable to influenza; light gray areas: pneumonia/influenza attributable to RSV; hatched areas: pneumonia/influenza attributable to both RSV and influenza. The upper 95% confidence interval limit of the model of Serfling pneumonia/influenza attributable to influenza is shown by the dashed, thin black line. The upper 95% confidence interval limit of the model of Serfling pneumonia/influenza attributable to RSV is shown by the solid, thin black line.
Item Considered
Periseason I
Periseason II
Serfling-type Regression
Poisson Regression With Log Link
Negative Binomial Regression With Identity Link
Box-Jenkins Method
Virus circulation
The virus of interest circulates during epidemic periods only.a
Disease attributed to the virus
During epidemic periods, all disease above the baseline rate is attributed to the virus of interest. Disease attributed to the virus is similar in all weeks of the season.a
All disease above the upper 95% CI predicted by the model (baseline) is attributed to the virus of interest.a
Disease attributed to the virus of interest is proportional to the quantity of the virus circulated in a given week.
No. of respiratory viruses
Only one virus is causing the excess in the rate during the epidemic period.a
Only RSV and influenza cause the excess in the rate during the epidemic period.a
Only one virus is causing the excess in the rate during the epidemic period.a
Models can include any number of viruses.
Epidemic periods
Consecutive weeks with 1% of annual positive specimens.
Consecutive weeks with 5% of annual positive specimens.
Periods when observed data exceed the upper 95% CI predicted by the model. Baseline cycle length and peak height occur at the same date each year throughout the study period.a
NA
NA
NA
Independence of observations
NA
NA
Observations are independent.a
Observations are independent.a
Observations are independent.a
Consecutive observations are correlated.
Virus virulence
NA
NA
The virulence of the virus is the same throughout the study period.a
Coinfection
There is no coinfection.a
There is no coinfection.a
Coinfection may exist.
Mathematical relation
NA
NA
The relation between the frequency of virus detection and the frequency of the outcome is loglinear.a
Virus can circulate throughout the year.
Each additional virus isolate multiplies the number of admissions.a
Each additional virus isolate contributes additively to the admissions.
The relation between the frequency of virus detection and the frequency of the outcome is linear.
Respiratory Virus–Attributable Hospitalization
Abbreviations: CI, confidence interval; NA, not applicable; RSV, respiratory syncytial virus. Assumptions that are violated or unrealistic.
a
931
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
Am J Epidemiol 2009;170:925–936
Table 1. Assumptions of Statistical Methods Used to Estimate Hospitalizations Attributable to Respiratory Viruses
932 Gilca et al.
from the Box-Jenkins time-series model, which accounts for autocorrelation of sequential observations. An ideal model for attributing hospital admissions to a specific respiratory virus should be able to account for variation in the timing, intensity, and virulence of activity within and between seasons. It should also deal appropriately with the cocirculation of other viruses causing infections with a similar clinical presentation and accurately estimate their separate and combined impact. Finally, the model should also be able to accommodate the strong correlation in outcomes and covariates between consecutive weeks. Despite all these qualities, such a model would not be ideal unless validation against epidemiologic studies, with viral confirmation of disease, demonstrates that it predicts or accurately explains the reality. Our validation process has several limitations. First, we used prospective data from a single hospital during a 2-year period to serve as the basis for validity assessment. This hospital serves 10% of Quebec children and may not represent the pediatric RSV/influenza epidemiology for the province. Second, we relied on provincial virus surveillance data aggregated across all age groups, whereas doing so may not accurately reflect viral transmission in the subset of children 6–23 months of age. Onset, duration, and intensity of virus circulation also vary across space and time (25), so provincial data may not reflect local associations. Finally, clinical suspicion and frequency of specimen collection/testing vary throughout the winter season and can influence the measured effect size (26). Notwithstanding these limitations, our study provides insight into the relative behavior of different statistical models: all estimates used the same source data over 7 winter seasons. Over the past decades, considerable effort has been directed toward developing more sophisticated statistical models to estimate the disease burden of respiratory infections. The greater complexity of these methods confers the impression of accurate results. Using the example of hospitalizations due to P&I and bronchiolitis in young children, however, we have illustrated that routinely used methods are neither accurate nor reliable. Quite simply, the only way to establish the accuracy of these methods is to validate them through large epidemiologic studies incorporating prospective laboratory confirmation. Otherwise, these statistical methods will remain analogous to a hall of mirrors offering distorted images, with no one knowing which is closest to the reality. This validation is costly, and the results will likely apply to only the viruses, outcomes, and age groups assessed. Because major public health decisions are made and population programs are promoted on the basis of estimates from these statistical methods, investment in their careful validation is essential to guide rational public policy decisions. Until then, their limitations should be made explicit and estimates used cautiously.
ACKNOWLEDGMENTS
Author affiliations: Department of Social and Preventive Medicine, Faculty of Medicine, Laval University, Quebec, Am J Epidemiol 2009;170:925–936
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
for variation by age group or other drivers of diagnostic testing or reporting. Univariate methods are simple, but their core assumptions are grossly unrealistic (Table 1). In the typical scenario of cocirculation of viruses, they are unable to properly estimate the effect of each virus. When attributing to the virus the aggregated periseason difference, such methods assume a similar burden of disease during each week of the season, which is unrealistic. By using trigonometric terms, Serfling regression assumes a constant cycle length and a constant peak height corresponding to the average curve from previous years. In reality, epidemic seasons occur at different times every year, which biases calculations based on an excess above the upper confidence interval of the ‘‘normal average’’ cycle. Finally, univariate methods assume no exposure to the virus of interest during the baseline period, even when there is one. Multivariate methods address cocirculation and estimate independent viral effects on hospitalization, but they have other limitations. Multivariate methods are all based on the correlation between laboratory surveillance for virus circulation and hospital admissions. This correlation varies between virus, outcomes, and age group. For young children, with higher correlation coefficients for RSV than for influenza, these methods appeared more capable of consistently estimating the trends of RSV-attributable hospitalization compared with influenza. That correlation would be different in older age groups, where RSV is not such a dominant cause of hospitalization but where influenza plays a greater role. All of the statistical methods examined calculate a single estimate of the overall strength of association between circulating viruses and disease that reflects the risk of being hospitalized for each unit of virus observed from viral surveillance. In fact, this association is not constant between years and varies more from year to year for influenza than for RSV. In our study, there appeared to be more betweenseason fluctuation in influenza-attributable hospitalizations compared with RSV, but this seasonal variation was matched by an equivalent or greater degree of variation between methods within a single season for estimating influenza burden. The 1999–2000, 2003–2004, and 2004–2005 influenza seasons were relatively severe in Canada, with a particular pediatric predilection during 2003–2004 (22, 23). No multivariate method detected these differences (Figure 2). When 2 viruses are cocirculating, log-linear models unrealistically assume multiplication of their risk to cause hospital admissions. This assumption results in an overestimation well illustrated during weeks in which nearly all pneumonia was attributed to RSV and influenza (Figure 3B). Regression models with an identity link treat these infections as additive risks, a more realistic assumption. Negative binomial regression with an identity link provided estimates closer to prospective evaluation than those obtained from Poisson regression with a log link. Linear regression assumes that observed data are the result of independent random variables. In fact, virus transmission creates strong dependencies between consecutive outcomes because diseased individuals are linked through chains of contact (24). Although the assumption of independence was violated in the negative binomial regression with identity link, its estimates were similar to those obtained
Respiratory Virus–Attributable Hospitalization
REFERENCES 1. Serfling RE. Methods for current statistical analysis of excess pneumonia-influenza deaths. Public Health Rep. 1963;78(6): 494–506. 2. Simonsen L, Clarke MJ, Williamson GD, et al. The impact of influenza epidemics on mortality: introducing a severity index. Am J Public Health. 1997;87(12):1944–1950. 3. Simonsen L, Reichert TA, Viboud C, et al. Impact of influenza vaccination on seasonal mortality in the US elderly population. Arch Intern Med. 2005;165(3):265–272. 4. Neuzil KM, Mellen BG, Wright PF, et al. The effect of influenza on hospitalizations, outpatient visits, and courses of antibiotics in children. N Engl J Med. 2000;342(4): 225–231. 5. Izurieta HS, Thompson WW, Kramarz P, et al. Influenza and the rates of hospitalization for respiratory disease among infants and young children. N Engl J Med. 2000;342(4): 232–239. 6. Fleming DM, Pannell RS, Elliot AJ, et al. Respiratory illness associated with influenza and respiratory syncytial virus infection. Arch Dis Child. 2005;90(7):741–746. 7. Nicholson KG. Impact of influenza and respiratory syncytial virus on mortality in England and Wales from January 1975 to December 1990. Epidemiol Infect. 1996;116(1):51–63. 8. Thompson WW, Shay DK, Weintraub E, et al. Mortality associated with influenza and respiratory syncytial virus in the United States. JAMA. 2003;289(2):179–186. 9. Mangtani P, Hajat S, Kovats S, et al. The association of respiratory syncytial virus infection and influenza with emergency admissions for respiratory disease in London: an analysis of routine surveillance data. Clin Infect Dis. 2006; 42(5):640–646. 10. Schanzer DL, Langley JM, Tam TW. Hospitalization attributable to influenza and other viral respiratory illnesses in Canadian children. Pediatr Infect Dis J. 2006;25(9): 795–800. 11. Markov PV, Crowcroft NS. Modelling the unidentified mortality burden from thirteen infectious pathogenic microorganisms in infants. Epidemiol Infect. 2007;135(1):17–26. Am J Epidemiol 2009;170:925–936
12. Helfenstein U. The use of transfer function models, intervention analysis and related time series methods in epidemiology. Int J Epidemiol. 1991;20(3):808–815. 13. Helfenstein U. Box-Jenkins modelling in medical research. Stat Methods Med Res. 1996;5(1):3–22. 14. Upshur RE, Knight K, Goel V. Time-series analysis of the relation between influenza virus and hospital admissions of the elderly in Ontario, Canada, for pneumonia, chronic lung disease, and congestive heart failure. Am J Epidemiol. 1999; 149(1):85–92. 15. Schull MJ, Mamdani MM, Fang J. Community influenza outbreaks and emergency department ambulance diversion. Ann Emerg Med. 2004;44(1):61–67. 16. Harper SA, Fukuda K, Uyeki TM, et al. Prevention and control of influenza: recommendations of the Advisory Committee on Immunization Practices (ACIP). MMWR Recomm Rep. 2004; 53(RR-6):1–40. 17. Orr P. National Advisory Committee on Immunization. An Advisory Committee Statement (ACS). National Advisory Committee on Immunization (NACI). Statement on influenza vaccination for the 2004–2005 season. Can Commun Dis Rep. 2004;30:1–32. 18. Boivin G, De Serres G, Coˆte´ S, et al. Human metapneumovirus infections in hospitalized children. Emerg Infect Dis. 2003;9: 634–640. 19. Gilca R, De Serres G, Tremblay M, et al. Distribution and clinical impact of human respiratory syncytial virus genotypes in hospitalized children over two winter seasons. J Infect Dis. 2006;193(1):54–58. 20. Tobias A, Dı´az J, Saez M, et al. Use of Poisson regression and Box-Jenkins models to evaluate the short-term effects of environmental noise levels on daily emergency admissions in Madrid, Spain. Eur J Epidemiol. 2001;17(8):765–771. 21. Linares C, Dı´az J, Tobias A, et al. Impact of urban air pollutants and noise levels over daily hospital admissions in children in Madrid: a time series analysis. Int Arch Occup Environ Health. 2006;79(2):143–152. 22. Public Health Agency of Canada. Influenza in Canada: 2003– 2004 season. Can Commun Dis Rep. 2005;31(1):1–18. 23. Public Health Agency of Canada. Influenza in Canada—2004– 2005 season. Can Commun Dis Rep. 2006;32(6):57–74. 24. Koopman J. Modeling infection transmission. Annu Rev Public Health. 2004;25:303–326. 25. Mullins JA, Lamonte AC, Bresee JS, et al. Substantial variability in community respiratory syncytial virus season timing. Pediatr Infect Dis J. 2003;22(10):857–862. 26. Quenel P, Dab W, Hannoun C, et al. Sensitivity, specificity and predictive values of health service based indicators for the surveillance of influenza A epidemics. Int J Epidemiol. 1994; 23(4):849–855. 27. Brumback BA, Ryan LM, Schwartz JD, et al. Transitional regression models, with application to environmental time series. J Am Stat Assoc. 2000;95:16–26. 28. Touloumi G, Atkinson R, Le Tertre A, et al. Analysis of health outcome time series data in epidemiological studies. Environmetrics. 2004;15:101–117. 29. Agresti A. Categorical Data Analysis. New York, NY: John Wiley & Sons; 2002. 30. Box GE, Jenkins GM, Reinsel GC. Time Series Analysis— Forecasting and Control. 4th ed. Hoboken, NJ: John Wiley & Sons Inc; 2008. 31. Helfenstein U. Box-Jenkins modelling of some viral infectious diseases. Stat Med. 1986;5(1):37–47. 32. Brocklebank JC, Dickey DA. SAS for Forecasting Time Series. 2nd ed. Cary, NC: SAS Institute, Inc; 2003.
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
Canada (Rodica Gilca, Gaston De Serres); Department of Medical Biology, Laval University, Quebec, Canada (Guy Boivin); Direction Risques Biologiques, Environnementaux et Occupationnels, Institut National de Sante´ Publique du Que´bec, Que´bec, Canada (Rodica Gilca, Gaston De Serres); British Columbia Centre for Disease Control, British Columbia, Canada (Danuta Skowronski); and Department of Epidemiology and Biostatistics, McGill University, Montre´al, Que´bec, Canada (David L. Buckeridge) This work was supported by a graduate scholarships doctoral award from the Canadian Institutes of Health Research to R.G. G. D. S. is supported by a scholarship from the Fond de la recherche´ en sante´ du Que´bec. D. L. B. is supported by a Canada Research Chair in Public Health Informatics. The authors appreciate the help of Michel Couillard from the Laboratoire de sante´ publique du Que´bec for providing provincial viral surveillance data and of Philippe de Wals for useful comments on the manuscript. Conflict of interest: none declared.
933
934 Gilca et al.
APPENDIX Serfling-type cyclic regression model
We applied a seasonal regression model to the series with weekly viral-positive tests, excluding values for the period December–April: Yt ¼ a þ b1 t þ b2 cosð2pt=52Þ þ b3 sinð2pt=52Þ þ et ;
Admissions ¼ b1 ½Influenzaþb2 ½RSV þb3 ½Influenza½RSVþb4 ½sinð2p week=52Þ þb5 ½cosð2p week=52Þþb6 ½sinð2p week=26Þ þb7 ½cosð2p week=26Þþb8 ½Year þ b9 ½Christmas and New Year þ b10 ½temperature:
Box-Jenkins approach Generalized linear models Multiplicative (log-linear) models. For the multiplicative (log-link) model, inspection of model residuals suggested that admissions on any given week were correlated with those up to 3 weeks previously. To account for this serial correlation in admissions, we added as many as 3 autoregressive terms to the model (27). This approach has been used extensively to analyze environmental pollution effects on human health (28). Recently, it was used to estimate associations between RSV and influenza with emergency admissions for respiratory disease (9). A scale dispersion parameter (Pearson) was used to correct for overdispersion. Because the estimates of admissions attributable to respiratory viruses obtained by using the Poisson regression models with autoregressive terms performed least well within the multivariate models as compared with the prospective study, the results from only the Poisson regression without autoregressive terms are presented here. Multiplicative (loglinear) models were developed as follows:
Admissions ¼ expðb1 þb2 ½RSVþb3 ½Influenza½RSV þb4 ½sinð2p week=52Þþb5 ½cosð2p week=52Þ þb6 ½sinð2p week=26Þþb7 ½cosð2p week=26Þ þb8 ½Yearþb9 ½Christmas and New Year þb10 ½temperatureÞ; where Admissions ¼ number of bronchiolitis or pneumonia/ influenza admissions for a given week, b1 and b2 are coefficients associated with weekly counts of influenza and RSV, b3 is the product term between influenza and RSV, b4–b7 account for seasonal changes in admissions, b8 and b9 are the indicator variables for year and for the Christmas and New Year periods, and b10 accounts for changes in temperature.
In the Box-Jenkins model, the current time-series observation Yt is explained by a linear combination of the p previous observations Yt1, . . ., Ytp (denoted as autoregressive component p), a linear combination of the q previous random shocks or disturbances at1, . . . atq (denoted as moving average component q), and a constant term. The error term is represented by at (30). For adequate Box-Jenkins modeling, the time series must be stationary. Stationarity (or statistical equilibrium) means that the series has a constant mean and variance over time (for each seasonal period). If the variance is related to the mean, then a variance-stabilizing transformation has to be applied. The mean-range plot (the range of the data plotted against the mean for each seasonal period) (31) and the studentized Dickey-Fuller test (32) suggested that the time series of admissions and respiratory viruses do not need transformation. Univariate models. Identification enables specification of the model, that is, identification of the order of the autoregressive (p) and moving average (q) operators. The dependency structure of the times series was ascertained by the autocorrelation function and the partial autocorrelation function. The autocorrelation function measures the correlation between Yt and Ytþk, where k is the number of intervals between consecutive observations. The partial autocorrelation function measures the correlation between Yt and Ytþk except that the effect of the intervening observations Ytþ1, Ytþ2, . . ., Ytþk1 is removed. Autocorrelation function and partial autocorrelation function patterns give indications on the order of autoregressive and moving average terms. Once the model was specified, parameters of the model were estimated by using the maximum likelihood method. Diagnostic checking of the models was conducted by examining their residuals. Final models were those in which residuals behaved as white noise; that is, no significant autocorrelations between residuals were detected. Akaike’s Information Criterion was used to identify Am J Epidemiol 2009;170:925–936
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
where Yt is the observed weekly number of viral-positive tests, t is the index for the week, and et is the error term. We identified influenza or RSV epidemic periods by applying this procedure to the weekly number of influenza or RSV surveillance positive tests. The ‘‘epidemic’’ seasons were defined as those weeks for which viral-positive tests exceeded the upper 95% confidence limit of that predicted by the model. Then, we applied the obtained models to bronchiolitis and P&I admissions. The observed admissions exceeding the upper 95% confidence limit of that predicted by the model were attributed to the corresponding virus.
Additive-type (linear) models. For the additive model with an identity link, we used the negative binomial distribution, which assumes that the conditional distribution of the response variable is Poisson but that the mean parameter for the subjects follows a gamma distribution (29). This mixture distribution accounts for subject heterogeneity and overdispersion. Additive-type (linear) models were developed as follows (explanations for the coefficients are similar to those presented above):
Respiratory Virus–Attributable Hospitalization
the appropriate model by minimizing the residual variance. The most parsimonious model with the fewest parameters was chosen. No moving average terms were retained in the final models. The final univariate models presented as follows: Yt ¼ Constant þ /1 Yt1 þ /2 Yt2 þ . . . þ /p Ytp þ et ;
Am J Epidemiol 2009;170:925–936
P&I admissionst ¼ Constant þ /flu ½Influenzat þ
2 X
/RSV RSVti þ /year ½Yeart
i¼0
þ/christmas
newyear ½Christmas and New Yeart
þ/temp ½temperaturet þ nt ; Bronchiolitist ¼ Constant þ /flu ½Influenzat þ
1 X
/RSV RSVti þ /year ½Yeart
i¼1
þ/christmas
newyear ½Christmas and New Yeart
þ/temp ½temperaturet þ nt ; where / represents the impact of the input series of RSV, influenza, temperature, year, and Christmas and New Year on the admissions on weekt; and nt is an acute respiratory illness process, as described above, and that represents the unexplained part of admissionst. Each input series represented a step transfer function, which predicts that following the known step change in the input series, an output series step change is produced immediately or with a delay. This is in contrast to impulse function representing an unusual event that acts only at time T. The step function can be modeled with numerator factors (x0 for immediate effect or x1k for lagged effect). If the change in the output (steady state) is reached only gradually, the transfer function includes a denominator and has the form x0/(1 d). The admissions attributable to RSV or influenza were modeled as a linear function of the current and recent values of the viruses and can be presented as follows: Model for P&I admissions
P&I admissions attributable to RSVt ¼ xRSV 0 RSVt þ xRSV 1 RSVt1 þ xRSV 2 RSVt2 P&I admissions attributable to influenzat ¼ xFlu 0 Influenzat Model for bronchiolitis admissions
Bronchiolitis admissions attributable to RSVt ¼ xRSV 0 RSVt þ xRSV 1 RSVt1 Bronchiolitis admissions attributable to Influenzat ¼ xFlu 0 Influenzat ðthe xFlu 0 is not significantÞ Transfer functions models showed a significant immediate (1 week) and delayed (2 weeks) impact of the RSV time series on the P&I admissions. Influenza virus time series had a significant, immediate impact on the P&I admissions. RSV time series had a significant immediate and delayed, by 1 week, impact on the bronchiolitis admissions, whereas
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
where Yt are observations (P&I admissions, bronchiolitis admissions, RSV, and influenza counts) at week t; Yt1 . . . Ytp are observations lagged by 1 to p weeks; /1, . . ., /p are autoregressive terms; and et is the error term. Cross-correlation function. The relation between 2 time series is established by the cross-correlation function, which determines the correlation between the 2 series as a function of the lag. The cross-correlation estimates at different lags may be correlated because of the autocorrelation within each individual series. To overcome this, the series are whitened by using the structure of the input series as a filter for the output series. First, we applied the univariate model for the input series (RSV, influenza) to convert the correlated series into an approximately independent series RSVind, influenzaind. Second, we applied the identical procedure to the output series (admissions), which produced a new series of admissions (P&Ifiltered, bronchiolitisfiltered). The cross-correlation function between RSVind, influenzaind and P&Ifiltered, bronchiolitisfiltered (the prewhitened crosscorrelation function) shows at which lags input and output series are related (30). The cross-correlation function between admissions and virus showed significant correlations at time lag 1, which, in statistical terms, means that admissions affect the virus counts with a 1-week lag. This function does not allow applying a transfer model approach to the data. Given that health service indicators advance detection of outbreaks by 1–4 weeks (26), in the subsequent analysis, we used time series of admissions delayed by 1 week. The cross-correlation function between the 1-week delayed admissions and input series suggested that RSV increases the P&I admissions immediately, after 1 week and after 2 weeks; influenza increases the P&I admissions immediately; RSV increases the bronchiolitis admissions immediately and after 1 week; and influenza does not have a significant impact on bronchiolitis admissions. Transfer function models. Transfer functions estimate the dynamic linear relation between the input and the output series. First, we identified preliminary parameters of transfer functions for the relation between admissions and respiratory viruses based on the results of the cross-correlation function. Second, a Box-Jenkins model was fitted to the remaining noise, and residuals were tested for white noise structure. Third, a repeat identification of the transfer function was conducted. Finally, we included in the models input series for mean weekly temperature, years, and the Christmas and New Year periods. Final models were chosen after repeated identification of the order of autoregressive operators, estimation of the parameters, and diagnostic checking of the models. The final models can be presented as follows:
935
936 Gilca et al.
the influenza virus time series impact was not significant. Because the estimated transfer function had a step pattern and consisted of numerator factors only, the expected number of admissions associated with each virus for a given week was calculated by multiplying the viral counts for the corresponding week by the transfer function model coefficients for the corresponding week and their 95% confidence interval. Because the transfer function for RSV
contained 2 or more significant coefficients at different time lags (2 for bronchiolitis and 3 for P&I), the admissions attributable to RSV for a given week was the sum of the 2 or 3 numbers estimated at different lags (e.g., the bronchiolitis attributable to RSV for a given week was the sum of the effect of RSVon that week and the previous week). The total admissions attributable to specific virus was the sum of admissions estimated for every week of the series.
Downloaded from http://aje.oxfordjournals.org/ by guest on October 27, 2015
Am J Epidemiol 2009;170:925–936