diseases: From statistical method to application. Michael ... Example: Monitoring German Salmonella Newport Cases .....
Automated outbreak detection of communicable diseases: From statistical method to application Michael H¨ ohle1 joint work with
M.
Salmon2 ,
D.
Schumacher2 ,
1 Department 2 (currently
H. Burmann2 , C. Frank2 and H. Claus2
of Mathematics, Stockholm University, Sweden or formerly) Robert Koch Institute, Germany
Seminar, 24th May 2016 RIVM, Bilthoven, The Netherlands
Outbreak detection
M. H¨ ohle
1 / 31
Outline
1
Motivation for the Online Monitoring of Count Data
2
A System for Automated Outbreak Detection in Germany
3
Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond
4
Discussion
Outbreak detection
M. H¨ ohle
2 / 31
Motivation for the Online Monitoring of Count Data
Outline
1
Motivation for the Online Monitoring of Count Data
2
A System for Automated Outbreak Detection in Germany
3
Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond
4
Discussion
Outbreak detection
M. H¨ ohle
3 / 31
Motivation for the Online Monitoring of Count Data
Example: Monitoring German Salmonella Newport Cases
15 10 5
●
0
No. reported cases
20
German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):
2009 10
20
30
2010 40
50
10
20
30
2011 40
50
10
20
30
40
50
Year/Reporting Week
Outbreak detection
M. H¨ ohle
4 / 31
Motivation for the Online Monitoring of Count Data
Example: Monitoring German Salmonella Newport Cases
15 10 5
●
0
No. reported cases
20
German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):
2009 10
20
30
2010 40
50
10
20
30
2011 40
50
10
20
30
40
50
Year/Reporting Week
Outbreak detection
M. H¨ ohle
4 / 31
Motivation for the Online Monitoring of Count Data
Example: Monitoring German Salmonella Newport Cases
15 10 5
●
0
No. reported cases
20
German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):
2009 10
20
30
2010 40
50
10
20
30
2011 40
50
10
20
30
40
50
Year/Reporting Week
Outbreak detection
M. H¨ ohle
4 / 31
Motivation for the Online Monitoring of Count Data
Example: Monitoring German Salmonella Newport Cases
15 10 5
●
0
No. reported cases
20
German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):
2009 10
20
30
2010 40
50
10
20
30
2011 40
50
10
20
30
40
50
Year/Reporting Week
Outbreak detection
M. H¨ ohle
4 / 31
Motivation for the Online Monitoring of Count Data
Example: Monitoring German Salmonella Newport Cases
15 10 5
●
0
No. reported cases
20
German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):
2009 10
20
30
2010 40
50
10
20
30
2011 40
50
10
20
30
40
50
Year/Reporting Week
Outbreak detection
M. H¨ ohle
4 / 31
Motivation for the Online Monitoring of Count Data
Example: Monitoring German Salmonella Newport Cases
15 10 5
●
0
No. reported cases
20
German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):
2009 10
20
30
2010 40
50
10
20
30
2011 40
50
10
20
30
40
50
Year/Reporting Week
Outbreak detection
M. H¨ ohle
4 / 31
Motivation for the Online Monitoring of Count Data
Example: Monitoring German Salmonella Newport Cases
5
10
15
q0.95 of predictive distribution
●
0
No. reported cases
20
German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):
2009 10
20
30
2010 40
50
10
20
30
2011 40
50
10
20
30
40
50
Year/Reporting Week
During Oct-Nov 2011 there was an outbreak associated with mung bean sprouts (RKI, 2012) Outbreak detection
M. H¨ ohle
4 / 31
Motivation for the Online Monitoring of Count Data
Challenges of surveillance data Issues making the statistical modelling and monitoring of surveillance time series a challenge: Lack of clear case definitions Under-reporting and reporting delays Often no denominator data Seasonality Low number of reported cases Presence of past outbreaks Existence of concurrent “explanatory” processes
Outbreak detection
M. H¨ ohle
5 / 31
Motivation for the Online Monitoring of Count Data
Methodological Feature Extraction
From a statistical viewpoint: The data are univariate or multivariate count time series There is a need to handle covariates such as time, seasonality, or other concurrent processes → regression framework We face a a sequential decision making problem → statistical process control There might be a need to adjust the analysis for occurred-but-not-yet-reported events
Outbreak detection
M. H¨ ohle
6 / 31
Motivation for the Online Monitoring of Count Data
Feature Extraction from a User’s Point of View
The amount of data collected exceeds the resources for case-based analysis → statistical summaries, visualization and automatization is needed Alarms are useless if there are too many, but missing an important outbreak is also fatal Automatic procedures provide a safety net, but I I I
Outbreak detection
The algorithms lack insights about the data generating processes Every pathogen has its special features (importance, consequences) Large accumulations in short time are noticed at the local level – no advantage of automatic monitoring here
M. H¨ ohle
7 / 31
Motivation for the Online Monitoring of Count Data
Summary
The Challenge of Outbreak Detection Systems Because of too many signals and a misalignment between users’ needs and signal presentation, output of automatic outbreak detection systems often has little impact on the practical work of public health institutions.
Our aims: System architecture and design should focus on improving impact → increase user focus Integrate new statistical developments in algorithmic biosurveillance
Outbreak detection
M. H¨ ohle
8 / 31
A System for Automated Outbreak Detection in Germany
Outline
1
Motivation for the Online Monitoring of Count Data
2
A System for Automated Outbreak Detection in Germany
3
Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond
4
Discussion
Outbreak detection
M. H¨ ohle
9 / 31
A System for Automated Outbreak Detection in Germany
Design Decisions (1) Follow checklist for computer supported outbreak detection by Hulth et al. (2010) – see also Triple-S guidelines in Hulth (2014) Follow the manifesto for agile software development1 : I I I I
Individuals and interactions Working software Customer collaboration Responding to change
over processes and tools
over comprehensive documentation
over contract negotiation
over following a plan
Start by developing a micro system for monitoring Salmonellosis in close collaboration with the responsible epidemiologist(s) Progressively scale up the system to other reporting categories by 1-on-1 discussions with the users 1
http://www.agilemanifesto.org/
Outbreak detection
M. H¨ ohle
10 / 31
A System for Automated Outbreak Detection in Germany
Design Decisions (2) – System Design To accommodate the need for weekly reports as well as ad-hoc analyses we created both a manual- and an automatic component.
Figure source: Salmon et al. (2016a) – available under a CC BY 4.0 license
Signals from the automatic component are stored in a signal database. Outbreak detection
M. H¨ ohle
11 / 31
Statistical Framework for Aberration Detection
Outline
1
Motivation for the Online Monitoring of Count Data
2
A System for Automated Outbreak Detection in Germany
3
Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond
4
Discussion
Outbreak detection
M. H¨ ohle
12 / 31
Statistical Framework for Aberration Detection
Statistical Framework for Aberration Detection (1)
Univariate time series {yt , t = 1, 2, . . .} to monitor For each time t we differentiate between two underlying states: in-control (everything is fine) & out-of-control (something is wrong). At time s ≥ 1, the available information is ys = {yt ; t ≤ s}. Based on ys an automatic detection procedure has to decide if there is unusual activity at time s (or not).
Outbreak detection
M. H¨ ohle
13 / 31
Statistical Framework for Aberration Detection
Statistical Framework for Aberration Detection (2)
The detectors are initially only based on the one-step-ahead predictive distribution at each time point (Shewhart-like control chart): I
I
I
Let G (ys |y1 , . . . , ys−1 ; θ) be the distribution of Ys in case everything is in-control. If the actual observed value Ys = ys is extreme in G , this is evidence against things being in-control. The alarm threshold a1−α,s at each time point is calculated as the (1 − α)’th quantile of the predictive distribution. If ys > a1−α,s then we have an alarm.
This can be generalized to more sequential control charts accumulating information, e.g. cumulative sum (CUSUM) methods.
Outbreak detection
M. H¨ ohle
14 / 31
Statistical Framework for Aberration Detection
Intermezzo: Estimation, prediction and uncertainty Data y are the observed value of a random variable Y characterized by a parametric model with density f (y ; θ). Aim: predict the value of a random variable Z , which, conditionally on Y = y has distribution function G (z|y ; θ), depending on θ. Simplest form of the prediction problem: iid
Y1 , . . . , Yn ∼ f (y ; θ), and the task is to predict Z = Yn+1 . In time series 1-step-ahead prediction the observations are correlated and the aim is to predict Z = Yn+1 .
Outbreak detection
M. H¨ ohle
15 / 31
Statistical Framework for Aberration Detection
Simple Algorithm for Ad-Hoc Detection
Example: Predicting a new N(µ, σ 2 ) observation (1) iid
Example: Let Y1 , . . . , Yn ∼ N(µ, σ 2 ) with unknown µ and σ 2 . Then Yn+1 − Y q ∼ t(n − 1), 1 s 1+ n where Y and s 2 are the sample mean and sample variance of Y . A (1 − 2α) · 100% two-sided prediction interval (PI) is thus given by r 1 Y ± t1−α (n − 1) · s · 1 + . n A plug-in (1 − 2α) · 100% two-sided prediction interval for µ is: Y ± z1−α · s. Both of these are not to be confused with a (1 − 2α) · 100% two-sided confidence interval for µ: s Y ± z1−α · √ . n Outbreak detection
M. H¨ ohle
16 / 31
Statistical Framework for Aberration Detection
Simple Algorithm for Ad-Hoc Detection
Example: Predicting a new N(µ, σ 2 ) observation (2) 0.25
Illustration: PIs based on n = 5 observations from N(µ, σ 2 ).
0.15 0.10 0.00
0.05
density
0.20
normal (plug−in) non−standard t
2
4
6
8
10
12
14
95% Pred. intervals
z
For n = 5 the 95% plug-in PI corresponds to a 85% PI. The 95% CI for µ is 7.0–9.8, which only corresponds to a 53% PI. Outbreak detection
M. H¨ ohle
17 / 31
Statistical Framework for Aberration Detection
Simple Algorithm for Ad-Hoc Detection
Summary: Ad-Hoc Outbreak Detection Algorithm
Predict value ys at time s = (s w , s y ) using a set of reference values from window of size 2w + 1 up to b years back. Let n = b(2w + 1) and compute threshold as the upper 97.5% quantile of the predictive distribution for ys , i.e. r 1 a0.975,s = y + t0.975 (n − 1) · s · 1 + . n Sound alarm, if ys > a0.975,s .
Outbreak detection
M. H¨ ohle
18 / 31
Statistical Framework for Aberration Detection
Farrington Algorithm and Beyond
Farrington algorithm (1) – basic model Predict value ys at time s = (s w , s y ) using a set of reference values from window of size 2w + 1 up to b years back.
300
● ● ● ● ●● ●● ●
● ● ●● ●● ●● ●
●
● ● ● ● ● ●●● ●
●● ●● ●●● ●
●
200
● ● ● ● ● ●● ● ●
0
100
No. infected
400
500
Prediction at time t=718 with b=5,w=4
450
500
550
600
650
700
Fit over-dispersed Poisson generalized linear model (GLM) to the b(2w + 1) reference values where E(yt ) = µt , Var(yt ) = φ · µt with log µt = α + βt and φ > 0. Outbreak detection
M. H¨ ohle
19 / 31
Statistical Framework for Aberration Detection
Farrington Algorithm and Beyond
Farrington algorithm (2) – outbreak detection
Predict and compare: An approximate (1 − α) one-sided prediction interval for ys based on p the GLM has upper limit a1−α,s = µ ˆs + z1−α · Var(ys − µ ˆs ) If the observed ys is greater than a1−α,s , then flag s as outbreak Refinements of the algorithm include: Computation of the prediction interval on a transformed scale Use a re-weighted fit with weights based on Anscombe residuals in order to correct for outliers
Outbreak detection
M. H¨ ohle
20 / 31
Statistical Framework for Aberration Detection
Farrington Algorithm and Beyond
Improvements of the Farrington algorithm Noufaily et al. (2013) suggest a number of improvements I
I I
use all past data as part of a 11-knot zero-order spline consisting of a 7-week reference period and nine 5-week periods only re-weight severely observations (st = 2.58) extreme µs and plug-in of µ ˆs and φˆ to find assume ys ∼ NegBin µs , φ−1
quantile of the predictive distribution
Salmon et al. (2016a) I
I
Outbreak detection
generalize the improvements to flexible zero-order splines for any type of periodicity integrate estimation uncertainty using a parametric bootstrap based on the asymptotic normality of the estimated GLM coefficients
M. H¨ ohle
21 / 31
Statistical Framework for Aberration Detection
Farrington Algorithm and Beyond
Application on Salmonella Montevideo 2009-2010 Results from the extended Farrington procedure using last five years as reference values:
Figure source: Salmon et al. (2016a) – available under a CC BY 4.0 license
Outbreak detection
M. H¨ ohle
22 / 31
Statistical Framework for Aberration Detection
Farrington Algorithm and Beyond
Salmonella Report for W41–46 of 2013 Weekly Report at National Level:
Table source: Salmon et al. (2016a) – available under a CC BY 4.0 license
Outbreak detection
M. H¨ ohle
23 / 31
Statistical Framework for Aberration Detection
Farrington Algorithm and Beyond
Excerpt of Salmonella report for W41–46 of 2013 Cluster Analysis:
Table source: Salmon et al. (2016a) – available under a CC BY 4.0 license
Outbreak detection
M. H¨ ohle
24 / 31
Discussion
Outline
1
Motivation for the Online Monitoring of Count Data
2
A System for Automated Outbreak Detection in Germany
3
Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond
4
Discussion
Outbreak detection
M. H¨ ohle
25 / 31
Discussion
Discussion The presented methods are implemented in the R package surveillance (Salmon et al., 2016a) Salmon et al. (2016b) documents the use of the above methods as backbone of the German IfSG routine surveillance system at the Robert Koch Institute (RKI) Developing, maintaining and improving automatic outbreak detection systems is an interdisciplinary activity! I I
Even more work could be put into user adaptation. Delay adjusted monitoring (Salmon et al., 2015)
The system proved to be a good insurance against missing anything important – see e.g. Gertler et al. (2015)
Outbreak detection
M. H¨ ohle
26 / 31
Discussion
Acknowledgments Persons: Ma¨elle Salmon, Dirk Schumacher and many other previous colleagues at the Robert Koch Institute, Berlin Sebastian Meyer, Michaela Paul and Leonhard Held – all currently or previously at the Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Switzerland Financial Support: German Science Foundation (DFG, 2003–2006) Munich Center of Health Sciences (2007–2010) Swiss National Science Foundation (SNF, 2007–2015) Robert Koch Institute (RKI, 2012–2015) Swedish Research Council (VR, 2016-2019)
Outbreak detection
M. H¨ ohle
27 / 31
Discussion
The End
That’s it. Thanks for the attention. q()
Michael H¨ohle, Stockholm University http://www.math.su.se/~hoehle m hoehle
Outbreak detection
M. H¨ ohle
28 / 31
Discussion
Literature I Gertler, Maximilian et al. (2015). “Outbreak of cryptosporidium hominis following river flooding in the city of Halle (Saale), Germany, August 2013”. In: BMC Infectious Diseases 15.1, p. 88. issn: 1471-2334. doi: 10.1186/s12879-015-0807-1. url: http://www.biomedcentral.com/1471-2334/15/88. Hulth, A. (2014). “First European guidelines on syndromic surveillance in human and animal health published”. In: Eurosurveillance 19.41. Available from http://www.eurosurveillance.org/ViewArticle. aspx?ArticleId=20927, pii=20927. Hulth, A., N. Andrews, S. Ethelberg, J. Dreesman, D. Faensen, W. van Pelt, and J. Schnitzler (2010). “Practical usage of computer-supported outbreak detection in five European countries”. In: Eurosurveillance 15.36.
Outbreak detection
M. H¨ ohle
29 / 31
Discussion
Literature II Noufaily, Angela, Doyo G. Enki, Paddy Farrington, Paul Garthwait, Nick Andrews, and Andr´e Charlett (2013). “An improved algorithm for outbreak detection in multiple surveillance systems”. In: Statistics in Medicine 32.7, pp. 1206–1222. RKI (2012). “Salmonella Newport-Ausbruch in Deutschland und den Niederlanden, 2011”. In: Epidemiologisches Bulletin 20. Available as http://www.rki.de/DE/Content/Infekt/EpidBull/Archiv/2012/ Ausgaben/20_12.pdf, pp. 177–184. Salmon, M., D. Schumacher, and M. H¨ ohle (2016a). “Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance”. In: Journal of Statistical Software 70.10. Also available as vignette of the R package surveillance. doi: 10.18637/jss.v070.i10.
Outbreak detection
M. H¨ ohle
30 / 31
Discussion
Literature III
Salmon, M., D. Schumacher, K. Stark, and M. H¨ ohle (2015). “Bayesian outbreak detection in the presence of reporting delays”. In: Biometrical Journal 57.6. http://dx.doi.org/10.1002/bimj.201400159, pp. 1051–1067. Salmon, M., Dirk Schumacher, Hendrik Burman, Christina Frank, Hermann Claus, and Michael H¨ ohle (2016b). “A system for automated outbreak detection of communicable diseases in Germany”. In: Eurosurveillance 21.13, pii=3018. doi: 10.2807/1560-7917.ES.2016.21.13.30180.
Outbreak detection
M. H¨ ohle
31 / 31