Real-time forecasting of infectious disease dynamics ...

1 downloads 0 Views 546KB Size Report
Oct 31, 2016 - A. Tiffany, W. J. Edmunds, S. Funk, Temporal changes in ebola. 265 ... moh, E. Pallasch, B. Pályi, J. Portmann, T. Pottage, C. Pratt, S. Pries-.
Accepted Manuscript Title: Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model Author: Sebastian Funk Anton Camacho Adam J. Kucharski Rosalind M. Eggo W. John Edmunds PII: DOI: Reference:

S1755-4365(16)30044-5 http://dx.doi.org/doi:10.1016/j.epidem.2016.11.003 EPIDEM 228

To appear in: Received date: Revised date: Accepted date:

8-7-2016 17-11-2016 17-11-2016

Please cite this article as: Sebastian Funk, Anton Camacho, Adam J. Kucharski, Rosalind M. Eggo, W. John Edmunds, Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model, (2016), http://dx.doi.org/10.1016/j.epidem.2016.11.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model

Centre for the Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom

cr

a

ip t

Sebastian Funka,∗, Anton Camachoa , Adam J. Kucharskia , Rosalind M. Eggoa , W. John Edmundsa

us

Abstract

pt

ed

M

an

Real-time forecasts of infectious diseases can help public health planning, especially during outbreaks. If forecasts are generated from mechanistic models, they can be further used to target resources or to compare the impact of possible interventions. However, paremeterising such models is often difficult in real time, when information on behavioural changes, interventions and routes of transmission are not readily available. Here, we present a semi-mechanistic model of infectious disease dynamics that was used in real time during the 2013–2016 West African Ebola epidemic, and show fits to a Ebola Forecasting Challenge conducted in late 2015 with simulated data mimicking the true epidemic. We assess the performance of the model in different situations and identify strengths and shortcomings of our approach. Models such as the one presented here which combine the power of mechanistic models with the flexibility to include uncertainty about the precise outbreak dynamics may be an important tool in combating future outbreaks.

Ac ce

Keywords: forecasting, real-time modelling, infectious disease dynamics, outbreak Introduction

5

Forecasting the incidence of infectious diseases is an important part of public health and intervention planning. This was especially true during the 2013-2016 West African Ebola epidemic, when the rapid expansion of the outbreak triggered an enormous national and international public health response in late summer 2014. From November 2014, the Centre for the Mathematical Modelling of Infectious Diseases (CMMID) at the London School of Hygiene & Tropical Medicine produced weekly situation reports ∗

Corresponding author Email address: [email protected] (Sebastian Funk)

Preprint submitted to Epidemics

October 31, 2016

Page 1 of 16

M

Ac ce

35

ed

30

pt

25

an

us

20

ip t

15

presenting updates of publicly available epidemiological data, model fits and forecasts, and estimates of key epidemiological parameters. These reports were distributed to a wide range of public health planners, policy makers, field workers and academics in several countries by email, and were made publicly available on a dedicated web site [1]. The forecasts in these situation reports were produced using a stochastic semi-mechanistic model of Ebola transmission [2]. The model was mechanistic in the sense that it relied on a compartmental description of the epidemiological status of the population based on known aspects of Ebola infection such as the incubation rate or infectious period. To model transmission between individuals, however, we used a more phenomenological, stochastic approach. During an emergency such as the Ebola epidemic, it is difficult to determine the precise factors underlying disease transmission that are required to inform a fully mechanistic model. Information about the relative importance and intensity of transmission in the community, hospital or at funerals [3], about the exact extent of control measures and their impact [4], about behavioural changes in the community [5] as well as about the potential role of seasonality [6] or genetic changes in the virus [7] were not available in real-time. To capture the overall change in transmission arising from these different mechanisms, we modelled transmission between individuals using a time-varying stochastic rate. Capturing the uncertainty in transmission in a stochastic term gives the model the flexibility to match the data in the presence of noise and uncertainty. In addition, the inferred trajectories of the transmission rate can directly be interpreted as change in the reproduction number and thus provide valuable information for decision makers, for example by indicating how far the outbreak is from being under control. Here, we present model fits and forecasts generated as part of the Ebola Forecasting Challenge conducted in 2015 after the true epidemic had waned. The challenge was based on four scenarios of synthetic data inspired from the outbreak in Liberia outbreak, with increasing levels of noise and uncertainty (see Vespignani et al., same issue). We used a similar model to the one used during the Ebola epidemic fitted to the simulated outbreak trajectories generated as part of the Challenge to produce forecasts of the number cases at upcoming time points. We particularly focus on methodological issues and forecasting performance. We report results on county-level data of scenario 1, and assess forecasts at time points 1, 2, 4 and 5 (time points 3 was not considered for logistical reasons). Results for the other scenarios are shown in the supplementary material.

cr

10

40

45

Methods

50

Our semi-mechanistic model of Ebola dynamics is a modified SusceptibleExposed-Infectious-Recovered (SEIR) model, accounting for delays in noti2

Page 2 of 16

fication and time-varying transmission (Fig. 1). At time t, the force of infection experienced by susceptible individuals is λt = βt It /N , where It is the overall number of infectious individuals, N the population size and βt the time-varying transmission rate which follows a random walk [8]: d log βt = σdWt ,

(1)

ip t

55

70

cr

us

an

Ac ce

pt

75

M

65

ed

60

Here, Wt denotes a Wiener process [9], σ the volatility of the transmission rate and the log-transform ensures positivity of βt . Modelling the timevarying transmission rate as a random walk means it is auto-correlated: the transmission rate on any day is most likely to be the same as on the previous one. Upon infection at rate λt , susceptible individuals (S) move from being exposed (E) to being infectious at a rate given by the reciprocal of the incubation period (1/Dinc ). We used two exposed sub-compartments in sequence to obtain an Erlang-distribution of the incubation period with shape k = 2 [10], and split the estimated initial number of exposed individuals evenly between these two sub-compartments. To account for delays in reporting of new cases, the infectious compartment was split into two compartments representing infectious but not yet reported (C) and infectious and potentially reported (Q) cases. The transition between C and Q occurs at a randomly varying rate with a mean equal to the reciprocal of the average reporting delay (1/Drep ) and 10% overdispersion following a Gamma distribution. This stochastic variation is introduced to capture non-independence in the time until cases get reported, in order to capture situations where, for example, several members of the same family might be reported simultaneously [2]. Lastly, infectious individuals are removed (R) when they recover or die from the Q compartment at a rate equal to the reciprocal of the difference between the infectious period and the reporting delay (Dout = Dinf −Drep ). The model can be formulated as a set of stochastic differential equations which was simulated with the noise term fixed for a time step of 1 day and a Runge-Kutta method solving the remaining ordinary differential components [11]. The only stochastic components are the trajectory of the transmission rate and the reporting noise, in contrast to the model used for the situation reports during the true Ebola epidemic, which also included demographic noise. The observation process was modelled to operate on the weekly incidence (Zt ), given by the number of infectious individuals entering the Q compartment. The observed incidence (Z˜t ) was assumed to follow a normal approximation (chosen for computational efficiency) to the negative binomial distribution with reporting probability p and overdispersion φ:

80

85

90

Z˜t ∼ N (pZt , p(1 − p)Zt + p2 Zt2 φ2 ).

(2)

where standard deviations smaller than 1 were rounded up to 1 to avoid the singularity at Zt = 0. Note that stochastic variation here captures variability 3

Page 3 of 16

cr

us

Proportion of cases reported Volatility of the transmission rate Overdispersion in reporting Initial number of exposed individuals Initial reproduction number

ip t

Mean delay from symptom onset to notification Mean delay from symptom onset to outcome

Source line list (where available) [12] assumption line list (where available) [12] [12] Fitted Fitted Fitted Fitted

Table 1: Parameters used in the model and their values/ranges.

100

105

an

M

95

in the probability that cases get reported, whereas the stochastic variation acting on the transition from C to Q captures variability in the delay until cases can get reported. The model thus has 8 parameters, which we either estimated from the line list of cases, took from a study on a pre-2014 outbreak of Ebola [12], or estimated from model fits process (Table 1). Prior ranges of the transmission rate volatility and reporting overdispersion were established in preliminary runs and chosen to be able to sufficiently capture sudden changes in cases without allowing a degree of variation that would render the algorithm unstable. We used a Metropolis-Hastings particle Markov chain Monte-Carlo (pMCMC) algorithm to sample from the joint posterior distributions of the estimated parameters and states of the model (i.e. the trajectories). In brief, at each MCMC step, a particle filter is used to estimate the likelihood of the proposed parameter set, and to generate a sampled trajectory of the states of the model and the transmission rate βt from their marginal posterior distribution [13]. Our forecasts were generated under a “no change” hypothesis: we assumed that the transmission rate would remain constant after the last observed data point. More precisely, we sampled 10,000 parameter sets from the posterior distribution in combination with the associated states and estimated values of βt at the last observed data point, and simulated the model forward one year. The future number of reported cases was generated by applying the observation process to the forecast incidence. Predicted reported peak incidence, death counts and final sizes were calculated from the sampled forecast observation trajectories. County-level forecasts were obtained under the assumption that no transmission occurred between counties. Model fits were generated using a fully automated algorithm applied to all the regional and national data sets as follows, implemented to facili-

ed

p σ φ E? R?

Description Mean delay from infection to symptoms

pt

Drep Dinf

Value or prior range variable 6 days 1 week variable 7.8 days 0.7 U(0, 0.5) U(0, 0.5) U(0, 5) U(0, 5)

Ac ce

Parameter Dinc

110

115

120

4

Page 4 of 16

Results

Our model was able to reproduce the observed trajectories both in scenarios of exponential increase followed by rapid decline as well as more sustained unchanged transmission (see Fig. 2 for fits to an example time point of an example scenario, others are shown in the supplementary material). Greater variability in individual trajectories was captured by greater volatility in the transmission rate (σ) as well as overdispersion in the observation process (φ) (Fig. 3). The two corresponding noise terms enter the model in different ways, and the estimated variability in each depends on the trajectory in question. In the model, transmission noise is correlated in time because the transmission rate is assumed to follow a random walk. In other words, the transmission rate at any time varies stochastically around the transmission rate at the previous time. Observation noise, on the other hand, is uncorrelated, and the proportion of cases reported at any time is independent of the proportion of cases reported at a previous time. Trajectories of the regional reproduction numbers (Rt = βt ∗ Dinf in the absence of significant depletion of susceptibles) showed a decline over the course of the epidemic in most counties, with occasional peaks and sometimes longer periods of a reproduction number close to 1, indicating the potential for sustained transmission (Fig. 4). There was no obvious pattern in the R0 trajectories, indicating that any attempt at finding a mechanistic

Ac ce

pt

145

ed

M

140

an

135

us

cr

130

ip t

125

tate convergence of the computationally intensive pMCMC sampler and to avoid long burn-in and low effective sample sizes: First, Metropolis-Hastings MCMC was run on the deterministic equivalent of the transmission model (with constant transmission rate and no multiplicative noise) to determine a reasonable starting location in parameter space. Then, the covariance matrix of the multivariate normal proposal distribution of the MCMC was adapted using the empirical variance in each parameter within accepted proposals, iterating and adapting an overall scale factor until an acceptance rate between 0.1 and 0.5 was achieved. Using the last parameter sample from this procedure in the fully stochastic model, the number of particles in the particle filter was calibrated numerically as the minimum number of particles required to obtain a variance of the likelihood estimator below 1. Starting from the covariance matrix of the proposal distribution obtained by fitting the deterministic model, the fully stochastic model was fitted using pMCMC, with the calibrated number of particles, and the proposal distribution was then again adapted using the same procedure as in the first step. This adapted proposal was used to run a final pMCMC for 10,000 iterations. The procedure was implemented using libbi [14] and the R packages rbi [15] and rbi.helpers [16]. A typical model fit took 5–20 minutes on an 8-core 2.5GHz node (with 8 parallel threads). All the code used is freely available at http://github.com/sbfnk/ebola_forecasting_challenge.

150

155

160

5

Page 5 of 16

M

180

an

us

175

ip t

170

basis for the time-varying behaviour of the transmission rate would have had to take into account differences between regions. Retrospectively comparing the credible intervals of our district-level forecasts made at 4 with data provided later showed a decline in reliability as the forecast horizon increases (Fig. 5). The number of data points inside the 50% credible interval was slightly but consistently underestimated, with 50% and 60% of data points inside the interval, while errors were consistent with a level of 50%. This bias became stronger when we increased the forecasting horizon, with more than 70% of data points inside the 50% credible interval at 10 weeks forecast, and errors no longer consistent with a level of 50%. Further, the number of data points inside the 95% credible interval was slightly but consistently underestimated, while consistent with a level of 95%. At this level, no obvious change in performance could be observed as the forecasting horizon was expanded to 10 weeks. Lastly, our predictions were consistent with 50% of data greater than the predicted median. There was, however, a declining trend in the proportion of observations greater than the median, hinting at an overestimation of cases as the forecasting horizon increases.

cr

165

Discussion

ed

Ac ce

190

pt

185

We used a stochastic semi-mechanistic model of Ebola transmission to fit the simulated trajectories in the Ebola Forecasting Challenge, and to produce forecasts that were compared to following data points. Our model is flexible enough to accommodate changes in the transmission rate that might occur due to unobserved processes. This is particularly useful in an outbreak situation where a combination of behavioural changes, potential biological changes such as viral evolution, and interventions by a variety of national and international organisations make it practically impossible to identify the key drivers of the outbreak dynamics required to inform a fully parameterised mechanistic model. While the predictions of the model were good when nothing changes, there were divergences when predictions were made just before the epidemic peaks and the transmission rate declines (Fig. 2, Grand Gedeh), where there was resurgence or successive waves (Fig. 2, Gbarpolu), or where the epidemic was highly stochastic due to the small number of reported cases (Fig. 2, Sinoe). In the latter case, including demographic stochasticity [17] may have improved the reliability of the forecasts. More generally, the quality of the forecasts could be seen to decrease as the forecasting horizon was expanded, with the number of data points inside the 50% credible interval increasing to more than 70%, and fewer than 40% of data points greater than the median. This would indicate that the model tended to overestimate the expected number of cases. A possible explanation for this is that all the data provided to us followed a scenario of

195

200

205

6

Page 6 of 16

Ac ce

235

ed

230

pt

225

M

an

220

us

215

cr

ip t

210

an overall decreasing reproduction number, while our forecasts were derived assuming an assumption of no change in the transmission rate. While the simulated trajectories presented in the Ebola Forecasting Challenge resembled the true trajectories of the West African Ebola epidemic, our model allowed for a broad range of possible scenarios including widespread epidemics where there was no clear evidence of a sustained decline. Some of our predictions may therefore appear to vastly overestimate future case numbers, but without any further information on the context of the epidemics, these scenarios could not be deemed unrealistic. Moreover, our model did not contain any mechanism for predicting increases or decreases before they occurred. In a real situation, other information could have been sourced to exclude some scenarios, or to find indicators of an impeding increase or decline. With respect to the Ebola Forecasting Challenge, using the provided situation reports may have helped yielded such information. We chose not to change the model between forecasting time points, to provide a fair assessment of the performance of our model. However, a number of changes could have been made to improve the particular predictions requested. Our assumption of no change from the last data point meant that the forecasts were highly sensitive to fluctuations in the transmission rate at the prediction point. At that point, the model was relatively poorly informed by the data. The data at time T only inform the trajectory of the transmission rate up to time T − Drep − Dinc , that is the latest data point minus the delays because of reporting and incubation periods. After that, the transmission rate mostly follows an random walk on the logarithmic scale, which creates the potential for large variations in the transmission rate that are then propagated into the prediction. A possible improvement would have been be to average over the transmission rate over the last few time points. For the weekly forecasts we generated during the true West African Ebola epidemic, we presented both forecasts based on an average of transmission rates over the last three weeks and forecasts based on the last value. Replacing our stochastic model for the transmission rate with a finitevariance random walk such as an Ornstein-Uhlenbeck process [18] would have allow us to propagate the estimated volatility instead of keeping the transmission rate fixed. Moreover, our parameter estimates including the volatility of transmission rate were informed by the entire fitted trajectories including early stages of the epidemic which may not have been relevant to later times, especially the declining phase. An approach where only the last few data points are used in the fits that inform the forecasts may have yielded more reliable forecasting results. Because of its relatively simple mechanistic skeleton, our model is more suitable for short-term predictions of incidence than long-term predictions of final size or peak size and timing. The stochastic nature of the transmission

240

245

7

Page 7 of 16

ip t

255

rate allows the model to flexibly fit any trajectory and could mask underlying mis-specifications in the deterministic core. Our approach does, however, come with the important advantage of being able to test the impact of interventions or to inform vaccine trials, as has been successfully done during the West African Ebola epidemic [19]. With this key capability, mechanistic models such as the one presented here, combined with improvements in forecasting methodology, can be expected to play a key role in informing the response to future outbreaks.

cr

250

References

us

[2] A. Camacho, A. Kucharski, Y. Aki-Sawyerr, M. A. White, S. Flasche, M. Baguelin, T. Pollington, J. R. Carney, R. Glover, E. Smout, A. Tiffany, W. J. Edmunds, S. Funk, Temporal changes in ebola transmission in sierra leone and implications for control requirements: a real-time modelling study., PLoS Curr 7. doi:10.1371/currents. outbreaks.406ae55e83ec0b5193e30856b9235ed2. URL http://dx.doi.org/10.1371/currents.outbreaks. 406ae55e83ec0b5193e30856b9235ed2

Ac ce

275

[3] O. Faye, P.-Y. Bo¨elle, E. Heleze, O. Faye, C. Loucoubar, N. Magassouba, B. Soropogui, S. Keita, T. Gakou, E. H. I. Bah, L. Koivogui, A. A. Sall, S. Cauchemez, Chains of transmission and control of ebola virus disease in conakry, Guinea, in 2014: an observational study, The Lancet Infectious Diseases 15 (3) (2015) 320–326. doi: 10.1016/s1473-3099(14)71075-8. URL http://dx.doi.org/10.1016/S1473-3099(14)71075-8

pt

270

ed

M

265

[1] Center for the Mathematical Modelling of Infectious Diseases, Visualisation and projections of the ebola outbreak in west africa, http://ntncmch.github.io/ebola/ (2015). URL http://ntncmch.github.io/ebola/

an

260

280

[4] WHO Ebola Response Team, West African ebola epidemic after one year — slowing but not yet under control, New England Journal of Medicine 372 (6) (2015) 584–587. doi:10.1056/nejmc1414992. URL http://dx.doi.org/10.1056/NEJMc1414992 [5] S. Funk, G. M. Knight, V. A. A. Jansen, Ebola: the power of behaviour change., Nature 515 (7528) (2014) 492. doi:10.1038/515492b. URL http://dx.doi.org/10.1038/515492b

285

[6] A. Groseth, H. Feldmann, J. E. Strong, The ecology of ebola virus, Trends in Microbiology 15 (9) (2007) 408–416. doi:10.1016/j.tim. 2007.08.001. URL http://dx.doi.org/10.1016/j.tim.2007.08.001 8

Page 8 of 16

M

[8] J. Dureau, K. Kalogeropoulos, M. Baguelin, Capturing the time-varying drivers of an epidemic using stochastic dynamical systems., Biostatistics 14 (3) (2013) 541–555. doi:10.1093/biostatistics/kxs052. URL http://dx.doi.org/10.1093/biostatistics/kxs052

Ac ce

315

ed

310

pt

305

an

us

300

ip t

295

[7] M. W. Carroll, D. A. Matthews, J. A. Hiscox, M. J. Elmore, G. Pollakis, A. Rambaut, R. Hewson, I. Garc´ıa-Dorival, J. A. Bore, R. Koundouno, S. Abdellati, B. Afrough, J. Aiyepada, P. Akhilomen, D. Asogun, B. Atkinson, M. Badusche, A. Bah, S. Bate, J. Baumann, D. Becker, B. Becker-Ziaja, A. Bocquin, B. Borremans, A. Bosworth, J. P. Boettcher, A. Cannas, F. Carletti, C. Castilletti, S. Clark, F. Colavita, S. Diederich, A. Donatus, S. Duraffour, D. Ehichioya, H. Ellerbrok, M. D. Fernandez-Garcia, A. Fizet, E. Fleischmann, S. Gryseels, A. Hermelink, J. Hinzmann, U. Hopf-Guevara, Y. Ighodalo, L. Jameson, A. Kelterbaum, Z. Kis, S. Kloth, C. Kohl, M. Korva, A. Kraus, E. Kuisma, A. Kurth, B. Liedigk, C. H. Logue, A. Ldtke, P. Maes, J. McCowen, S. M´ely, M. Mertens, S. Meschi, B. Meyer, J. Michel, P. Molkenthin, C. Mu˜ noz-Fontela, D. Muth, E. N. C. Newman, D. Ngabo, L. Oestereich, J. Okosun, T. Olokor, R. Omiunu, E. Omomoh, E. Pallasch, B. P´ alyi, J. Portmann, T. Pottage, C. Pratt, S. Priesnitz, S. Quartu, J. Rappe, J. Repits, M. Richter, M. Rudolf, A. Sachse, K. M. Schmidt, G. Schudt, T. Strecker, R. Thom, S. Thomas, E. Tobin, H. Tolley, J. Trautner, T. Vermoesen, I. Vitoriano, M. Wagner, S. Wolff, C. Yue, M. R. Capobianchi, B. Kretschmer, Y. Hall, J. G. Kenny, N. Y. Rickett, G. Dudas, C. E. M. Coltart, R. Kerber, D. Steer, C. Wright, F. Senyah, S. Keita, P. Drury, B. Diallo, H. de Clerck, M. V. Herp, A. Sprecher, A. Traore, M. Diakite, M. K. Konde, L. Koivogui, ˇ N. Magassouba, T. Avˇsiˇc-Zupanc, A. Nitsche, M. Strasser, G. Ippolito, S. Becker, K. Stoecker, M. Gabriel, H. Raoul, A. D. Caro, R. Wlfel, P. Formenty, S. Gnther, Temporal and spatial analysis of the 2014–2015 Ebola virus outbreak in west Africa, Nature 524 (7563) (2015) 97–101. doi:10.1038/nature14594. URL http://dx.doi.org/10.1038/nature14594

cr

290

320

[9] R. Durrett, Brownian Motion and Martingales in Analysis, The Wadsworth mathematics series, Wadsworth, Belmont, 1984.

325

[10] A. L. Lloyd, Realistic distributions of infectious periods in epidemic models: changing patterns of persistence and dynamics., Theor Popul Biol 60 (1) (2001) 59–71. doi:10.1006/tpbi.2001.1525. URL http://dx.doi.org/10.1006/tpbi.2001.1525 [11] G. N. Milstein, M. V. Tretyakov, Numerical methods for sdes with small noise, Stochastic Numerics for Mathematical Physicsdoi:10.

9

Page 9 of 16

330

1007/978-3-662-10063-9_3. URL http://dx.doi.org/10.1007/978-3-662-10063-9_3

[13] C. Andrieu, A. Doucet, R. Holenstein, Particle markov chain monte carlo methods, J R Stat Soc Ser B 72, pt. 3 (2010) 269–342. doi: 10.1111/j.1467-9868.2009.00736.x.

cr

335

ip t

[12] A. Camacho, A. Kucharski, S. Funk, J. Breman, P. Piot, W. Edmunds, Potential for large outbreaks of ebola virus disease, Epidemics 9 (2014) 70–78. doi:10.1016/j.epidem.2014.09.003. URL http://dx.doi.org/10.1016/j.epidem.2014.09.003

us

340

[14] L. M. Murray, Bayesian state-space modelling on high-performance hardware using libbiarXiv:1306.3277. URL http://arxiv.org/abs/1306.3277

an

[15] P. E. Jacob, S. Funk, rbi: R interface to LibBi, r package version 0.4.1 (2016). URL https://CRAN.R-project.org/package=rbi [16] S. Funk, rbi.helpers: rbi helper functions, r package version 0.2 (2016). URL https://github.com/sbfnk/RBi.helpers

M

345

[18] J. L. Doob, The brownian movement and stochastic equations, The Annals of Mathematics 43 (2) (1942) 351. doi:10.2307/1968873. URL http://dx.doi.org/10.2307/1968873

pt

350

ed

[17] M. J. Keeling, P. Rohani, Modeling Infectious Diseases in Humans and Animals, Princeton Unitersity Press, Princeton, 2008.

Ac ce

[19] A. Camacho, R. M. Eggo, S. Funk, C. H. Watson, A. J. Kucharski, W. J. Edmunds, Estimating the probability of demonstrating vaccine efficacy in the declining ebola epidemic: a Bayesian modelling approach, BMJ Open 5 (12) (2015) e009346. doi:10.1136/bmjopen-2015-009346. URL http://dx.doi.org/10.1136/bmjopen-2015-009346

355

[20] A. Gelman, J. Carlin, H. Stern, D. Dunson, A. Vehtari, D. Rubin, Bayesian Data Analysis, 3rd Edition, CRC Press, Boca Raton, USA, 2013.

10

Page 10 of 16

Figures λ

S

E1

2(Dinc )−1

2(Dinc )−1

E2

(Drep )−1 C

Q

(Dout )−1

R

Bomi

Bong

20

30

40

10 0

0

10

20

30

40

GrandCapeMount

25 20 15 10 5 0

cr

20

us

10

30

GrandBassa

0

10

20

30

40

GrandGedeh

40

100

an

30 20

50

10

0 0

10

20

30

40

0

0

10

Lofa 50 40 30 20 10 0

20

0 20

30

Nimba

ed

10

10

30

40

0

10

Margibi

30

0

20

M

Incidence

Gbarpolu

120 100 80 60 40 20 0

100 80 60 40 20 0 0

ip t

Figure 1: Flow between compartments of the transmission model.

40

0

10

20

20

30

40

Montserrado

200 150 100 50 0

30

40

30

40

0

10

20

30

40

Sinoe

60

10

pt

40

5

20 0

0

10

20

30

40

Ac ce

0

0

10

20

week

Figure 2: Fitted (black) and predicted (blue) incidence at time point 4 (week 35) of scenario 1. Median lines and 50% (dark) and 95% (light) credible interval ranges are shown, calculated across all trajectories at every time point. Fitted data points are shown in red, future data points (not included in the fits) in black.

11

Page 11 of 16

ip t cr an

Sinoe Nimba Montserrado Margibi

GrandCapeMount GrandBassa Gbarpolu

M

Lofa GrandGedeh

σ

us

φ

Bong

ed

Bomi

0.1

0.2

0.3

0.4

0.5

pt

0.0

Ac ce

Figure 3: Parameter estimates of the transmission rate volatility σ (top, blue) and reporting overdispersion φ (bottom, red) at time point 4 (week 35) of scenario 1. Shown are the median (vertical bar), interquartile range (box), the most extreme values within 1.5 times the interquartile range (outer lines) and outliers (dots).

12

Page 12 of 16

ip t

Bong

Gbarpolu 4

1.5

3

1.0

2

0.5

1

us

2.0

0

25

30

35

40

45

25

GrandBassa

30

35

40

45

GrandCapeMount

30

35

40

45

an

8

1.5

6

4 1.0 2

4

0.5

2

0

0

30

35

40

45

25

30

35

40

45

M

25

R0

25

GrandGedeh

6

Lofa

2

1.5 1.0 0.0 30

35

40

Nimba

45

40

45

2.0 1.5 1.0

ed

0.5

35

Montserrado

1

25

30

Margibi

2.0

25

30

0.5 35

40

45

25

30

35

40

45

Sinoe

4

pt

15

3

10

2

5

1

0

25

30

35

40

Ac ce

0

cr

Bomi 5 4 3 2 1 0

45

30

35

40

45

week

Figure 4: Fitted (black) and predicted (blue) trajectories of the transmission rate at time point 4 (week 35) of scenario 1, shown here rescaled with the infectious period to correspond to the reproduction number Rt . Median lines and 50% (dark) and 95% (light) credible interval ranges are shown. The horizontal dashed lines are at Rt = 1.

13

Page 13 of 16

ip t cr us

inside 95% CI

an

greater than median

80% 60%

M

40% 20% 0% 0

2

4

6

8

10

0

2

4

6

8

10

0

2

4

6

8

10

ed

proportion of observations

inside 50% CI 100%

number of weeks forecast

Ac ce

pt

Figure 5: Forecasting performance, shown as proportion of data points (across regions) within the 50% credible interval (left), 95% credible interval (centre) and above the median (right) as a function of the distance in weeks predicted ahead. Shown are the mean and Bayesian 95% confidence interval using a conjugate beta prior [20] across time points and counties of scenario 1. Trend lines for the means were obtained using locally weighted smoothing.

14

Page 14 of 16

360

Supplementary material

Ac ce

pt

ed

M

an

us

cr

ip t

Fits and predictions for all scenarios and time points. See the caption of Fig. 2 for details.

15

Page 15 of 16

Highlight (for review)

Ac

ce pt

ed

M

an

us

cr

ip t

- A Bayesian stochastic semi-mechanistic model was used to generate forecasts for the Ebola Forecasting Challenge - A Susceptible-Exposed-Infectious-Recovered (SEIR) model with timevarying transmission rate was fitted to the simulated data using particle Markov-chain Monte Carlo (pMCMC) - Posterior samples of the model parameters were used to generate forecast trajectories - The quality of the forecasts was assessed against the subsequently released simulated data points

Page 16 of 16