Hans WACKERNAGEL¹,Christian LAJAUNIE¹. Magali LEMAITRE2, Huey Chyi LEE1,3,4, ... MINES ParisTech. (with input from: Fabrice CARRAT, Mark WILSON) ...
Predicting pandemics and forecasting epidemics based on influenza mortality and morbidity data
Hans WACKERNAGEL¹,Christian L AJAUNIE¹ Magali L EMAITRE2 , Huey Chyi L EE1,3,4 , Thomas R OMARY1 Fabrice C ARRAT2 , Laurent B ERTINO3 , Alex C OOK4 ¹Equipe de Géostatistique — Centre de Géosciences — MINES ParisTech 2 INSERM UMR-S 707 3 Mohn-Sverdrup Center / NERSC 4 National University of Singapore
eVITA Meeting - Bergen, 25 january 2010
The eVITA EnKF project
Geostatistics group • MINES ParisTech
Pre-diction or pre-vision ?
Prédire to predict - to foretell. Prévoir to forecast - to foresee.
(German:
Vorhersage - Voraussicht)
Influenza epidemics
1) Long term prediction (mortality): extreme value analysis of epidemics
2) Short term forecast (morbidity): data assimilation by particle filtering
Extreme value analysis of US influenza excess mortality data Magali Lemaitre1 , Geffroy Brandicourt1,2 Hans Wackernagel2 1 Unité
2 Equipe
707 — INSERM de Géostatistique — MINES ParisTech
(with input from: Fabrice CARRAT, Mark WILSON)
Extreme value theory Statistics of extremes concerns modelling risks from rare events with potentially large impacts: environmental hazards—rain, snow, storms, hurricanes, earthquakes, typhoons, high tides, . . . structural failures—bridges, dams, oilrigs, . . . reliability—mechanical failure due to corrosion etc. finance—market crashes. Statistical modelling of extremes relies on the limit distributions of maxima. These distributions belong all to one family: the Generalized Extreme Value distribution, with 3 parameters (location, scale, shape).
Extreme value theory Statistics of extremes concerns modelling risks from rare events with potentially large impacts: environmental hazards—rain, snow, storms, hurricanes, earthquakes, typhoons, high tides, . . . structural failures—bridges, dams, oilrigs, . . . reliability—mechanical failure due to corrosion etc. finance—market crashes. Statistical modelling of extremes relies on the limit distributions of maxima. These distributions belong all to one family: the Generalized Extreme Value distribution, with 3 parameters (location, scale, shape).
Decreasing mortality (all causes)
Mortality rates (all causes) standardized by age taking the 2000 US population as reference. Effect of aging population has thus been removed. Diminishing mortality due improving health care, hygiene, etc.
Mortality in California (all causes, not standardized)
Mortality in New York (all causes, not standardized)
Excess US influenza mortality Epidemiology is intimately linked to demography. The yearly excess mortality due to influenza has been computed following Simonsen et al. (1997, 2005).
The maximum monthly excess influenza mortality was extracted for the epidemic season in each year (block maxima approach).
Return levels for 20 years Excess influenza mortality in US (left) and France (right)
20 years return level ●
−399
−278
90% 95%
●
90% 95%
● ●
●
●
99%
99%
● ● ●
●
−282
● ●
● ● ●
●
40000
60000
80000
100000
return level (excess flu mortality)
120000
−284
●
−403
NLLH
● ● ●
●
−401
● ●
−280
●
NLLH
−397
●
20 years return level
●
10000
15000
20000
25000
return level (excess flu mortality)
30000
Influenza epidemics
1) Long term prediction (mortality): extreme value analysis of epidemics
2) Short term forecast (morbidity): data assimilation by particle filtering
Flu epidemics and meteorological conditions
by Etienne G OUDAL Christian L AJAUNIE , Hans WACKERNAGEL , Laurent B ERTINO
Links between the onset of influenza epidemics and meteorological parameters were analyzed using: exploratory statistics generalized linear models (GLM) logistic regression support vector machines (SVM)
Geostatistics group • MINES ParisTech
Data from FP7 ENSEMBLES project 6-hourly data from 1/1/1984 to 31/8/2002 — ERA40 — ECMWF
ILI data from Rhônes-Alpes region
ILI cases against temperature and humidity
ILI cases against temperature and humidity Deviations from seasonal means
Logistic regression
The probability be in epidemic state is modelled as: P(Z) logit{P(Z)} = log = Zw 1 − P(Z) where P(Z) = P(Y = 1 | Z) is the probability that a given day is epidemic. Binary data is used with the first 14 years as training period, the subsequent 3 years for validation.
Geostatistics group • MINES ParisTech
Logistic regression using daily data
Logistic regression from temperature (left) and seasonal mean temperature (right)
Logistic regression with all meteorological parameters
Relatively good fit at the onset of epidemics
Support vector machines (SVM)
Machine learning: the machine “learns” from part of the data, then validates on the rest of it. Classification: a given day is classified as epidemic or non-epidemic. Possibility to use a large number of predictors and a great many data—with reasonable computing times.
Geostatistics group • MINES ParisTech
SVM: temperature only
Validation Non-epidemic Epidemic
Observations Non-epidemic Epidemic 1896 301 126 161
SVM: all parameters Validation Non-epidemic Epidemic
Observations Non-epidemic Epidemic 1714 189 308 273
More epidemics, but also more false epidemics!
Conclusions
A relation between the meteorological conditions and the onset of epidemics could so far not be established. SVM results still to be improved by using a different kernel? A more explicit treatment of seasonality of meteorological conditions needed.
H1N1 2009 epidemic in Singapore Alex C OOK’s web site (National University of Singapore)
http://www.stat.nus.edu.sg/staff/alexcook/flu/flu.html
Bayesian version of the particle filter programmed by Huey Chyi L EE (Honor’s thesis) in collaboration with MINES ParisTech. Alex C OOK and Mark C HEN (NUS) developped the epidemic model. Geostatistics group • MINES ParisTech
Estimated ILIs per GP
ILI cases per day per doctor 7 ●
6 5
●
4 3
● ● ● ● ● ●●
●● ●
2 1
● ●● ● ●●● ● ●●
● ●
●●
●● ●
● ● ● ●● ●● ● ●●● ●
Jun
5
●
15
Jul
25
●
●
● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●●●● ● ● ● ● ●● ●● ●●●●●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●
●
0 25
●
●
5
15
Aug
25
5
15
25
Sep
5
Oct
Daily reported ILIs per GP
Estimated ILI cases per family doctor
8 ●
6 4 2 0
● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ●● ●● ● ● ●● ●● ● ●● ● ●● ●●● ● ● ●●●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●
25
Jun
5
15
Jul
30
10
Aug
25
5
15
Sep
30
15
Oct
30
Nov
Reported (red) & forecast (black/grey) ILI cases
Daily predicted ILIs (000s)
Total ILI patients seeking treatment (per day) 8 6 4 2 0 25
Jun
5
15
Jul
30
10
Aug
25
5
15
Sep
30
15
Oct
30
Nov
Estimated & forecast total number of patients (in thousands) with ILI seeking treatment per day, aggregated across all private and poly-clinics. The weekend effect has been removed for easier interpretation.
Predicted population infected
Proportion of population infected or recovered 30% 25% 20% 15% 10% 5% 0% 25
Jun
5
15
30
Jul
Estimated and forecast total number of people who: 1 2 3
are currently symptomatic, have recovered, had pre-existing immunity (a very small proportion).
10
Aug
25
5
15
Sep
30
15
Oct
30
Nov
Conclusion and Perspectives
Geostatistics group • MINES ParisTech
Conclusion
The particle filter can be an efficient tool for the early detection of a change in epidemic state. Improvements were examined (by Huey Chyi L EE in Bergen/Fontainebleau 2009): in the particle filter algorithm (resampling algorithms), in the underlying SIR model and its parameterization.
An important side-product of the system is that it provides an estimate of the total number of infected people at a given time for a given region. As a scenario simulator it can also provide an error estimate in the assessment of the severeness of an epidemy.
Conclusion
The particle filter can be an efficient tool for the early detection of a change in epidemic state. Improvements were examined (by Huey Chyi L EE in Bergen/Fontainebleau 2009): in the particle filter algorithm (resampling algorithms), in the underlying SIR model and its parameterization.
An important side-product of the system is that it provides an estimate of the total number of infected people at a given time for a given region. As a scenario simulator it can also provide an error estimate in the assessment of the severeness of an epidemy.
Perspectives I Inclusion of climate parameters. Alternative filters
Seasonal influenza: characterize the meteorological configurations leading to an outbreak. Possible links between reanalysed meteorogical data and observed incidences have been explored (Etienne G OUDAL). Relevant climatic parameters to be included into the epidemics’ early detection system? As the dimensionality of the system grows, the particle filter is in danger of becoming impracticable. Alternative: the Ensemble Kalman Filter.
Perspectives I Inclusion of climate parameters. Alternative filters
Seasonal influenza: characterize the meteorological configurations leading to an outbreak. Possible links between reanalysed meteorogical data and observed incidences have been explored (Etienne G OUDAL). Relevant climatic parameters to be included into the epidemics’ early detection system? As the dimensionality of the system grows, the particle filter is in danger of becoming impracticable. Alternative: the Ensemble Kalman Filter.
Perspectives II Other diseases
The system can easily be adapted to handle other diseases than influenza: gastro-enteritis, chickenpox (also monitored by Sentinelles) bacterial meningitis (sub-sahelian zone: meningitis belt; China) dengue (Singapore) ...
References B ERTINO, L., E VENSEN , G., AND WACKERNAGEL , H. Sequential data assimilation techniques in oceanography. International Statistical Review 71 (2003), 223–241. C APPÉ , O., G ODSILL , S., AND M OULINES , E. Overview of existing methods and recent advances in sequential Monte Carlo. Proceedings of the IEEE 95, 5 (2007), 899–908. C APPÉ , O., M OULINES , E., AND RYDEN , T. Inference in Hidden Markov Models. Springer, 2005. C ARRAT, F., L UONG , J., L AO, H., S ALLE , A. V., L AJAUNIE , C., AND WACKERNAGEL , H. A small-world-like model for comparing interventions aimed at preventing and containing pandemics. BMC Medicine 4 (2006), 26. C OLES , S. An Introduction to Statistical Modeling of Extreme Values. Springer, London, 2001. E VENSEN , G. Data Assimilation: the Ensemble Kalman Filter, 2nd ed. Springer, Berlin, 2009. J ÉGAT, C., C ARRAT, F., L AJAUNIE , C., AND WACKERNAGEL , H. Early detection and assessment of epidemics by particle filtering. In GeoENV VI – Geostatistics for Environmental Applications (2008), A. Soares, M. J. Pereira, and R. Dimitrakopoulos, Eds., Springer, pp. 23–35.