Infection, Disease & Health (2016) 21, 184e191
Available online at www.sciencedirect.com
ScienceDirect journal homepage: http://www.journals.elsevier.com/infectiondisease-and-health/
Research
Parameter estimation of tuberculosis transmission model using Ensemble Kalman filter across Indian states and union territories Pankaj Narula a, Vihari Piratla b, Ankit Bansal c, Sarita Azad a,*, Pietro Lio d a
School of Basic Sciences, Indian Institute of Technology Mandi, Mandi 175001, Himachal Pradesh, India School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Mandi 175001, Himachal Pradesh, India c Department of Mechanical and Industrial Engineering, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India d Computer Laboratory, William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 0FD, University of Cambridge, UK b
Received 6 August 2016; received in revised form 6 November 2016; accepted 7 November 2016
Available online 7 December 2016
KEYWORDS Tuberculosis; India; Infection
Abstract Background: Tuberculosis (TB) is one of the main causes of mortality on the globe. Besides the full implementation of Revised National Tuberculosis Control Programme (RNTCP), TB continues to be a major public health problem in India. Methods: In the present study, parameters of a TB model are estimated using Ensemble Kalman filter (EnKf) approach. Infection rate and fraction of smear positive cases of TB are estimated in context of India. Results and Conclusions: Results reveal that the infection rate is highest in Manipur and the ratio of smear positive cases is highest in Pondicherry. The infection rate of TB in Manipur is found to be 2.57 per quarter for the period 2006e2011. ª 2016 Australasian College for Infection Prevention and Control. Published by Elsevier B.V. All rights reserved.
Highlights Deterministic TB model is used to model the TB transmission in India. Parameters of model are estimated using Ensemble Kalman filter for the period 2006e2011. Infection rate is found to be highest in Manipur.
* Corresponding author. E-mail address:
[email protected] (S. Azad). http://dx.doi.org/10.1016/j.idh.2016.11.001 2468-0451/ª 2016 Australasian College for Infection Prevention and Control. Published by Elsevier B.V. All rights reserved.
Parameter estimation of tuberculosis transmission model
Introduction Tuberculosis (TB) is a well known infectious disease caused by bacterium M. tuberculosis which generally spreads through air. In 2011, 2.0e2.5 million new TB cases were estimated in India out of the global annual incidence of 9.4 million cases [1,2]. A large number of factors contribute to the spread of TB; such as high prevalence of HIV/AIDS and diabetes, poor hygiene, crowding, illiteracy and lack of awareness make the TB situation critical in Indian context. All these factors directly contribute to high infection rates among the population. In 1997, the Government of India, with the help of World Bank, initiated RNTCP based on the internationally recommended Directly Observed Treatment Short-course (DOTS) strategy [2,3]. RNTCP is the largest TB control program in terms of treatment of patients with full nationwide coverage. Mathematical models and statistical techniques play a significant role in understanding the transmission dynamics of TB. In simple deterministic model of infectious disease, the number of susceptible persons who are infected by an infectious individual per unit of time is proportional to the total number of susceptible persons. This proportional coefficient is defined as infection rate. Estimation of parameters of mathematical model, for instance, infection rate contributes to better quantify the spread of disease. Generally, inference of these parameters is a difficult task because of poor compatibility between observed data and models. Simulations and epidemiological data have been used to estimate the key parameters of deterministic models. Different techniques have been introduced and applied to estimate the parameters of TB models. Approximate Bayesian computation approach has been used to estimate TB transmission rate parameters for United States [4]. A synchronisation based method has been implemented to infer the parameters such as treatment rate, disease induced mortality rate and infection rate of a TB model. In particular, the infection rate in the study is estimated to be 2.04 for the quarterly data during 2003e2007 for Cameroon [5]. Liu et al. estimated the reactivation and infection rate of a TB model for China by assuming these rates as sinusoidal functions and infection rate is estimated to be 2.23 person per month for the period 2005e2009 [6]. A qualitative analysis of the TB model for Nigeria has been performed to analyse the effect of DOTS strategy [7]. Mandal et al. estimated parameters such as infection rate and treatment rate using annual prevalence and incidence data of TB [8]. In particular, the infection rate of TB have been estimated to be 11.03 per year for India [8]. Mishra et al. integrated quarantine compartment into TB model, which incorporates the multidrug-resistant TB patients. The model is further analysed and simulated in using TB data of Jharkhand, India [9]. In the present paper, we use Ensemble Kalman filter (EnKf) approach to estimate the parameters of a deterministic model of TB. Kalman filter has been extensively used to infer the parameters of models of various infectious diseases [10e14]. Parameters of an HIV/AIDS model has been estimated using Kalman filter approach [10]. An extension of Kalman filter has been implemented to analyse the spatio-temporal behaviour of measles outbreak using
185 count data for the period 1960e1970 in Landon [12]. Influenza data of different cities within the United States has been analysed and demonstrated that ensemble filters were found to be more accurate than other filters in predicting the peaks of the influenza [13].
Methods Dynamic model In this paper, we use a variation of SIR (Susceptible-Infected-Recovered) model defined as SLIS (Susceptible-LatentInfected-Susceptible). There are three exclusive groups of individuals; namely, susceptible, S, latently infected, L (infected with M. tuberculosis but not infectious), and actively infected with M. tuberculosis, I (infected and infectious). The model does not take into account genetic and demographic heterogeneity. The following are the governing differential equations for the rate of change in population in various compartments. dS bSI Z þ gðI þ LÞ dt N dL bSI Zð1 pÞ gL mL dt N dI bSI Zp gI þ mL dt N
ð1Þ
where b is the transmission rate (number of contacts made by an infectious person per quarter), g is the recovery rate (assumed to be 0.8 for both active and latent infections) [2], m is the re-infection rate from latent to active disease (assumed to be 0.1) [15], and N is the total population (assumed to be constant). Patients with the latent form of infection are assumed to develop active tuberculosis at an average rate of t, with a 5e10% lifetime risk of a latent infection reactivating to active TB disease. The rate at which M. tuberculosis infected people spread active TB to susceptible people is proportional to p, and the rate at which latent cases occur is proportional to (1 e p). In general, untreated patients can infect 10e15 persons each year [2].
State space formulation The parameters b and p in the dynamic SLIS model are modelled using state-space formulation and EnKf. These two parameters cannot be observed directly. The methodology employs a two-step forecasting model. The coefficients of the SLIS model are written into a very simple state space model representing a Markov process as b b atþ1 Z Z þ wt ð2Þ p tþ1 p t atþ1 Zat þ wt wt ZN
0:5 0:02
ð3Þ
where wt is the uncertainty in the model parameters assumed to be given by Gaussian white noise with standard deviations 0.5 and 0.02, respectively. The measurement
186
P. Narula et al.
Figure 1
Reported number of new smear positive (Sp) TB cases in Manipur.
Figure 2
Estimated infection rate for Indian states (2011, fourth quarter).
Parameter estimation of tuberculosis transmission model
Figure 3
Estimated fraction of smear positive cases for Indian states (2011, fourth quarter).
model for the observed number of active and latent cases can be written as " yt Z
InL InI
# tþ1
2
3
pbSI N
ensemble of q forecasted states with random sampling error as i aftþ1 bafti þ wti iZ1; 2; 3; .; q
ð1pÞbSI 6 N 7
6 Z6 4
187
7 7 þ vt 5
ð4Þ
t
yt ZHðat Þ þ vt where vt is the noise in the measured variables, assumed to be Gaussian distributed with standard deviation as 5%.
Ensemble Kalman filter The Kalman filter equations are expressed in two steps, the forecast step, where information from the measurements is used in time series model, and the analysis step, where this information is used to obtain assimilated value using Kalman gain matrix [16]. In the forecast step, we prepare an
ð5Þ
where the superscript fi refers to the i-th member of the ensemble. The ensemble mean is defined as 1 X fi aftþ1 b a q iZ1 tþ1 q
ð6Þ
The ensemble error matrix for the state variable is defined as h i f fq 1 Eat b aftþ1 ð7Þ aftþ1 . atþ1 aftþ1 and the ensemble error matrix for the observed variables is defined as i h fq a f1 ð8Þ b ytþ1 Eyt y ftþ1 . ytþ1 y ftþ1
188
P. Narula et al.
Analysis steps are defined as aat i bafti þ Kt yt þ vti ytfi
ð9Þ
where Kt is the Kalman gain matrix given by 1 f f Pyy Kt bPay t t
ð10Þ
Error covariance matrices are given by 1 f a T f E E b Pyy t q 1 at yt 1 f a T f E E Pyy b t q 1 yt yt
Results
ð11Þ
The observed variables in our model are incidences of active and latent cases.
Data India consists of 35 states and union territories (UT’s) with populations ranging from less than 0.7 to 175 million. The data for the study was taken from quarterly RNTCP
Figure 4
Figure 5
performance reports published by Central TB Division, Directorate General of Health Services, Revised National Tuberculosis Control Programme (RNTCP) Ministry of Health and Family Welfare, Government of India from 2006 to 2011 [2]. DOTS programmes maintain the report quarterly data for number of new TB cases.
The SLIS-EnKf framework explained in Section Methods and data is implemented in MATLAB and results are post processed and plotted in R-statistical programming language with the “spplot” package. In Fig. 1, reported smear positive cases of TB are plotted for Manipur state for the time period 2006 to 2011. It is observed that there is a declining trend in the number of cases. A seasonal pattern is observed in the number of cases over the years, with peak infection rate occurring in the mid of the year (AprileJuly). Fig. 2 shows the contour plot of estimated infection rate (b) for all the states in 2011 (fourth quarter). Our results show that Manipur is the worst infected state. It was reported that lack of adequate
Estimated infection rate b (1/quarter).
Estimated fraction of smear positive cases.
Parameter estimation of tuberculosis transmission model diagnostic facilities has also been cited as one of the reason for higher rate of infection in the state. Further, it is observed from Fig. 2 that Maharastra, Chattisgarh, Bihar, Mizoram and Andaman & Nicobar have the high infection rates. It is also noticed that Delhi has relatively higher infection rate when compared to other neighbouring states. This may be due to high population density and large social contact rates. On the other hand, Gujarat, West Bengal, Uttaranchal and Tripura are found to have the least infection rates. Fig. 3 shows the distribution of estimated fraction of smear positive cases, p, over the country in 2011 (fourth quarter). In Figs. 4 and 5, estimated trends in the infection rates and fraction of smear positive cases are plotted from 2006 to 2011 for the states Manipur and Pondicherry. The mean value of infection rate for Manipur is 2.57 per quarter, which is higher than the national average of 1.72 per quarter. In contrast,
Table 1 Sr. no.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Estimated values of b and p. States and union territories
Number of smear positive TB cases in fourth quarter, 2011
b
Andaman & Nicobar Andhra Pradesh Arunachal Pradesh Assam Bihar Chandigarh Chhattisgarh D & N Haveli Daman & Diu Delhi Goa Gujarat Haryana Himachal Pradesh Jammu & Kashmir Jharkhand Karnataka Kerala Lakshadweep Madhya Pradesh Maharashtra Manipur Meghalaya Mizoram Nagaland Orissa Pondicherry Punjab Rajasthan Sikkim Tamil Nadu Tripura Uttar Pradesh Uttaranchal West Bengal
101 18,665 314 4898 10,555 536 3050 70 48 5091 323 13,816 5134 1480 1944 5080 11,185 3540 4 12,493 18,007 259 581 164 396 6637 647 4552 15,938 162 10,716 409 42,593 2031 13,752
2.01 1.65 1.63 1.62 1.84 1.46 1.94 1.53 1.44 1.82 1.63 1.34 1.47 1.51 1.64 1.68 1.56 1.62 1.50 1.69 1.78 2.31 1.71 2.22 1.73 1.58 1.19 1.49 1.49 1.70 1.64 1.40 1.54 1.40 1.44
p
2011, fourth quarter 0.42 0.59 0.51 0.55 0.52 0.59 0.44 0.56 0.59 0.48 0.52 0.69 0.60 0.56 0.55 0.54 0.57 0.52 0.50 0.54 0.53 0.36 0.50 0.37 0.50 0.53 0.74 0.57 0.57 0.48 0.50 0.62 0.56 0.63 0.60
189 Pondicherry has the least infection rate with mean value of 1.34 per quarter. Fig. 5 shows that the fractions of smear positive cases in Manipur are almost constant with a mean value of 0.32. While the fraction of smear positive cases in Pondicherry has increased from a value of 0.3 in 2006 to 0.7 in 2011. The mean value for India is calculated by taking an average over all the states. It is observed that p remains constant at 0.6 for the entire country. In Table 1, number of smear positive cases, infection rates and fraction of smear positive cases for 2011 (fourth quarter) are listed for all the states. We also estimated the values of parameters b and p using Least Square Method by minimising the squared difference between observed and model data points. The estimated values of b and p are found to be 2.32 and 0.29 per quarter respectively, which are close to the mean values obtained from our Enkf model. The parameters of same model have also been estimated by using the Bayesian melding method for the period 2006e2011 [17]. However, it is to be noted that EnKf estimates parameter values as a function of time, while the least square minimisation and Bayesian melding approach give point estimates. In Table 2, values of b and p estimated by using three different methods are listed. The sensitivity of the estimated parameters b and p to the measurement noise is analysed. Figs. 6 and 7 show the change in parameters values for three different levels of measurement noise, or uncertainty in the number of reported cases of smear positive. The value of the infection rate increases with the increase in the measurement noise. With the increase in the measurement noise, there is a larger variation in the reported cases.
Discussion We have implemented EnKf in conjunction with a deterministic model of TB. There are many alternate assimilation approaches that may be tested further [18]. The parameter estimation framework presented here captures seasonality well in the data which could not be expected from standard-likelihood methods. The technique is also computationally inexpensive when compared to Monte Carlo methods. India’s DOTS programme is the fastest expanding programme, placing more than 100,000 patients on treatment every month. Despite such efforts, the number of new cases continues to grow. The TB notification data published by RNTCP does provide estimates of overall TB burden over the country but these data alone cannot be used as an indicator of the severity of the TB transmission. Since the
Table 2 b (infection rate) and p (fraction of smear positive cases) of TB model for India using three methods (2006e2011). Parameters Least square Bayesian Kalman filter method melding method method b p
2.32 0.29
1.44 0.55
1.72 0.60
190
P. Narula et al.
Figure 6
Sensitivity in the model parameter b with respect to measurement noise.
Figure 7
Sensitivity in the model parameter p with respect to measurement noise.
number of new cases reported every quarter depends on the transmission parameter b, it is an important parameter to indicate the severity of the disease. The infection rate b represent the average number of contacts a susceptible person requires to get infected with the TB. The estimated infection rate is highest for the Manipur (2.31), which shows that an infectious individual infects approximately 10 persons per year, which matches well with reported data of TB in India [2]. This study provides an estimate of transmission parameter which could be an important measure to check the status of TB across Indian states and UT’s. Our results reveal that infection rates are high in seven states namely, Manipur, Mizoram, Andaman & Nicobar and Chhattisgarh. The north eastern region is reported to be one of the highest smoking prevalence regions in India [19e21]. The probability of transmission of TB increases significantly with high smoking prevalence [22]. This may lead to high infection rate of TB in this region. A high value of infection rate is also observed in Andaman & Nicobar. This may be due to significant rise in the incidence of TB burden in some regions of Andaman and Nicobar [23]. Also, fractions of smear positive cases are found to be high in six states, namely: Gujarat, Haryana, Uttaranchal, West Bengal, Tripura, and Pondicherry. High HIV prevalence, malnutrition and poverty are major reasons for the situation of TB in this part of the country. Our model does not capture the demographic and genetic heterogeneity. Incorporation of monthly notification of TB, age and sex based categorisation of data, multi-drug resistant TB cases, co-morbidity of TB with other diseases and TB patients treated in private sector may allow us to assimilate more parameters and compartments to the
model. This is however subjected to the availability of data. For instance, in spite of constant efforts of RNTCP, many TB patients seek treatment outside the programme settings and missed by the notification system [24]. Count data of TB cases is accessible in the form quarterly reports. The limited availability of data has restricted the present study to a simple model. However, the estimate of infection rate is in close match with previously reported results [8].
Conclusion To make RNTCP program more effective a shift in strategies is needed for states with higher transmission rates as compared with states with higher fraction of smear positive cases. For example, states with higher transmission rate require more focus on educational and awareness programs to mitigate the transmission of TB bacteria from infected to susceptible. Human behaviour plays an important role in the spread of infectious agent; transmission of the disease can be reduced by increasing the perception of risk about a disease. For example, in states like Bihar and Chhattisgarh people should be educated to reduce their contacts to TB infected people. On the other hand, intensely populated areas like Delhi, infected people should be encouraged to use masks. A significantly different strategy is needed in states like Manipur and Mizoram which are con-infected with HIV. For states with lower transmission rates but high value of p, the strategy should be to increase the number of susceptible people examined for TB. This will help in increasing the notification rate of smear positive cases and reduce the overall burden. Our model provides an understanding of TB transmission, whereas Kalman filter technique supplies the robust
Parameter estimation of tuberculosis transmission model estimation of parameters of this model to identify the severity of TB in various regions of India. Accurate estimation of these parameters helps in developing TB control programs and treatment strategies. Advanced diagnostic tools and treatment techniques should be acquired to rapidly lower TB transmission with high infection rate states such as Manipur, Mizoram, Chhattisgarh, Bihar, and Delhi. This will also be further helpful in optimal use of resources.
Ethics No ethical permission is required to conduct the present study.
Authorship contribution SA, AB and PL designed the study. PN and VP carried out the analysis. All authors critically reviewed the manuscript.
Conflicts of interest The authors of this paper have no conflicts of interest to declare.
Funding sources The author(s) received no financial support for the research, authorship, and/or publication of this article.
Provenance and peer review Not commissioned; externally peer reviewed.
References [1] Global tuberculosis report 2012, WHO, page 3. http://apps. who.int/iris/bitstream/10665/75938/1/9789241564502_eng. pdf. [Accessed 5 July 2016]. [2] RNTCP annual status report (2006-2013) TB India. http:// tbcindia.nic.in/. [Accessed 5 July 2016]. [3] Dye C, Williams BG. The population dynamics and control of tuberculosis. Science 2010; May 14;328(5980):856e61. [4] Tanaka MM, Francis AR, Luciani F, Sisson SA. Using approximate Bayesian computation to estimate tuberculosis transmission parameters from genotype data. Genetics 2006;173: 1511e20. [5] Bowong S, Kurths J. Parameter estimation based synchronisation for an epidemic model with application to tuberculosis in Cameroon. Phys Lett A 2010;374:4496e505. [6] Liu L, Zhao XQ, Zhou Y. A tuberculosis model with seasonality. Bull Math Biol 2010;72:931e52.
191 [7] Okuonghae D, Korobeinikov A. Dynamics of tuberculosis: the effect of direct observation therapy strategy (DOTS) in Nigeria. Math Model Nat Pheno 2007;2:113e28. [8] Mandal S, Arinaminpathy N. Transmission modeling and health systems: the case of TB in India. Int Health 2015;7:114e20. [9] Mishra BK, Srivastava J. Mathematical model on pulmonary and multidrug-resistant tuberculosis with vaccination. J Egypt Math Soc 2013;22:311e6. [10] Tan WY, Ye Z. Estimation of HIV infection and incubation via state space models. Math Biosci 2000;167(1):31e50. [11] Dukic V, Lopes HF, Polson NG. Tracking epidemics with Google flu trends data and a state-space SEIR model. J Am Stat Assoc 2012 Dec 1;107(500):1410e26. [12] Chiogna M, Gaetan C. Hierarchical space-time modelling of epidemic dynamics: an application to measles outbreaks. Stat Methods Appl 2004 Apr 1;13(1):55e71. [13] Yang W, Karspeck A, Shaman J. Comparison of filtering methods for the modelling and retrospective forecasting of influenza epidemics. PLoS Comput Biol 2014;Apr 24;10(4):e1003583. [14] Chen S, Fricks J, Ferrari MJ. Tracking measles infection through non-linear state space models. J R Stat Soc Ser C 2012;61(1):117e34. [15] Murphy BM, Singer BH, Anderson S, Kirschner D. Comparing epidemic tuberculosis in demographically distinct heterogeneous populations. Math Biosci 2002;180:161e85. [16] Gillijns S, Mendoza OB, Chandrasekar J, Moor BLR, Bernstein DS, Ridley A. 2006 what is the Ensemble Kalman Filter and how well does it work?. In: Proceedings of the American Control Conference. Minneapolis: Minnesota USA; 2006. [17] Narula P, Azad S, Lio P. Bayesian melding approach to estimate the reproduction number for tuberculosis transmission in Indian states and union territories. Asia Pac J Public Health 2015 Oct 1;27(7):723e32. [18] Shaman J, Karspeck A. Forecasting seasonal outbreaks of influenza. Proc Natl Acad Sci U S A 2012;109:20425e30. [19] Sinha DN, Gupta PC, Pednekar MS. Tobacco use among students in the eight North-eastern states of India. Indian J Cancer 2003;40(2):43e59. [20] Rani M, Bonu S, Jha P, Nguyen SN, Jamjoum L. Tobacco use in India: prevalence and predictors of smoking and chewing in a national cross-sectional household survey. Tob Control 2003; 12:e4. http://dx.doi.org/10.1136/tc.12.4.e4. [21] Global adult Tobacco Survey India (GATS-India), Ministry of Health and Public Welfare, Government of India, 2009-2010. http://mohfw.nic.in/WriteReadData/l892s/1455618937GATS %20India.pdf. [Accessed 5 July 2016]. [22] D’ Arc Lyra Batista J, de Fa ´tima Pessoa Milita ˜o de Albuquerque M, de Alencar Ximenes RA, Rodrigues LC. Smoking increases the risk of relapse after successful tuberculosis treatment. Int J Epidemiol 2008;37(4):841e51. http: //dx.doi.org/10.1093/ije/dyn113. [23] Chakraborty AK. Epidemiology of tuberculosis: current status in India. Indian J Med Res 2004;120:248e76. [24] Satyanarayana S, Nair SA, Chadha SS, Shivashankar R, Sharma G, Yadav S, et al. From where are tuberculosis patients accessing treatment in India? Results from a crosssectional community based survey of 30 districts. PLoS One 2011 Sep 2;6(9):e24160.