Application of Restricted Maximum Likelihood in Log Empirical Bayesian Kriging for Spatiotemporal Mapping of Tuberculosis Ms.Kasturi Basu
Applied Statistics Unit, Indian Statistical Institute, Kolkata, India
GIS Technique
Introduction Mycobacterium Tuberculosis (TB) is a highly airborne communicable and curable disease, a leading cause of death worldwide. • In
2015, an estimated 10.4 million new (incident) TB cases worldwide. • India, Indonesia, China, Nigeria, Pakistan and South Africa accounted for sixty percent of the new cases. • TB has devastating social and economic impact.
Semivariogram Model in EBK α
• RNTCP
Map, North 24 Parganas District: treated as background Map of the district. • Centroids of TUs (22 in the district): calculated and digitized, using ARCGIS10.2 software. • PNSPP values are attached to respective TU centroids.
γ(h) = N ugget + b|h|
Figure 1: North 24 Parganas District
Study Design • Study
area: North 24 Parganas District, West Bengal, India. • Tuberculosis Units(TUs): Nodal point for TB control activities in the sub district. • Data Source: Revised National Tuberculosis Control Program (RNTCP) Performance Report, separately, for each Quarter (1 to 4), 2008-10. • Study Variables: Proportion of New Smear Positive Patients(PNSPP), initiated on treatment. [ PNSPP in a quarter = (Sum of new smear positive patient in the respective TU in the specified quarter/RNTCP population covered by that TU in that quarter)*1,00.000 ].
Empirical Bayesian Kriging (EBK) •A
Kriging based interpolation method. • A posterior distribution specified by means of simulations. • Simulated values at the measurement points: treated as new data-sets. • Accounts for uncertainty in semivariogram estimation: simulating many semivariograms from the input data. • To get the a posterior function for prediction: repetition of simulation and estimation process. • Uses restricted maximum likelihood (REML). • Allows accurate predictions for small data sets. Log Empirical Bayesian Kriging (LEBK): EBK and REML applies to Log transformed random field.
semivariogram: from known data
locations. • Use single semivariogram: to make predictions at unknown locations. • The estimated semivariogram: treated as the true semivariogram for the interpolation region. • Underestimate standard errors of prediction:
For all predicted surfaces: Using possible sets of TU centroids with the corresponding attributes treating one as unknown. Year 2008 2009 2010
Qtr 1 0.15 1.27 0.50
Qtr 2 0.31 0.36 0.08
Qtr 3 0.50 0.19 0.11
Qtr 4 0.27 0.13 0.10
Table 1: Relative Standard Error Percentage
Important Result
Acknowledgements
References [1] N. Cressie. Statistics for Spatial Data. Wiley,New York, 2nd rev.ed. edition, 1993. [2] Spock G. Pilz J., Pluch P. Bayesian kriging with lognormal data and uncertain variogram parameters. Geostatistics for Environmental Applications. Conference paper, pages 51–62, December 2005. [3] Davis CA. Leylend AH. Empirical bayes methods for disease mapping. Stat Methods Med Res, 14(1):17– 34, Feb 2005.
• Spatial
Cluster of NSPT identified in each Quarter, 2008-10. • Prevalence of NSPT gradually reduced in 4th quarter 2010, compared to 2008 and 2009. • Number, shape, size and Direction of clusters: changed over the years 2008-10.
Other Kriging Methods • Calculate
Application of more advanced Geostatistical methodologies with covariates and real time surveillances are needed to address simultaneously Co-morbidities of TB associated with HIV,diabetes,smoking and other diseases.
Applied Statistics Unit and CSSC Lab of the Indian Statistical Institute, kolkata, for providing infrastructural facilities in using ArcGIS10.2.software package
Cross Validation
Objective Mapping of Tuberculosis,using real life data relevant in Indian context. • Novelty: Application of Restricted Maximum Likelihood(REML) in log Empirical Bayesian Kriging (LEBK) to identify visually: • Spatial clusters of risk surfaces of New Smear Positive Tuberculosis (NSPT), • Changes,if any,in the risk pattern, over space and time. • Estimate associated uncertainties.
(1)
γ : Semivariogramvalue, h : distance, N ugget : Error, b : slope, α : power. REML in LEBK methods imposes restrictions: for a given distance h, b(slope): positive, α between 0.25 and 1.75.
22 21
• Spatiotemporal
Future Research
Conclusion Application of REML in LEBK resulted in: • Most
accurate prediction for all 12 Quarters. • Percentage Relative Standard Errors (PRSE): very negligible for all 12 sets. • PRSE varied only between 0.08 to 1.27, over the years 2008-10.
Contact Information Applied Statistics Unit
Indian Statistical Institute 203, B.T.Road, Kolkata Pin code: 700108 West Bengal, India Phone(O):+91(033)25752800 Mobile: +919831554563 URL:www.isical.ac.in/ asu Email:
[email protected]