Extending geospatial data to support epidemiological ...

3 downloads 480 Views 536KB Size Report
IBM Almaden Research Center. 650 Harry Rd. .... Any member of the population is either in ... Platform independent, STEM is available in versions for Apple,. Linux and ... provides complete user documentation, developer tutorials, and.
Extending Geospatial Data to Support Epidemiological Modeling Stefan Edlund

Matthew Davis

James Kaufman

IBM Almaden Research Center 650 Harry Rd. San Jose, CA 95120 +1(408) 927-1766

IBM Almaden Research Center 650 Harry Rd. San Jose, CA 95120 +1(408) 927-1029

IBM Almaden Research Center 650 Harry Rd. San Jose, CA 95120 +1(408) 927-2477

[email protected]

[email protected]

[email protected]

ABSTRACT

relationships.

This paper describes the Spatiotemporal Epidemiological Modeler (STEM), an open source disease modeling application available through the Eclipse Foundation. The most distinguishing aspect of STEM is that it provides an open platform for researchers to build, run, share, and reuse models of infectious disease. We give a motivation why we believe an extensible architecture is desirable for these types of applications, allowing new models to be built on top of existing proven models. We also describe some of the features in STEM, and explain how GIS data must be extended to support interactions between regions – including the mass action terms that drive transmission of infectious disease.

2. STEM

Categories and Subject Descriptors I.6.3 [Computing Methodologies]: Simulation and Modeling Applications

General Terms Algorithms, Design, Experimentation.

Keywords Epidemiological Modeling, GIS, Simulations, Eclipse, OSGi

1. INTRODUCTION Given the increase of global transportation and the acceleration of global commerce, the threat from emerging disease has increased. Scientists and public health officials require new tools to anticipate how disease risk is changing in time and space. Epidemiologists have developed a wide range of mathematical models to describe the future state of a disease at a location given an initial condition, and these models depend on a variety of GIS data. Disease transmission occurs not only through human travel, but also through insect vectors whose population and range are affected by global climate change and local microclimate factors. Integrating this wide range of geospatial data into layers of models requires an extensible software framework and relevant GIS data extended in ways to describe spatiotemporal

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM SIGSPATIAL GIS '12, November 6-9, 2012. Redondo Beach, CA, USA Copyright (c) 2012 ACM ISBN 978-1-4503-1691-0/12/11...$15.00

589

Traditional approaches to software development may be too slow to respond on the timescale relevant to an unexpected threat to public health. The OSGi framework [1] provides a modular architecture separating an application into smaller bundles that are independently managed and deployed into an application. This ability, plus the ability to rapidly extend existing software components and build on the work of others, accelerates the development and testing of new software. The Spatiotemporal Epidemiological Modeler (STEM, www.eclipse.org/STEM) is an Eclipse Rich Client Application based on Eclipse Equinox (a reference implementation of the OSGi standard). The principal idea behind STEM is to provide an open simulation platform where epidemiologists and public health experts can develop and share models and data relevant to the study of disease [2]. Equinox brings to STEM a “plug-and-play” software architecture allowing GIS data models and computational models to be seamlessly reused and extended. As an open source project, STEM is free and available to any scientist. Researchers are also free to contribute back to its growing library of models, computer code, and denominator data. A video demonstration of STEM is available at http://www.youtube.com/watch?v=OoiFLemepw4.

3. FEATURES This section discusses some of the features and functions available in STEM.

3.1 Building a STEM Scenario At its core, STEM provides a simulation engine that solves differential equations. Scientists have created mathematical models to study the dynamics of disease at least since Ronald Ross published his famous differential equation based model for Malaria in 1911 [3]. STEM provides built-in epidemiological models including many classic textbook deterministic and stochastic compartment models. Examples include seasonal influenza [4, 5], as well as many advanced models of vector-borne diseases such as Malaria [6] and Dengue Fever [7]. Recently models for zoonotic diseases such as salmonella are also available [8]. Basic examples are available in textbooks such as Infectious Diseases of Humans [9]. The differential equations defined in a disease model can be solved using a choice of solvers including a fast finite difference integrator or by using more accurate adaptive numerical integration algorithms [10, 11]. Like most things in STEM, solvers are pluggable and users are free to contribute their

own if needed. Users can easily introduce their own new models, extending the disease algorithms already in STEM or defining entirely new differential equations for newly emerging disease. Epidemiological models depend on data, including a large range of denominator data describing human and animal populations, transportation, etc. Many of STEM’s data plug-ins are GIS data sets, defining geography, transportation systems and population for 244 countries and dependent areas. Some of these data sets describe GIS relationships between locations such as nearest neighbor connectivity, road travel, or air transportation. For most countries STEM provides population data at administrative level 2 (counties in the US). The most important executable object (i.e., objects that can be processed in the simulation engine) in STEM is called a scenario. Scenarios encapsulate a GIS data model of a region of interest, as well as computational models for one or more populations and/or diseases. STEM separates the modeling of population dynamics (including births, deaths, transportation, and migration), from the modeling of the disease itself. It is possible to run a simulation in STEM without any disease at all and, for example, model only the effects of climate or other factors on an important vector population such as the anopheles mosquito. This vector model can then be used together with a model of an emerging vector borne disease. After creating a GIS data model for a region to study, including one or more populations and diseases with desired parameters, a user needs to define the “initial state” of the simulation. STEM uses the concept of “infectors” and “inoculators” to do this. An infector describes the number of individuals infectious at a given location, whereas an inoculator moves individuals into a “removed” (or immune) state. In the immune state an individual is neither susceptible nor infectious. There are several options for creating infectors and inoculators in STEM. If the initial condition is based on historic surveillance data, it is possible to generate infectors and inoculators by importing public health incidence data from a file. Alternatively, a simple user interface (UI) wizard can be used to define the initial number of infected or immunized people at a particular location.

Figure 2. Screenshot from the Simulation Perspective The upper right of Figure 1 shows how scenarios are organized. A scenario contains a geospatial model (Thailand.model), which itself contains sub-model (THA_0_1_2_population.model) with GIS data on geographic regions down to administrative level 2, including human population statistics as well as common border relationships. This sub-model is pre-built in STEM and can easily be dragged and dropped into a user-defined model from the STEM library. Thailand.model (which was created by the user) contains an aging population model (ChildrenAdults.standard) subdividing the population into adults and children (discussed below). In addition, the model contains a model for the disease itself (Flu.standard) with epidemiological parameters set by the user. Inside the scenario (ThailandDemo.scenario) we find the top-level model, a single infector, and two inoculators. Generally in STEM, objects higher up in the hierarchical structure depend on objects deeper down. The scenario also contains a sequencer that determines the flow of time in the simulation, i.e., when to start, when to stop, and how often to generate results to outputs such as map viewers and log files. Modeling many diseases, including disease of childhood, requires dividing populations into subgroups. This is possible in STEM using an aging population model. For instance, a human population can be divided into demographic groups (Children under 2, Adults over 65, etc.). Modeling a disease where multiple population groups interact affects disease parameters. For example, recovery rate becomes a vector of values and transmission rate becomes a matrix. A simple two-group example is shown in Figure 1, where the user is populating the transmission rate matrix between children and adults. The disease state of the population is computed as the population ages and transitions from one demographic group to another. Figure 2 shows the UI displayed when a simulation is running. A map is displayed indicating the relative number of individuals in a given disease state, in this case the ‘I’ (infectious) state for adults. The initial case originated in Bangkok and spread via mixing across common borders to other regions in Thailand. In the lower right, the epidemic curve in the form of a time series for one region is shown. The user can pick any region (by double clicking on the map) to visualize one or more time series plots or phase space plots.

Figure 1. Screenshot from the Designer Perspective.

590

Frequently users want to generate log files containing the output of a simulation. Any combination of data, by population group and/or disease state can be logged. STEM provides a variety of logger plug-ins such as limited file loggers and map video loggers. To include these, a user simply drags loggers into any scenario.

3.2 Spatial Interactions A model that deals only with the trajectory of a disease in time implicitly assumes that the population (or populations) in question is well mixed across all locations of interest (i.e., that there is no need to model the spatial distribution of people). However, for very large-scale simulations, the details of population distribution, transportation, trade, even wild bird migration, can all be important factors in understanding the evolution of an infectious disease in space and time. The STEM framework allows users to “plug-in” custom models of transportation for any population in any graph. As a convenience, STEM implements a basic nearestneighbor mixing model for human populations derived from the common border edges in STEM’s built-in geographic data graphs. Common border edges describe physical adjacency between two geographic regions. As an example of a very simple epidemiological model, consider a compartment model of the common rhinovirus (a common cold). In this model people are considered to be either susceptible to the rhinovirus, or to be infectious. The rhinovirus changes so fast in a population that people do not typically develop long-term immunity, so once they leave the infectious state they simply become susceptible again. This defines an SI (or SIS) compartmental model. Any member of the population is either in

the Susceptible compartment or in the Infectious compartment. A set of differential equations describes the disease dynamics. An SIS model, like most epidemiological models, contain a massaction term that drives transmission of an infectious disease as infectious individuals, I, mix with susceptible individuals, S at a region j:

"S % dS j = −β $$ j '' I j + γ I j dt # Nj & !S $ dI j = β ## j && I j − γ I j dt " Nj % where 𝜸 is the recovery rate, and β is the transmission rate, defined as the number of effective contacts (a contact resulting in a new infectious case) in a fully susceptible population per unit of time. Nj is the total population at location j. We simplified the equations by not including the background birth rate and death rate of the population. To capture relationships between different locations in a graph, STEM automatically includes the rate of infection in the population j due to mixing with a population at a neighboring region k. The incidence, or rate, of new infection becomes: K

ΔI j ∝ β S j

∑m

jk k

∑m

Nk

I

k=1 K

jk

k=1

where mjk is the mixing rate between regions j and k (note that mjj=1 ), Ik is the number of infectious in region k, and K is the total number of regions bordering region j. Nk is the total population in region k. All models of disease in STEM take into account mixing of population across borders. Users can customize mixing rates on the graph for individual border edges or set a global mixing rate based on relative areas of neighboring regions and a user specified average commuting distance.

4. STEM DESIGN AND ARCHITECTURE Platform independent, STEM is available in versions for Apple, Linux and Microsoft operating systems. All of its main components – the representational framework, simulation engine, graphical user interface, disease model algorithms, vector capacity and population models, and various data sets – are distributed as independent bundles or plug-ins. Each of these components can be developed, deployed, and used with declarative software extension points making it possible for users to compose modelson-models, extend old models, and create new ones. STEM provides the basic elements for developing sophisticated simulations of disease spread.

Figure 3. Composing a scenario for avian influenza in STEM. STEM scenarios contain building blocks of reusable components.

591

STEM represents the world, or any region graph. The nodes define any physical “decorated” with labels derived from GIS example, human or livestock populations, variables, etc. The graph also contains relationships between nodes; for example

of the world, as a locations and are data including, for land area, climate edges that define the fact that two

geographic regions are connected by a common border or interstate highway. One pluggable graph of edges defines the global air transportation system. More details on the STEM air transportation model can be found in [12]. Edges and nodes can also be decorated with dynamic labels. For example, a set of labels describes the dynamic disease state of a population and tracks that state on a given node.

6. REFERENCES

Decorated graphs can be combined and wrapped inside a STEM model to form a self-contained, reusable unit. One example of a decorator is a model for a disease. STEM models and graphs are composable: models can contain graphs as well as other models. The STEM user interface supports dragging and dropping graphs and models to make up a given scenario. Figure 3 shows an example of how to compose a scenario for avian influenza in STEM. The user starts with two models containing graphs representing regional geographic data for Canada and United States, as well as graphs for human population data and road networks, and then overlays a model of the North American bird population and bird migration routes. On top of this, an algorithmic disease models such as H5N1 or H3N2 is added and finally everything is contained inside a scenario. The essential components to model an avian influenza outbreak are now in place.

[3] Ross, R. 1911. The Prevention of Malaria. 2nd edition. Murray, London..

STEM components are written in Java™. A population or disease model defines the difference, or “delta,” in the state of a population or disease in a population given a time step and the current state of the population or disease. Population models, analogous to disease models, dynamically update population numbers over time based on specified birth rates, death rates, and user defined algorithms. Insect population models compute “vector capacity” based on historic or predicted GIS climate and weather data.

5. Conclusion STEM offers many features not described in this paper. These include support for zoonotic models, the ability to set up triggers and actions (e.g., implement a vaccination program) when a condition is satisfied during a simulation, and a feature for fitting parameters of a disease to clinical case report data. A powerful graphical editor is also available giving the user full control over data stored in STEM graphs. New extensions to STEM’s platform are planned, including support for modeling food production and foodborne disease. The STEM wiki at Eclipse (http://wiki.eclipse.org/index.php/STEM) provides complete user documentation, developer tutorials, and links to video tutorials on YouTube in several languages.

592

[1] OSGI, The Dynamic Module System for Java™ http://www.osgi.org [2] Kaufman, J., Edlund, S., Douglas, J. Infectious disease modeling: creating a community to respond to biological threats. Statistical Communications in Infectious Diseases, Vol 1, Issue 1, Article 1. The Berkeley Electronic Press.

[4] Edlund, S., Kaufman, J., Lessler, J., Douglas. J., Bromberg, M., Kaufman, Z., Bassal, R., Chodick, G., Marom, R., Shalev, V., Mesika, Y., Ram, R., Leventhal, A. Comparing three basic models for seasonal influenza. Epidemics. 3, 3 (Aug. 2011),135-142. DOI=10.1016/j.epidem.2011.04.002 [5] Edlund, S., Bromberg, M., Chodick, G., Douglas, J., Ford, D., Kaufman, Z., Lessler, J., Marom, R., Mesika, Y., Ram, R., Shalev, V., Kaufman, J. A spatiotemporal model for influenza. HIC 2009, Frontiers of Health Informatics, (Canberra, Australia, August 19-21, 2009). [6] Edlund, S., Davis, M., Pieper, J., Kershenbaum, A., Waraporn, N., Kaufman, J. A global study of malaria climate susceptibility. Epidemics 3 (Boston, MA, November 29December 2, 2011). [7] Hu, K., Thoens, C., Edlund, S., Davis, M., Kaufman, J.H. Disease dynamics in three models of dengue fever. In Proceedings of the 2012 Industrial and Systems Engineering Research Conference (Orlando, FL, USA, May 2012). [8] Hu, K., Thoens, C., Edlund, S., Davis, M., Kaufman, J.H. STEM: A platform for modeling complex zoonotic disease, Industrial and Systems Engineering Research Conference, (Orlando, FL, USA, May 21, 2012). [9] Anderson R.M., and May, R.M. 1991. Infectious Diseases of Humans: Dynamics and Control. Oxford Science Publications, New York and London. [10] Cash, J.R., and Karp, A.H. A variable order Runge-Kutta method for initial value problems with rapidly varying righthand sides. ACM Transactions on Mathematical Software 16 (1990), 201-222. DOI=10.1145/79505.79507 [11] Dormand, J. R. and Prince, P. J., A family of embedded Runge-Kutta formulae. Journal of Computational and Applied Mathematics 6, 1 (1980),19–26. DOI=10.1016/0771050X(80)90013-3 [12] Lessler, J., Kaufman, J.H., Ford, D.A., and Douglas, J.V. 2009 Feb 6. The cost of simplifying air travel when modeling disease spread. PLoS ONE 4, 2: e4403 (Feb. 6, 2009).DOI= 10.1371/journal.pone.004403.

Suggest Documents