An Architecture for Social Simulation Models to Support Spatial Planning 1,
2
1
1
2
Mark Birkin Paul Townend , Andy Turner , Belinda Wu and Jie Xu 1 2
School of Geography, University of Leeds, LS2 9JT, UK School of Computing, University of Leeds, LS2 9JT, UK
Email address of corresponding author:
[email protected] Abstract. We present an architecture for grid-enabled simulation modeling. The use of grid resources provides access to the substantial data storage and processing power which are necessary to translate such models from computational tools into genuine planning aids. As well as providing access to virtualized compute resources, the architecture allows customized applications to meet the needs of an array of potential user organizations.
1. Background MoSeS (Modelling and Simulation for e-Social Science) is a research node of the National Centre for e-Social Science (NCeSS). MoSeS aims to use e-Science techniques to develop a national demographic model and simulation of the UK population specified at the level of individuals and households. There are an abundance of simulation games relating to people, cities and societies (past, present and future). We pose the question of what would be the impact of transferring these simulations into a real world environment. Our specific interest is in cities and regions, with an aim of building simulation models of interactions between individuals, groups or neighbourhoods within large metropolitan areas. Such simulations can form the basis of a wide range of applications in both e-Research and public policy analysis. Specifically, MoSeS aims to develop scenarios in the domains of health, transport and business. For example, one health scenario would be to provide perspectives on medical and social care within local communities for a dynamic and ageing population. A scenario in the transport domain might concern the sustainability of transport networks in response to demographic change and economic restructuring: for example, what kind of transport network is capable of sustaining long-term economic growth in West Yorkshire, Greater Manchester, and the intervening areas - the ‘Northern Way’. A scenario in the business domain might include the impact of diurnal population movements on retail location and profitability; or the impacts of a changing retirement age on personal wealth and living standards. The MoSeS project stands to benefit from e-Science technologies in a number of ways; in particular, the simulation model will draw on diverse, virtualised data sources, will deploy models which are richly specific and therefore computationally intensive, and will provide outputs to a spatially distributed community of researchers and policy-makers. In this paper, we will describe our experience in the design and implementation of an architecture to support Moses. Next, we present the Moses architecture, and describe its various constituent features. A review of a potential use case for simulation modeling in spatial planning is
presented in Section 3. In the final section of the paper, we offer a discussion of our experience to date and future plans.
2.
Architecture
The architecture for a grid-enabled spatial decision support system has evolved over a period of time (e.g. AHM 2003, eSS 2005) into the relatively stable configuration shown at Figure 1. The main user interface is accessed through a JSR168 portlet container, usually referred to simply as the ‘Moses portal’. The portal accesses processing resources through the UK National Grid Service, and also interacts with a Storage Resource Broker cluster for the maintenance of a simulation data archive. The Moses technologies shown at the top of the diagram interact with a broader domain of distributed resources, and in particular the provision of social, demographic and map data. MOSES Archiving portlet
MOSES Charting portlet
MOSES Analysis portlet
MOSES Selection portlet
SRB authentication and authorisation
MOSES Mapping portlet
MOSES archive store
MOSES forecasting store
MOSES forecasting module
MOSES virtual population
MOSES demographic module
Portlet container authentication and authorisation
JSR-168 Compliant Portlet Container
Google Maps
User
Storage Resource Broker Cluster
HPC Cluster / Grid
EDINA
CASWeb Data Resources
Internet
Figure 1.
Moses Architecture
The Moses portal development has been implemented using Gridsphere (Novotny et al, 2004). The two technologies which we evaluated in reasonable detail were GridSphere and Sakai. Of the two, Sakai is probably better supported within the UK e (social) science community, with particular expertise in the Lancaster and Daresbury research node of the National Centre for e-Social Science. However at the time of our evaluation, Sakai was not JSR168 compliant and could not therefore guarantee transferability between devices and hosting environments. We were keen to maintain maximum flexibility, for example with the option to provide decision support tools across mobile devices. Also we considered the ‘iframes’ design templates within Sakai to be somewhat restrictive, preferring the greater freedom offered by the xslt protocols providing xml support within Gridsphere. For these reasons, Gridsphere was selected ahead of Sakai as the Moses technology framework.1 The current implementation of Moses combines portlets for selection, analysis, charting, mapping, and scenarios. We conceive a portlet as a pluggable user interface component which is managed and displayed within a container or web portal. The portlet is activated through fragments of markup code, usually html, which are aggregated into a portal page. The java portlet specification (JSR168) allows the developer to define a contract between the 1
Note that Sakai has recently become jsr168 compliant (Summer 2007). In practice, this should mean that the Moses portlets are fully reuseable within Sakai.
portlet container and the portlets, and provides a convenient programming model for the creation of portlets. The selection portlet (see Figure 2) allows the user to make geographical decisions about the city or region in the UK which is to be examined, and the level of geographical detail which is required: from relatively aggregate ward-level geographies, through mid-level super output areas right down to very detailed output areas.2 The selection of a subset of areas within a defined region is also supported by this portlet. The analysis portlet is essentially a processing engine, which takes the detailed model simulation results, comprising lists of individual household characteristics, activity, attitudinal and behavioural data, and aggregates the data into geographically referenced indicators (say, levels of car ownership within Leeds wards). The charting portlet allows users to represent and explore various patterns within the simulation results. For example, a chart of ‘age’ against ‘diabetes’ would show variations in the incidence of diabetes within different demographic age groups. Charts can also be used to represent variations in an indicator, like diabetes, over time; or to observe spatial variations between different areas. Output methods include line graphs, pie charts and bar graphs.
Figure 2: The Moses Selection Portlet Source: 2001 Census, Output Area Boundaries. Crown copyright 2003. Aggregations from the analysis portlet can also be displayed cartographically within the mapping portlet, which is often the most convenient way to display substantial quantities of information about urban and regional populations. In essence, the maps are simple thematic or choropleth maps showing variations in the intensity of an indicator under study. At present, Moses supports two rather different approaches to the presentation of map data. On the one hand, we can present spatial data against the backdrop of a google maps image. This allows users to make real inferences about map distributions in relation to the underlying local geographies, for example that a specific area has high levels of crime or deprivation. 2
Census wards are primarily an administrative geography. There are about 11,000 wards in the UK with an average population in the order of 2,000 households. Super output areas are a convenient means of presenting highly disaggregate statistics at a neighbourhood scale. There are more than 200,000 SOAs with an average population in the order of 100 households (see Rees, Martin Williamson, 2002).
This approach draws on development work of the ‘google map creator’ which has been undertaken by the GeoVue team, also within the NCeSS network (GeoVue, 2007). Sample output is shown at Figure 3. An alternative approach is to link directly to national vector data sets from the Ordnance Survey and elsewhere which are curated through Digimap at the University of Edinburgh. Moses is collaborating with developers at Edinburgh, Manchester and elsewhere in the SeeGeo project (Secure Access to Geospatial Services) in the production and testing of grid-enabled mapping technologies.
Figure 3. The Moses Mapping Portlet Source: 2001 Census: Standard Area Statistics (England and Wales) Source: 2001 Census, Output Area Boundaries. Crown copyright 2003. The advantage of the google maps route is that the background environment is increasingly familiar to a wide variety of users, and the maps appear to be available free of licensing or copyright restrictions. On the other, the quality and continued supply of the maps cannot be guaranteed, and there are currently no facilities for full grid integration of google maps – in effect, gmap creator is a web-service which cuts a google image of the local area and pastes it under a thematic map of the developer’s specification. The advantage of the SeeGeo route is that it potentially provides to a wide range of top quality spatial datasets using fully configurable grid services. Other data providers, notably MIMAS (census data services) are also collaborators within SeeGeo, which provides seamless access to multiple datasets in the bottom layer of the Moses architecture at Figure 1. Finally, both SeeGeo and Moses are linked through an interest in the Open Grid Consortium, which makes this route compatible with our enthusiasm of complete flexibility and transferability within the portlet technologies. On the downside, the datasets within Digimap are strictly controlled under copyright, which means that access portlets need to support effective security and authentication procedures, and that users could be restricted by the need to obtain appropriate permissions for access to the data (not usually an issue for academics, but possibly a concern for commercial or public sector users). The SeeGeo project is also ‘further from the market’ than google map creator: even though the final product may be at least equivalent in quality to google maps, it will require significant investment in the service to reach and maintain this level. It can be seen that the choice of a mapping technology is a relatively complex question and we are content to keep the options for Moses open at the present time.
The scenario portlet is designed to support ‘what if’ analyses of social evolution or specific policies, planning decisions and strategies. This portlet is therefore at the heart of the policy value of Moses. Scenario analysis ranges from a projection of existing patterns and social processes into the future, to relatively local analysis of the impacts of a specific planning decision, such as the decision to merge two hospitals, to the impact of national social interventions, such as changes in income tax or retirement ages, and regional or national planning strategies, such as brown field versus green field development. Scenarios are supported by detailed and complex model-based simulations of social and demographic structures. These have been discussed with reasonable technical detail elsewhere (Birkin et al, 2006; Wu, Birkin and Rees, 2007) and a full specification is neither necessary or appropriate here. A brief overview is presented here in order to substantiate two rather important observations relating to the architecture. As we have noted earlier, Moses is a national level individual and household-based simulation. It represents 60 million synthetic individuals, grouped into 24 million households across the UK. In the current version of Moses, we represent individual and household characteristics from the UK Census Sample of Anonymised Records, plus limited health information which has been extracted from the British Household Panel Survey. In total, this provides us with 60 attributes for each individual, and another 20 for the households into which they are combined. However work is well-advanced on a method for synthetic linkage of data extracts which would allow hundreds of other attributes to be estimated with acceptable confidence. The base population is therefore a very rich, detailed, and above all a large population dataset! Then we seek to address the genuinely hard challenge of projecting this base population on an annual cycle 25 years into the future. Each annual cycle involves the individual population members in a series of transition processes. In the current model implementation, we focus on the main demographic processes – births, deaths, migration, household formation, marriage, divorce and illness. We use the best available demographic datasets to build realistic representations of these transition processes in relation to underlying causal factors: for example migration is most strongly determined by factors like age/ lifecycle, household size, marital status, housing type, and ethnicity. Therefore each annual iteration of the model involves each one of millions of individual entities in dozens of potential transitions or decision-making processes. This generates a computational task which is pretty daunting: we have estimated that a single model scenario generates something like 3 terabytes of data and in the order of 900 giga floating point operations.3 The first important point arising from this discussion is that the simulation model has been completely divorced from the decision-support portal within the Moses architecture. In other words, we can construct scenarios and feed the results into the portal for users to examine from many angles and perspectives. What we do not allow is for users to generate scenarios directly, for example in relation to the latest policy option or planning proposal. We have taken this decision for two reasons, both important but rather different. Firstly, as we have discussed at length above, each scenario involves substantial computation. In principle, computational grids provide a means to execute such simulations in real time (and note that experience with users indicates that ‘real time’ is really rather short – models which take more than about a minute to execute will soon lose popularity with regular users). In practice, we still seem to be a fair way from having unrestricted, cheap access to grid 3
The good news is that a substantial portion of this data is ‘temporary’ and does not need to be stored in the long-term. For example, within a single model iteration we may need to generate a file of ‘migrants’ so that they can be assigned to new dwelling units by the model. But once the assignment is complete, we don’t need the migrant file any more. The bad news is that we have not yet considered the many activities, behaviours and other processes beyond the purely demographic, which would increase the complexity of the simulation by a number of orders of magnitude.
resources at this scale. The second reason is a concern with the technical ability of users to specify, execute, annotate and archive complex spatial scenarios. The training and system design implications of trying to support such an activity are daunting. The second important point, arising from the first, is that both the model and the portal present significant challenges for e-Science in the design and execution. However the challenges are perhaps somewhat different in the two components. The modelling requires high performance computation, and note that most of this computation is probably associated with experimental work in model construction and testing. The model is a piece of social research: it is built in accordance with certain design principles, but is not a piece of software engineering to implement a robust specification. The specification is part of a learning process involving the system of interest. On the other hand, the challenges associated with the portal tend to centre on questions of useability, reuseability, security and collaboration. In short, within the Moses project at least, most of the exciting challenges for computer science are in the portal architecture; most of the exciting issues for social science are in the model design and deployment. We observe a certain tension at the inter-disciplinary boundary between social science and computer science for this problem domain. An additional piece of functionality which is still under development is an archiving portlet. We have seen above that the creation of scenarios is potentially a time consuming activity. Hence a means is needed to access these scenarios for interrogation. Of course the scenarios are all pre-created within the existing Moses applications. In situations where users are allowed to configure and execute their own scenarios than it becomes even more important to allow storage of these results, for later examination, for sharing with other users, or for modification and the evaluation of alternatives. However as we also explained above, the simulation results are also extremely voluminous. For this reason we have deployed the Storage Resource Broker as a pillar for data storage within the Moses architecture. The SRB allows data to be stored across multiple physical nodes within a high performance computational network, but to be accessed as a virtualised resource which is independent of both the software and data platforms. In view of the computational demands of the various simulations, access to high performance processing capability is also a key feature of the Moses architecture. In the current application, a Beowulf cluster with 64 parallel nodes is deployed, but the application is scaleable to the White Rose Grid and other resources within the UK National Grid Service. The limitations of the Beowulf cluster, both practical and conceptual, are obvious. It lacks the power necessary for our purpose: existing applications are still taking several days to execute, and although it may be possible to increase the efficiency of these implementations by several orders of magnitude, the resource would still not be scaleable, for example to support multiple users each with their own requirement for scenario development. Access to the resource is limited through competition with other users, although in fairness this is equally true of other WRG and NGS resources. Finally, the Beowulf is not future-proofed, so we have no idea whether it will still be available in 12 months, let alone five or ten years time. Security of access is another big issue which requires proper consideration within the Moses architecture. This is a question of generic importance to spatial decision support systems of this type. Users may wish to create scenarios using data which is restricted or confidential , and perhaps share scenarios with collaborators in a specific part of the (virtual) organisation, or with colleagues at a particular level of seniority. Certainly shared access to users from multiple organisations is unlikely to be without friction. However security is also an
immediate practical concern for a number of reasons. First, Moses is providing access to distributed data and mapping resources which are themselves under restricted access. Usage of Casweb and Digimap is controlled through strict licensing and regulation arrangements. Secondly, we are already beginning to negotiate data sharing arrangements with individual users such as Leeds Teaching Hospitals Trust and Leeds City Social Services for the provision of confidential patient data. Such data sharing arrangements will obviously be subject to ethical and legal commitments regarding their use. Third, in order to fully exploit distributed computational resources, we need a way of transferring bundles of data and executable software in a secure and encrypted fashion, in order to protect the IP of the developers as well as the data of the users. For example, we would consider the deployment of a resource such as CROWN (China Research and development environment Over Wide area Network) for processing, but only with appropriate guarantees about data and software integrity. At present, security is regulated in a somewhat pragmatic and ad hoc fashion. Standard authentication mechanisms within Gridsphere are employed for the registration of users. The individual portlets are also configured in a role-based manner, with individual users being manually assigned with different roles in the creation of their accounts. The Moses administration takes explicit responsibility for ensuring that all users have registration permissions for access to the data mapping resources. Data within the SRB clusters is controlled through further SRB authentication procedures: again this involves the manual access of rights to individuals in the creation of user accounts. We are currently testing the implementation of a shibboleth-based security mechanism into the Moses portal which will allow registered users to gain automatic access to census and map data with no burden of validation on the Moses administrators.
3.
Use Cases for Moses
In the introduction, we explained that Moses is applicable to many substantive problem domains, and that three more specific demonstrations are sought within our project plan. In this section of the paper, we report on progress with one of those problem domains which is specifically concerned with the health care and social service requirements of the Leeds population. Through discussion with users in Leeds Teaching Hospitals Trust and Leeds City Council Social Services, we have identified a number of strategic issues to which Moses is applicable. A particular concern is with the health care demands of an increasingly elderly population. Specific questions and issues are outlined at Table I. Some of the questions are reasonably straightforward, but others play directly to the strengths of the Moses modelling approach. For example, co-dependency of the elderly population within households and ethnic minority populations can both be projected with a high level of confidence from the individual-based modelling. This allows flexible aggregation of outputs to allow the construction of indices of need or service uptake (physical disability, sensory impairment etc) by combining estimates from survey data – Health Survey for England or BHPS, for example, or perhaps using data about existing social services users in the Leeds area. A third class of results necessitates the integration of behavioural models of service utilisation, which will only be captured through incorporation of third party data regarding both the provision and uptake of services. For example, where are the existing day centres and from where do they draw their users? Once such models have been estimated, however, the evaluation of alternative delivery options or future requirements becomes much more feasible.
Target Population
Prevalence / Examples of Examples of questions to consider Incidence Data Data Sources
Older People (OP)
Limiting long term illness (LLTI)
Census 2001 DoH – Health Survey for England 2004
Can self-reported limiting long term illness be validated against other data sources? Does the profile for LLTI correlate with other data eg morbidity data? Which wards / localities have the most OP with LLTI?
Physical Disability Sensory Impairment
Health Survey for England 2000 (DOH)
Is there a correlation with the distribution of services eg home care, equipment and adaptations, hospital admission data.
Census 2001 (Theme Table 06)
Can a small area analysis validate the city-wide picture? Is there any correlation between the current distribution of services and the proximity to District centres (eg for shopping) and health care facilities.?
Cardiovascular Disease (eg stroke / heart attack)
Ethnicity
Table I.
Public Health Observatories Ageing: Scientific Aspects (House of Lords) Policy Research Institute on Ageing & Ethnicity
Is there any significance in the age, geographical distribution of people who attend day centres for older people? What is the current and future likely level of co-dependency among older couples. What is the likely impact for service delivery arising from the projections for growth in ethnic minority population, (both in numbers and the age profile of the projected increases in numbers)?
A ‘needs’ analysis for social services (Source: Leeds City Council)
In order to develop a Moses application for social services for the elderly, our general approach is to build a customised ‘instance’ of the technology to this particular domain. In essence, the argument is that the Moses portlets provide the functionality that is required by users, but must be configured to meet their specific needs. The requirements for customisation are primarily two-fold. In relation to the modelling, we need to estimate activities and behaviours which are specific to this service domain. For example, we use specific social and demographic characteristics which are already embedded in the Moses core model to assess sensory and physical disabilities. Secondly, in relation to the portal, we need to package the model outputs in a format which is suitable for this group of users. To a degree, this is just a configuration issue – highlighting or prioritising those variables which are of specific interest to this community. We also believe that this level of customisation can be achieved through the cross-integration of the existing portlet technologies into a reporting function, so that tables, charts, maps and analysis can be integrated, probably across different scenarios and time periods, into a series of summary reports which meet these particular needs. Current work on this demonstrator is therefore concentrating on a) the extension of models into a social services domain; b) development of the portal to include a
reporting capability; and c) specific design work to outline the reporting requirements within this domain. It is our belief that by decomposing a spatial decision support system into individual components – the portlets – that users will then be able to combine these elements flexibly in order to solve different kinds of problems. This might be rephrased as a statement that the Moses portal supports a variety of different workflows. In considering the nature of the components and their integration, we have considered the following activities within the system to be the most important: 1. 2. 3. 4.
Diagnostic activity – users assess variations in a target indicator to identify key variations between local areas or demographic groups, with a view to targeting investment or other resources to provide greater equity in service utilisation Forecasting activity – indicators are assessed across a period of time to identify key trends which may require policy intervention Impact activity – having formed an idea of the potential policy response to a problem, users may wish to simulate the impact and effectiveness (‘what if’?) of such an intervention Optimisation activity – users may wish to evaluate a variety of policy interventions or ‘what if’ scenarios in order to identify the best available response
To date, these requirements have been assessed and catered for in a relatively intuitive way. We do not discount that future discussions with users, backed up by some more formal analysis of workflows, might not yield significant enhancements to the Moses project. We do believe, however, that in our domain such conversations are most productive within the context of an existing demonstration technology. Users cannot be expected to articulate the requirements of complex technologies without being provided some frame of reference for the possibilities, and what might be on offer.
4.
Discussion
We have described an architecture for a grid-enabled spatial decision support system which combines a series of functional portlets into a user-oriented application portal which is accessible through a thin web interface. The Moses portal has been constructed using Gridsphere. We consider that Gridsphere offers the maximum flexibility in the creation of transferable, platform-independent applications. Moses draws on a variety of e-infrastructure components. External data services for the provision of demographics and map data, are combined with locally managed simulation routines for scenario generation. The use of third party mapping services is a particular conundrum, given the availability of two solutions, one of which (google maps) is light and elegant but may not be generic or sustainable; the other (SeeGeo) perhaps more cumbersome and requiring investment but probably more robust. We have found it useful to divide the production and analysis of simulation outputs. The simulation process is data rich, computeintensive, it requires a detailed understanding of the underlying model architectures, and generates substantial results archives. We consider that this process is best kept under the control of the Moses custodians. Distributed computation and virtualised storage are both valuable resources for social simulation. The Storage Resource Broker has been deployed to maximize the Moses data archives. Computational options include a local Beowulf cluster, the White Rose Grid,
National Grid Service and CROWN Grid. We are currently seeking to encapsulate a series of functional but messy security and authentication procedures into a robust framework enabled by shibboleth. Greater security would allow access to a wider variety of distributed resources without arousing ethical or legal concerns. Not the least benefit of this architecture is that the complexities of the engineering are entirely shielded from the users. The individual portlets provide flexible building blocks from which customized applications can be tailored to the needs of specific user organizations. For example, transport engineers, town planners or health policy-makers might all use unique versions of Moses which share a common architecture and components. Current priorities for the further development of Moses are the robust implementation of some of the archiving and security features discussed in the paper, and the development of full use cases in the health, business and transport domains. Enhancement of the demographic simulation modeling methods to incorporate more wide-ranging behaviours and activities is also a priority.
Acknowledgements This research has been funded by ESRC through the National Centre for e-Social Science (Award Reference RES-149-25-0034). Census output is Crown copyright and is reproduced with the permission of the Controller of HMSO and the Queen's Printer for Scotland.
References M. Birkin, M Clarke, H Chen, P Dew, J Keen, P Rees, J Xu (2005) MoSeS: Modelling and Simulation for e-Social Science, Proceedings of the First International Conference on eSocial Science, National Centre for e-Social Science, Manchester. M. Birkin, A. Turner, B Wu (2006) A Synthetic Demographic Model of the UK Population: Methods, Progress and Problems. Proceedings of the Second International Conference on eSocial Science, National Centre for e-Social Science, Manchester. P. Rees, D. Martin, P. Williamson (2002) The Census Data System, Wiley, Chichester. J. Novotny, M. Russell, O. Wehrens, “GridSphere: a portal framework for building collaborations”, in Concurrency and Computation: Practice and Experience, Vol. 16, No. 5, March 2004. A. Rajasekar, M. Wan, R. Moore, W. Schroeder, G. Kremenek, A. Jagatheesan, C. Cowart, B. Zhu, S.-Y. Chen, and R. Olschanowsky, “Storage Resource Broker—Managing Distributed Data in a Grid,” Computer Society of India Journal, Special Issue on SAN 33, No. 4, 42–54 (October 2003). UCL (2007) Google Map Creator, UCL Centre for Advanced Spatial Analysis, http://www.casa.ucl.ac.uk/software/googlemapcreator.asp B. Wu, M. Birkin, P. Rees (2007) A Spatial Microsimulation Model with an ABM Insight, RGS/ IBG Annual Meeting, London, August.