Using Simulated Data to Examine the ... - Wiley Online Library

9 downloads 0 Views 2MB Size Report
Jun 29, 2010 - 2Rural Economic Development Programme, Teagasc, Athenry, Co., .... The key motivating reason for using MLM is to make a formal distinction between ..... admission (binary data) and the number of nights patients spent in ...
Geographical Analysis (2013) 45, 49–76

Using Simulated Data to Examine the Determinants of Acute Hospital Demand at the Small Area Level Karyn Morrissey1, Cathal O’Donoghue2, Graham Clarke3, Jinjing Li4 1

People, Space and Place, School of Environmental Sciences, University of Liverpool, Liverpool, UK, Rural Economic Development Programme, Teagasc, Athenry, Co., Galway, Ireland, 3School of Geography, University of Leeds, Leeds, UK, 4The National Centre for Social and Economic Modeling, University of Canberra, Canberra, Australia 2

The aim of this article is to establish whether spatial variation exists in acute hospital utilization in Ireland and, if it does, to identify the microlevel factors influencing this variation. First, an alignment process is used to calibrate the acute inpatient attendance and nights spent in hospital variables produced by a spatial microsimulation model at both the national and the subnational levels. Comparing the results of the national and subnational alignment allows us to examine whether spatial variation exists. Second, after establishing that hospital utilization displays a significant spatial pattern, we use a nationally representative survey to determine which individual-level factors significantly affect inpatient attendance and the number of nights spent in hospitals. Using the calibrated data from the aforementioned spatial microsimulation model, we examine whether the spatial patterns of those variables found to influence hospital utilization match the spatial pattern of actual hospital utilization rates at the small area, electoral division level. That is, are the individuals/areas with the highest demand for acute hospital services utilizing acute hospital services? Finally, the results of this research are discussed in relation to both the national and international literature.

Introduction To represent completely the health status of a country’s population, a wide variety of demographic, socioeconomic, health, environmental, and locational data are required at the individual, microlevel. Establishing the relationship between individual determinants of health and health status is integral to increasing outcomes in this area. Furthermore, understanding the underlying influences that affect health outcomes allows public policy to target health service provision in a

Correspondence: Karyn Morrissey, People, Space and Place, School of Environmental Sciences, Roxby Building, University of Liverpool, Liverpool L69 7ZT, UK e-mail: [email protected]

Submitted: June 29, 2010. Revised version accepted: March 27, 2012. doi: 10.1111/gean.12000 © 2012 The Ohio State University

49

Geographical Analysis

more effective and efficient manner. Quantitative models help to understand the pathways and determinants of health outcomes by attempting to capture and quantify the effects of individual health determinants and the interdependencies between these factors (Sassi and Hurst 2008). However, quantitative models of health have their limitations. Since the release of the Black Report (Black 1980), the dominant conceptual framework underlying the analysis of ill health in the social sciences is the multifactorial model of disease causation (Shim 2002; Williams 2003). This model posits that most illnesses are the result of multiple causes, determinants, and risks involving a complex set of interactions between individuals, the environment, and other factors (Gordis 2000; Shim 2002). Based on this model, social research seeks to identify characteristics that increase the likelihood an individual has of developing a particular disease. Much of the post-Black Report research focuses on occupation or socioeconomic group as an indicator of ill health. Implicit in this assumption is that the environment, social, political, economic, and cultural forces are exogenous to individual health. However, a number of papers (Navarro and Shi 2001; Graham 2002; Williams 2003) point out that while individual characteristics are key to understanding social structures and health, social class is only a partial indicator of health (Shim 2002). Williams (2003) continues the debate by arguing that recent developments in multilevel modeling (MLM), which attempt to separate the contextual and compositional factors that influence health, create yet another false dichotomy. These developments undermine our ability to understand the interacting influences of social influences that are “etched in space” (Soja 1980; Harvey 1989) on health. Research about the links between place and health demonstrates that place has an independent effect on health over and above individual or area-based characteristics (Macintyre, Ellaway, and Cummins 2002; Wilson et al. 2009). However, while unequivocal evidence often exists that individual outcomes are different in different places, the source of such differences remains far from clear (Dorling et al. 2000). Dorling et al. (2000) investigate the intangible effect of geography and social processes. Presenting an analysis of patterns of poverty and mortality in London across a 100-year time span, they found that geographic patterns remained similar despite the passing of 100 years, and that the people in the surveys were completely different. Mitchell (2001), commenting on these results, points out that this finding is not due to a contextual effect but rather is based on the geography of poverty in London transcending the individual population. Persistent and stable social and economic processes maintained a spatial distribution of poverty and high mortality in London. Assertions that health is socially constructed across a broad set of parameters (Macintyre 1997; Williams 2003; Windle 2010) requires the use of modeling techniques that incorporate all the necessary parameters and social constructs, be it environment, social, political, economic, or cultural. The key motivating reason for using MLM is to make a formal distinction between individual- versus area-based characteristics (Fotheringham and Brunsdon 1999; Duncan and Jones 2000; Mitchell 2001). In contrast, spatial microsimulation as a data-generation technique allows one to model the distribution of individuals across space and thus incorporate intangible (both micro and macro) spatial processes that MLM fails to capture. Using “geography” as the target component and control parameter, spatial microsimulation is a data-generation process that creates a spatially representative population with appropriate spatial conditional characteristics (e.g., age, sex, education level, socioeconomic group). However, as spatial microsimulation seeks to represent the “true” spatial distribution of individuals, the intangible effects of macrolevel social processes are implicitly maintained at the local 50

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

level. Over the last decade, spatial microsimulation techniques have been increasingly used to examine health and health inequalities (Morrissey et al. 2008, 2010; Edwards et al. 2009; Smith, Clarke, and Harland 2009). Finally, a variety of techniques examined in the wider statistical literature also exist that may be used to estimate small area level data. These techniques are based on the prediction of single-target parameters at the small area level. For example, in the case of income, known predicators of income (i.e., labor force participation, education level, gender, and age) are used to predict income levels for each individual in a small area. These methods include synthetic, sample size, dependent, best linear unbiased predicators, and a variety of Bayesian estimators (Ghosh and Rao 1994; Chaudhuri and Ghosh 2011), and borrow “strength”/information from a standard population (such as a national data set) to increase the effective sample size for each small area of interest. In contrast, spatial microsimulation may be used to estimate several different processes simultaneously if the predictor variables for each of the behaviors are the same (Smith, Pearce, and Harland 2011). The modeling of processes rather than target parameters is particularly advantageous in health studies because many different processes may be modeled using a number of key variables. Furthermore, given the computational costs of small area estimation in terms of resources and time, a strong efficiency rationale exists for developing a model that may be used to examine a number of different policy questions. However, spatial microsimulation, like any other data fusion or statistical matching technique, is prone to breaches of the core assumption of conditional independence (D’Orazio, Di Zio, and Scanu 2006). This conditioning assumes that the variables not used in the matching process in both data sets (spatial and micro) are independent of each other, conditional on their relationship with the matched variables; in other words, the spatial variability of the nonmatched individual characteristics is captured via the spatial variability of the matched variables. We use a method—calibration through alignment—that allows us to correct for breaches in this assumption, permitting exogenous data, either qualitative or quantitative, to be introduced within a simulated data set. This method allows researchers to introduce or link external, noncompositional effects, such as high rates of intangible social capital in an area, that are health determining.

Data requirements and a SMILE (simulation model of the Irish economy) As outlined in the introduction, a variety of data is required to estimate health statistics. In terms of data availability, the Living in Ireland (LII) Survey is the Irish component of the European Community Household Panel data set. The LII Survey is a 7-year longitudinal survey that began in 1994 and ended in 2001. The sampling frame used for the LII is the Irish Register of Electors. The SMILE is a static microsimulation model; thus, the longitudinal nature of the LII data set was disregarded when developing it. The SMILE is based on the 2000 LII data set, which contains 13,067 individuals. In addition to information about a variety of individual, demographic, and socioeconomic characteristics, the LII also contains detailed information about individuals’ health status (both physical and mental) and health service utilization rates in the previous year (general practitioner [GP], optician, and dentist, as well as the number of nights spent in a hospital). However, like most microlevel data sets, the LII only contains spatial identifiers for each individual at an aggregate level. The LII contains two location variables, a NUTS-3 regional variable (nomenclature of territorial units for statistics; containing t regions) and a 12-category 51

Geographical Analysis

locational variable, categorized into the five cities in Ireland, Dublin County, an open-countryside group, and five categories for towns of varying sizes. As such, any health analysis using the LII Survey is constrained to the national level. In contrast, the Irish Small Area Population Statistics (SAPS) data set contains a rich set of census information at the small area level—in this case for electoral divisions (EDs, which in this article are referred to simply as districts). The SAPS data set is created by the Central Statistics Office from the Irish Census of Population. Censuses represent an official count of the population at one point in time. Given the SAPS data set is created by geographically disaggregating the census data by ED, it is the most reliable and robust estimation of the population at the small area level. The SAPS data set may be spatially disaggregated at a number of levels. Of interest here are the regional (NUTS-3), county, and district levels. Table 1 provides an overview of the regional- and county-level distribution of Ireland’s population in 2002.

Table 1 Distribution of the Irish Population at Regional and County Levels, 2002 County population

Persons

Percentage distribution

Regional population

Persons

Percentage distribution

Leitrim Longford Carlow Monaghan Roscommon Cavan Sligo Laoighis Tipperary North Offaly Westmeath Tipperary South Kilkenny Waterford City and County Louth Clare Wicklow Wexford Mayo Kerry Meath Donegal Kildare Limerick City and County Galway City and County Cork City and County Dublin City and County

25,799 31,068 46,014 52,593 53,774 56,546 58,200 58,774 61,010 63,663 71,858 79,121 80,339 101,546 101,821 103,277 114,676 116,596 117,446 132,527 134,005 137,575 163,944 175,304 209,077 447,829 1,122,821

0.7 0.8 1.2 1.3 1.4 1.4 1.5 1.5 1.6 1.6 1.8 2.0 2.1 2.6 2.6 2.6 2.9 3.0 3.0 3.4 3.4 3.5 4.2 4.5 5.3 11.4 28.7

Border Dublin Mid-East Midland Midwest Southeast Southwest West

432,534 1,122,821 412,625 225,363 339,591 423,616 580,356 380,297

11.0 28.7 10.5 5.8 8.7 10.8 14.8 9.7

Source: CSO. 52

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

The smallest region, the Midlands, has 6% of the Irish population; in contrast, the Dublin region has 29% (Table 1). At the county level, the smallest county, Leitrim, has 0.7% of the population, while Dublin city and county has 28%. Districts are the smallest geographical output areas for all statistics produced in Ireland. Within the broader small area estimation literature, an area is regarded as small if the sample data from the area are not large enough to produce a direct estimate of the target parameter with adequate precision (Rao 2003). However, for our purposes, the term small area refers to the lowest geographical scale for which data are available. As such, the usage of this term has no statistical connotation here, as it does throughout the broader small area estimation literature. Ireland has 3,440 districts. The population in any one district ranges from 55 to 14,238 individuals, with an average across all districts of 885. However, as with most censuses, the data available for individuals’ health status are limited. Therefore, although the SAPS contains a multitude of socioeconomic and demographic variables at the district level, the lack of any significant health status variables means that any in-depth analysis of health care status or utilization is limited. When matching data sets across common variables, a researcher must ensure that the variables are measured in the same manner. The LII contains two weights, a household weight and an individual weight, to ensure a nationally representative depiction of the population. Also to ensure the representativeness of the sample, the LII was reweighted using a number of key variables, such as age, sex, education level, and socioeconomic category, from the Census of Population (Nolan et al. 2002). Nolan et al. (2002) report that for each administration of the LII Survey (1994 onward), the results have been satisfactory. Three of the five variables that the SMILE uses to match the LII to the SAPS (age, sex, and education level are tested for consistency against the census), demonstrate the reliability of matching across these variables. The Hospital In-Patient Enquiry (HIPE) from acute hospitals in Ireland is a health information system designed to collect administrative data about discharges and deaths (Layte and Wiley 2006). The HIPE was established in 1971 and is the principal source of national data from acute general hospitals about discharges. Discharge data from 60 hospitals across Ireland (two of which are private) include information for each patient’s demographic, medical diagnosis, and the medical procedure undergone. The data are made available at the county level across a number of categories, such as age, gender, and private insurance holders. Thus, although the HIPE data set may be used to study the demographic and social structure of inpatient discharges at the county level, this level of aggregation is too coarse to draw any meaningful microlevel conclusions. Using a combinational optimization matching process, such as simulated annealing (Ballas et al. 2007), to match the data in the LII with the district-level census data and the HIPE data set, we have a much richer data set that allows us to investigate health status and health service utilization patterns at the small area level. We use spatial microsimulation techniques to accomplish this. Spatial microsimulation and calibration Most government policies have a geographical impact, irrespective of whether they are geographically targeted. Therefore, to inform current and future policy making, data are necessary that allow the socioeconomic and spatial impact of policy decisions to be examined (Ballas and Clarke 2001; Ballas et al. 2007). Spatial microsimulation is a means of synthetically creating large-scale micro-data sets at different geographical scales. The development and application 53

Geographical Analysis

of spatial microsimulation models offers considerable scope and potential to analyze the individual composition of an area so that specific policies may be directed to areas with the highest need. The SMILE is a static spatial microsimulation model (Morrissey et al. 2008). It uses a combinatorial optimization technique, simulated annealing, written in the Java computer language, to match the LII (2000) to the SAPS (2002) observations. The data are matched across five common variables in the LII and SAPS, a cross-tabulation based on age-sex variable, education level, number of individuals in a district, number of individuals in a household, and whether or not an individual has a farm. The sample weights in the LII Survey are not used in the microsimulation process. As Smith, Clarke, and Harland (2009) point out, because the matching algorithm only chooses individuals who are representative of an area (based on the distribution of the common variables at the district level), there is no need to use the LII weights. The LII and SAPS data sets are the initial inputs into the SMILE. The SAPS data set provides the target spatial parameter, the district (Morrissey et al. 2008). The two spatial categories within the LII data set are not used in the matching process. The spatial structure that is observed once the simulation is completed is based entirely on the association that the variables from the LII data set have with the data from the SAPS. Table 2 presents a summary of the main demographic, socioeconomic, and health variables and their source data set. Through the simulated annealing process, a microlevel synthetic data set for the entire population of Ireland is created. For a full discussion about the algorithm used to create the statistical match, see Morrissey et al. (2008). The data set created by the SMILE is the final output from the model. Fig. 1 represents the inputs, modeling process, and outputs of the creation of the spatially disaggregated data. The data set created by the SMILE contains demographic, socioeconomic, labor force, and income variables at the microlevel for both individuals and family units. However, for the purpose of this article, only the individual-level data are used. Of particular interest here is that the SMILE contains a health component (Morrissey et al. 2008). The health data created by the SMILE have been used previously to examine differences in GP utilization rates between urban and rural areas in County Galway (Morrissey et al. 2008) and access to acute hospitals in Ireland for individuals suffering with depression (Morrissey et al. 2010). Common to all data-generation techniques that rely on a random process, validation of the newly created data is an important component of model development (Morrissey and O’Donoghue 2011). In the case of spatial microsimulation modeling, one samples from a micro-data set to make it representative at a spatial scale lower than what was originally collected within the survey. Two important aspects of simulating representative spatial data are ensuring the precision of the matching variables between the original and simulated variables, and maintaining the original, multivariate distributions for the variables. To ensure the precision of matching the original and simulated variables, both sets of values, the predicted and observed, or simulated and actual, must be compared (Voas and Williamson 2001). The SMILE contains a number of internal validation methods, based on the concept of goodness-of-fit statistics (mean square error [MSE] would be an alternative), namely Z-scores (Hynes et al. 2009). The Z-score is based on the difference between the relative size of the category in the synthetic and actual populations, although an adjustment is made when dealing with zero counts. A Z-score can be summed and squared to provide a measure of tabular fit similar to a chi-squared statistic. If a cell’s Z-score exceeds the critical value, the cell is deemed not to fit. The Z-score calculation is given by 54

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

Table 2 The Main Demographic, Socioeconomic, and Health Variables in the SMILE

Demographic profile Sex Martial status Household size Number of children Household income Socioeconomic profile Household income Education level Employment status Occupation Health status profile Health status (five category variable) Long-term illness Physically/emotionally hampered Depression Utilization variables GP utilization Hospital inpatient utilization Nights spent as an inpatient Medical specialist consultant Coverage status Private health insurance coverage Medical card coverage Risk factors Smoker or not

LII

Census/SAPS

Status

Yes

Matched variable

Yes Yes Yes Yes

Cross tabulated with age Yes Yes Yes No

Nonmatched variable Matched variable Nonmatched variable Nonmatched variable

Yes Yes Yes Yes

No Yes Yes Yes

Nonmatched variable Matched variable Nonmatched variable Nonmatched variable

Yes Yes Yes Yes

No No No No

Nonmatched variable Nonmatched variable Nonmatched variable Nonmatched variable

Yes Yes Yes Yes

No No No No

Nonmatched variable Nonmatched variable Nonmatched variable Nonmatched variable

Yes Yes

No No

Nonmatched variable Nonmatched variable

Yes

No

Nonmatched variable

Source: SMILE.

LII data set

SAPS data set

SMILE Underlying algorithm—simulate annealing Matches the LII to the SAPS data set

Spatially referenced data set

Inputs

Modeling process

Output

Figure 1. A spatial microsimulation process. 55

Geographical Analysis

Table 3 MSE-Based Comparison between Simulated Results and the Original SAPS Aggregates Education 1. Actual SAPS table 101003 101004 101005 101006 2. Simulated table results 101003 101004 101005 101006 3. Z-score 101003 101004 101005 101006

Education 1

Education 2

Education 3

Education 4

675 1,503 1,157 1,643

283 584 319 1,146

177 441 332 476

255 561 567 410

675 1,503 1,157 1,643

283 584 319 1,146

179 441 332 476

255 561 567 410

0 0 0 0

0 0 0 0

0.16 0 0 0

0 0 0 0

Source: SMILE.

zij =

Tij − Oij Σ ij Oij Oij ⎞ ⎛ Oij ⎞ ⎛ ⎜⎝ Σ O ⎟⎠ ⎜⎝ 1 − Σ O ⎟⎠ ij ij ij ij Σ ij Oij

(1)

where Tij is the estimated data, column i, row j, and Oij is the census data, column i, row j. This formula is used to calculate the Z-score. The sample of Z-squared results presented in Table 3 (an example for four specific districts) illustrates which tables and which districts have the fits. Information about the relative error and the Z-scores are outputted automatically in the static simulation. The first line in section 3 of Table 3 shows the associated 95% critical value for the Z-score. Taking district 101004 as an example, the Z-score of zero indicates that the estimated tables fit the actual tables. Also for this district, the Z-score is zero across all cells, indicating that the estimated cells fit the actual cells from the census perfectly. In contrast, for district 101003, the Z-score for cell 3 is 0.16. This value is greater than zero but still does not exceed the critical value; that is, these cells still fit the actual cells at the 95% confidence level, thus indicating that the estimated table still fits the actual table very well. The Z-score statistic measures the difference between the actual and simulated numbers of cases and, as such, captures one dimension of model uncertainty, the difference of a single sample relative to the actual values. However, each sample results in potentially a different set of households and thus generates a different Z-score. To measure the sampling variability, one would need to bootstrap the data and run the match many times. Given the computational length of each individual match, this task is beyond the capacity of the current model. With regard to ensuring the maintenance of the underlying multivariate distribution for the variables, studies using combinational optimization techniques (Williamson, Birkin, and Rees 56

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

Table 4 Validation of the SMILE’s Health Variables at the County Level County administrative boundary

Simulated hospital attendance (count) (SMILE)

Real hospital attendance (count) (HIPE)

Ratio

Cavan Donegal Leitrim Louth Monaghan Sligo Laois Longford Offaly Westmeath Galway Mayo Roscommon Kildare Meath Wicklow Clare Limerick Tipperary North Carlow Kilkenny Tipperary South Waterford Wexford Cork Kerry Dublin

4,970 12,026 3,067 8,989 4,546 5,187 5,130 2,617 5,413 6,160 17,794 10,573 4,973 13,746 11,230 9,919 9,062 11,367 5,418 3,964 6,926 7,163 8,966 10,460 39,404 11,945 98,042

8,398 19,420 3,913 14,882 7,444 9,280 7,769 4,842 9,055 11,489 33,998 21,797 9,521 16,626 14,030 11,560 21,276 15,025 10,328 6,997 10,391 10,758 15,635 17,159 50,257 15,203 112,696

0.59 0.62 0.78 0.60 0.61 0.56 0.66 0.54 0.60 0.54 0.52 0.49 0.52 0.83 0.80 0.86 0.43 0.76 0.52 0.57 0.67 0.67 0.57 0.61 0.78 0.79 0.87

Data sources: HIPE and SMILE. 1998; Voas and Williamson 2001) highlight the difficultly in reliably simulating microdata based on nontabulated data (Smith, Clarke, and Harland 2009). Validation of the newly created variables alone and validation of the multivariate distribution for the newly created variables are particularly important with regard to variables that have not been used as part of the weighting, calibration, or matching process. However, limitations associated with validating spatial microsimulation models center on the difficulty of finding spatial data at varying levels of aggregation with which to compare the newly created data. This article uses the out-of-sample validation technique defined by Caldwell (1996). Out-of-sample validation involves comparing the synthetically created microdata with new, external data. Based on the out-of-sample validation method, Table 4 provides the results for the simulated hospital attendance validated against the HIPE hospital attendance variable at the county level. Differences exist between the matched results and the exogenous real results. Overall, hospital 57

Geographical Analysis

attendance is lower in the former, with simulated attendance only 69% of the exogenous HIPE-based total; this difference is spatially heterogeneous. Hospital attendance may be underreported due to either the confluence of multiple visits into a single report or the LII Survey instrument’s primary objective being to collect income data with very detailed questions rather than health data using more simplified recall questions, thus causing the responses to suffer from recall bias. Numerous studies indicate that age, sex, and education levels are consistently good indicators of ill health and hospital attendance (Nolan and Nolan 2005; Layte and Wiley 2006). Therefore, it is unsurprising that the validation presented in Table 3 indicates that the fit between the initial matching process and exogenous data sets averages 69% across counties. However, ill health and hospital attendance also are significantly influenced (to a greater and lesser extent) by a variety of individual attributes, such as socioeconomic status, income level, smoking, diet, and proximity to healthcare services. Given that these variables are not included in the initial matching process, the health data outputted by the SMILE may not meet exogenous-specified targets provided by the HIPE data set thus breaching the conditional independence assumption. Essentially, the spatially heterogeneous differential is telling us that hospital attendance is related to spatial factors not captured solely by the demographic characteristics used to create the spatial match. Furthermore, as outlined in the introduction, previous research reports significant spatial heterogeneity in health outcomes and the demand for service provision. This heterogeneity may arise from the demographic and socioeconomic profile of an area, contextual/area-based effects, or the impact of service provision or lack thereof in an area. To ensure that the data outputted from the SMILE accounts for this spatial heterogeneity and to maintain the original, multivariate distributions for the variables, an adjustment method to correct for validation failures is utilized from the dynamic microsimulation literature. This process is known as calibration through alignment (Morrissey and O’Donoghue 2011). The objective of calibrating a spatial microsimulation model is to ensure that the simulated output matches exogenous totals (non-SAPS data) at varying levels of spatial disaggregation (Baekgaard 2002). Similar to the Cornell Microsimulation Model (Caldwell and Keister 1996) and the Dynamic Microsimulation Model for Canada (Morrison 2006) models, the SMILE incorporates an array of alignment processes. The alignment technique ensures that the variables produced by the SMILE are representative for the Irish population across a number of spatial levels of aggregation. Ensuring the spatial representativeness and validity of the data produced by the SMILE is essential if the synthetically created data are to be used within a policy domain. Data from a number of different data sets are used as exogenous targets within the SMILE. However, for the purpose of this article, the HIPE data set, which provides calibration totals at the county level, and a weighted version of the LII (the sample weights used within the LII are outlined in the section “Results of the alignment process”), which provides regional calibration totals, are used. Data from the SAPS data set are not used as a calibration total here. A number of different alignment processes may be used, and the choice of process depends on the type of data output from the microsimulation model and the data type of the exogenous target data. The data outputted from the microsimulation model may be of three broad types: binary, continuous, and count. The actual process used to align these three data types is very different. This article is solely concerned with the alignment of two variables, hospital inpatient admission (binary data) and the number of nights patients spent in hospital (count data). Most of the alignment used in the literature is based on calibrating a binary or continuous process 58

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

(O’Donoghue 2010). To our knowledge, there has been little or no usage of this method for count data models. As we are not calibrating continuous variables, the alignment procedure used to calibrate them is not discussed here. Binary events, such as the presence of an illness and health service attendance (yes, attended; no, did not attend), may be modeled using either a logistic regression or a probit model to estimate the probability of the event occurring. When the dependent variable, such as the number of nights spent in hospital, is in the form of count data, a count data regression model must be used (Greene 1993). Depending on the characteristics of the data, a number of statistical models can be used when analyzing count data. For example, if the variance of the data equals the mean, a Poisson regression model may be used. However, given the characteristics of health data (individuals first must be admitted as inpatients, with a gradual decline in the number of nights spent in hospital as the distribution continues), the assumptions that underlie the Poisson specification are rarely met (Greene 1993). Thus, because of the negatively skewed nature of the distribution of the variable and the nontruncated nature of the data, a negative binominal regression model becomes the obvious choice. The negative binomial model accounts for extra Poisson variations (i.e., overdispersion) by changing a constraint mean to a gamma distributed mean. The distinction between data types is only one of the properties that define an alignment technique (Baekgaard 2002). An important difference between alignment processes is the method by which the simulated output is matched to the exogenous targets. These methods include aggregated total alignment, percentage/rate alignment, and average value alignment. Aggregated total alignment forces an exact match with the exogenous target data. For example, exogenous census data may reveal that 54 males in one district are physically disabled. The alignment process ensures that this total is met for that district. For percentage/rate alignment, the simulated data are matched to exogenous rates. For example, 80% of the 19–25-year-old population in one district may have excellent health. The simulated data are aligned to match this rate. Average value alignment is generally used to align monetary values. For example, for a monetary value such as private health insurance contributions, the simulated output must produce the same aggregate contributions at the county or national level. The next section describes the alignment process used to calibrate both the binary inpatient admission variable and the count data nights spent in hospital variable contained in the SMILE to the HIPE data set for 2002. Each alignment process takes place at the county level, ensuring that the spatial heterogeneity required for modeling health outcomes and demand for health service provision is introduced into the data set.

The alignment process The initial step in the calibration process involves the estimation of a logistic model for the binary inpatient admission variable. Using the unweighted LII data set and the independent variables listed in Table 5, a logistic model of acute hospital admission was estimated. The logistic model may be defined as follows:

y*i = logit ( Pi ) = ln

Pi

(1 − Pi )

= βo + ∑ Xik β k + ε i

(2)

k

such that

y = 1 if y*i > 0

(3) 59

Geographical Analysis

Table 5 Logistic Regression Model for Inpatient Attendance (Weighted LII) Inpatient attendance

Odds ratio

Standard error

Long-term illness (0 = no, 1 = yes) Age (continuous) Age-squared (continuous) Gender (0 = female, 1 = male) Primary school education (0 = no, 1 = yes) Lower secondary education (0 = no, 1 = yes) Upper secondary education Higher diploma education (0 = no, 1 = yes) Special education Income (continuous, €) Medical card possession (0 = no, 1 = yes) Private insurance (0 = no, 1 = yes) GP visit (0 = no, 1 = yes) Consultant visit (0 = no, 1 = yes) Married (ref. widowed) Single (ref. widowed) Divorced (ref. widowed) Never married (ref. widowed) Border (ref. Dublin region) Mideast (ref. Dublin region) Midlands (ref. Dublin region) Midwest (ref. Dublin region) Southeast (ref. Dublin region) Southwest (ref. Dublin region) West (ref. Dublin region) Good health status (ref. v. good health) Fair health status (ref. v. good health) Bad health status (ref. v. good health) Very bad health status (ref. v. good health) constant No. of individuals Pseudo-R2 Probability >c2

0.563† 0.020* 0.000* -0.370† 0.542* 0.512† 0.305 0.460† 0.010 0.000 0.339* 0.327* 1.310† 2.149† 0.472† 0.722 -0.856 0.038 0.614† 0.989† 0.187 0.017 0.464* 0.521† 0.580† 0.208 0.406* 0.976† 1.089† -5.036† 11662 0.2643 0.00

0.129 0.009 0.000 0.102 0.226 0.196 0.164 0.163 0.315 0.000 0.156 0.141 0.238 0.117 0.114 0.440 0.728 0.152 0.184 0.185 0.209 0.212 0.180 0.166 0.196 0.127 0.178 0.281 0.631 0.313

*Significant at the 0.05 level; †Significant at the 0.01 level. where yi is a variable indicating that person i was admitted to a hospital, and y*i is an unobserved latent continuous variable that notionally determines the propensity for admission to hospital. The stochastic term, ei, is generated in a manner similar to Monte Carlo simulation; namely, that a transition happens if a uniform random number is drawn that is less than the predicted probability for the equation Pi. Thus, in the situation where we observe an actual transition, y = 1, we require a residual related to a uniform random ui variable that is less than or equal to Pi and greater than Pi when y = 0. Thus, 60

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

∫ y = 1 if u

i

⎛ ⎞ < logit −1 ⎜ Bo + ∑ Xik β k ⎟ = Pi ⎝ ⎠ k

(4)

The stochastic ei may be derived as

⎛ u ⎞ ε i = ln ⎜ i ⎟ ⎝ 1 − ui ⎠

(5)

where mi is a uniform random number between (0–1). Difficulties occur when using logistic models for predictive purposes (Elbers, Lanjouw, and Lanjouw 2001; Demombynes et al. 2002). Used alone, a logistic regression model may under- or overpredict the number of events (Duncan and Weeks 1998), particularly if the average probability is far from 0.5. While the derivation of the stochastic term within the model mostly corrects for this problem, it does not correct for missing spatial heterogeneity. Simply running a Monte Carlo simulation using these derived stochastic terms results in an appropriate national distribution but tends to smooth over spatial differences. Therefore, to improve the spatial heterogeneity of the model, an alignment procedure may be used to ensure that the SMILE’s inpatient admissions matched the true spatial distribution of inpatient admissions as specified by the HIPE data set. The calibration routine operates by ranking y*i described in equation (2) from highest to lowest, until the required number of cases, n, are filled, such that

y = 1 for the highest y*i

(6)

until the required number of n cases is met. This method is undertaken for each of the simulated processes, so that the aggregate number of cases of each variable y is consistent with the county control totals from the HIPE data set, while maintaining the association with the observed explanatory variables. To reduce errors in status assignment, the alignment process is constrained by seven age categories (0–24, 25–34, 35–44, 45–54, 55–64, 65–74, 75–84, and 85+ years of age), which are further subdivided by gender. As such, these groups may be expressed as a A34 ¥ A2 ¥ A7 table, where A34 represents the number of administrative counties in Ireland, A2 represents sex, and A7 represents the seven age categories. y*i is simulated using equation (2) and then ranked from lowest to highest, and the exogenously specified number of inpatients for each age/sex and county category are selected on the basis of the rank of y*i . The stochastic term must be included in the ranking variable in equation (2). Mainly high-risk individuals would be selected without it and no low-risk individuals would be selected. Risk relates to the average incidence and is not exclusionary. The stochastic term ensures that some low-risk and some high-risk people are not selected, as we observe in reality. Therefore, the calibration process introduces unobserved spatial heterogeneity when one does not have spatially representative data. Thus, calibration may be used to introduce spatial fixed effects at the county level into simulated data. Count data alignment Once the correct numbers of individuals by age/sex and county have been aligned to match the HIPE totals, the count data variable nights spent in hospital can be aligned. The process for aligning a count data variable, such as nights spent in hospital, is similar to the process for aligning a binary choice variable. 61

Geographical Analysis

The first stage of count data alignment involves the estimation of the number of nights spent as an inpatient (conditional on admission to an acute hospital). The number of nights spent in hospital is reported as a discrete, nonnegative integer value. Such data may be modeled using either a Poisson or a negative binomial probability distribution for the dependent variable. The Poisson model has been criticized because of its implicit assumption that the conditional mean of Ti (in this case, the expected number of nights spent in hospital) is equal to the variance (Greene 1993). This mean-variance equality has proven problematic in applied work because real data frequently exhibits overdispersion, whereby the conditional variance often is greater than the conditional mean (Greene 1993). The Poisson distribution can be generalized from a constant to a varying mean to take into account this problem of overdispersion. The generalization most often used in the literature is the negative binomial probability distribution (Curtis 2002), where an individual, unobserved effect is introduced into the conditional mean. Thus, using the unweighted LII data set and a negative binomial model, the predicted number of nights spent in hospital for inpatients is estimated as

Pr(Ti* ) = f (Ti* ) =

Γ(Ti* + 1 α ) Γ(Ti* + 1)Γ(1 α )

*

* +1

(αλ i )Ti (1 + αλ i )− (Ti

α)

(7)

for individual i. In the context of this article, Ti* is the number of nights spent in hospital for the ith individual in the LII who had been simulated as having been admitted to an acute hospital, and l is an exponential function known as the intensity of the process, which may be represented as

λ i = exp(βo + Σ k Xik β k )

(8)

The negative binomial regression model can model the factors that influence the number of occurrences of a dependent variable, in this case the number of hospital nights. It is a generalization of the Poisson model that allows for the heteroscedasticity often found in count data models. The likelihood function for the negative binomial is

1 Ti Γ ⎛⎜ Ti + ⎞⎟ ⎝ α Ti λ Ti i −1 α⎠ L = ∏i • Ti +1 α 1 Γ (Ti + 1) Γ ⎛⎜ ⎞⎟ (1 + αλ i ) ⎝ α⎠

(9)

where a and li are the parameters of the negative binomial distribution. The expected value of Ti conditional on the vector of values of the explanatory variables Xi can be defined as

∫ E (T

i

Xi ) = XiT β = βo ∑ Xik β k = exp ( XiT β ) = λ i

(10)

k

Without calibration, estimating the parameters b using a standard statistical package such as STATA, the number of nights in hospital can be simulated without difficulty. However, because the data do not incorporate spatial heterogeneity, external spatially representative data is required to calibrate the simulated data. The calibration data contain the number of people who had spent various nights in hospital across different spatial units (in this instance, at the regional level). To calibrate the data, as in the preceding binary logistic-based alignment, alignment must be generated by sorting and selecting 62

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

the lowest ranks to have few nights, with higher ranks having more nights. However, as in the preceding case of the binary alignment, ranking on BXi or exp(BXi) means that all high-risk people are ranked highest and hence have the greatest number of nights in hospital (conditional on being hospitalized). Therefore, an error term, ei, must be generated, where ranking on exp ( XiT β + ε i ) allows one to capture this individual heterogeneity. To recover the error term ei, the following model by Greene (2008) that incorporates latent heterogeneity may be adapted:

∫T

i

= E (Ti Xi, ε i ) = exp ( XiT β + ε i ) = λ i exp ( ε i ) = λ i hi

(11)

where

hi = exp ( ε i )

(12)

⎛1 ⎞ hi ~ gamma ⎜ , α ⎟ ⎝α ⎠

(13)

ε i = ln (Ti ) − BXi

(14)

and

Therefore,

where we observe the number of nights in hospital in the data. Calibrating nights spent in hospital in the model may result in the simulation of individuals who did not spend a night in hospital in the original data. In this case, they do not have an observed value of ei. Sampling for these individuals is from the gamma distribution. Having generated ei for the entire sample, it is possible to rank BXi + ei, as the exponential function is monotically increasing. This rank is defined as

Ri = rank ( BXi + ε i ) if hospitalization = 1,

(15)

where Nj are the number of people in the county in hospital for nights, j = 1, . . . N. One may simulate the number of nights as

Ti = j,

(16) j

where j is the lowest number of nights spent in hospital such that Ri ≤ ∑ N k . k =1

Using the independent variables presented in Table 6, the resulting model was used to predict the distribution of hospital nights within the SMILE database for those who are simulated as being admitted to an acute hospital in the first instance. This ensures that each individual who is simulated as being admitted to an acute hospital has an estimated probability distribution for the number of nights she or he spent in hospital. With regard to the target calibration parameter nights spent in hospital obtained from the HIPE data set, each night (up to 23 nights) is classified as a discrete category in the variable nights (individuals staying over 23 nights in hospital are grouped into one variable). The resulting target calibration table contains the number of individuals in each night’s category by region, age, and 63

Geographical Analysis

Table 6 Negative Binominal Model for Nights Spent in Hospital (Weighted LII) Number of nights spent in hospital

Odds ratio

Standard error

Long-term illness (0 = no, 1 = yes) Age (continuous) Gender (0 = female, 1 = male) Primary school education (0 = no, 1 = yes) Lower secondary education (0 = no, 1 = yes) Upper secondary education Higher diploma education (0 = no, 1 = yes) Special education Income (continuous, €) Medical card possession (0 = no, 1 = yes) Private insurance (0 = no, 1 = yes) GP visit (0 = no, 1 = yes) Consultant visit (0 = no, 1 = yes) Married (ref. widowed) Single (ref. widowed) Divorced (ref. widowed) Never married (ref. widowed) Good health status (ref. v. good health) Fair health status (ref. v. good health) Bad health status (ref. v. good health) Very bad health status (ref. v. good health) Border (ref. Dublin region) Mideast (ref. Dublin region) Midlands (ref. Dublin region) Midwest (ref. Dublin region) Southeast (ref. Dublin region) Southwest (ref. Dublin region) West (ref. Dublin region) Constant No. of individuals Wald c2 (28) Probability >c2 a = Overdispersion parameter

0.216 0.030 -0.519† -1.786† 1.371† 1.288† 0.644† -0.112 0.000 0.197 0.327 1.914† 2.047† 1.134† 1.972† -0.893 0.274 0.244 1.437† 2.724† 2.354† 0.100 0.296 -0.186 -0.775† 0.008 0.203 0.111 -4.612 11662 930.40 0.00 15.16

0.229 0.004 0.168 0.332 0.247 0.221 0.184 0.337 0.000 0.198 0.203 0.267 0.162 0.184 0.745 0.682 0.226 0.170 0.310 0.473 0.500 0.284 0.236 0.263 0.241 0.375 0.225 0.244 0.391

*Significant at the 0.05 level; †Significant at the 0.01 level. sex. For example, 230 males in the western region of Ireland aged 0–24 spent 15 nights in an acute hospital. As with the binary variable, by including the exogenously specified county variable, the calibration process introduces unobserved spatial heterogeneity when spatially representative data are not available. After ranking, a number of different nights spent in hospital categories for many individuals were given an equally high ranking probability. Baekgaard (2002) refers to this as the problem of flow identifiability. For example, a large number of individuals had an equally high probability of 64

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

spending two, three, or four nights in the hospital. This means that an individual who had an equally high probability of spending two, three, or four nights in the hospital was automatically placed in category one (as category one was the first to be aligned), leaving fewer people in the population stock to be exogenously aligned for k - 1 categories. As a means of overcoming this issue, the ranking procedure was carried out for each of a night’s k category. Categories were filled sequentially, from lowest—1 night—to highest—23+ nights. When the ranking procedure was applied to the next variable, the individuals already assigned to a category were taken out of the pool, and the ranked residuals who were left were assigned independently of those residuals already used. This process was carried out for the 23 nights spent in hospital variable, until everyone who had spent a night in hospital was assigned a number of nights spent in hospital. In an exclusion procedure like the preceding one, as the pool of residuals becomes increasingly smaller, individuals are increasingly likely to be assigned a category that is less likely to be representative of the actual number of nights spent in hospital. However, given the complexities involved in the alignment procedure and the alternative, an uncalibrated night spent in hospital variable, when weighing the pros and cons of the alignment outcome, this approach was deemed reasonable. The next section presents the results of the initial national-level alignment at the district level (the calibration total is the national level of inpatient admission and the national number of nights spent in hospital), compared with the spatial alignment at the district level (the calibration totals are the levels of inpatient admission and the number of nights spent in hospital by county). Because data for hospital visits and nights spent in hospital are not available at the district level, the calibration process, through the logistic and negative binomial estimates, uses explanatory variables to select individuals in each district with the highest probability of being admitted to an acute hospital. Given that the initial logistic regression and negative binomial estimates for both alignments are the same, any spatial variation between the two alignments may be attributed to differences in the utilization of acute hospital services across different locations; that is, any differences in utilization patterns can be attributed to spatial influences.

Results of the alignment process Both hospital utilization variables are aligned to match exogenously specified variables at the county level. However, to demonstrate the necessity of aligning variables at the county level, rather than just taking national-level aggregates alone, a national-level alignment was calibrated. It did not include a spatial breakdown of hospital admissions, and it only aligned both hospital admissions variables according to the age/sex distribution provided by the HIPE data set for 2002. Thus, it corrects for the underreporting of bed nights observed in Table 4. Figs. 2 and 3 provide the results of the alignment for the number of individuals admitted as inpatients to acute hospitals in 2002. Given the range of average inpatient attendance at the district level (0%–30%), four categories were deemed sufficient to highlight the spatial distribution of inpatient attendance in Figs. 2 and 3. The average percentage of inpatients for each district was 12%. As such, the ranges displayed in Figs. 2 and 3 were chosen (0%–6%, 7%–13%, and 14%–30%) so that the average number of inpatients fell in the middle range. Fig. 2 portrays the alignment at the national level; the aggregate number of admissions to acute hospitals is, for the most part, proportionately assigned across space. However the mideast region and the Dublin region were assigned a lower level of inpatient admissions. This pair of 65

Geographical Analysis

Figure 2. National alignment for inpatient admittance (Data source: SMILE). allocations indicates that, given the average demographic and socioeconomic characteristics of individuals in these two regions, these individuals have a lower than average probability of being admitted to an acute hospital. However, Fig. 3 clearly shows that once location is accounted for by aligning inpatient attendance at the county level, considerable variation occurs in inpatient attendance across Ireland. Fig. 3 indicates that the west and midwest have considerably higher inpatient attendance than the east and south regions of the country. Thus, national-level alignment alone does not allow for spatial variation or the effect that location may have on hospital utilization. Table 4 indicates that large variations exist at the county level between the SMILE’s simulated data and the actual HIPE data. As outlined in the section “Data requirements and a SMILE,” due to a host of factors, namely the limited correlation between the SMILE’s constraining variables and hospital utilization, the initial output by the SMILE was deemed to be unrealistic. Calibration at the county level allowed the introduction of spatial fixed effects into the SMILE data set at that level. However, Figs. 2 and 3 suggest that intracounty spatial variations in hospital admissions are difficult to establish. Fig. 4 presents intracounty variations among the original, 66

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

Legend ED-Ireland Mean percentage of inpatients for each ED 0–6% 7–13% 14–20% 21–30%

N

0

25

50

100 Kilometers

Figure 3. County alignment for inpatient admittance (Data source: SMILE).

N

(a)

Legend Galway Intracounty simulation for hospital attendance 0–14% 15–17% 18–19% 20–30%

0

10

20

N

(b)

Legend Galway Intracounty simulation for hospital attendance 0–14% 15–17% 40Kilometers 18–19% 20–30%

0

10

20

40Kilometers

Figure 4. Intracounty variation among the original, simulated SMILE data, and the calibrated SMILE data for hospital attendance in County Galway (Data source: SMILE). 67

Geographical Analysis

Figure 5. National alignment for nights spent in hospital (Data source: SMILE). simulated SMILE, and calibrated SMILE data for hospital attendance in County Galway. It repeats that the original, uncalibrated SMILE data show very little spatial variation among districts at the intracounty level, and the percentage of individuals having been admitted to a hospital is underreported. In contrast, the calibrated data show a large variation across districts (this pattern is smoothed in Fig. 3 at the national level). Thus, although the calibration process does not produce district-specific spatial effects, Figs. 3 and 4 demonstrate that the SMILE calibration process accounts for both inter- and intracounty variation, and that spatial heterogeneity is reduced (although not fully eliminated) using this method. The large variation across counties with regard to inpatient admission is one of interest, and the next section summarizes further analysis about the determinants of this variation. Figs. 5 and 6 present the results of the national and spatial alignment for average nights spent in hospital at the district level in 2002. Given the small range of values for this variable (0–23 nights), only three categories are used in Figs. 5 and 6. The average number of nights in hospital at the district level was six. As such, the ranges displayed in Figs. 4 and 5 were chosen (one–five, six, and seven plus) so that the average number of nights spent in hospital straddled the mean. In 68

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

Figure 6. County alignment for nights spent in hospital (Data source: SMILE). mapping the results of this alignment, any individual who had not spent a night in an acute hospital was removed. As Fig. 5 indicates, with national-level alignment, residents in the northwest and west spent a greater number of nights in hospital than residents of other regions in Ireland. This means that, given the demographic and socioeconomic profile of these districts, the negative binomial model assigned the residents of these districts a higher probability of spending a greater amount of time in hospital. However, Fig. 6 clearly shows that once location is accounted for (by aligning at the county level), the Dublin and southwest regions have a considerably higher than average number of nights spent in hospital compared to other regions in Ireland. In contrast, the northwest and west have lower numbers of nights spent in hospital than predicted by the negative binomial model. This outcome indicates that exogenous influences other than the demographic, spatial, and socioeconomic characteristics included in the negative binomial model are more important as determinants of the number of nights spent in hospital. Calibrating the alignment at the national level alone demonstrates the importance of aligning variables at the subnational level as a way to introduce spatial heterogeneity into the data. Once this spatial variation has been established, the question becomes what factors determine the 69

Geographical Analysis

relationship between individual location and acute hospital utilization? The aim of the remainder of this article is to identify the individual-level factors that influence the spatial variation in acute hospital utilization across Ireland.

The spatial influence of acute hospital utilization As part of the alignment process, a logistic regression model was estimated with STATA to examine the determinants of inpatient admission at the national level using the weighted LII Survey. This data set contains sample weights for each individual and household within it. The purpose of sample weighting is to compensate for any biases in the distribution of characteristics in the completed survey sample compared with the population of interest. Nolan et al. (2002) outline the method of weightings used in the LII data set. Given that the SMILE is representative at the national level, the weighted LII Survey was used for model estimation, increasing the consistency of estimation across both data sets. The spatial microsimulation does not use these sample weights. Table 5 presents the results of the logistic regression. The general fit of the model, reported by the R2 statistic, is 0.26. Based on the odds ratio, the main drivers of inpatient admission are • • • • •

very bad health status (odds ratio, 1.089), bad health status (0.97), a consultant specialist visit in the previous year (2.149), a GP visit in the previous year (1.310), and age (a one unit increase in age increases the probability of being admitted as an inpatient by 2%).

These results are consistent with previous research examining factors that influence inpatient admissions to acute hospitals. Previous research on Ireland has found that females are 2% more likely than males to be admitted to an acute hospital (Layte and Wiley 2006). Also significant and found to affect positively admission to an acute hospital are lower levels of education. One of the main points of interest from the analysis here is that annual income does not significantly covary with inpatient admission. This result persists even when income is categorized into five bands. The 0.00 coefficient and 0.00 standard deviation for income reported in Table 5 are due to scale issues. Income in the LII is measured in euros, and as such, the variable (minimum €0, maximum €250,000) varies from the other continuous and categorical variables. Furthermore, this result is consistent with previous research about the relationship between income and acute hospital utilization in Ireland (Layte and Wiley 2006). The strong relationship that the logistic regression reveals between health status and inpatient admission is reassuring. International research indicates that health status is the most important covariate of hospital utilization, although demographic and socioeconomic factors also tend to have strong relationships (Benzeral and Judge 1994). We are interested in examining the key drivers of inpatient admission to an acute hospital. Executing a logistic regression with the three variables found to have the strongest significance relationship with hospital admissions—a visit to a GP in the previous year, age, and ill health— results in 20% (pseudo-R2 = 0.20) of the variation in hospital attendance at the national level being due to variation in these three variables alone. Using data from SMILE, one can graph the spatial distribution of these three variables at the district level to determine if the spatial pattern of hospital admission is driven (in part) by these variables’ spatial distribution. Fig. 7 portrays the spatial relationship between age, ill health, and GP visits, and hospital attendance at the district 70

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

20

30

40

Relationship between Mean ED LTI & Mean ED Hospital Attendance

25

10

30

Mean Age 35

40

Mean Percentage of LTI

45

Relationship between Mean ED Age & Mean ED Hospital Attendance

0

10 20 30 Mean Percentage of Hospital Attendance (mean) age

40

0

Fitted values

Mean LTI

Relationship between Mean ED GP Visits & Mean ED Hospital Attendance

40

Fitted values

25

60

30

70

Mean Age 35

80

40

45

90

Relationship between Mean Age and Mean Nights Spent in Hospital

50

Mean Percentage GP Visits

10 20 30 Mean Percentage of Hospital Attendance

0

10

20 30 Mean Hospital Attendance

Mean GP Visits

40

0

.5

1 1.5 Mean Nights Spent in Hospital Mean Age

Fitted values

2

2.5

Fitted values

Relationship between Mean GP Visits & Mean Nights Spent in Hospital

70 50

10

60

Mean GP Visits

Mean LTI 20 30

80

40

90

Relationship between Mean LTI & Mean Nights Spent in Hospital

0

.5

1 Mean LTI

1.5 Mean Nights Fitted values

2

2.5

0

.5

1

1.5 Mean Nights

Mean GP Visits

2

2.5

Fitted values

Figure 7. Spatial relationship between age, ill health, and GP visits and hospital attendance at the district level (graphs 1–3) and the spatial relationship between age, ill health, and GP visits and nights spent in hospital at the district level in Ireland (graphs 4–6).

level in Ireland. It shows a strong positive relationship between age and ill health and hospital attendance. Older individuals and those with long-term illness (LTI) are more likely to have higher demand for hospital services. The spatial relationship between GP visits and hospital attendance, although positive, is not as strong as that for age and LTI variables. Thus, we find that the spatial distribution of acute hospital inpatient admission at the district level (Fig. 3), covaries by age and LTI. As part of the alignment process (as with the logistic model for inpatient admissions), a negative binomial model was implemented to analyze the determinants of nights spent in hospital 71

Geographical Analysis

at the national level using the weighted LII Survey. The STATA statistical package was again used for estimation purposes. Table 6 presents the results of the negative binomial model for the number of nights spent in hospital. The overdispersion parameter, a, is positive and quite large at 15.16, indicating that the data are overdispersed. Thus, the negative binomial model is preferable to the Poisson model for estimating nights spent in hospital. Table 6 indicates that the main covariates for number of nights spent in hospital are • • • • •

whether an individual has visited a GP (1.914), whether an individual has visited a consultant (2.047), being single (compared to the base category, widowed; 1.972), bad to very bad self-assessed health (relative to good self-assessed health status; 2.724 and 2.354, respectively), and age (a one unit increase in age increases the number of nights spent in an acute hospital by 2%).

Again of interest is the insignificant covariation annual household income has with the number of nights an individual spends in hospital. Previous research about determining factors on the number of nights spent as an inpatient in Ireland also found that income level is not a significant covariate (Economou, Nikolaou, and Theodossiou 2008). Our analysis also shows that gender (female, -0.519) and lower levels of education relative to degree level significantly covary with the number of nights an individual spent as an inpatient. Fig. 7 portrays the district-level relationship between age, LTI, and GP visits, and the number of nights spent in hospital. Similar to the relationship observed for actual hospital attendance, a strong positive relationship exists between age and ill health and nights spent in hospital. Older individuals and those with LTI are more likely to have higher demand for hospital services. The spatial relationship between GP visits and nights, although positive, is not as strong as that for age and LTI variables. Thus, we find that the spatial distribution of the number of nights spent in an acute hospital at the district level (Fig. 6) is driven by age and LTI. In terms of the validity of the regression coefficients presented in Table 5–6, the SMILE produces synthetic data for each individual by district. However, the SMILE does not capture intradistrict spatial variation for the population. Rather than producing clustered data, with a subset of the population in each cluster, the SMILE creates a spatially referenced data set for the entire Irish population. As such, the regressions or nonparametric estimates do not need to be adjusted to account for clustering. From a methodological point of view, implications of error, arising from both the use of the SMILE’s simulated data and the use of statistical methods (including, e.g., the logistic regression and negative binomial models) employing simulated data, must be noted. Although the SMILE’s matching variables produce a representative age/sex, education, farm or no farm, number of individuals in each household, and number of households in each district for all of Ireland (Morrissey et al. 2008; Table 3), the unmatched variables produced by the SMILE are subject to error. The multiple reasons why the initial SMILE match fails to meet its initial target are outlined in the section “Data requirements and a SMILE.” This article introduces an alignment methodology that may be used to overcome the inherent error present in the initial SMILE match resulting from breaches in conditional independence. However, not just simulated data suffer from error. Survey and administrative data also contain error, be it measurement, observational, or reporting error. Measurement/observational/reporting error is merely a different type than estimated/synthetic/simulated error. 72

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

Conclusion Current international research shows that strong spatial variations exist with regard to health service utilization (Oliveira and Bevan 2003). Quantitative models may be used to help understand the pathways and determinants of these spatial variations. However, quantitative models of health have their limitations (Mitchell 2001; Williams 2003); namely, their inability to capture intangible social processes, such as macrolevel political decisions and cultural influences that determine individuals’ health (Graham 2002; Macintyre, Ellaway, and Cummins 2002). The aim of this article is to take existing methodologies a step further, specifically spatial microsimulation, and to develop a methodology that creates a spatial distribution of individuals according to individual, spatial, and social processes. Adapting an alignment technique used in the dynamic simulation literature (Morrison 2006) to calibrate variables of interest to exogenous data allows social dynamics to be incorporated into data. For example, by calibrating the hospital admissions variable, we observe that midwest Ireland has higher than average hospital admission. Although this regional feature is partly determined by the compositional characteristics of the individuals in midwestern Ireland, it may also underline a higher propensity for admissions—a cultural inclination of hospitals in this area to admit patients to acute hospitals. Thus, research on this spatial variation may use further qualitative analysis to determine any sociocultural determinants. With the calibration method, the residential location of individuals exhibits a significant covariation with hospital utilization patterns in Ireland. After establishing spatial variations in acute hospital utilization, the next step involved determining why these variations exist. Examining at the small area level, the key covariates with acute hospital utilization from the nationallevel analysis reveals that individual risk factors influencing hospital utilization covary with the spatial pattern of acute hospital utilization in Ireland: hospital utilization is at least partly driven by health service needs, such as poor health and age. This article presents a methodology, spatial microsimulation, that has been extensively used to estimate the effect of various public policies across space (Ballas and Clarke 2001). Extending the methodology to include a calibration process allows for a complete analysis of relationships between individual characteristics, location, sociocultural processes, and acute hospital utilization in Ireland. Our results have two main implications for policy and research. First, from a policy perspective, they indicate that, given the demographic, socioeconomic, and health profile of the Irish population, the northwest, west, and midwest regions have the highest demand for acute hospital services. Although these regional-level results may have been established using the weighted LII Survey, a district-level profile of hospital attendance may have been established only by using the data produced by the SMILE. The analysis provided in the section on the alignment process clearly indicates clusters of districts with high hospital utilization rates across different regions in Ireland. The subsequent section on the results of the alignment process also demonstrates that the data produced by the SMILE may be used to pinpoint clusters of districts with a high probability of hospital demand given their socioeconomic and demographic profiles. First, given the rising cost of providing health care to an expanding population, the focus on developing needs-based indicators of health care is at an all-time high (de Looper and LaFortune 2009). Therefore, the data produced by the SMILE allows policy makers to pinpoint areas, at a very local level, with high healthcare demand, and thus ensures cost-effectiveness in public expenditure. Second, from a methodological perspective, the use of aggregate indicators to measure health service demand 73

Geographical Analysis

across different locations may be inappropriate. Hospital utilization patterns strongly covary with location, and as such, any policy-oriented analysis must capture this variation. As outlined in the introduction, because of the relationship between health and location, health resources should be allocated across some spatial dimension to achieve equity of provision. This article demonstrates that a spatial microsimulation model, when combined with a spatial calibration process, can indicate spatial variation in health service utilization and the individual covariates. These results, in turn, can be used to target healthcare resources in areas with the highest demand and therefore optimize government intervention and use of public resources.

References Baekgaard, H. (2002). “Micro-Macro Linkages and the Alignment of Transition Processes.” Technical Paper No. 25, National Centre for Social and Economic Modeling, University of Canberra. Ballas, D., and G. P. Clarke. (2001). “The Local Implications of Major Job Transformations in the City: A Spatial Microsimulation Approach.” Geographical Analysis 33(4), 291–311. Ballas, D., G. P. Clarke, D. Dorling, and D. Rossiter. (2007). “Using SimBritain to Model the Geographical Impact of National Government Policies.” Geographical Analysis 39(1), 44–77. Benzeral, M., and K. Judge. (1994). “The Determinants of Hospital Utilization: Implications for Resource Allocation in England.” Journal of Health Economic 3(2), 105–10. Black, D. (1980). Inequalities in Health. London: Penguin. Caldwell, S. (1996). “Health, Wealth, Pensions and Life Paths: The CORSIM Dynamic Microsimulation Model.” In Microsimulation and Public Policy, 505–22, edited by A. Harding. Amsterdam: North Holland. Caldwell, S., and L. Keister. (1996). “Wealth in America: Family Stock Ownership and Accumulation, 1960–1995.” In Microsimulation for Urban and Regional Policy Analysis, 88–116, edited by G. P. Clarke. London: Pion. Chaudhuri, S., and M. Ghosh. (2011). “Empirical Likelihood for Small Area Estimation.” Biometrika 98(2), 473–80. Curtis, J. A. (2002). “Estimating the Demand for Salmon Angling in Ireland.” The Economic and Social Review 33(3), 319–32. Demombynes, G., C. Elbers, J. Lanjouw, P. Lanjouw, J. Mistiaen, and B. Özler. (2002). “Producing an Improved Geographic Profile of Poverty: Methodology and Evidence from Three Developing Countries.” Discussion Paper No. 2002/39, United Nations University, Helsinki. D’Orazio, M., M. Di Zio, and M. Scanu. (2006). Statistical Matching: Theory and Practice. Chichester: Wiley. Dorling, D., R. Mitchell, M. Shaw, S. Orford, and G. Davey Smith. (2000). “The Health Effects of Poverty in London in 1896 and 1991.” British Medical Journal 321, 1547–51. Duncan, A., and M. Weeks. (1998). “Simulating Transitions Using Discrete Choice Models.” Papers and Proceedings of the American Statistical Association 106, 151–56. Duncan, C., and K. Jones. (2000). “Using Multilevel Models to Model Heterogeneity: Potential and Pitfalls.” Geographical Analysis 32(4), 279–305. Economou, A., A. Nikolaou, and I. Theodossiou. (2008). “Socio-Economic Status and Healthcare Utilization: A Study of the Effects of Low Income, Unemployment and Hours of Work on the Demand for Health Care in the European Union.” Health Service Management 21, 40–59. Edwards, K., J. Cade, J. Ransley, and G. Clarke. (2009). “A Cross Section Study Examining the Pattern of Childhood Obesity in Leeds: Affluence Is Not Protective.” Archives of Diseases in Childhood 69, 1127–34. Elbers, C., J. Lanjouw, and P. Lanjouw. (2001). “Welfare in Villages and Towns: Micro-Level Estimation of Poverty and Inequality.” Mimeo, DECRG–World Bank, Washington, DC. Fotheringham, A. S., and C. Brunsdon. (1999). “Local Forms of Spatial Analysis.” Geographical Analysis 31(4), 340–58. Ghosh, M., and J. Rao. (1994). “Small Area Estimation: An Appraisal.” Statistic Science 9(1), 55–93. 74

Karyn Morrissey et al.

Determinants of Acute Hospital Demand

Gordis, L. (2000). Epidemiology. Philadelphia, PA: WB Saunders Company. Graham, H. (2002). “Building An Interdisciplinary Science of Health Inequalities: The Example of Lifecourse Research.” Social Science and Medicine 55, 2005–16. Greene, W. (2008). “Functional Forms for the Negative Binomial Model for Count Data.” Economics Letters 99, 585–90. Greene, W. H. (1993). Econometric Analysis, 2nd ed. New York: Macmillan. Harvey, D. (1989). The Urban Experience. Oxford: Basil Blackwell. Hynes, S., K. Morrissey, C. O’Donoghue, and G. Clarke. (2009). “Building A Static Farm Level Spatial Microsimulation Model for Rural Development and Agricultural Policy Analysis in Ireland.” International Journal of Agricultural Resources, Governance, and Ecology 8(2), 282–99. Layte, R., and M. Wiley. (2006). “Equity in the Utilization of Hospital In-Patient Services in Ireland: An Improved Approach to the Measurement of Health Need and Differential Cost.” Working Paper No. 19, Research Program on Health Services, Health Inequalities and Health and Social Gain. Available at http://www.esri.ie/UserFiles/publications/20080905114906/OPEA024.pdf; last accessed on 22 October 2012. de Looper, M., and G. Lafortune. (2009). “Measuring Disparities in Health Status and in Access and Use of Health Care in OECD Countries.” OECD Health Working Papers No. 43, OECD Publishing. http://dx.doi.org/10.1787/225748084267. Macintyre, S. (1997). “The Black Report and beyond: What Are the Issues?” Social Science and Medicine 44, 723–45. Macintyre, S., A. Ellaway, and S. Cummins. (2002). “Place Effects on Health: How Can We Conceptualise, Operationalise and Measure Them?” Social Science and Medicine 55(1), 125–39. Mitchell, R. (2001). “Multilevel Modeling Might Not Be the Answer (A Response to McCulloch, A. [2001]).” Environment and Planning A 33, 1357–60. Morrison, R. (2006). “Make It So: Event Alignment in Dynamic Microsimulation.” DYNACAN Working Paper, Ottawa, Canada. Morrissey, K., and C. O’Donoghue. (2011). “The Spatial Distribution of Labour Force Participation and Market Earning at the Sub-National Level in Ireland.” Review of Economic Analysis 2, 80–101. Morrissey, K., G. Clarke, D. Ballas, S. Hynes, and C. O’Donoghue. (2008). “Analysing Access to GP Services in Rural Ireland Using Micro-Level Analysis.” Area 40(3), 354–64. Morrissey, K., S. Hynes, G. Clarke, and C. O’Donoghue. (2010). “Examining the Factors Associated with Depression at the Small Area Level in Ireland Using Spatial Microsimulation Techniques.” Irish Geography 43(1), 1–22. Navarro, V., and L. Shi. (2001). “The Political Context of Social Inequalities and Health.” Social Science and Medicine 52, 481–91. Nolan, A., and B. Nolan. (2005). “A Panel Data Analysis of the Utilisation of GP Services in Ireland: 1995–2000,” Working Paper No. 13, Research Programme on “Health Services, Health Inequalities and Health and Social Gain” ESRI, ISSC and University of Ulster. Nolan, B., B. Gannon, R. Layte, D. Watson, C. Whelan, and J. Williams. (2002). “Monitoring Poverty Trends in Ireland: Results from 2000 Living in Ireland Survey,” Working Paper No. 45, Policy Research Series, Dublin, Economic and Social Research Institute. O’Donoghue, C. (2010). Life-Cycle Microsimulation Modeling: Constructing and Using Dynamic Microsimulation Models. London: Lambert Academic Publishing. Oliveira, M. D., and G. Bevan. (2003). “Measuring Geographic Inequities in the Portuguese Health Care System: An Estimation of Hospital Care Needs.” Health Policy 66(3), 177–92. Rao, J. (2003). Small Area Estimation. New York: Wiley. Sassi, F., and J. Hurst. (2008). “The Prevention of Lifestyle-Related Chronic Diseases: An Economic Framework.” OECD Health Working Paper 32. Available at http://www.oecd.org/dataoecd/57/14/ 40324263.pdf; last accessed on 22 October 2012. Shim, J. K. (2002). “Understanding the Routinized Inclusion of Race, Socio-Economic Status and Sex in Epidemiology: The Utility of Concepts from Technoscience Studies.” Sociology of Health and Illness 24, 129–50. Smith, D., G. Clarke, and K. Harland. (2009). “Improving the Synthetic Data Generation Process in Spatial Microsimulation Models.” Environment and Planning A 41, 1251–68.

75

Geographical Analysis

Smith, D., J. Pearce, and K. Harland. (2011). “Can A Deterministic Spatial Microsimulation Model Provide Reliable Small-Area Estimates of Health Behaviors? An Example of Smoking Prevalence in New Zealand.” Health and Place 17, 618–24. Soja, E. W. (1980). “The Socio-Spatial Dialects.” Annals of the Association of American Geographers 70, 207–77. Voas, D., and P. Williamson. (2001). “Evaluating Goodness-of-Fit Measures for Synthetic Microdata.” Journal of Geographical and Environmental Modelling 1(2), 177–200. Williams, G. (2003). “The Determinants of Health: Structure, Context and Agency.” Sociology of Health and Illness 25(1), 131–54. Williamson, P., M. Birkin, and P. Rees. (1998). “The Estimation of Population Microdata Using Data from Small Area Statistics and Samples of Anonymised Records.” Environment and Planning A 30, 785–816. Wilson, K., J. Eyles, S. Elliott, and S. Keller-Olaman. (2009). “Health in Hamilton Neighbourhoods: Exploring the Determinants of Health at the Local Level.” Health and Place 15(1), 374–82. Windle, M. (2010). “A Multilevel Developmental Contextual Approach to Substance Use and Addiction.” BioSocieties 5(1), 124–36.

76