Spatial Autoregressive Model for Population Estimation at the Census ...

9 downloads 0 Views 6MB Size Report
investments involved, census data, such as those. Spatial Autoregressive Model for Population. Estimation at the Census Block Level Using. LIDAR-derived ...
Spatial Autoregressive Model for Population Estimation at the Census Block Level Using LIDAR-derived Building Volume Information Fang Qiu, Harini Sridharan, and Yongwan Chun ABSTRACT: The collection of population by census is laborious, time consuming and expensive, and often only available at limited temporal and spatial scales. Remote sensing based population estimation has been employed as a viable alternative for providing population estimates based on indicators that make use of two-dimensional areal information of buildings or one-dimensional length information of roads The recent advancement of LIDAR remote sensing provides the opportunity to add the third dimension of height information into the modeling of population distribution. This study explores the use of building volumes derived from LIDAR as a population indicator. Our study shows the volume-based model consistently outperforms area and length-based models at the census block level. Additionally, the study examines the impact of spatial autocorrelation, the presence of which violates the independence assumption of the traditional OLS models. To address this problem, a spatial autoregressive model is employed to account for the spatial autocorrelation in the regression residuals. By incorporating the spatial pattern, the volume-based spatial error model achieves a goodness of fit (R2) of 85 percent, with a significant improvement in model performance and estimation accuracies in comparison with its OLS counterpart. The study confirms building volume as a more valuable indicator and estimator for block level population distribution, especially if an appropriate spatial autoregressive model is adopted. KEYWORDS: Population estimation, Lidar, building volume, spatial models

T

Introduction

he distribution of population is one of the fundamental inputs for many research tasks (such as urban planning, facilities allocation, and quality of life assessment) because the dispersion of resources and energy among various geographical areas is strongly dependant on the population size (Hoque 2008). Traditional methods for the collection of population data involve a census, which entails extensive planning, surveying, and data processing, which is time consuming, laborious, and expensive (Lo 2006). Due to the huge financial investments involved, census data, such as those

Fang Qiu, University of Texas at Dallas, 800 W Campbell Rd. GR31, Richardson, Texas 75080-3021. Tel: 972-883-4134. Email: . Harini Sridharan, University of Texas at Dallas, 800 W Campbell Rd. GR31, Richardson, Texas 75080-3021. E-mail: . Yongwan Chun, University of Texas at Dallas, 800 W Campbell Rd. GR31, Richardson, Texas 75080-3021. Tel: 972-883-4719. E-mail: .

provided by the U.S. Census Bureau, are usually collected only at a fixed time period (e.g., once every ten years) and at a set of pre-defined geographic units (e.g., the census tract, block group, and block). The fact that census data are provided at a limited temporal and spatial scales restricts census data’s applications to only those that do not critically rely on the most current information of the population distribution at detailed levels. In addition to conducting a census every ten years, the U.S. Census Bureau also provides intercensal population count estimations at state, county, and city levels based on projection techniques (Smith 1998), but the estimates are not available at any finer geographical scale such as the tract or block level. The U.S. Census Bureau has recently introduced the American Community Survey (ACS) which aims at providing population information every year instead of every ten years. The ACS selects a sample of households for surveying and provides yearly estimates based on this sample. Currently, only three-year estimates from 2006-2008 are available and five-year estimates are expected to

Cartography and Geographic Information Science, Vol. 37, No. 3, 2010, pp. 239-257

be available by the end of 2010. These estimates will be available only at the tract level or above but will remain unavailable to the more detailed (block group and the block) levels. To support activities that need suitable population information at a fine-scale level for a given year between two consecutive decennial censuses (e.g., emergency response planning), population estimation is still needed. At this scale, reliance must be placed on third party, often expensive, complex, commercial demographic models. These models may provide reasonable estimation but often involve significant manpower for demographic analysis. Due to the requirement to collect multiple inputs, the success of the models relies heavily on the quality of these inputs and the performance of the models in earlier time periods, which are not always reliable (Qiu et al. 2003). In order to estimate population at various levels of detail, remote sensing imagery and its derived datasets have been employed as viable alternatives in population prediction. Remote sensing provides a synoptic view of a large area which can be acquired at a small and regular time interval or within a short period of time when it is needed (Lo 2006). With the right techniques, remote sensing based population estimation can serve as a cheaper and less laborious replacement for commercial demographic models. Over the past 40 years, a variety of remote sensing products with different spatial resolutions have been employed to estimate population at different scale levels. For example, low to medium resolution satellite images, such as Landsat TM imagery, have been used to conduct city level population estimation (Lisaka and Hegedus 1982; Qiu et al. 2003; Wu and Murray 2007), while large-scale aerial photographs have been employed to support the modeling of population counts at a community level, such as the census tract (Porter 1956; Dueker and Horton 1971; Lo 1989). The advent of IKONOS, Quickbird and other very high spatial resolution satellite imagery enables us to drill down to even more detailed levels, such as the block and block group (Haverkamp 2004; Lu et al. 2006; Liu et al. 2006; Chen 2002). The increasing refinement in spatial resolutions for aerial and satellite images and their growing availability to the public have motivated a wider adoption of remote sensing technology for population estimation, aimed at achieving better accuracy at a finer scale level than was previously possible. The promise of remote sensing technology for population estimation lies in the fact that its derived data products contain rich information on the 240

distribution of human settlements, which can serve as reliable indicators of population. In many past studies, a variety of indicator variables of human settlement distribution have been extracted from remotely sensed imagery to provide timely and detailed population estimation, including extent of urban development (Tobler 1969; Ogrosky 1975; Lo and Welch 1977; Lisaka and Hegedus 1982; Weber 1994), area of residential land use (Lo 1989; Wu et al. 2005; Lu et al. 2006; Qiu et al. 2003), counts of dwelling buildings (Lo 1995; Wu et al. 2008; Lwin and Murayama 2009), length of roads (Qiu et al. 2003), and spectral index of pixel values (Liu et al. 2006; Harvey 2002). Utilization of these variables to model population distribution makes use of only two-dimensional (2-D) areal or onedimensional (1-D) length information. The recent development in Light Detection And Ranging (LIDAR) technology provides the opportunity to go beyond this limit (Xie 2006). With the accurate height information acquired by LIDAR, we now can incorporate a third dimension of information in population estimation. In this research, we will explore the utilization of the volume of residential buildings as a potential indicator to estimate population distribution at fine scales, such as the census block. Resolution refinement in remote sensing data offers the opportunity to model population at finer scales, but it also demands a corresponding enhancement in population estimation methodology. Previous approaches to population estimation have been largely based on ordinary least square (OLS) regression. The OLS approach assumes that observations are independent of each other, ignoring the possible spatial relationship that may exist between them. This assumption may not be valid with variables representing a geographically varying phenomenon, such as population, because when the spatial scale moves down especially to the block level, the spatial autocorrelation of population among neighboring blocks is likely to become stronger (Brown 1995). This may lead to the violation of the OLS independence assumption, a problem that needs to be addressed using an improved methodology to ensure a valid model of population estimation. The purpose of this research is to test if the volume of residential buildings facilitated by the latest LIDAR remote sensing techniques can serve as an effective indicator variable to estimate population at the block level. The effectiveness of this 3-D volume based population estimation will be assessed using comparisons with traditional 2-D and 1-D approaches based on the area of residential Cartography and Geographic Information Science

buildings, the area of residential land use, and the length of road networks. This paper proposes the use of spatial autoregressive regression to address the spatial autocorrelation problem of traditional OLS models when applied in fine-scale population estimation. This study will examine the possible presence of spatial autocorrelation in block level population and develop appropriate spatial models according to its type. The accuracy and robustness of these spatial models will be compared with those of the OLS models.

Literature Review Estimation of population using remote sensing products dates back to the 1960s when Nordbeck (1965) found an exponential relationship between the population and area of urban settlement based on allometric growth models. This relationship was tested by Tobler (1969) who correlated the radius of built-up areas with population and established a correlation coefficient of 0.87. Popularization of Landsat imagery in the 1970s in land use / land cover mapping promoted the use of this data for population estimation at city and census tract levels. Lo and Welch (1977) estimated the population of Chinese cities by extending Nordbeck’s model to the measurement of built-up area extracted from Landsat images and achieved a correlation coefficient of 0.82. Chen (2002) used Landsat TM data to classify residential land uses into different density levels and modeled census counts as a function of the area of three residential land uses at high, medium, and low density levels. Qiu et al. (2003) modeled the growth in population at city and census tract levels using Landsat TM based change detection. It was also found that new developments in the road network were a better indicator of population growth than the change in urban land use. These studies usually estimate population at a predetermined geographic unit such as the city, county, or census tract, with population counts regressed against the area of urban or residential land use, or length of road networks, aggregated to that geographic unit. Characteristics derived from the individual pixels in satellite images have also been utilized as population indicators. Lisaka and Hegedus (1982) established a mathematical relationship between population density and the radiance measured from Landsat data to estimate the counts. Li and Weng (2005) correlated population with various pixel level Vol. 37, No. 3

variables using spectral signatures, principal components, vegetation indices, textures, and temperatures extracted from Landsat TM data. Harvey (2002) classified each pixel of Landsat TM data to delineate residential area and used an EM algorithm to iteratively regress the population at each pixel level with its spectral values and adjust the results through redistributing the errors at the aggregation zone into each of the individual pixels. With the availability of high spatial resolution remotely sensed imagery, residential areas can be delineated with a greater precision and accuracy so that the estimation of population can be attempted at finer levels. To estimate population at the block group level, Almeida et al. (2007) performed a multi-resolution segmentation on a Quickbird image to delineate homogeneous residential areas of different occupation density based on spectral, geometrical, and topological features of the segmentations as well as their context and semantics. With improvements in image resolution and processing technologies, it is now possible to model population based on the number of individual dwelling units automatically determined by object-oriented or object-based classifications (Haverkamp 2004; Zhang and Maxwell 2006), rather than the laborious manual counting from aerial photographs previously employed in many earlier studies (Green 1956; Hadfield 1963; Binsell 1967). However, as early as Green (1956) it was noticed that population was often underestimated when using dwelling counts derived from aerial photographs because the number of vertically stacked dwelling units cannot be easily identified from the 2-D photographs. This problem persists in the automatic determination of dwelling units from very high resolution satellite images because usually only the roof of the buildings is visible from the satellite’s perspective. Population estimation should be improved by incorporating the third dimension height information, which can provide an insight into the structure of the dwelling units. The potential of LIDAR data for the extraction of building height (Maas and Vosselman 1999; Barnsley et al. 2003) and the reconstruction of the 3-D building structure (Hongjian and Shiqiang 2006; Gurram et al. 2007; Sampath and Shan 2007) has been explored in the recent literature. However, population estimation using LIDAR-derived building structure information is still in its nascent stage, and a search in literature yielded only a few related works. Xie (2006) proposed a theoretical framework of areal interpolation for population to the level of housing 241

units. The housing units were extracted and classified based on high resolution DOQQ imagery and LIDAR data. The population was then allocated to each unit based on a variety of factors, one of which was the volume of buildings. This theoretical framework could be used to estimate population at any aggregation zone level, including the census block. However, no actual population estimation was conducted by the authors with detailed accuracy assessment. Wu et al. (2008) developed a deterministic equation for estimating population at block and sub-block levels by multiplying the number of housing units with the occupancy rate and average number of people per unit. The number of housing units was derived based on the ratio between LIDAR-derived building volume and the average space per unit. The research was able to achieve an average percent error of 0.11 at block level because, in addition to the remotely sensed building volume, detailed information on the occupancy rate, average space per housing unit, and average number of persons per household was also included. These variables, however, are not always available for intercensal years. The research reported here is based on the premise that the volumetric information of a building can serve as a better indicator than building counts or 2-D area data for population estimation at a finescale level. Volumetric information derived from building heights automatically extracted from LIDAR can provide a better surrogate for the number of residential units in buildings with one or more stories (i.e.vertically stacked structure). However, an appropriate methodology must be employed to take advantage of this detailed information. Most remote sensing based population estimation models utilize ordinary least squares (OLS) to fit a multivariate regression between population counts and various demographic indicators (Wu et al. 2005). Ordinary least square models assume the independence of the observations (Bailey and Gatrell 1995), which is rarely true for a geographically varying variable such as population. The assumption of OLS ignores the spatial process of the underlying phenomena and their effect on the estimation. Modeling of geographically varying variables based on OLS is therefore likely to produce biased or inefficient estimation (Anselin 1988). For example, Lo (2008) discovered that the relationship between population and classified land cover/land use is not constant across space (non-stationarity), a problem that OLS models cannot deal with. He applied a local geographically weighted regression (GWR) model for popula242

tion estimation that permits the regression coefficients thus derived to vary geographically and achieved improved estimation accuracy over the OLS model of 28 percent at the census tract level. Lo (2008) suggested that the spatial non-stationarity problem can be mitigated if smaller spatial units are used. At finer scales such as the census block level, the spatial process determining population does not vary as rapidly as do processes at much coarser levels, and can often be considered to be stationary. However, another problem appears at this level, which is the presence of spatial autocorrelation. In the case of population, it means their counts tend to have similar values within a short distance. As a result, spatial autocorrelation tends to be more pronounced at a finer-scale level, because population counts vary minimally from those of neighboring units when aggregated in smaller zones such as census blocks. According to the Geographic Area Reference Manual provided by the U.S. Census Bureau (1994), census blocks in urban and surrounding suburban areas are designed as relatively small grids with road networks and may be disrupted here and there by water features, topographic relief, and transportation features such as rail yards and airports. If a block contains an array of residential buildings, then it is highly probable that its neighboring blocks also contain similar residential developments, leading to a corresponding correlation in the population count with neighboring blocks. To address this spatial autocorrelation issue, spatial autoregressive models (Anselin 1988) need to be used in place of OLS models. The current study implements this by first examining the spatial pattern in the dependent variable and the OLS regression residuals using a set of spatial diagnostics. Then an appropriate spatial autoregressive model is selected, which explicitly accounts for the spatial autocorrelation present in the population estimation.

Study Area and Data To test our proposed approach, we selected a part of Round Rock, Texas, as our study area (Figure 1). Located 15 miles north of Austin, Round Rock is a suburban city in the Austin-Round Rock metropolitan area and the home of a major computer manufacturer, Dell Inc. According to American FactFinder by the U.S. Census Bureau, it is one of the fastest growing cities in the U.S., with a population of 63,136 in 2000 and an estiCartography and Geographic Information Science

Figure 1. Census block population density of the study area. mate of 86,175 in 2006, a 40.95 percent increase over a period of six years. A subset of the city located between 30°30′ N and 30°31′ N, and Vol. 37, No. 3

between 97°39′ W and 97°41′ W was used in this study, which covered an area of 10 square km, comprised 220 buildings, and had a total popu243

lation of 9550 residents in 2000. This is a typical residential area with a mixed composition of single and multifamily buildings. The population data were obtained from the 2000 U.S. Census Bureau block data (Figure 1). The 220 blocks in the study area had populations ranging from 0 to about 1290 people per block in 2000. Since a non-zero intercept model is to be developed, blocks with zero population were eliminated, which resulted in 175 blocks available for subsequent analyses. The blocks are mostly near rectangular or square grids, with a few large, irregularly shaped blocks in the central part of the region due to a water body. A visual examination of the population map suggested a noticeable positive spatial autocorrelation in the distribution of population. For example, blocks with high population density were clustered predominantly in the north-western and lower southern portion of the study area. The various population indicators used in the study, including the area of residential land use, the length of road networks, and the area and volume of residential buildings, were either extracted from remote sensing imagery or obtained from remote sensing derived GIS data provided by City of Round Rock’s GIS Information Center. The parcel-level land use and road network GIS data were both derived from 2006 aerial photographs using a combination of automated digital image processing and visual photo interpretation. The parcel level land use data includes 16 categories and only the categories for residential land use (that is, single-, two-, and multi-family) were retained for analysis. The road networks in the study area are mostly city streets surrounding the blocks, except for a U.S. Highway running through the middle of the city. Since a block is often surrounded by roads or other linear features, and there may possibly be a few internal roads within the block, one of which may go completely through, the total length of the street network surrounding each block and the internal roads are used as an indicator of population per block. Building footprints were derived from high-resolution aerial photographs by Round Rock GIS Information Center through combined digital image classification and visual photo interpretation; aerial photographs are the most accurate source currently available for footprint data. Building footprints can also be extracted automatically using LIDAR data (Maas and Vosselman 1999; Barnsley et al. 2003). However, since the average resolution for most LIDAR data currently available for cities is often coarser than that of high-resolution aerial 244

photographs and a perfect differentiation between buildings and nearby tall trees is still very challenging, the accuracy of the subsequent building volume calculation will be compromised unless substantial post-processing manual editing is conducted. Buildings used for residential purposes were identified by overlaying them with the parcel level land use data described above. Both land use and building footprint data were available only for the year of 2006, but block-level population census data were available for the year 2000. Visual comparison of 2000 and 2006 aerial photographs indicated that no buildings present in 2000 were demolished in 2006, and the only changes since 2000 were new housing developments. In order to derive a footprint map for the year of 2000, any new housing developments that appeared only on the 2006 aerial photographs were eliminated. The LIDAR dataset was collected by Williamson County in 2006 with an average point spacing of 1.4 m and a vertical accuracy of 10 cm. A similar approach was used to eliminate all LIDAR points corresponding to the buildings constructed after 2000 to match the 2000 census data. The heights of residential buildings were determined using a method aimed at extracting height information from LIDAR point cloud data. First a two-way buffer along the boundary of the building footprint is created with a buffer distance of one foot on both sides of the boundary (Figure 2). This buffer serves as an exclusion zone that excludes the LIDAR points potentially bounced off from the walls of the buildings. Then a one-way buffer is performed along the outer boundaries of these exclusion zones using a buffer distance of two feet. The LIDAR points within the outer buffer zone are primarily those corresponding to returns from the ground. The median elevation of the points within this buffer zone is then calculated and taken as the elevation of the ground around the building. The use of median value instead of the mean prevents the influence of the extreme values from points reflected from tall trees or low pits. Another one-way buffer is then followed along the inner boundaries of the exclusion zones, also with a buffer distance of two feet. The median elevation of the points within the inward buffer zone is used as the elevation of the building roof. The use of the median value in this case prevents extreme values from the points reflected from TV antennas, chimneys and other above-roof structures to influence the analysis. Next, the difference between the median elevations of the inward and outward buffer zones is computed and assigned to each building as the above-ground Cartography and Geographic Information Science

Figure 2. Building height determination from LIDAR. height of the building. The building height thus determined is then multiplied by the area of the building footprint to obtain the volume of individual buildings. Finally, the volumes of all the individual buildings within a block are aggregated for use as a block level indicator variable in the subsequent regression analysis.

Methodology The volumes of residential buildings, derived based on the additional height information furnished by the LIDAR data, can serve as a valid and reliable indicator for population estimation, an OLS based allometric growth model is first developed. The results from this model are compared with similar OLS models using the traditional indicator variables, such as area of residential buildings, area of residential land use, and length of road network in a block. To Vol. 37, No. 3

determine the potential presence of spatial autocorrelation in the model, and to identify an appropriate spatial autoregressive model to address it, a series of diagnostics were performed. To estimate model parameters and to evaluate model performance, we divided the study area into north and south sections along the U.S. Highway 79 that passes through the region (Figure 1). The south section of the study area contains 102 blocks and was used for training the models, and the north section has 73 blocks used for testing the models. Unlike the commonly used spatially random selection of training and testing sites, the practice of dividing the study area into two sections preserves the spatial contiguity of the census blocks and allows the construction of the spatial connectivity matrix, a requirement of the spatial autoregressive model for both training and testing purposes. A set of various error measures are employed and calculated using the 245

north section testing data in order to evaluate the performance of models derived from the training data in the south section.

The OLS Models The OLS models used in this study are based on the widely used allometric growth equation (Nordbeck 1965) which suggest an exponential relationship between population (P) and builtup area (A), as shown in Equation (1):

P = α × Ab

(1)

where α and b are model coefficients. The allometric growth equation has been modified by using other 1-D or 2-D population indicator variables. For example, Tobler (1969) replaced the built-up area in the original allometric growth equation with the radius of urban development (r) to estimate population, assuming that cities are usually circular in shape. We extended this idea by substituting built-up area with the volume of residential buildings aggregated at the census block level, as shown in following equation P = α ×V

β

(2)

where β is the rate at which the population (P) changes with the volumes of residential buildings (V). When the value of β is less than 1, it indicates a negative allometry, where the population grows at a slower rate than the volumes of the residential buildings. A β value close to one indicates that the growth rates of both variables are equal and population growth is in a state of isometry. A positive allometric growth—where the population grows at a higher rate than building volume—occurs when β is greater than 1. In order to fit a regressive model of population to a volume of residential buildings using OLS, logarithmic transformation is performed to the both sides of the allometric growth equation (2). As a result, the original exponential equation is converted into a linear form as follows: Log(P) = Log (α ) + β ⋅ Log (V )

(3)

A scatterplot of the log of building volumes versus the log of population at the block level confirms there is a strong linear relationship between the two variables (Figure 3). The Pearson’s correlation coefficient between population and building volume was 0.914. The logarithmic transformation is often performed to make counts data (which is commonly assumed 246

to follow a Poisson distribution) normally distributed. It also ensures a linear functional form for the allometric growth Equation (.). The OLS model of population estimation involves fitting a linear statistical relationship (Burt and Barber 1996) between a dependant variable (Y) and one or more independent variables (X). Y = X β +e

(4)

where β are regression parameters, and e is the residual. The best linear unbiased estimates of the regression parameters are determined by minimizing the sum of squares of the residuals (e). In this study, population at the census block level is first estimated using the volume of residential buildings based on the linearized allometric Equation (3). For comparison, OLS models of population estimation were also developed based on various traditional 2-D and 1-D indicator variables, including area of residential buildings, area of residential land use, and length of road network. The total area of residential buildings in each census block was estimated using the residential building footprint data. The total area of residential land use was aggregated from the parcel level residential land use data. The area of residential land use is different from (and larger than) that of the residential buildings because it also includes the area of the parcel land surrounding the buildings. The length of the roads around a block was summed and used as another indicator variable for population estimation. To be consistent with the volume-based model (Equation (3), logarithmic transformation of these 2-D and 1-D variables and the population counts were performed.

Spatial Diagnostics Spatial autocorrelation is a direct implication of Tobler’s law (Tobler 1970) which states that things nearer to each other are more similar than those farther away. Spatial autocorrelation happens commonly with geographically varying attributes such as population, as suggested by the visual examination of the block-level population distribution in Figure 1, where blocks with similar population counts appear to cluster in space. The existence of spatial autocorrelation makes the use of OLS-based regression questionable because its assumptions require that the observations are independent of each other or the residuals from the OLS estimation are uncorrelated (Haining 1990). Violation of these assumptions may result in biased and inefCartography and Geographic Information Science

Figure 3. Relationship between population and volume of residential buildings. ficient estimation of the parameters of the linear regression. In addition to the above casual visual examination of Figure 1, which suggests a spatially autocorrelated data distribution, a series of diagnostics can also be performed to statistically determine the presence of spatial autocorrelation and to identify the form of the spatial process (spatial lag or spatial error) causing it. Moran’s I (Moran 1950) is often used to measure the degree of spatial autocorrelation of the dependant variable and the residuals of the OLS model. Albeit useful in identifying the general misspecification of the OLS model, Moran’s I falls short of providing any insight into the form of the spatial process causing the autocorrelation. Therefore, further evaluation is needed to decide which of the two common spatial processes, spatial lag or spatial error, is the cause.

Vol. 37, No. 3

Lagrange Multiplier (LM) tests can be used for this purpose (Anselin 1996). There are two different types of LM tests: LM Lag and LM Error. The LM Lag test evaluates if the lagged dependent variable should be included in the model, and the LM Error test assesses if the lagged residual should be included. Additionally, Robust LM tests can be conducted to verify if the spatial lag dependence is robust and therefore spatial error dependence can be ignored, or vice versa.

Spatial Autoregressive Models The biased and inefficient results of the OLS models that are due to the presence of spatial autocorrelation necessitates the adoption of new regression approaches that can explicitly account for the autocorrelation. Without taking spatial autocorrelation into consideration, the

247

OLS models may result in biased and inconsistent model coefficients with inefficient variance estimates and incorrect confidence intervals, which may cause the commitment of Type 1 errors by rejecting the null hypothesis when it is actually true (Anselin 1988; Bailey and Gatrell 1995). In simple terms, spatial autocorrelation may cause the researcher to conclude that a relationship exists when that is not actually the case. Two common spatial autoregressive models can be employed: the spatial lag model and the spatial error model. In the spatial lag model, the value of a dependant variable Y at a location is modeled as a function of the independent variables X in that location as well as the values of the dependant variable at the neighboring locations, that is, the spatial lag. A spatial lag is basically the weighted average of the dependant variable values at the neighboring locations (Anselin 1988), included as an additional explanatory variable in the model as shown in the following equations. Y = pWY + Xβ + ε

(5)

or Y = (I − ρW ) −1 X β + (I − ρW ) −1 ε

(6)

where ρ is the coefficient of the spatial lag component, ε is white noise, W is a spatial weight matrix, and I is an identity matrix. The spatial error model addresses the spatial autocorrelation existing in the regression residuals ( ε ) of the OLS models. The value of the dependent variable Y in a location is redefined as a function of the independent variables X and the regression residuals of the neighboring location, that is, the spatial error. A spatial error is fundamentally a weighted average of the individual residuals of the neighboring locations, which is added into the model as an additional explanatory variable shown in the following equations. Y = X β + ξ , ξ = ρW ξ + ε or Y = X β + (I − ρW ) −1 ε

(7)

(8)

where ρ is the coefficients for the spatial autocorrelation. In this paper, a spatial weight matrix, W, is defined as a row standardization of the first order spatial contiguity matrix, C, whose element cij has 1 if census blocks i and j share a common boundary or vertex, or 0 otherwise. While only the first order 248

neighbors are considered in W, spatial influence is not limited to just immediately adjacent neighbors. The matrix (I − ρ W)−1 being used in the spatial lag models (Equation (6)) and spatial error models (Equation (8)) incorporates the influence of higher order neighbors (LeSage and Pace 2009). Unlike the W which is a sparse matrix with 0s for all the higher order neighbors, this matrix is no longer sparse as a consequence of the inverse operation. Since most of the elements in this matrix have a non-zero value, the influence of higher order neighbors is implicitly considered in the spatial autoregressive models. As mentioned earlier, such a spatial weights matrix requires that testing and training data be based upon the division of the study area into two regions so that the spatial contiguity of the samples will not be disrupted by a commonly used spatially random selection scheme. Similar spatial autoregressive models were also developed using the other 1-D and 2D indicator variables, so that the results of these models could be directly compared.

Measures of Estimation Error To assess the accuracy of the population estimation models derived from the 102 census blocks in the training data, several error measures were computed based on the difference between the true population (P) and estimated population (P’) of the 73 census blocks in the testing dataset. These measures included root mean square error, adjusted root mean square error, mean absolute percentage error, median percentage error, and population weighted percentage error. Detailed equations for these measures are shown in Table 1. Root mean square error (RMSE) computes the square root of the mean squared error for the census blocks and is widely used to assess population estimation models (Fisher and Langford 1995). If the population of the blocks varies greatly, the blocks with high populations tend to have large estimation errors and therefore will heavily affect the RMSE measure, while blocks with low populations tends to have small estimation errors and less influence to the RMSE measure. For this reason, the adjusted root mean square error (Adj-RMSE) was proposed (Gregory 2000) which computes the square root of the mean squared percentage error so that the drastic impacts of the blocks with extremely large populations can be avoided. To avoid the possibility of accuracy overestimation due to cancelling out of negative and positive errors, the Mean Absolute Percentage Error (MAPE) Cartography and Geographic Information Science

Measures

Formula Nt

Root Mean Square Error (RMSE)

MedAPE and PWMAE are believed to be more objective error measures that are not heavily impacted by blocks with either extremely high or low populations.

RMSE =

∑ (P t '−Pt )

2

t =1

Nt

Results and Discussion 2

The results of the 3-D volume based model using the traditional OLS approach were compared with those Adj_RMSE = Nt derived from the OLS models based on conventional 2-D and 1-D population indicators, such as area of resiNt P '−P dential buildings, area of residential t t ∑ land use, and length of road netMean Absolute Percentage Error t =1 Pt work. Table 2 gives the coefficient MAPE = 100 × (MAPE) Nt of determination (R2 and pseudo R2), Akaike Information Criterion (AIC) computed from the training data, and the five measures of Median Absolute Percentage Pt '−Pt estimation error computed for the MedAPE = 100 × Median Error (MedAPE) testing data. Only the results of the Pt spatial error models are included, in line with the spatial diagnostics discussed below. Nt Pt '−Pt The R2 for the OLS models, the P ' × ∑ pseudo R2 for the spatial models, and Population Weighted Mean Pt t =1 t PWMAE = 100 × AIC all reveal the goodness of fit of Absolute Error (PWMAE) Nt a specific model to the training data. ∑P ' Specifically, R2 and pseudo R2 reflect i=1 t how much variation of the population Table 1. Error measures based on the difference between the estimated in the training area is explained by and the true population. its respective indicator variable, i.e. whether or not the variable is a strong population indicator. The Akaike can be computed, which takes the average of the Information Criterion is a model selection criteabsolute percentage error. rion based on the distance between the estimates The inclusion of percentage errors in Adj-RMSE, of the model and the true values, which allows and MAPE, however, may make them dispropormodels of different types to be compared directly tionally affected by the blocks with extremely small (Akaike 1974). A smaller AIC value indicates a population, because the same amount of estimation better goodness of fit and therefore the model error in population counts may lead to a larger with the least AIC is regarded as the best model. percentage error when the true population of the Since the OLS models report goodness of fit using block is small. To overcome the problem caused R2, while the spatial autoregressive models use by the impacts from both extremely large and pseudo R2, they are not directly comparable (Veall smaller values, Qiu et al. (2003) used the Median and Zimmermann 1996). Therefore, AIC is the Absolute Percentage Error (MedAPE), which is best measurement to compare OLS and spatial the 50th percentile of the local absolute percentautoregressive models. age error. Qiu et al. (2003) introduced another Among the four OLS models, the residential error measure called Population Weighted Mean building volume based model achieves the highest Absolute Error (PWMAE) which uses estimated R2 of 0.8361, indicating a strong fit for the training population as a weight to normalize the absolute data. The volume- based OLS model also demonpercentage error. This measure has the advantage strates consistently better population estimation of eliminating the effects of extreme populations. Adjusted Root Mean Square Error (Adj-RMSE)

Vol. 37, No. 3

N t (P ' −P ) ∑ t t t =1 P t

249

Models

R2/ PseudoR2

AIC

Building volume based

0.8361

Building area based

Moran’s I MedAPE PWMAE of residuals (p value)

RMSE Adj RMSE

MAPE

131.04

28.42

0.4023

28.34

18.69

27.10

0.1708 (0.0073)

0.8153

139.41

53.58

0.7268

56.08

37.99

57.18

0.1305 (0.0257)

Land use area based

0.5087

207.88

53.62

0.4381

39.72

43.49

38.32

0.4020 (0.0000)

Road length based

0.6433

185.48

244.50

0.9090

48.85

38.71

215.13

0.2235 (0.0011)

Building volume based

0.8498

128.84

28.17

0.2880

23.74

18.16

24.27

0.0203 (0.3317)

Building area based

0.8244

138.96

35.07

0.4840

33.86

20.59

34.92

0.0160 (0.3525)

Land use area based

0.6666

189.61

53.88

0.4400

39.98

43.82

38.55

-0.04463 (0.6456)

Road length based

0.7068

178.55

74.77

0.5460

50.24

57.40

62.78

-0.0337 (0.5945)

OLS

Spatial Error Models

Table 2. Comparison of different population estimation models. when applied to the testing area, evidenced by the apparently lower values for all five error measures, when compared with those of the 2-D area and 1-D road length based OLS models. It is also observed that the indicator variables based on the characteristics of individual buildings (building volume and area) have better goodness of fit measures than parcel- level land use area and road length. Since residential buildings can provide more precise information on the actual distribution of the population, while parcel- level land use and road network are only proxies of where the population may be located, it is understandable that the two building- level indicator variables can better explain the variations of the population in the training data. For the same reason, one would expect that these indicators would also produce better population estimation when the OLS models are applied to the testing area. However, this is only true for the buildingvolume based model, but not for the building area based model. As a matter of fact, out of the five measures of estimation error, the building area based model only has two measures (RMSE, MedAPE) smaller than those of the land use area based model, and four measures (RMSE, Adj-RMSE, MedAPE, and PWMAE) smaller than those of the road length based model. A high goodness of fit (R2 = 0.8153) obtained using training data accompanied by inconsistent error measures derived from the testing data are often an indication of possible over-fitting of the

250

model to the training data. Even with higher-resolution images, building area is not a stable indicator variable for population estimation. This is probably because “building area” covers only the area of the building footprint, rather than the total area of all the residential units within the building. Consequently, the population living in buildings with vertically stacked residential units is likely to be underestimated, and the population in one-story buildings overestimated. In contrast, the building volume measure, which embeds additional third dimension information on height, is the actual total volume of all the residential units in the building. Consequently, the ability of the LIDAR data to inform about the vertical stacking of the buildings makes volume a stable indicator of population at the residential unit level, which is the finest level of detail at which quantitative estimation of population can be currently achieved using remote sensing data. This explains why the building volume model has been demonstrated to be both a stronger population indicator and a more accurate estimator than the building area. Another interesting finding is that at the block level, the road length based model, when compared with that of the land use area based model, has lower accuracies in population estimation for the testing area, as revealed by the larger errors for four of the five measures (RMSE, Adj-RMSE, MAPE, and PWMAE). This disagrees with what was previously found at much coarser scales (Qiu et al. 2003), where at both the city and census tract

Cartography and Geographic Information Science

levels, population growth models based on road development were mostly better than those based on land use change detected in satellite imagery. The fact that road networks traverse cities and census tracts, but not always census blocks, may help to explain the contradicting results obtained at these two different scales. The use of high spatial resolution aerial photographs to derive parcel level land use as opposed to utilizing low spatial resolution TM satellite imagery to obtain urban land use may also contribute to the difference in findings. This clearly demonstrates the famous “modifiable area unit problem” in which the different results may be expected from data with different geographic units (here, census blocks versus tracts, or cities) or scales. In order to use the OLS models for real world population estimation, the models need to be validated for the OLS assumptions of normality and independence. To test for the normality of the log

transformed population counts, graphic diagnostics and a statistic significant test were conducted. For the graphic diagnostics, a normal QQ-plot of the regression residuals from the volume-based OLS model was produced (Figure 4). When the quantile of the regression residual is plotted against the quantile of the standard normal probability distribution, it is observed that the points approximately lie on a straight line, verifying that the residuals are normally distributed. To assess the statistical significance, the JarqueBera test that measures the departure from normality based on skewness and kurtosis was performed (Jarque and Bera 1980). The null hypothesis of the Jarque-Bera test is that the skewness and kurtosis are jointly zero, in other words the data tested comes from a normal distribution. This test resulted in a Jarque-Bera statistic of 1.8909 with a p value of 0.3885, failing to reject the null hypothesis at the 95 percent confidence level and

Figure 4. Normal QQ-Plot of the regression residuals of the OLS volume based model. Vol. 37, No. 3

251

therefore reconfirming that the distribution of the OLS residuals is normal. A set of spatial diagnostics described above were also used to test the independence assumption of the OLS models, based on both the dependant variable (i.e., log of population) and the regression residuals. Moran’s I for the dependent variable illustrated a significantly positive spatial autocorrelation in population (with a Moran’s I value of 0.5587 and p value of 0.0000). This concurs with the conclusions drawn from the visual examination of the block level population map (Figure 1). The volume indicator variable of the OLS model can partially account for the spatial autocorrelation in population. However, the residuals of the OLS model still present a significant level of positive spatial autocorrelation, having a Moran’s I value of 0.1708 and p value of 0.0073. The presence of spatial autocorrelation in the data violates the independence assumption of the OLS regression and demands explicit treatment with a spatial autoregressive model. In order to select between the spatial lag and spatial error models, LM and LM Robust tests were performed. The LM Lag test and LM Lag Robust test both resulted in positive test statistics (0.4063 and 0.1890 respectively) but, with p values of 0.5239 and 0.6637, respectively, they fell short of rejecting the null hypothesis of zero lag dependence at the 95 percent confidence level. The LM Error and LM Error Robust tests do however reject the null hypothesis of zero error dependence with significant p values of 0.0434 and 0.0494 respectively. The LM and LM Robust tests thus consistently indicate that the spatial error model is the preferred alternative to use for addressing the spatial autocorrelation in the data. Similar tests for the normality and independence assumptions were performed on the other OLS models (detailed results not shown). The normality tests confirmed that the regression residuals of the OLS models are all normally distributed. In addition, the spatial diagnostics detected the misspecification of these OLS models due to the presence of spatial autocorrelation and determined the spatial error model as the appropriate model for population estimation. The Moran’s I values in Table 2 show that while the residuals of the OLS models have a significant positive spatial autocorrelation, spatial error models show no signs of spatial autocorrelation of the model residuals, as evidenced by their low absolute Moran’s I values and large p values (>0.05). The spatial error models were built using the four indicator variables with the training data; 252

they were subsequently applied to the testing data for population estimation. A general trend similar to that for the OLS models was observed for the spatial autoregressive models, with the building volume based model providing the best results, followed by the building area based model. The building volume based model again achieved the highest goodness of fit (pseudo R2 = 0. 8498) and consistently produced the lowest values for all the five error measures, reconfirming volume as both the strongest indicator and the most accurate estimator for population. Models based on the characteristics of individual buildings (volume and area) outperformed models based on the characteristics of larger geographic units, not only in the models’ goodness of fit, but also in their estimation accuracy. In contrast to the OLS models, the building area based spatial model now has all five error measures lower than the road length based model and four error measures lower than the land use area based model, with the only exception being a slightly larger value in the adjusted RMSE, an error measure heavily affected by extreme values. This consistent improvement is attributable to the ability of the spatial error model to account for the autocorrelation in the residuals caused by unknown factors not included in the model. One of these unknown factors is obviously the aforementioned third dimensional building height information. The spatial error model based only on building area was able to partially compensate for the missing height information and achieve estimation accuracy closer to that of the OLS model based on building volume, which included the missing factor of height. The addition of the height information in the volume based spatial error model further enhanced the model’s ability in population estimation, evidenced by the reduction in values of error measurements. For example, the more objective PWMAE measure is now reduced by almost 10 percent (from 34.92 percent to 24.27 percent) when the additional height information is included. Compared to its OLS counterpart, the building volume based spatial model also exhibits improvement in estimation accuracy with a noticeable reduction in the error measures. The likelihood ratio (LR) test was conducted to statistically confirm the improvement of the building volume based spatial error model over its OLS counterpart. The LR, defined as (where L (υ) is a likelihood), follows a chi-square distribution with m degrees of freedom (where m is the difference between the number of parameters of the two models). The LR test statistic for the OLS Cartography and Geographic Information Science

Model

Building Volume Based

Building Area Based

Landuse Area Based

Road Network Based

Parameter

Intercept

Slope ( β )

Intercept

Slope ( β )

Intercept

Slope ( β )

Intercept

Slope ( β )

Estimate (p value)

-4.8559 (0.0000)

0.9627 (0.0000)

-3.9148 (0.0000)

1.0135 (0.0000)

-4.0417 (0.0000)

0.7618 (0.0000)

-5.4257 (0.0000)

1.3318 (0.0000)

Table 3. Parameters of the spatial error models.

and spatial error models can be calculated with the AIC values because AIC = −2·lnL + 2·k (where L is a likelihood value and k is the number of parameters). For the building volume based models, the LR value is 4.20 (i.e., AICols − AICsp − 2(kols − ksp)) with 1 (i.e., ksp − kols) degree of freedom, and its p value is 0.0403, showing a significant improvement of the spatial error model over the corresponding OLS model. Presumably, this improvement is achieved by having addressed the spatial autocorrelation in the residuals, which might have been caused by other unidentified factors. However, the enhancement in estimation accuracy for the two spatial models based on non-building characteristics is not as obvious or consistent as the building characteristics based spatial models. All five error measures for the land use area based spatial model were minimally higher when compared to its OLS counterpart, and three of the five error measures for the road length based spatial model were increased, including the more objective MedAPE measure (from 38.71 percent to 57.40 percent). Also, the road length based model had a higher goodness of fit than the land use area based model. However, its estimation accuracy is still lower than that of the land use area based model, as demonstrated by higher values for all five error measures—which is consistent with the findings obtained with the OLS models. These varying and sometimes inconsistent results seem to suggest that the effectiveness of spatial autoregressive modeling is greatest when applied to more fine-scaled data such as residential building units rather than larger-scaled measures such as parcel area and road network length. Among all the models, the volume based spatial model has the least AIC value, showing the best goodness of fit among all the models. When compared with their OLS counterparts, the AIC values of the spatial models are consistently smaller for all four population indicators. The improvement of goodness of fits in these spatial models is not a surprise because an additional factor in the form of a spatial component is included in Vol. 37, No. 3

the regression equations to explain the variation in the population. An examination of the regression parameters also helps to shed light on the spatial models (Table 3). The slope parameters ( β ) for all the spatial models were significant at the 95 percent confidence level. The volume based model had a slope parameter ( β ) of 0.9627, which is close to 1. The model based on building area had a similar slope parameter close to 1 (1.01435). These are consistent with the expectation that the development of residential buildings (and their associated volume and area) are fundamentally dependent on the degree of population growth in the area. The residential land use area based model has a lower value of 0.7618 for the slope parameter. This may be due to the fact that land use, once established, remains unchanged as more buildings are added to the established residential area to accommodate the growing population. The road length based model, however, is found to have a larger β of 1.3318, indicating a faster development rate for the road network than for population growth. This positive allometry is easy to comprehend since roads usually have to be constructed prior to the development of new residential communities. Further, road networks constantly expand in order to serve not only the residing population, but also the dynamic commuter population of a city. The above analyses clearly identify building volume as a strong population indicator that produces superior estimation results at the census block level, especially when a spatial error model is employed. To further verify the validity of this conclusion, two additional visual assessments were conducted. First, a scatterplot was drawn between the observed population and the estimated population derived by the volume based spatial error model from 175 blocks in the study area (Figure 5). This scatterplot reveals a general good fit between the two, except for a few outliers of overestimation and underestimation. To examine these outliers in detail, choropleth maps 253

Figure 5. Observed vs. estimated population from a volume-based spatial error model. of the observed and estimated populations were produced (Figure 6). There is an overall good agreement between the two maps, with a few large blocks being overestimated and small blocks being underestimated—similar to the result obtained from the scatterplot. A detailed scrutiny of these overestimated and underestimated blocks reveals that the outliers are primarily located at the edge of the study area, where the edge effect of spatial autoregressive models tends to be high due to the lack of neighboring blocks (Bailey and Gatrell 1995). The division of the study area into training and testing sections also caused edge effects along the division line, although less pronounced when compared to those along the boundaries of the study area. This is probably because the division line in this case is a U.S. highway, which imposed a natural barrier between the blocks on its two sides and led to reduced impacts from the blocks on the other side of the highway. The implicit assumption of a 100 percent occupancy rate for the buildings in this research might have 254

contributed to the overestimation of a few large blocks in the middle of the study area. Assuming the isometric relationship between building volume and population (defined by the β parameter), and the spatial autocorrelation pattern (defined by the ρ parameter), remain relatively constant over a period of 10 years, the derived volume based spatial model could be used to predict the population at any year. To this end, we applied the volume based spatial model developed above to the building volume data derived from the original building footprint and LIDAR data for 2006 and obtained a total population estimation of 17,245 people for the study area, reflecting a 44.62 percent population growth rate over the six-year period. This rate is comparable to the Census Bureau’s projected 40.95 percent growth rate for the entire Round Rock city, further supporting the effectiveness of the building volume based spatial model in providing an accurate estimation of population at any given year between two decennial censuses. Cartography and Geographic Information Science

Figure 6. Observed and estimated 2000 block level population derived from the volume based spatial error model.

Due to the lack of block level population data for 2006, the evaluation of the prediction was possible only at the aggregated level of the entire study area. Extending the model developed with data from a census year to predict the population at another year is based on the assumptions that the population density and the spatial configuration of the majority of the blocks do not change significantly between the two points in time (which would be less than 10 years), and the population increase results primarily from the expansion of new communities. The validity of these assumptions would be best verified with block level data from an intercensal year, but this is not normally available.

Conclusions Population estimation from remotely sensed data provides an alternative to complex and expensive commercial demographical models for estimating population at intercensal years. However, modeling population at such a fine spatial scale as the census block and utilizing the latest LIDAR technology to extract the characteristics of individual residential buildings and use them as indicator variables to estimate population have not yet been attempted. This study explored the modeling of population at Vol. 37, No. 3

the census block level by using building volumes which were derived from the footprint area extracted from high spatial resolution remote sensing imagery and building height obtained from LIDAR data. The analysis showed that the models based on volumetric information outperformed those based on conventional areal and linear indicators in both their goodness of fit and estimation accuracy, due to the inclusion of height information. This additional information has made building volume a more valuable population indicator at the residential unit level, the finest scale that can be achieved by remote sensing based population estimation. When the modeling of population moves to a finer scale level, the impacts of neighboring communities become more pronounced, a factor that has not been previously fully investigated. To address this issue, the study also examined the potential influence of spatial autocorrelation, the presence of which violates the independent assumption of traditional OLS models widely used previously. Spatial error models were selected as the appropriate spatial autoregressive models to account for the spatial autocorrelation among the OLS model residuals. The accuracy assessment of the results demonstrated that the performance of the spatial error models consistently surpassed that of their corresponding OLS models. Because of the 255

incorporation of spatial patterns into the modeling process, the volume-based spatial autoregressive model achieved a significant improvement over its traditional counterpart, demonstrating that the spatial autoregressive model was not only the correct model to use but also a more effective approach to finer scale population estimation.

REFERENCES Almeida, C.M., I.M. Souza, C.D. Alves, C.M. Pinho, M.N. Pereira, and R.Q. Feitosa. 2007. Multilevel object-oriented classification of quickbird images for urban population estimates. In: Proceedings of the 15th Annual ACM international Symposium on Advances in Geographic Information Systems, November 7-9, 2007, Seattle, Washington. GIS ‘07. ACM, New York, New York, 1-8. [DOI: acm.org/10.1145/1341012.13 41029.] Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6): 716-23. Anselin, L. 1988. Spatial econometrics: Methods and models. New York, New York: Springer. Anselin, L. 1996. Simple diagnostic tests for spatial dependence. Regional Science and Urban Economics 26: 77-104. Bailey, T.C., and A.C. Gatrell. 1995. Interactive spatial data analysis. U.K: Longman: Harlow. Barnsley, M.J., A.M. Steel, and S.L. Barr. 2003. Determining urban land use through an analysis of the spatial composition of buildings identified in LIDAR and multispectral image data. In: V. Mesev (ed), Remotely sensed cities. London, U.K.: Taylor & Francis. pp. 83-108. Brown, D.G. 1995. Spatial statistics and GIS applied to internal migration in Rwanda, Central Africa. In: S.L. Arlinghaus and D.A. Griffith (eds), Practical handbook of spatial statistics. Boca Raton, Florida: CRC Press. pp. 156-60. Burt, J.E., and Barber, G.M. (eds). 1996. Elementary statistics for geographer. New York, New York: The Guilford Press. pp. 425-500. Binsell, R. 1967. Dwelling unit estimation from aerial photography. Department of Geography, Northwestern University, Evanston, Illinois. Chen, K. 2002. An approach to linking remotely sensed data and areal census data. International Journal of Remote Sensing 23(1): 37-48. Couloigner, I., and T. Ranchin. 2000. Mapping of urban areas: A multi-resolution modelling approach for semi-automatic extraction of streets. Journal of the American Society For Photogrammetry And Remote Sensing 66(7): 867-75. Dueker, K., and F. Horton. 1971. Towards geographic urban change detection systems with remote sensing inputs. Technical Papers. 37th Annual Meeting, American Society of Photogrammetry. pp. 204-18. 256

Fisher, P.F., and M. Langford. 1995. Modeling the errors in areal interpolation between zonal systems by Monte Carlo simulation. Environment & Planning A 27: 211-24. Green, N.E. 1956. Aerial photographic analysis of residential neighborhoods: An evaluation of data accuracy. Social Forces 35: 142-7. Gregory, I.N. 2000. An evaluation of the accuracy of the areal interpolation of data for the analysis of longterm change in England and Wales. Geocomputation, 2000, Proceedings of the 5th International Conference on GeoComputation. Manchester, U.K. Gurram, P., H. Rhody, J. Kerekes, S. Lach, and E. Saber. 2007. 3D scene reconstruction through a fusion of passive video and LIDAR imagery. Applied Imagery Pattern Recognition Workshop, 2007.. 36th IEEE. pp. 133-8. Hadfield, S.A. 1963. Evaluation of land use and dwelling unit data derived from aerial photography. Chicago Area Transportation Study, Urban Research Section, Chicago, Illinois. Haining, R. (ed.). 1990. Spatial data analysis in the social and environmental sciences. Cambridge, U.K.: Cambridge University Press. Harvey, J.T. 2002. Population estimation models based on individual TM pixels. Photogrammetric Engineering and Remote Sensing 68(11): 1181-92. Haverkamp, D. 2004. Automatic building extraction from IKONOS imagery. In: Proceedings of ASPRS 2004 Conference, Denver, Colorado, May 23-28, 2004. Hongjian, Y., and Z. Shiqiang. 2006. 3D building reconstruction from aerial CCD image and sparse laser sample data. Optics and Lasers in Engineering 4(6): 555-66. Hoque, N. 2008. An evaluation of population estimates for counties and places in Texas for 2000. In: S.H. Murdock and D.A. Swanson (eds), Applied demography in the 21st century. Dordrecht, The Netherlands: Springer. pp. 125-48. Jarque, C.M., and A.K. Bera. 1980. Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters 6: 255-9. LeSage, J., and R.K. Pace. 2009. Introduction to spatial econometrics. Boca Raton, Florida: CRC Press. Li, G., and Q. Weng. 2005. Using Landsat ETM+ imagery to measure population density in Indianapolis, Indiana, USA. Photogrammetric Engineering and Remote Sensing 71(8): 947-58. Lisaka, J., and E. Hegedus. 1982. Population estimation from Landsat imagery. Remote Sensing of Environment 12(4): 259-72. Liu, X., K. Clarke, and M. Herold. 2006. Population density and image texture: A comparison study. Photogrammetric Engineering and Remote Sensing 72(2): 187-96. Lo, C.P., and R. Welch. 1977. Chinese urban population estimates. Annals of the Association Cartography and Geographic Information Science

of American Geographers. Association of American Geographers 67(2): 246-53. Lo, C.P. 1989. A raster approach to population estimation using high-altitude aerial and space photographs. Remote Sensing of Environment 27(1): 59-71. Lo, C.P. 1995. Automated population and dwelling unit estimation from high resolution satellite images: A GIS approach. International Journal of Remote Sensing 16(1): 17-34. Lo, C.P. 2006. Estimating population and census data. In: M.K. Ridd and J.D. Hipple (eds), Remote Sensing of Human Settlements. ASPRS 5: 337-73. Lo, C.P. 2008. Population estimation using geographically weighted regression. GIScience & Remote Sensing 45(2): 131-48. Lu, D., Q. Weng, and G. Li. 2006. Residential population estimation using a remote sensing derived impervious surface approach. International Journal of Remote Sensing 27: 3553-70. Lwin, K. and Murayama, Y. 2009. A GIS Approach to Estimation of Building Population for Micro-spatial Analysis. Transactions in GIS, 13(4): 401-414. Maas, H.-G., and G. Vosselman. 1999. Two algorithms for extracting building models from raw laser altimetry data. ISPRS Journal of Photogrammetry and Remote Sensing 54: 153-63. Moran, P.A.P. 1950. Notes on continuous stochastic phenomena. Biometrika 37: 17-23. Nordbeck, S. 1965. The law of allometric growth. Discussion Paper 7, Michigan Inter University Community of Mathematical Geographers. Ann Arbor, Michigan: University of Michigan. Ogrosky, C.E. 1975. Population estimates from satellite imagery. Photogrammetric Engineering and Remote Sensing 41: 707-12. Porter, P.W. 1956. Population distribution and land use in Liberia, Ph.D. thesis, London School of Economics and Political Science, London, UK. p. 213. Qiu, F., K. Woller, and R. Briggs. 2003. Modeling urban population growth from remotely sensed imagery and TIGER GIS road data. Photogrammetric Engineering and Remote Sensing 69: 1031-42.

Vol. 37, No. 3

Sampath, A., and J. Shan. 2007. Building boundary tracing and regularization from airborne LIDAR point clouds. Photogrammetric Engineering and Remote Sensing 73(7): 805-12. Smith, A.S. 1998. The American community survey and intercensal population estimates: Where are the crossroads? Technical Working Paper 31. Washington, D.C.: U.S. Census Bureau, Population Division. December 1998. Tobler, W.R. 1969. Satellite confirmation of settlement size coefficients. Area 1: 30-34. Tobler, W.R. 1970. A computer movie simulating urban growth in Detroit region. Economic Geography 46: 230-40. U.S. Census Bureau. 1994. Census blocks and block groups. Geographic Area Reference Manual. [http:// www.census.gov/geo/www/garm.html]. Veall, M.R., and F.K. Zimmermann. 1996. Pseudo R2measure for some common limited dependent variable models. Journal of Economic Surveys 10(3): 241-59. Weber, C. 1994. Per-zone classification of urban land use cover for urban population estimation. In: G.M. Foody and P.J. Curran (eds), Environmental remote sensing from regional to global scales. Chichester, UK: New York, New York: Wiley. pp. 142-48. Wu, C., and A.T. Murray. 2007. Population estimation using Landsat enhanced thematic mapper. Geographical Analysis 39: 26-43. Wu, S., X. Qiu, and L. Wang. 2005. Population estimation methods in GIS and remote sensing: A review. GIScience & Remote Sensing 42: 80-96. Wu, S., L. Wang, and X. Qiu. 2008. Incorporating GIS building data and census housing statistics for subblock-level population estimation. The Professional Geographer 60(1): 121-35. Xie, Z. 2006. A framework for interpolating the population surface at the residential-housing-unit level. GIScience & Remote Sensing 43(3): 233-51. Zhang, Y., and T. Maxwell. 2006. A fuzzy logic approach to supervised segmentation for objectoriented classification. ASPRS Annual Conference, Reno, Nevada.

257