A Comparison of Three Exploratory Methods for ... - Wiley Online Library

15 downloads 7493 Views 1012KB Size Report
lyzing spatial point data suggest that interest in spatial point pattern analysis ... examine the detection of clusters using exploratory techniques (Openshaw et al.
A. Stewart Fotheringham and F. Benjamin Zhan

A Comparison of Three Exploratory Methods for Cluster Detection in Spatial Point Patterns

This paper compares the performances of three exploratoy methods f o r cluster detection in spatial point patterns where the at-risk population is known. After reviewing two existing methods, Openshaw et al. (1987) and Besag and Newell (1991), an alternative method is introduced. These three methods are then compared empirically using two point patterns drawn from a disaggregate housing database consisting of 28,832 observations. Each observation in the data set contains attributes of single-family detached dwellings in the City of Amherst, New York. This paper provides some new insights into the performance of the three methods, as previous applications have used spatially aggregated (and hence rather inaccurate) data. The paper also demonstrates the utility of GZS for this type of spatial analysis. 1. INTRODUCTION

The analysis of spatial point patterns has long been an important concern in geographical inquiry [see, for example, Boots and Getis (1988) and the references therein]. The availability of georeferenced point type data in digital form and the advantages that geographical information systems (GIS) offer for analyzing spatial point data suggest that interest in spatial point pattern analysis will increase. Indeed, there has been much recent interest from researchers across several disciplines (Clayton and Kaldor 1987; Openshaw et al. 1987; Stone 1988; Doll 1989, Gardner 1989; Hills and Alexander 1989; Wheldon 1989; Cuzick and Edwards 1990; Besag and Newell 1991), and particularly in The idea for this paper began as part of Research Initiative #14, “Spatial Analysis and GIS,” of the National Center for Geographic Information and Analysis in the United States, supported by a grant from the National Science Foundation (SBR-88-10917). Continued support for A. Stewart Fotheringham was provided by the North-East Regional Research Laboratory in the United Kingdom and for F. Benjamin Zhan by a faculty research enhancement grant from Southwest Texas State University. The authors thank Dr. Barry Lentnek for allowing them to use the Amherst housing data; David Phillips and Martin Camacho for their assistance with the data set; and Professor Stan Openshaw for his comments. Generous help from Fuxiang Xia and Ge Lin is also greatly appreciated.

A. Stewart Fotheringham is professor of geography at the North-East Regional Research Laboratory, University of Newcastle. F. Benjamin Zhan is assistant professor of geography and planning, Southwest Texas State University. Geographical Analysis, Vol. 28, N o . 3 (July 1996) 0 1996 Ohio State University Press Final version accepted 12/20/94.

A. Stewart Fotheringham and F. Benjamin Zhan

/ 201

the study of spatial patterns of disease (Marshall 1991). Given the rich availability of data in GIs, and the nature of spatial point pattern analysis, where the underlying statistical assumptions are often hard to specify and selection biases are usually present (Besag and Newell 1991), it seems particularly important to examine the detection of clusters using exploratory techniques (Openshaw et al. 1987; Besag and Newell 1991). Two broad categories of point patterns can be identified: those for which the at-risk population is known, and those for which the at-risk population is unknown. While it is recognized that there may be many situations where the atrisk population is unknown, such as the occurrence of the lighting strikes, for example, this paper concentrates solely on the former. The reason for this is that knowledge of the spatial distribution of the at-risk population allows more interesting clusters to be distinguished from those that arise purely from spatial variations in the density of the at-risk population. For example, a map of the incidence of some disease is relatively uninformative if the underlying distribution of the population is unknown: “clusters” of the disease will inevitably appear in areas of high population density. The geographically interesting question is not “where is the sample clustered?” but “where is the sample clustered relative to the population?’ Regardless of the specific technique used for cluster detection, the general procedure for hypothesis testing is basically the same: a null hypothesis (Ho) and alternative (research) hypothesis ( H I ) are specified; a test statistic is computed from the observed point pattern; and a technique is chosen for assessing the significance of the statistic. Ideally the test statistic should be computed from a comparison of the observed points and the underlying at-risk population. This is a problem if data are aggregated to a certain level as by Openshaw e t al. (1987) and Besag and Newell (1991) where the observed cases and the population at risk are aggregated into census enumeration districts (EDs) or census tracts and georeferenced to the centroids of these zones. The purpose of this paper is to compare the performance of three exploratory methods used for detecting clusters in spatial point patterns using examples from a file containing georeferenced data on 28,832 houses in Amherst, New York. We will first give a brief review of the existing exploratory methods for cluster detection in section 2. Section 3 presents an alternative method to those that currently exist. The design of the empirical research is presented in Section 4 and results discussed in Section 5 . Conclusions are drawn in Section 6. 2. TWO EXISTING METHODS FOR DETECTING SPATIAL POINT CLUSTERS

Reviews on general point pattern analysis can be found in Ripley (1981), Diggle (1983), and Upton and Fingleton (1985), and those particularly related to geographical research can be found in Boots and Getis (1988). Reviews of the methods used for the analysis of clusters in spatial point patterns concerned with disease are provided by Hills and Alexander (1989) and Marshall (1991). Because our concern here is with the detection of clusters in spatial point patterns using exploratory methods, the literature review is focused on such methods. The first attempt for detecting spatial point clusters using exploratory methods is the Geographical Analysis Machine (GAM) developed by Openshaw et al. (1987). For convenience, the method will be called the Openshaw method hereafter. GAM consists of four components: “( 1)a spatial hypothesis generator, (2) a procedure for assessing significance, (3) a GIS to handle retrieval of spatial data, and (4) a geographical display and map processing system” (Openshaw et al. 1987, p. 338).

202

Geographical Analysis

The technique used by Openshaw et al. (1987) is illustrated in Figure 1. First, a universe of all possible circle-based hypotheses are generated using the following algorithm. (1) Construct an initial grid over the area of interest, and define the minimum, maximum, and the incremental value of radii of the circles to be located at the intersections on the grid. The length of each side of the grid and the radius of a circle are chosen in such a way that the initial grid lattice is sufficiently fine-grained and that the circles can overlap to a large degree. (2) For a constructed grid mesh and a determined circle size, move the circle in such a manner that it is located on each grid intersection systematically. Compute the test statistic for each circle at each grid intersection. If the test statistic passes the significance test (see below), the location and the circle are stored for later visualization. (3) Increase the radius of the circle by the specified increment, and accordingly construct a new grid mesh. (4)Repeat steps 2 and 3 until the radius reaches the maximum value. Openshaw et al. (19871, in their CAM, used Monte Carlo simulation to assess significance. Circles are located systematically in the study area as discussed above. The “count of observed cases” within each circle is used as the test statistic. That is, the “count of observed cases” within a circle for the observed point data is compared with the “count of simulated cases” in the circle for each of the a - 1 sampled data sets. The circle and its location are recorded if and only if the “count of observed cases” in the circle for the observed point data is the largest one among the a test statistics. For a - 1 simulated sample data sets, the significance level is Using a Monte Carlo significance test has a number of advantages as described by Hope (1968, p. 582). Essentially, the technique is assumption free and can always be used when underlying distributions are unknown or when the necessary conditions for applying a test are not met. It may also be used when only vague alternative hypotheses exist and when only a vague definition of the test criteria can be given. However, as identified by Besag and Newell (1991) and Marshall (1991), there are weaknesses in the method used by Openshaw et al. (1987). One such weakness is that “there is no control for multiple testing both locally and globally” (Besag and Newell 1991, p. 148). The global aspect means that clusters may be produced by chance alone when the circle used is large. The local aspect is related to the problem that the change of radius and the shifts in location are not taken into account in the calculation of the significance levels. Secondly, it is very difficult to calculate the observed cases and to define the population at risk within the circular area given that the data are aggregated into irregular districts. More recently, Besag and Newell (1991) propose a method (hereafter referred to as the Besag method) that avoids some of these deficiencies. In the Besag method, under the null hypothesis Ho it is assumed that the total observed cases in a circle (defined in the same way as in the Openshaw method) are located randomly among the population at risk (pm)with the mean probability P,,, = % (nis the total number of observed cases and N is the total population at risk). The probability of observing exactly x cases among the population at risk can then be approximated by the Poisson term:

i.

e-’AZ

- for X!

~ = 1 , 2 , 3... ,

where A = Pmean x pm. It follows that, for a prespecified value k, the probability of observing k or more than k cases among p,,, is

Q

nput the observed data (number of observations = n

1

+ +

input the data of population at risk (number of observations = N )

for significance test at level a, randomly sample 1 / (a-I) data sets from the population at risk, and make sure that each data set contains n observation

set the minimum, maximum and incremental values for the radii of the circles

obtain the radius of a circle and construct a grid mesh so that the length of the side of a cell in the grid mesh is some fraction of the radius of the circle

move the circle in such a way so that each time the circle is located on one of the intersections of the grid mesh consecutively

compute a test statistic for the observed data within the circl I

t ompute a test statistic for each of the 1 / (a-I) sampled data sets within the circle

I

t se Monte Carlo significance test to assess the significanc

+ store the circle and the location

6 FIG.1. Openshaw et al.’s Procedure for Cluster Detection

204

/ Geographical Analysis

This formula is used to calculate the significance level for each potential cluster. In this method, the cluster is detected based on whether an observed case forms the center of a cluster of cases through examining the number of nearest zones, M , given that a prespecified accumulated k cases are observed in the M zones. Suppose that at least one case is observed in zone i = 0, labeled Ao. In order to check if there is a cluster around Ao, all other zones are labeled Ai,i = 1,2,. . . , sequentially, the sequence depending on the distance between the centroid of a zone i # 0 and the centroid of zone i = 0. Let xi be the observed number of cases in zone i, yi be the population at risk in zone i. The accumulated number of cases and accumulated number of population at risk in the zones can be defined respectively, as follows:

Let

M = min(i : Di 1 k)

(5)

where k is a predetermined number of observed cases (for example, k = 4) and M is defined in such a way that zones Ao, . .. ,A M contain at least k cases. When the value of M is small, it is indicative of a cluster around Ao. It should be noted that Di and pi as defined in (3) and (4)are slightly different from the definition in Besag and Newell (1991) in that no observed case is discounted. Formulas (3) and (4)are used in the experiment conducted in this paper because of the use of disaggregated data. To understand the physical base behind this method, one has to appreciate that the observed cases and the population at risk are aggregated data and are georeferenced to the centroids of zones [census enumeration districts (EDs) or census tracts] distributed over the study area. If individual data were available, the method would be more subjective because of the lack of predefined zones. The method can also be criticized because the value of k is chosen in an ad hoc manner, although the results for different values of k can obviously be displayed to obviate this problem. In each example presented by Openshaw et al. (1987) and Besag and Newell (1991), the data used are aggregated into census tracts or enumeration districts. The observed cases and population at risk are georeferenced to the centroids of these zones. There is an obvious fundamental problem for computing the test statistic and defining population at risk for any given circle when the Openshaw method is used (Besag and Newell 1991) because the computation is based on the census enumeration districts (EDs) whose centroids are within the circle. This apparently does not reflect the situation in reality. It would be more desirable to conduct the analysis using the true coordinates of the observed cases and the population at risk using disaggregated data.

A. Stewart Fotheringham and F. Benjamin Zhan

/ 205

3. AN ALTERNATIVE METHOD FOR DETECTING SPATIAL POINT CLUSTERS

A third method for detecting spatial point clusters is introduced in this paper. The procedure for the method is illustrated in Figure 2. It differs from the Openshaw method in basically two respects: the location and size of a circle are determined randomly within specified ranges, and the Poisson probability distribution is used directly for assessing significance. Let the total population at risk in the area of interest be N and the total number of observed cases (with a particular attribute) be n; then the mean probability of observing a case in the entire area is

For any circle whose location and radius are determined randomly, the number of cases (2)and the population at risk (y) within the circle can be obtained. The expected number of cases (A) in the circle then can be determined as:

The probability P ( z l A ) for observing exactly 2 cases in the circle with expected cases A can then be determined using the Poisson distribution (Getis and Boots 1978, p. 19):

Two methods can be used to assess the significance. In the first, P ( z lA) in (8) is directly used as the measurement of significance. That is, if P(x,Iz) < U, where u is a prespecified level of significance, the radius and location of the circle generating P ( z lA) are stored. The second method of significance testing that can be applied is that adopted in the Besag method described above. The only difference here is that the locations and radii of the circles are determined randomly, and k is the number of observations with a particular attribute that lie within each circle. In what follows, the results of the two significance testing procedures are very similar so only the results of the first one are reported. Hereafter, this third method is called the Fotheringham and Zhan method, or Fotheringham method for short. 4. RESEARCH DESIGN FOR COMPARING THE THREE METHODS

The methods described above for detecting point clusters were coded in C and linked with ARC/INFO 6.1, running on a SUN workstation. The purpose of this section is to discuss the experimental design for testing these methods in terms of the preparation of data and the choice of search parameters in the programs. 4.1 The Preparation of Data

In this study, the objects under investigation are houses in Amherst, New York. A master database consisting of 28,832 houses (observations) is constructed, stored, and managed using the ARC/INFO database. The data in the master database is derived from a data file containing information about the single-family detached dwellings in the City of Amherst, New York. These houses are geocoded using two-dimensional coordinates, and the locations of

nput the observed data (number of observations = n

I input the data of population at risk (number of observations = N)

i compute the mean probability: n / N

J set the minimum and maximum values of the radii of the circle

4

randomly select the radius of a circle within a specified range of radius value

1 randomly locate the circle in the area of interest

1

(compute the number of points from the observed data within the circle: x)

t compute the number of population at risk within the circle

I

compute the expected number of points in the circle using the mean probability and the population at risk in the circle

1 compute the probability of observing x points in the circle using the Poisson distribution and the expected number of points

& store the circle and the location

/ sufficient . number of circles seeded?

FIG.2. Fotheringham and Zhan's Procedure for Cluster Detection

A. Stewart Fotheringham and F. Benjamin Zhan / 207 all 28,832 houses are shown in Figure 3. In addition to the x, y coordinates, other attributes such as age, type of construction, quality, and price are available for every observation. The 28,832 houses can be regarded as the “population at risk” in the area and houses with various attributes are drawn from this population. Polygons of census tracts covering Amherst are also created and added to the map, but are used solely for visualization. Two data sets were drawn from the total population at risk. Data set 1 consists of houses whose overall construction quality is rated in the lowest category (1-5) and contains the 277 points mapped in Figure 4. Data set 2, illustrated in Figure 5 , consists of two hundred randomly selected points from the master database. In Figures 4 and 5, various clusters seem to be present although different clusters may be apparent to different people. It is also not clear whether a cluster is worthy of further investigation because it results from some clustering process or whether it is simply a reflection of the distribution of the underlying “at-risk” population. For example, Table 1 shows that the use of standard tests such as the variance-to-mean ratio and nearest neighbor analysis indicate extremely strong clustering in both spatial distributions (one of which is a random drawing) in Figures 4 and 5. This is because the distribution of the at-risk population in each case is strongly clustered but this is ignored in the calculation of the variance-to-mean and nearest neighbor statistics. These statistics cannot differentiate between distributions that are clustered because of the distribution of the underlying population and those that exhibit clusters that are geographically interesting. Conversely, there might well be points that do not appear to merit investigation when examined without exogenous knowledge, and only appear as significant clusters when compared to the “at-risk” population. The three methods described above are designed to remove these problems by using information on the at-risk population to automate the identification of spatial clusters that warrant further geographic investigation.

4.2 The Choice of Search Parameters Before each program is run, a number of search parameters must be decided. One parameter that is used in all three methods is the minimum number of observed cases to be considered in a circle. This parameter is set to one (1) for the Fotheringham and Openshaw methods, which means that significance assessment is conducted as long as there is at least one case within a given circle at a particular location. Because the Besag method directly uses a prespecified number (Ic), the minimum value of k is set to two (2), following the experiment conducted by Besag and Newell (1991). Other search parameters to be set are the minimum and maximum radii of the circles. After a number of trials, the minimum radius is set to 182.88 meters (600 feet), and the maximum radius is set to 640.08 meters (2,100 feet). In Amherst, the expansion in east-west direction is 10882.5 meters (35,704 feet), and 15199.1 meters (49,866 feet) in north-south. It should be pointed out here that the range of the radii is data dependent and only becomes clear after several trials. Circles that are too small will not detect the extent of large clusters and may miss clusters all together, while circles that are too large risk hiding variations at smaller scales. This is one of the reasons why exploratory data analysis is important, and the optimal choice of the parameters remains subject to further investigation. For the Openshaw and the Besag methods, one other parameter, the increment of the radius, has to be determined and based on the results of a number of trials, it is here set to 76.2 meters (250 feet). Various tests of sensitivity of each of the three methods are employed. For

208

Geographical Analysis

FIG.3. Houses in the Study Area (At-Risk Population)-The Master Database

the Openshaw method, the number of simulated sample data sets are chosen as 19, 49, 99, 199, 499, so that they are equivalent to significance levels of 0.05, 0.02, 0.01, 0.005, 0.002, respectively. A significance level of 0.001 is not used for the Openshaw method because of the computer time required to investigate 999 simulated samples. The Besag method is sensitive to the value of Ic and six values are reported ( I c = 2 , 3 , 4 , 5 , 6 , and 7), all at the 0.05 significance level. In the Fotheringham method, significance levels are set to 0.05, 0.02, 0.01, 0.005, 0.002, and 0.001 and clusters are displayed at each of the levels. Because data set 2 is a sample drawn randomly from the at-risk population and hence is subject to sampling variation, ten such samples are drawn and

209

FIG. 4. Test Data Set Category (1-5)

Lowest

FIG.5. Test Data Set 2: Randomly Selected Points from the At-Risk Population

results are reported for the average of all ten. For the Openshaw and the Besag tests, 29,218 circles were seeded for each random sample at each significance level and the proportion of circles displayed (that is, those containing significant clusters of points) is calculated for each significance level in the case of the

210 / Geographical Analysis TABLE 1 Classical Point-Pattern Analysis Results for Data Sets 1 and 2 Data Set 1 (Figure 4)

Variance-Mean ratio t value R (nearest neighbor)

21.24 171.17 0.54'

Data Set 2 (Figure 5)

3.39 18.54

0.66'

NME: 'significantly different from 1.0 at the 99 percent confidence level.

Openshaw method and for each value of k in the case of the Besag method. For the Fotheringham test, five thousand circles were seeded and the proportion of circles displayed is calculated for each significance level. The Fotheringham method uses a random placement of circles and hence fewer circles need to be seeded. 5. RESULTS

5.1 Visualization In order to demonstrate the relative performance of the three techniques of automatic cluster detection, the results (all retained circles and locations) were displayed using the ARC/INFO GIS and are reported in Figures 6 and 7. Both figures refer to the respective data displayed in Figures 4 and 5 and both are composed of three sets of circles derived from the Openshaw, Besag, and Fotheringham methods, respectively. Every circle represents a significant cluster of points using the significance testing procedures described above. Figures 6a-c contain results using data set one defined as houses with the lowest construction quality ranking and Figures 7a-c show the results from the two hundred points in data set two which are drawn randomly from the master

IignifKUre lewl= 0.01

observed point pattern aignifiancc lcvd = 0.002

FIG.6a. Detecting Spatial Point Clusters Using the Openshaw Method: Actual Point Pattern

/ 211

A. Stewart Fotheringham and F. Benjamin Zhan

k=2

k=3

k=4

M

observed point pattern k=S

k=6

k=7

FIG.6b. Detecting Spatial Point Clusters Using the Besag and Newell Method: Actual Point Pattern

I

significance ~ w =u 0.00s ~

significance level = 0.02

significanec level = 0.01

significance lcvel = 0.002

significsnec levcl = 0.001

FIG. 6c. Detecting Spatial Point Clusters Using the Fotheringharn and Zhan Method: Actual Point Pattern

database. Both sets of figures contain a separate window displaying the point pattern on which the results are based. Figure 6a shows the results of applying Openshaw’s method to data set 1. It is clear that the technique identifies a large number of clusters, especially at traditional significance levels (0.05 and 0.01) where almost every point appears as a

212 J Geographical Analysis

I

M

M

significance level = 0.0s

significance level = 0.02

M

M

significance lcvcl

-

0.01

M

observed point pattern

FIG.7a. Detecting Spatial Point Clusters Using the Openshaw Method: Random Point Pattern

k-2

k=3

k-4

observed point pattern k-S

k=6

k=l

FIG. 7b. Detecting Spatial Point Clusters Using the Besag and Newell Method: Random Point Pattern

significant cluster. Even at significance levels as extreme as 0.002, the technique identifies large numbers of clusters. The Besag method, the results of which are shown in Figure 6b, is much more conservative although the results are clearly dependent on k , a predefined number of points within a circle. Above k = 4, the technique picks out only the clusters of points in the southern part of the

A. Stewart Fotheringham and F. Benjamin Zhan J 213

significance level

0.05

significance level = 0.02

significance level

* 0.01

M

I

observed point pattern

I I

significance level = 0.005

significance level = 0.002

significance level = 0.001

FIG. 7c. Detecting Spatial Point Clusters Using the Fotheringham and Zhan Method: Random Point Pattern

map and only two areas of the map appear to have interesting clusters. The Fotheringham methodology appears marginally more selective than Openshaw’s at more extreme levels of significance but essentially depicts similar results. One general finding is that the techniques would appear to be more useful when used with an extreme significance level such as 0.001 so that a limited number of significant clusters is identified. At less extreme values, the techossibly identify too many clusters to be useful as an exploratory tool. niques To p a c e the above results in perspective, each of the three methods is applied to a set of points randomly drawn from the at-risk population and these results are shown in Figures 7a-c. It is important to emphasize that the point pattern in this data set does not appear random because the distribution reflects the distribution of the at-risk population from which the sample is drawn. Given that the population is nonrandomly located in space, the sample is similarly spatially nonrandom. A logical test of each method is therefore to see whether it can separate a visual cluster from a geographically interesting one, the latter being a set of points that is significantly more clustered than the distribution of the underlying at-risk population would suggest. The Openshaw technique performs slightly less satisfactorily in this regard: clusters appear at significance levels even as extreme as 0.002. It is more difficult to evaluate the Besag technique because although clusters are identified at all values of Ic, the significance level is 0.05. The Fotheringham technique identifies relatively fewer clusters and identifies none at a significance level above 0.005. These results are encouraging because if in the random samples geographically interesting clusters can be separated from clusters that result merely from the distribution of the underlying population, clusters that are identified in the nonrandom distributions can be treated as “geographically interesting.” For instance, the results of the Fotheringham method at significance levels 0.002 and 0.001 with the random data suggest that the clusters identified from this method in data set 1 are of geographic interest in that they probably arise

P

214

Geographical Analysis

TABLE 2 Performance Indicators of the Three Techniques on Ten Random Samples Significance level or k value

Number of circles seeded

Average number of circles displayed

Average proportion of circles displayed

a. Openshaw et al. method ,050 ,020

.010 .005 .002

29218 29218 29218 29218 29218

395.5 147.9 74.5 37.1 10.9

,01314 ,00506 ,00265 .00127 .00036

b. Besag and Newell method (significance level: 0.05) 2 3 4 5 6 7 .050 .020 ,010 .005 .002 ,001

29218 29218 29218 29218 29218 29218

288.1 178.1 98.5 62.2 41.0 26.6

c. Fotheringham and Zhan method 5000 120.0 5000 5000 5000 5000 5000

38.3 18.5 7.2 3.0 0.1

.00986 ,00609 .00337 .00213 .00140 .00091 ,02400 ,00766 .00370 .00140 .00060 .00002

from a spatial process and not from variations in the underlying population density. 5.2 A Further Test Based on Random Distributions A further test of the three techniques is undertaken by examining the performance of each technique on ten different random drawings from the at-risk population. These results are summarized in Table 2 where each of the ten distributions consists of two hundred randomly drawn points. Table 2a contains the results of the Openshaw technique applied to each of the ten distributions. At each significance level, 29,218 circles are seeded and the average number of these circles that are displayed (and hence contain a significance cluster of points) is given in column 3. These average frequencies are converted to average proportions in column 4. Given that the distributions are random drawing from the at-risk population, a comparison of these proportions across the different techniques yields some insights into the probability of each technique identifying false positives (although it says nothing about failure to identify real positives). Unfortunately, the Besag results in Table 2b depend on the value of k, the minimum number of points within a circle, and so are not directly comparable. The results for the Fotheringham technique, shown in Table 2c, result from only five thousand seeded circles at each significance level because in the technique the circles are seeded randomly, whereas in the other two techniques the circles are uniformly placed over the studying area. The results for all three methods are encouraging in that the average proportion of circles displayed is always less than half the significance level (circles are displayed only when a significantly larger number of points is observed than would be expected). The Besag procedure is particularly impressive when it is noted that the proportions are all calculated at a significance level of 0.05. The results again suggest that the circles identified at extreme significance levels in

A. Stewart Fotheringham and F. Benjamin Zhan

/ 215

Figure 6a-c would therefore seem to represent the outcomes of some interesting geographic processes. It is useful to note that the Fotheringham method appears to be less sensitive than the other two methods at low levels of significance but is more sensitive at higher levels of significance. This suggests that the simpler procedure of randomly assigning circles (the Fotheringham method) works just as well as comprehensively covering the study area, and may in fact be more selective when extreme significance levels are used.

5.3 Sensitivity to Circle Definition All three methods of point pattern analysis depend upon a definition of circle size. The above results, for instance, are for circles that have a radius between 182.88 and 640.08 meters. In order to examine the potential sensitivity of the results to this definition, some other ranges were selected and the methodology described in section 5.1 repeated. The results of one significance level corresponding to data set one are shown in Figures 8a-c. Each technique has a similar sensitivity to circle definition in that as the circles increase in size, the circles in which significant clusters occur increasingly overlap and give an exaggerated appearance to a cluster. That is, regardless of the effect on statistical detection, varying the size of circles used affects the perception of the results. Given that all three techniques are intended to be used in an exploratory mode, this perceptual sensitivity needs more attention. It could be argued that an advantage of exploratory techniques is that analyses can be undertaken under many different conditions and in this case maps can be reported with different circle ranges. 6. CONCLUSIONS

The increasing prevalence of GIS technology and the concomitant access to disaggregate spatial data sets will lead to a greater demand for automated cluster detection techniques. Such techniques have obvious applications in the

183 m 5 R < 2SVm

observed point pattern

41 I rn 5 R < S64 m

FIG.8a. Detecting Spatial Point Clusters Using the Openshaw Method: The Effect of Circle Size

216

1

Geographical Analysis

183 m

-

< R < 259 m

259m5R