Proximity Operators for Qualitative Spatial ... - Semantic Scholar

3 downloads 0 Views 88KB Size Report
Object-Oriented GIS. Proc. Spatial Data Handling '94, (Ed. Waugh T C and. Healey R G), pp 988-1001, Edinburgh, Scotland. Gapp K-P (1994), A Computational ...
Proximity Operators for Qualitative Spatial Reasoning Mark Gahegan Department of Geographic Information Systems, Curtin University, PO BOX U 1987, Perth 6001, WA, Australia. tel: +619 351 3309, fax: +619 351 2819, email: [email protected]

ABSTRACT One way to increase the power of Qualitative Spatial Reasoning is to introduce proximity operators (such as close and far) that are surrogates for distance measures. These operators appear to be semi-quantitative in nature as opposed to purely qualitative. In the light of observations drawn from psychometric testing of perceived proximity, this paper discusses how a model to support proximal reasoning could be constructed. The relationships between the model and the raw data are described. Fuzzy set membership is used to reason about the degree of closeness. The formulation of queries involving proximity is presented, with the meaning of linguistic variables being instantiated within a given context at execution time.

1

Introduction

Qualitative Spatial Reasoning is concerned primarily with the abstracted relationships between objects in space, as opposed to the underlying geometry defined by the raw spatial data. Consequently, it is centred around topological relationships [Egenhofer and Franzosa, 1991], [Smith and Park, 1992], [Molenaar, 1994]. Qualitative spatial reasoning offers a means of transforming all relationships in the data to give a view of the universe of discourse (U) that is centred around a single object, here called the reference object R. In the discussion that follows, relationships are defined between geographic features of interest (here termed objects). For simplicity, each object is considered to be defined in two-dimensional space by a single point o(i,j). The reference object is similarly defined, R(h,k). This 'egocentric' view of the world can be very useful as a means of constructing queries concerning relationships between a particular object of interest and other objects. These relationships can be expressed as: "Which objects are East-of R?" or "Which objects overlap with R" An example query might be:

"Find a petrol station East of current location" (where 'current location' is set to be R). A key reason for the importance of qualitative spatial reasoning is that it allows a user to express queries in a way that is much more intuitive; that is, relationships inherent within the data are used, without the user becoming swamped in the actual values (co-ordinates and angles) that comprise the raw data. To expect a GIS user to be familiar enough with the raw data to reason using only geometry would seem to be asking too much in all but a few cases. Qualitative spatial reasoning is offered as an alternative, supplementary, means of working with the data. Some precision is lost by moving away from absolute values, but this may be compensated by the increase in accessibility afforded to the user. As discussed by Frank, [1992], there is evidence that qualitative reasoning follows along similar lines to the thought processes of humans. Furthermore, as pointed out by Sharma et al. [1994], qualitative spatial reasoning separates the absolute numeric properties of the data from the "determination of magnitudes and events, which may be assessed differently depending on the context". The implication is that if a model for context can be created, then the qualitative relationships can be defined in the light of this model, whereas quantitative values must remain fixed. The spatial reasoning is then to some extent adaptive to the context of the current task, and hence may appear more intuitive to the user. The support of context for proximity forms the main subject of this paper. 1.1 Proximity and Context The idea that one object has a qualitative relationship by distance to another is an important concept, since it can be used as a predicate on which to relate together objects in a more natural way than resorting to geometry. The example given above can then be further qualified: "Find a petrol station South of and close to current location" Indeed, it has been argued that the directional operators alone are insufficiently powerful, and need to be supplemented by a qualitative proximity measure [Gapp, 1994]. This allows more of the underlying data to be utilised in analysis and therefore increases the qualifying power of the reasoning. Most topological operators are strictly qualitative, and their definition, once given, is fixed. However, the concept of proximity does not appear to behave in this fashion, but is instead highly dependent on the context within which it is applied. It will be argued in this paper that proximity depends on many factors, including the spatial distribution of the data under consideration, and various aspects concerning the current task. In the light of this dependency, it might be more accurate to refer to proximity as semi-quantitative [Roberts and Gahegan, 1991], [Gahegan, 1994]. The definition of a semi-quantitative operator allows adaptation to the data under consideration.

1.2 Recent Research on Proximity In order to use proximity as a qualifier in spatial reasoning, distances between objects must be described by linguistic variables such as close and far. These variables correspond to some type of distance metric. Two alternative approaches are given by Frank [1992], and Dutta, [1990]. In the former, a tolerance space and relation are used, with distances being ranked into a number of intermediate steps from 0 to n - 1. In the latter, a fuzzy membership function is constructed, μF : U → [0, 1], where μF (u) denotes the degree of membership (between zero and one inclusive) of u in the fuzzy set F. In both of the above approaches, all objects can be graded according to their distance from R. Proximity is described here by a fuzzy membership function, since there are well defined methods for reasoning with data in this form [Zadeh, 1975]. One possible mapping to linguistic variables is: 0.0 → 0.1 0.1 → 0.3 0.7 → 0.9 0.9 → 1.0

very far far near very near

Note that a value of 1.0 signifies co-incidence, and a value of 0.0 signifies maximal separation. Two other important linguistic variables are used to represent the extrema; namely nearest and farthest. These are analogous to the SQL functions MIN and MAX. An alternative approach to that developed here is taken by Robinson [1990], in which a fuzzy spatial relation is learnt via interaction with a computer in a question and answer style dialogue. The notion of proximity is constructed implicitly, but the results are equally valid. Proximity appears to be a richer concept than is currently modelled, and there is a danger that the problem has been over-simplified to some extent. Proximity measures that are included in spatial reasoning must behave in a way that follows a human perception of proximity. Failure to achieve this goal will result in counter-intuitive interaction with the GIS, with a consequence that unreliable results will be produced. There are some issues that remain to be addressed, namely: •

How do humans reason about proximity?



How does scale affect proximity?



Does the nature of the task to be undertaken have any effect?

1.3 Psychometric Testing of Proximity Various psychometric studies have been conducted to assess how humans make subjective judgements regarding distances across a wide range of different data domains, including geographical space [Lundberg and Ekman, 1973], [Guttman, 1968]. In order to gain further understanding as to how humans judge proximity, a simple test was conducted on fifty subjects, each of whom was asked to comment on the closeness of a series of objects to a reference point. Further details of the test can be found in Appendix A. Whilst it is not claimed that these tests are conclusive in any way, some interesting points were observed, namely: (1) In the absence of other objects, subjects reasoned about proximity in a geometric fashion, and furthermore, the relationship between distance and proximity can be approximated by a simple linear relationship. (2) When other objects of the same type are introduced, proximity is judged in part by relative distance, that is, the distance between an object and R is modified according to the distances from other objects to R. (3) Distance is affected by the size of the area being considered, or alternatively, it has some relationship to perceived scale. The above observations are discussed in the section that follows and their consequences are formulated into a contextual model for proximity.

2

A Model Supporting Context in Proximal Relationships

There are several issues that need to be addressed before measures of proximity can be defined at all. Not least of these is the formulation of a metric that can be used to measure proximity given a set of objects. 2.1 Absolute Distance Metrics Observation (1) suggests that, in the simple case, proximity is directly proportional to distance and that proximity can be modelled adequately with a linear Euclidean distance metric. So, for two dimensional data:

(

)

ρ o(i, j ) , R( h, k ) ∝

( i − h ) 2 + ( j − k )2

To give an actual numeric value to proximity, it is necessary to compare the distance from o to R with some sort of maximal distance, M. This could be measured in a variety of ways, for example as the distance to (i) the furthest object, (ii) the furthest corner of a bounding rectangle, or (iii) the distance given by the diagonal of a bounding rectangle. Figure 1 shows these options as 1, 2 and 3 respectively, with the reference point denoted by (♦). From the test results obtained, the most appropriate

measure would seem to be 3, the diagonal of the bounding rectangle. This represents the maximum possible separation between o and R.

d

b a

1

e

2 c

‹ 3

Figure 1. Options for computing maximal distance (M)

If the size of the bounding rectangle is described by (0 ,0 : x, y), the proximity (ρ) of object o to R is given by:

(

)

ρ o(i, j ) , R( h, k ) = 1 −

( i − h) 2 + ( j − k )2 ( 0 − x ) 2 + (0 − y)2

When using an absolute approach, whilst there is a continuum for proximity, (very close > close > far > very far) this cannot be extended to include closest and farthest. This is because it is entirely possible for a point to be the closest but not close, or farthest but not far. As a consequence, there is a loss of qualifying power. The fact that a point is close says nothing about the distribution of points in general; that is, it is not close compared to other points. A distribution function for geometric proximity is shown in Figure 2. Based on the observed results, the function is modelled simply as a linear relationship, where the diagonal distance M defines the extent of the X axis. This approach has the advantage from a computational perspective that proximity has a fixed relationship to the distance between o and R only, and so the insertion and deletion of objects will not alter the quality of relationships at all. However, from the point of view of the user, this can be somewhat counter-intuitive.

1 0.9

Fuzzy Membership

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 max

far

close

0

0

Geometric Distance

Figure 2. A possible fuzzy distribution function for absolute distance

2.2 Relative Distance Metrics In the presence of other objects, observation (2) implies that modifications to proximity measures must be made to account for the patterns of distribution of these objects. There are several possible methods for calculating relative proximity, ranging from simple ordinal techniques to point distribution functions from spatial statistics.

The simplest measure is an ordinal approach. The n objects under consideration are simply ranked according to their distance from R, with the most proximal objects being first (R is assigned the zeroth ranking). Using the mapping to linguistic variables given in Section 1.2, an object (o) is considered close to R according to: very close (o, R): ranking (o, R) ≤ (n * 0.1) close (o, R):

ranking (o, R) > (n * 0.1) ∧ ranking (o, R) ≤ (n * 0.3)

far (o, R):

ranking (o, R) ≥ (n * 0.7) ∧ ranking (o, R) < (n * 0.9)

very far (o, R):

ranking (o, R) ≥ ( n * 0.9)

closest (o, R):

ranking (o, R) = 1

farthest (o, R):

ranking (o, R) = n

This measure works well if the distribution of objects is fairly even, and provided the number of objects is not too small ( n < 10 ). One obvious disadvantage however, is that objects can be separated from R by large distances, but still be regarded as close, if there are enough objects even further from R 1 .

1A more sophisticated approach is to use the statistical distribution between points to calculate

proximity. Nearest Neighbour distribution function such as those described by Cressie [1991]

The fuzzy distribution function used is shown in Figure 3. For the present it is kept deliberately simple as a linear relationship, and is similar to Figure 2, but in this case, the extent of the X axis is defined by n. 1 0.9

Fuzzy Membership

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 n

far

close

0

0

Relative Distance

Figure 3. A distribution function for relative distance

2.3 A Combined Proximity Measure The above two approaches represent extremes in the way in which proximity can be measured; the first relies entirely on absolute distance, the second entirely on relative distance. A more intuitive approach makes use of both approaches together. A number of methods for combining fuzzy values exist, for example the fuzzy AND, fuzzy OR, and fuzzy Algebraic Product. By choosing such a method, the data can be examined for points that are both close in a geometric sense and/or close in a relative sense.

The membership function for the sets absolute proximity (Pa) and relative proximity (Pr) are applied to all objects {o} with respect to the current reference point. With the distribution functions described above, the fuzzy union operator produces useful results.

are perhaps most suitable, where the probability of finding a neighbour in a circle centred on R, of radius r is modelled. An example would be:

G(r ) = 1 − exp( − λπr 2 ),

r ≥ 0

where G(r) is the probability that the distance from R to the nearest object is less than or equal to r, and λ is the Poisson intensity function.

Pa U Pr ≡

{ (o, max( μ ( o), μ ( o))) } a

r

That is to say, an object is considered to be close if it is either geometrically close OR relatively close. However, as the model for proximity is developed, it will probably become necessary to adopt a more subtle method for evidence combination, such as the Gamma Operator (Zimmermann & Zysno, 1980), which allows all contributing evidence to effect the result, rather than being entirely dictated by the most extreme values. 2.4 The Dynamic Nature of Spatial Data It follows from Observation (2) that the relative nature of proximity may lead to inconsistencies if the data under consideration changes, since the addition, deletion or displacement of objects will change the distribution of the point set.

From a database perspective, the GIS is a dynamic environment. Features of interest may change shape, relocate, and they may not be persistent over time; that is, they may begin their existence at some point in time, and end it at another. The point being made here is that any pre-defined notion of proximity may need to adapt to changes in the data under investigation. It is therefore not sufficient to define an initial proximity zone and to expect its validity to remain constant. Rather it is more appropriate to take the opposite approach; that is to calculate proximity only at execution time, when all influencing factors are known, and can be taken into account. Obviously the calculations involved place some additional burden on the computer, but the compensation is that the proximity measures are guaranteed to be as up-to-date and appropriate as the current raw data allows. 2.5 The Effect of Scale Observation (3) indicates that proximity may be interpreted as a function of scale. What might be considered close at one scale may also be considered far at another. For example, there is no logical inconsistency in the following:

London (UK) is close to Paris (France) London (UK) is far from Perth (Australia) and London (UK) is close to Perth (Australia) London (UK) is far from Sea of Tranquillity (Moon) A good deal of research has been conducted into the problems associated with scale changes, and specifically into providing seamless and continuous views of space as scale is changed {Roberts et al., 1991], [Richardson, 1994]. Indeed, these notions have become established in some commercial systems (for example Smallworld GIS). It is also desirable to include the same flexibility in qualitative spatial reasoning.

Scale in isolation only defines a ratio of displayed data to on-the-ground distances. Before proximity measures can be calculated, it is also necessary to know a frame of reference against which all distances can be compared. A useful default is to adopt the extents of the current viewing window onto the underlying data; that is, the area that the user is currently viewing on the screen. Assuming that this area is rectangular, the theoretical maximum distance between two points is given by the diagonal extent of the area, and used as described in Section 2.1. The area selected defines the objects under consideration, in this case via the use of a simple point in polygon intersection algorithm. All included objects form a subset which is then used to calculate Pa and Pr, with respect to R. 2.6 The "Attractiveness" of Objects Although scale would appear to have the major effect, it is also true that a user's perception of what might be considered close is not consistent amongst all pairs of object types, and would appear to change according a perceived attractiveness. For example, "close to the shops" may be 1 km or less, whereas "close to the Toxic Waste Dump" may be 10 km or less, at the same scale. One way of coping with this is to allow the object type to affect the steepness of the absolute fuzzy proximity distribution. A steeper gradient effectively reduces the distance that is considered to be close, and a shallower gradient increases it. Before this concept may be used effectively, a much greater level of understanding is required regarding a user's perceptions of the real world. For now, this aspect of proximity is noted but not implemented. 2.7 The Effect of Reachability Proximity may not simply imply closeness in a geometric sense, but also closeness topologically. For example, it does not matter that a is close to b, if one is trying to get to b from a by train and there is no track. In other words, there may be an imposed transportation network that constrains movement between objects. Distances must then be calculated along this network.

From an implementation perspective, reachability causes little difficulty. Instead of calculating a straight-line distance between points, a network distance is computed, using any imposed restrictions on accessibility, flow, and direction of links. In some static applications, it may be worth storing the pre-calculated distances in a matrix for re-use, since their computation can be expensive.

3

A Definition of Proximity with Context

To summarise, the concept of proximity does not appear to be fixed, but rather it is defined by: • The (sub) set of objects under consideration. • The path of connection or route between objects. • The spatial distributions inherent in the actual data (at a specific time).

• •

The scale at which the data is being viewed. The attractiveness of objects.

The objects under consideration form a set {o} of type O. This set may be restricted using any valid qualifying clauses (for example, see Worboys et al., [1991]), so that only objects with certain properties are considered in calculating proximity. The relative positions and number of objects in {o} define relative proximity. The bounding rectangle is supplied either as part of the query or as a default from the current viewing area. The bounding box and the geometric positions of the objects define absolute proximity. The distance metric is defined in the query as "Euclidean", or "Network" (although others are possible). Objects with a positive or negative attraction should be noted as such in the object definition, although no allowance for this is currently made. To paraphrase, the context is taken to be "Is object o of type O close to R, using the absolute and relative distances from o to R, where distance is calculated using a specified method, within the current region of interest". As a simple example, consider the query: "Which petrol stations are close to R and South of R by road?" The query is executed in two stages. The concept of proximity as modelled here is semi-quantitative and hence driven from the data. So, before the query can be executed, it is necessary to instantiate the CloseTo relation from the data using the current context as given by: close:O = CloseTo (o:O, {o}, R, {x1,y1,x2,y2}, DistanceMethod) The resulting set of objects is denoted as close, and is also of type O. The final stage is to join {close} with the other query qualifiers via:

{

p: petrol station ∃ c: close (southof ( p, R) ∧ c = p)

}

Concepts such as 'South of' can also be expressed in fuzzy terms, allowing a straightforward combination of set membership to be applied when formulating the result [Dutta, 1990]. In the above example, the expression can be optimised so that only the objects in {close} are tested for also being South. Note however, that the entire set of objects was used to calculate the context for {close}. A different result may be obtained if {close} was instantiated after the qualifier 'South of' had been applied to {p}, since the context then becomes close with respect to petrol stations South of R. Exactly how, or indeed if, this additional flexibility should be presented to the user is a perplexing issue that requires further investigation.

4

Conclusions

There is an established need to support proximity measures as a part of qualitative spatial reasoning. Psychometric testing has implied that humans do not judge proximity in a purely Euclidean manner, with the consequence that measuring and modelling proximity is not a straightforward process. There would seem to be strong justification for adopting a semi-quantitative approach where relationships are only calculated when the context of the problem is fully understood, i.e. at query execution time. A model for such a context has been developed and presented. The model attempts to account for absolute and relative methods of judging distances, and the effects of scale, data and task. It is based on the results of preliminary tests and so may need altering and/or refining in the light of further investigation. The tests conducted thus far are only exploratory in nature, and additional, more rigorous tests must be constructed before any definite conclusions should be drawn concerning human perception of proximity. Specifically, further testing will focus on the distribution functions described in Sections 2.1 and 2.2, and also on the combination of evidence in Section 2.3. It may also be appropriate for an automated learning strategy to be applied, or to make use of expert judgement about what might be considered close at a particular scale.

5

References

Cressie N A C (1993), Statistics for Spatial Data. John Wiley and Sons, USA, Ch. 8. Dutta S (1990), Qualitative Spatial Reasoning: A Semi-Quantitative Approach Using Fuzzy Logic. In: Design and Implementation of Large Spatial Databases (Eds. Buchmann A, Günther O, Smith T R and Wang Y-F) Springer-Verlag, New York, pp 345-364. Egenhofer M J and Franzosa R D (1991), Point-Set Topological Spatial Relations. Int. J. Geographical Information Systems, Vol. 5, No. 2, pp 161-174. Frank A U (1992), Qualitative Spatial Reasoning about Distances and Directions in Geographic Space. Journal of Visual Languages and Computing, Vol. 3, pp 343371. Gahegan M N (1994), Support for the Contextual Interpretation of Data Within an Object-Oriented GIS. Proc. Spatial Data Handling '94, (Ed. Waugh T C and Healey R G), pp 988-1001, Edinburgh, Scotland. Gapp K-P (1994), A Computational Model of the Basic Meanings of Graded Composite Spatial Relations in 3D Space. Proc. Int. Workshop on Advanced Geographic Data Modelling (AGDM '94), (Ed. M Molenaar and S De Hoop), pp 66-79, Delft, Netherlands.

Guttman L (1968), A General Nonmetric Technique for Finding the Smallest Coordinate Space for a Configuration of Points. Psychometrika, Vol. 33, No. 4, pp 469-506. Lundberg U and Ekman G (1973), Subjective Geographic Distance: A Multidimensional Comparison. Psychometrika, Vol. 38, No. 1, pp 113-122. Molenaar M (1994), A Syntactic Approach for Handling the Semantics of Fuzzy Spatial Objects. Proc. European Science Foundation, GISDATA, Baden, Austria. Richardson D E (1994), Generalization of Spatial and Thematic Data Using Inheritance and Classification and Aggregation Hierarchies. Proc. Spatial Data Handling '94, (Ed. Waugh T C and Healey R G), pp 957-972, Edinburgh, Scotland. Roberts S A, Gahegan M N, Hogg J and Hoyle B S, (1991), Application of ObjectOriented Databases to Geographic Information Systems. Information and Software Technology, Vol. 33, No. 1, pp 38-46. Roberts S A and Gahegan M N, (1991), Supporting the Notion of Context Within a Database Environment for Intelligent Reporting and Query Optimisation. European Journal of Information Systems, vol. 1, no 1, pp 13-22. Robinson V B (1990), Interactive Machine Acquisition of a Fuzzy Spatial Relation. Computers and Geosciences, Vol. 16, No. 6, pp 857-872. Sharma J, Flewelling D M and Egenhofer M J (1994), A Qualitative Spatial Reasoner. Proc. Spatial Data Handling '94, (Ed. Waugh T C and Healey R G), pp 665-681, Edinburgh, Scotland. Smith T R and Park K K (1992), Algebraic Approach to Spatial Reasoning. Int. J. Geographical Information Systems, Vol. 6, No. 3, pp 177-192. Worboys M F, Hearnshaw H M, Maguire D J (1991), Object-Oriented Data and Query Modelling for Geographical Information Systems. Proc. 4th International Symposium on Spatial Data Handling, (Ed. Brassel K & Kishimoto H), Dept. of Geography, University of Zurich, Switzerland, Vol. 2, pp 679-688. Zadeh L A (1975), The Concept of a Linguistic Variable and its Application to Approximate Reasoning. Information Sciences, Vol. 8 pp 199-249 (part I), Vol. 8 pp 301-357 (part II), Vol. 9 pp 43-80 (part III). Zimmermann H J and Zysno P (1980), Latent Connectives in Human Decision Making. Fuzzy Sets and Systems, Vol. 4, pp 37-51.

Appendix A: Details of the Psychometric Testing A group of 50 subjects took part in a test designed to investigate how humans reason about proximity. The subjects all had some practical exposure to the use of GIS.

About half of those tested were using GIS in relation to their employment, the other half were full-time students. There did not appear to be any significant differences between the two groups based on employment, but there were some differences observable based on experience with handling geographic data. Specifically, the more experienced subjects tended to respond with more weight given to relative proximity than their less experienced colleagues. Subjects were asked to rate diagrams similar to that shown in Figure 4, according to how close they judged point (e) was to R, shown as (♦). The distance to e was varied (linearly), and averaged responses produced are shown graphically in Figure 6. For the most part, the relationship appears to be linear, (the rightmost value in the graph may be giving a lower result due to proximity with other objects and the edge of the diagram). The response for the distance between e and R in Figure 4 averaged 0.67. This can be contrasted with Figure 5, where the actual distance between e and R is the same but additional, closer points have been inserted. The response in this case was significantly lower at 0.50. This, and similar observations, indicate that a relative measure for proximity is being used, at least in part. The effect of changing the size of the study region was also examined, but the results were problematic. A modified Figure 4 was shown, with the only difference being that the bounding box itself was enlarged, whilst the pattern of objects was kept constant. Responses fell into two categories, some judging point e to be closer, and others further away. A possible explanation is that some subjects judged the distance from e to R in terms of the boundary, and others judged it in terms of relative distance to other objects. Further investigation is needed to fully explore these trends.

d b c a

e

‹

Figure 4. Sample test diagram, ρ (e, R) = 0.67

d b c a g i

‹

e

f

h

Proximity measure [0-1]

Figure 5. Extra points added, ρ (e, R) = 0.50

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1

2

3

4

Linear Distance of an object to R

Figure 6. Results of assessing proximity to R from Figure 4.