A Model To Support the Integration of Image Understanding ...

University of Leeds

SCHOOL OF COMPUTER STUDIES RESEARCH REPORT SERIES Report 93.17

A Model To Support the Integration of Image Understanding Techniques within a GIS* by Mark Gahegan & Julien Flack Division of Operational Research and Information Systems

April 1993

*Presented at the 25th International Symposium on Remote Sensing and Global Environmental Change, Graz, Austria, 4-8th April 1993.

ABSTRACT Traditionally, GIS have used data generated from remote sensing, but only after the data has been pre-processed in some way to provide a suitable classification and/or segmentation. This leads to several weaknesses, the resulting system is inflexible and cannot support multiple (possibly overlapping) segmentations of the same area. We describe a new GIS under development that includes a set of image understanding algorithms, that work to extract features of interest from the raw images. The algorithms employed are chosen by the system to best emphasise the types of features (rivers, fields, forests etc.) that the user is currently investigating. The GIS model has been extended to allow multiple representations of the same features to co-exist, enabling the image segmentation to be adaptive and not fixed.

1.0

SYSTEM OVERVIEW

Existing GIS often incorporate remotely-sensed data in the form of a single classified overlay of a scene that is formed according to established pixel-based rules. The overlay is formed externally to the GIS, often with the aid of an image processing system. A good summary of this process is given by De Cola (1989). The resulting overlay has general applicability to many types of study, but is not specifically targeted at any one problem. This is because the binding between the image and the overlay is fixed at execution time, and this places limits on the flexibility of the system. The flexibility issue is similar to that of (say) a network database when compared to a relational database. In most implementations of the relational model, the bindings between the various layers are not fixed until query execution time, giving rise to greater interpretative power. In contrast, a conceptual model such as the network or hierarchical model supports only fixed, pre-defined access paths, thus limiting the type of queries that can be solved. In order to provide an image interpretation more directly related to the current task, we must defer the choice of interpretation until the task is known, i.e. at execution time. In this manner we can provide (or make) the most suitable interpretation possible, given our understanding of the domain. We call such an interpretation an image view. As a practical example, consider a GIS applied to hydrological modelling. For such a task we may require data concerning the stream and river network. Features such as these do not show up well in many remotely-sensed images, such as LANDSAT TM, because they occupy a width of less than one pixel in the image. After classification has taken place they often become even more obscure. Their presence may be indicated in the raw image (depending on the scale and resolution) only as an aliasing or edge effect. In order to emphasise this type of feature we must apply an entirely different set of tools to the task of image interpretation. This in turn produces an image view where these features will be emphasised, and other (irrelevant) features may be less well defined or even absent. It is this view that we use to form the current spatial description of the features under investigation.

1

To support this functionality we extend the GIS model in the following two ways: Firstly, a series of image understanding tools is included within (or made accessible to) the GIS. Secondly, rules are required to describe the manifestation of the feature-types in the image data. These rules must encompass such properties as spectral reflectance, shape and scaling behaviour. In our model this knowledge forms part of the definition of each feature-type. At the external level (that at which the user operates), the system presents the data as a set of geographic features of certain pre-defined types which the user manipulates to form maps. Various attribute-types, properties, methods and behaviour are associated with the featuretypes in an object-oriented manner (Gahegan and Roberts, 1988) (Van Oosterom and Van Den Boss, 1989). As such the aspatial data is associated with an instance of a feature, and not simply with a spatial description. The spatial description is therefore just one more attribute of a feature, which in this case can assume more than one value. In fact its value is not fixed until query time, when it is instantiated from a particular image view deemed by the rule base to be appropriate. 1.1 SYSTEM ARCHITECTURE The system is built as a series of layers, each one being a further abstraction of the raw image data, in much the same way as is found with conventional (relational) databases. However in our case, the mappings between the physical data and the conceptual model of the domain are necessarily more complicated. At execution time the subject of the current query or operation is found (this is similar to setting the context of a query). Next, the system must supply an appropriate view of the image data that provides the best possible emphasis on the subject. This is achieved by examining the stored rules concerning feature-type occurrences, and using them to form the image view before the query is processed. In practice, views that are created for a particular task are normally saved for future reference, along with a header that describes how they were formed. Consequently the system first checks to see if an appropriate view exists before creating one. Within this model there is an extra layer of indirection which is placed between the conceptual model and the physical image data, as shown in figure 1. Its purpose is to provide a mapping between the raw data and the geographical features of interest to the user. The interface between the user and the data is consistent and independent of the type or source of the data. The model is an expansion of the general three layer architecture for relational databases given by Korth and Silberschatz (1989) that is often used to describe conventional (non-spatial) database models. Note that other types of spatial data such as classified overlays and feature boundaries can also be included directly into the model as view data and feature data. Most of the feature extraction tools employed produce representations of the data that are in themselves inconclusive, that is they do not offer a perfect segmentation. Consequently, several different techniques are often used together with the resulting evidence being combined by a process of relaxation to improve the quality of the interpretation, as shown in figure 2. The intermediate results, which are themselves image views, may also be saved since they may be useful in identifying other feature-types. The end result of this stage is that a derived spatial representation becomes associated with each feature instance. For regional features this will most often be in the form of a set of pixels, however the representation can

2

be given as a set of vectors if this is more appropriate (for linear features, or for operations where vectors are more desirable for computational reasons). For many applications it is only necessary to provide a single view of a feature-type, but there are two cases where multiple representations are appropriate. The first is in situations where the spatial descriptions of the features being considered are subject to uncertainty, either because the data is not accurate enough or the perimeter is not well-defined. In this case we can introduce bounds on the uncertainty by producing (say) a minimum and maximum likely extents for the spatial representation. These can then be used in a best-case worst-case analysis. Hence we can model the effects of using different interpretation methods on the same feature. The second use of multiple representations is where the change in features over time is to be studied. Each image is tagged with a time value that is used to order the temporal data. Queries can then be restricted to use views derived from a particular time window. This allows the system to model the effects of landuse change, provided that the temporal views can be reconciled to one another. Further details of the model are given in Gahegan (1993).

User 1

User 2

External Level

User 3

Conceptual Features Level feature extraction feature selection

View Level

Image View view extraction Image Data

view selection Physical Feature Level Data

View Data

Figure 1. Overview of Layers and Bindings

2.

THE SCENE UNDERSTANDING TOOLS

From the viewpoint of image understanding, the design of an interactive system incorporating image analysis tools within a GIS provides a goal oriented approach that is lacking from systems that view image understanding as a separate preprocess. A problem that is often encountered in image segmentation is that of specifying accurately what constitutes a feature (Marr, 1982). Subsequently such interpretations are often regarded as non-purposive segmentations or classifications, which do provide a general overview of the image contents, but which may fail to satisfy the specific requirements of the user. Based on the information supplied by

3

the image, the user and a knowledge base (or long term memory) we provide the context in which to apply a purposeful segmentation. An outline of the structure of the image understanding mechanisms can be seen in figure 2, which shows the derivation of a view to emphasise a particular featuretype by a combination of image evidence. Solid lines indicate the flow of data, dashed lines indicate the flow of knowledge. The low level analysis essentially focuses attention to areas of the image that are interesting with respect to a particular image property of the feature-type under investigation. For example, an intensity transition, which may indicate a boundary between adjacent features. This evidence is then combined to form a classification of the image. Entities identified by this process may then be subject to manual or automatic evaluation.

Raw multiband image data

Intermediate Results

Edge evidence

Spectral evidence

Combination of evidence

Knowledge Base

Evaluation and Interaction

Figure 2. Image View Derivation 2.1 LOW LEVEL ANALYSIS Edge evidence is widely used in computer vision and image processing in order to segment images. A survey on edge detection is reported by Torre and Poggio (1986). The basic concept involves marking edges at positions in the image at which there are significant changes in intensity. The problem with this definition is specifying what a significant change is. In computer vision the problem may be may be addressed by scale space analysis in which edges are marked only if they occur at several resolutions within the image. In an interactive environment we can use the information regarding the resolution of the image and the knowledge relating to the scale of the feature-type we are interested in to determine what constitutes a significant intensity change. For example, given a query concerning roads within a Landsat TM data set we would look for small scale (aliased) edges. We may also associate structural edge information with a feature-type i.e. are the edges usually 4

sharp step edges, indicating a sudden change or blurred ramp edges, indicating a more gradual change. In remotely sensed data we are usually presented with a number of image bands; the application of standard edge detection processes is therefore not straightforward. Multi-spectral edge detections, as discussed by Nagao and Matsuyama (1980), may be used to address this problem. However in an interactive environment we may use knowledge about the feature-type of interest to select the most appropriate single band within which to evaluate the edge evidence. Following the previous example the system would select a near-infrared band (LANDSAT TM band 4) in order to detect roads, as this is where they are often most prominent. The extraction of other feature-types may involve combining edge evidence from two or more spectral bands. The following analysis is based on two datasets. The first is a Landsat TM scene taken over an area near Minehead, West Somerset, England. The second is a Landsat MSS scene of the Penobscot drainage basin taken over Maine, New England. Figure 3a shows band 4 of the Minehead scene, and figure 3b shows band 4 of the Penobscot scene. Both images have been subjected to various registration processes and contrast stretched to improve their visual appearance. In figure 4a large scale edges are detected by means of a Canny detector (Canny, 1986), applied to band 5 of the Minehead data set. Figure 4b shows a Canny detection set to identify smaller edge structures, and applied to band 4 of the same scene. Notice that large features such as the coastline (at the top of the image) appear clearly in figure 4a, but not in figure 4b. Small scale features are only apparent in figure 4b. A

B

Figure 3. Initial Images of a) Minehead and b)Penobscot

5

A

B

Figure 4. Canny Edge Detection Edge detections such as the Canny detector are performed by examining small, local windows of the image. Errors may therefore occur due to noise and texture. Edge post-processing is used in order to reduce such classification errors (Hancock and Kittler, 1989). Edge post-processing propagates the constraint that edge points should be connected to other edge points to form line segments. An edge has an associated certainty, which initially depends upon the strength of the intensity transition at that point. Edge certainties are subsequently processed by a relaxational technique; edge points which are isolated (and are therefore likely to be due to noise or texture) have their probabilities reduced, whereas edge points which are neighbours with other edge points (and are therefore likely to be connected to a valid line segment) have their probabilities increased. The images depicted in figure 5 indicate how a relaxational edge enhancer may increase the edge response for points which lie on connected line segments. Figure 5a shows a close up of detected edges, figure 5b shows the same area after post-processing has occurred. Notice how the features of interest are now emphasised much more strongly.

A

B

Figure 5. Edge Enhanced Image

6

2.2 IMAGE CLASSIFICATION Multi-spectral data is a good source of information regarding the classification of remotely sensed images. The relationship between spectral responses and the corresponding ground cover is generally known (Lillesand and Kiefer, 1979). Such information may be applied in a variety of approaches, ranging from simple spectral classifiers (Duda and Hart, 1973) to more sophisticated expert and knowledge-based systems (Wharton, 1987), (Ton et al., 1991). In the following experiments, a database administrator or expert user has provided spectral knowledge of the relevant feature-types in the form of training-sets. A training-set is associated with each feature-type and is used to identify areas of the image that are likely candidates for a given feature-type. A probabilistic classification technique is used here (Eklundh et al., 1980). When using this technique, the spectral signature of a point in the image is compared to the multidimensional cluster formed by the training set. The distance between the cluster centroid and the spectral signature of the image point provides a measure of similarity which is used to indicate the certainty of a point belonging to a particular feature-type. Figure 6 shows the probabilistic classification of the Minehead scene, based on :a training-set constructed from known (mainly deciduous) forested areas. The :lighter the intensity the more likely the pixel represents forest cover.

Figure 6. Probablistic Classification for Forest

3.0

Figure 7. Forest Regions

COMBINING EVIDENCE

By combining evidence from a number of low level processes we are reducing the reliance of the interpretation on one particular property of a feature-type, and thereby improving the quality. For example, most natural forest features have a relatively poor spectral discrimination (caused by irregular tree density), yet may still be identified by virtue of their boundary definition with the surrounding areas. Such concepts form the basis of recent work in the attempt to unify results from low level process to form a more accurate image segmentation (Pavilidis and Liow, 7

1990). Additionally, by weighting the importance of one attribute over another we may successfully apply the same extraction technique to a range of dissimilar feature-types. The problem presented by this technique is exactly how the various sources of information (or intermediate results) may be combined. The approach taken here is to defer the formation of regions until all available information has been considered, as opposed to forming regions during low level analysis and subsequently splitting, merging or discarding regions. In this situation the low level processing is simply used as a focusing mechanism to identify regions of the image which may merit further investigation. As indicated by figure 2, the combination of evidence is dependent upon knowledge describing the feature-type of interest. This dependence is initially divided into two broad categories: regional feature-types and linear feature-types. The scale and resolution of the raw data is used to estimate how a feature-type will appear within the image. For example a road will usually appear as an aliasing effect in LANDSAT TM images, and hence is linear in nature. The same road may appear as a region in high resolution aerial photographic images, and so the method for detection will change. For the detection of regional feature-types, an approach based loosely on Milgram's method (Milgram and Kahl, 1979) is applied. Successive regions are formed by including points of increasingly lower spectral certainty (as evaluated by the distance of the signature of an image point to the training set centroid). At each step the boundary of the regions formed by this process are intersected with the edge evidence. A score is assigned based on the strength and connectivity of the edge evidence along the region boundary. The regions with the highest intersection score form the final classification. The overlay in figure 7 shows the classification of the Minehead data set, resulting from a query concerning forest cover. For linear feature-types, points of high spectral certainty are examined for bounding parallel edge evidence, such points are marked as candidate seed points and are used to form a linear network. The example in Figure 8 illustrates how this technique is used to extract road segments from the Penobscot data set. Figure 8a shows the road seed points and figure 8b shows the resulting detected road network. For more details of this approach refer to Gahegan and Flack (1993). A

B

Figure 8. Extracting Road Features 8

3.1 EVALUATION/INTERACTION Given a query and a data set, instantiations of the required feature-types are identified. These features may then be evaluated, either manually by the user, or automatically by the system. The final classification as presented in the previous section is simply the classification at which the spectral information is supported most strongly by the edge evidence. In certain situations the user may wish to examine the validity of a particular region, or group of regions. To facilitate this the constraints on region formation may be relaxed or contracted to provide a minimum and maximum extent view of the feature. Figure 9 shows the results of applying this technique to a forest feature taken from the Minehead image. The dark area represents the high certainty region, the light area represents the maximum extents of the region as the forming constraints are relaxed. Simple queries concerning the feature properties may help the user to discard invalid regions and track changes over time. Figure 10 shows the change over time of a water body detected in the Penobscot dataset. Given the information provided in the knowledge base and the context of the current query the system is capable of processing the recognised regions. For example, part of the user query may specify that only features occupying an area greater than a threshold should be returned. As an example consider the situation in which an (expert) user specifies that instances of a particular feature-type are only valid if they satisfy a minimum size constraint. Instances of regions violating this constraint are discarded from the current view. Additionally, since the system is essentially a GIS, the user may qualify a query or operation by using the value of any other attribute-type on which the GIS holds data.

Figure 9 Minimum and Maximum Extents

Figure 10. Detecting Changes

In comparison to figure 7, figure 11 shows how a size constraint may provide a more desirable classification. In this case regions smaller than 300M2 (10 pixels) are discarded from the classification.

9

Figure 11. Final Forest Extraction Future work will focus on providing intelligent, automatic evaluation of the classified regions. By extending the knowledge base to include characteristic values for relevant feature properties, the significance of detected features may be used to influence the certainty of the classification. For example, a detected feature that is smaller than any previously encountered features of the same type could lead the system to reduce the probability of that feature. For more information on significance and context within a database environment see Roberts and Gahegan (1991). 4.0

CONCLUSION

An extended GIS has been developed that encompasses an image interpretation system. The binding between the images and features of interest is delayed until run time, and this approach gives more flexibility in scene interpretation than is currently available when using an image processor as a precursor to the GIS. The results presented demonstrate the adaptable behaviour of the system when presented with different types of queries. By using a variety of independent image processing tools and combining the results, the accuracy (and usefulness) of the resulting interpretation is improved. There is, of course, more work to do. Specifically we are keen to build some automated means of query evaluation into the system, that can modify expert rules in the light of experience. We are presently investigating the tracking of features between temporally sequenced images as a means of monitoring landuse change.

5.0

ACKNOWLEDGEMENTS

Our thanks go to Emily Bryant of the Image Processing Lab, Department of Earth Sciences, Dartmouth College, Hanover, NY 03755 USA, for helping with some of our data requirements.

10

REFERENCES Canny J (1986), A Computational Approach to Edge Detection. IEEE Transactions on Pattern analysis and Machine Intelligence, Vol.8, No.6, pp 679-698. De Cola L (1989), Fractal Analysis of a Landsat Scene, Photogrammetric Engineering and Remote Sensing, vol. 55, no 5, pp 601-610. Duda R O and Hart P E (1973), Pattern Classification and Scene Analysis, New York, Wiley 1973. Eklundh J O and Yamamoto H and Rosenfeld A (1980), A Relaxation Method for Multispectral Pixel Classification, IEEE Transactions on Pattern analysis and Machine Intelligence, Vol.2, No.1, pp 72-75. Gahegan M N and Roberts S A, (1988), An intelligent, object-oriented geographical information system. International Journal of GIS, vol. 2 no 2, pp 101-110. Gahegan M N and Flack J C (1993), Query-centred interpretation of remotely sensed images within a GIS. To appear in Proc. European conference on Geographic Information Systems, Genoa Italy March 1993. Gahegan M N (1993), A consistent user-model for a GIS incorporating remotelysensed data. To appear in Proc. GIS research UK (1), Keele University, UK, March 1993. Hancock E R and Kittler J (1989), Edge Postprocessing --- A Comparative Study, Proceedings of the fifth Alvey vision conference, University of Reading, 25th-28th Sept. 1989, pp 245-249. Korth H F and Silberschatz A (1991), Database System Concepts (second edition), Chapter 1, McGraw-Hill. Lillesand T M and Kiefer R W (1979), Remote Sensing and Image Interpretation, New York, Wiley. Marr D, (1982), Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W. H Freeman and Company, P.270. Milgram D L and Kahl D J (1979), Region Extraction Using Convergent Evidence, Computer Graphics and Image Processing, Vol.11, pp 1-12. Nagao M and Matsuyama T (1980), A Structural Analysis of Complex Aerial Photographs, New York, Plenum Press, pp 199. Pavilidis T and Liow Y T (1990), Integrating Region Growing and Edge Detection, IEEE Transactions on Pattern analysis and Machine Intelligence, Vol.12, No.3, pp 225-233. 11

Roberts S A and Gahegan M N, (1991), Supporting the notion of context within a database environment for intelligent reporting and query optimisation. European Journal of Information Systems, vol. 1, no 1, pp 13-22. Ton J and Sticklen J and Jain A K (1991), Knowledge-Based Segmentation of Landsat Images, IEEE Transactions on Geoscience and Remote Sensing, Vol. 29, No. 2, pp 223-231. Torre V and Poggio T (1986), On Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 2, pp147-163. Van Oosterom P and Van Den Boss J (1989), An object-oriented approach to the design of geographic information systems. Computing and Graphics, vol. 13, no 4, pp 409-418. Wharton S W (1987), A Spectral-Knowledge-Base Approach for Urban Land-Cover Discrimination, IEEE Transactions on Geoscience and Remote Sensing, Vol. 25, No. 3, pp 273-283.

12

A Model To Support the Integration of Image Understanding ...

A Model To Support the Integration of Image Understanding ...

Suggest Documents