data mining and spatial reasoning for satellite image ... - CiteSeerX

Data Mining and Spatial Reasoning for Satellite Image Characterization Corina VADUVA, Daniela FAUR, Inge GAVAT Applied Electronics and Information Technology Department University Politehnica of Bucharest Bucharest, Romania [email protected]

idea of the image processing chain is first to extract objects (pixel level analysis) and then to understand configuration of regions (object level analysis).

Abstract—High level image understanding and content extraction requires image regions analysis to reveal the spatial interaction between them. This paper aims to engender new attributes for scene description considering the relative position of the objects inside. A visual grammar of the scene is built using an extension for a Knowledge Based Image Information Mining system (KIM). The objects are extracted using statistical models and machine learning through the KIM system, according to the user interest. Further, an affine invariant descriptor of the relative position between two objects is computed. This is the force histogram and it is considered to be a spatial signature which characterizes configurations of regions based on the attraction forces between the composing objects. Thereby, new patterns could be defined using similar object configurations, in order to enhance the effectiveness of the content-based image retrieval inside large databases. High levei image understanding; invariant signatures (key words)

I.

spatial

In the field of pixel-based analysis, there are intelligent concepts of interactive learning and probabilistic retrieval of user-specific cover-types bringing significant improvements to the features extraction techniques. The knowledge based image information mining (KIM) system [2] uses such a method and it is based on human centered concepts in order to fully exploit the human-computer synergy: the user guides the interactive learning process and the system continuously gives the operator relevance feedback about the performed training actions and searches the archive for relevant images. Although there are others content-based retrieval concepts (GoeIRIS, C.R. Shyu [6]) from large-scale images, KIM is easy to use and it performs a very good local classification.

relationships;

Once they are extracted, the regions can have different spatial arrangements. Thereby, a couple of scenes can have many interpretations. The relative position, as a fundamental concept in computer vision, helps discern between “parks in a city” and “houses near forest”.

INTRODUCTION

Due to the fact that the amount of information received from satellites is constantly increasing, image information mining becomes extremely difficult. In various different fields, like agriculture, forestry, floods, city planning, cartography or military high level remote sensing data needs to be transformed in useful products to assess land infrastructure.

The histogram of forces introduced by Matsakis in [4], [5] is an alternative way to define properly the spatial relationship between regions in a scene. While Aksoy and Tilton [3] proposed a method that gives information about the topology in the image, the histogram of forces function is sensitive to shape of the objects, their orientation, their size and the distance between them.

Automatic content extraction, classification and contentbased retrieval are highly necessary to develop intelligent databases for effective and efficient processing of remotely sensed imagery. Therefore, a high-level semantic interpretation of images is considered to be a really challenging problem.

The aim of this paper is to engender new attributes to the land-cover classification performed by KIM system. The proposed processing chain for image information mining begins with the extraction of interest objects/regions through an interactive learning process, inside of the KIM system tool. The regions (2D objects) will be handled as longitudinal sections. A simplified forces histogram (HF) interpretation is used to describe the relative position between regions of interest. Therefore, any configuration of 2 regions is reduced to a signature, invariant to rotation, translation or scale. The new obtained attribute revealing spatial relationships can be further attached to the envisaged pair of objects, as a new label in further complex content-based image retrieval.

Most of the previous approaches try to solve the content extraction problem only by building pixel-based classification and retrieval models using spectral and texture features. But this is only the first level from a hierarchical scene modeling with a visual grammar that aims to bridge the gap between features and semantic interpretation [1]. In order to increase the efficiency of image characterization, our purpose is to obtain region-level features to describe properties shared by groups of pixels. The conversion from image data to the thematic information required by end users is a constraining factor for the further dissemination of remote sensing applications. The

c 978-1-4244-6363-3/10/$26.00 2010 IEEE

173

The outline of the paper is as follows: the proposed method is described in Chapter II; the next chapters explain, step by step, the notions used through the processing chain: the knowledge based information mining system for image retrieval (III), the used spatial relationships and the histogram of forces (IV), and finally the invariant signatures extraction for object configurations (V). The last chapter (VI) analysis the achieved results and considers further improvements of the algorithm. II.

The knowledge driven information mining system KIM is composed of 3 subsystems: 1.A library of algorithms for extracting the image primitive features and representing them as a code of the image classes; 2.A human machine communication module including a machine learning algorithm and a graphical user interface; automated image classification is generated to the entire database; 3.A database management system (DBMS) that stores and manages an image information catalogue and is recording the semantic by the user doing queries in the system.

PROPOSED METHOD

In order to provide knowledge based image interpretation, a visual grammar algorithm models hierarchically the scene in three levels: pixel level, region level and scene level. Pixellevel representations include labels for individual pixels computed in terms of spectral, texture, geometry and segmentation cluster features. Region-level representations include land cover labels for groups of pixels obtained through region segmentation. Scene-level representations include interactions of different regions computed in terms of their spatial relationships. Figure 2. Overall KIM system logical model

The logical structure of the KIM system is presented in Figure 2. At the time of data ingestion multi mission images are tiled in sub-images, indexed and stored in a repository. From these images, primitive features are extracted, such as spectral, texture and geometrical attributes. To obtain a quasicomplete description of the entire image, textural and geometrical features are extracted at multiple scales. For data reduction reasons, the features are separately compressed by a global unsupervised clustering across all images in the archive. In the inventory, the DBMS catalogue system contains all relevant types of meta-information.

Figure 1. Image data workflow: KIM classification, segmentation, object grouping, generation of the reference object, HF computation.

The new approach data workflow is given in Figure 1. The first step is to find meaningful and representative regions inside the scene through an automatic KIM land cover classification. Next, through a mean shift segmentation process, region-level features will efficiently describe the properties shared by groups of pixels. Further, the objects are put together into configurations; a synthetic circular object is always placed in the centre of the configuration; then, the forces histogram is computed for this reference object against the entire configuration, beginning with the angle of the principal axis. The last step of the process consists in a normalization to [0, 1] of the histogram of forces. A rotation, translations and scaling invariant signature is thus generated for all groups of objects as a new attribute describing the image to the scene-level. III.

KIM assigns meaning to the primitive features through a learning phase [1]. Using the samples images, the user marks interest areas by giving positive and negative examples, refining the definition of derived feature through an iterative process. Great structures of interest will appear in red on gray scale panel visualization of the scene. Once this system training has been satisfactorily completed, the definition can be saved and used afterwards just by requesting images containing the derived features. IV.

SPATIAL RELATIONSHIPS

The spatial information represents an important element of high resolution image understanding and scene description. After the interest objects are extracted, the next step is to model the relations between them. The relative position of two regions can be defined in terms like right of, left of, below, above, near, far. The fuzzy evaluation of directional spatial relationships between areal objects often relies on the computation of the histogram of angles, which is considered to provide a good representation of the relative position of an object with regard to another based only on their geometry (shape and distance) [3].

KNOWLEDGE BASED INFORMATION MINING CONCEPTS FOR IMAGE RETRIEVAL

The KIM system is a new concept to explore image catalogues based on human-centered concepts [2]. It allows the user to search in an archive by giving examples directly on the image. A graphical user interface enables a variety of mining tools, including semantic querying by image content or image example, interactive classification and learning of image content.

Through a generalization, was obtained the histogram of forces, a function sensitive to the shape of the regions, their

174

corresponding to the principal axis, thorough the entire trigonometrically circle (360o). The result is normalized to [0, 1] in order to obtain a signature for the group of objects.

orientation, their size and the distance between them. It represents the relative position of 2D object A with regard to another object B, by a function FAB from ℜ into ℜ+. For any direction θ, the value FAB(θ) is the scalar resultant of elementary forces exerted by points of A on those of B which tend to move B towards A in direction θ (Figure 3a). Thus, we define FAB as the histogram of forces associated with the group (A, B) via F, where A is the argument and B the referent.

Figure 3. a, b. a. Elementary forces between objects; b. Handling of segments when computing the histogram of forces. Figure 4. Algorithm for computing spatial signatures specific to groups of objects. Step1: object grouping, generation of the reference object; Step 2: counterclockwise computation of the HF, starting from the principal axis.

There are many constraints in the equation of this histogram, regarding the distance between the objects concerned. In this paper, some assumptions were made in order to simplify the computational part. Instead of trying to find out if object A is moved towards object B from right, left, above, or below, we are only interested about the force wherewith the two objects are attracted one to another along a specific direction θ through a 2π period. The equation becomes easier to compute (1) and it involves knowing just the values for the sections in objects A and B (Figure 3b) which are undercrossed with the direction θ. , 0

The synthetic object assures the signature invariance for the translation in the image. The principal axis angle as threshold for the histogram of forces computation provides the invariance for the group rotation inside the scene, while the normalization keeps the same signature if the objects are scaled.

(1)

To the output, this method will provide a function graphically represented as in Figure 3b. Its periodicity is given by the fact that the forces of attraction between 2 objects are the same for directions (0, π) as for (π, 2π). This reasoning is very useful for building spatial signatures for groups of objects, as it will be shown in the next paragraph. V.

Figure 5. Illustration for the invariance of the spatial signatures to translation, rotation and scale.

SPATIAL SIGNATURES

Given a configuration of objects, we need to put together all the pixel and region level features using the relationships between objects involved in order to build scene level attributes. There are a lot of possibilities to group the regions into configurations and thus many high level features not very easy to operate with. But some groups can be alike, for example, pairs of houses in a residency area. The main problem is to be able to assign the same label to many groups of regions if they are similar.

The illustration of this method is given in Figure 5. We can observe that the same configuration has different positions in the image, at different scales. The three rectangles are characterized by the same vector. One attribute can be thus assigned to all the four groups. VI.

EXPERIMENTAL RESULTS

The theory is also proven in real imagery. As example in this paper, we used an aerial image acquired over Louisiana, SUA (Figure 6a). Areas of interest, such as houses, roads, or vegetation types, could be extracted from the scene using KIM system. In Figure 6b houses are considered to be the objects of interest and they will be used in further analysis.

Therefore, we propose an algorithm that reduces a specific configuration to an invariant signature, regardless to its position in the image. Its diagram is represented in Figure 4. First, we choose a group of objects in the scene, A1 (the correspondent of object A in Figure 3). Then, a reference punctiform object, B1 (the correspondent of object B in Figure 3), is generated in the centroid of the group. Further, we analyze the attraction between the reference object and the whole configuration. The histogram of forces between A1 and B1 is thus computed beginning with the direction θp,

In order to accomplish the analysis for spatial relationships inside a collection of objects (Figure 6c), a grouping of nearby regions is needed. Figure 7 display the results for some of the pairs of objects. As it can be seen, there are similar configurations defined by similar signatures. For example, just

175

by looking at their signatures, we are able to say that the pairs 1, 2, 4 or 3, 5 illustrate the same group of objects. Indeed, if we compare the configurations, they are very much alike, but translated and rotated through the image. The very small differences appear due to the fact that the real regions have not

exactly the same shape. Applying the same method, we can find not only similar groups of houses, but more complex configurations, like “house and its access to the main road” or “garden near house”.

Figure 6. a) Original aerial image; b) In red, there are the regions extracted with the KIM system, according to a user interest; c) Final objects used in the spatial relationships analysis

Figure 7. In red there are the pairs of objects characterized by the spatial signature in the right. Close regions in the image lead to large lobe in the signature, while distant regions lead to strait lobe, because they attract less.

ACKNOWLEDGMENT

This method can assign similar signatures to resembling regions that can be unperceivable by the human eye. A user could observe immediately that pairs 3 and 5 are alike, but he could miss the similarity between pairs 2 and 4.

The authors would like to thank Prof. Mihai DATCU for helpful comments and suggestions. REFERENCES

Considering the signatures as semantic attributes for the image at the region level, complex semantic classification could automatically be made.

[1]

[2]

VII. CONCLUSIONS We described in this paper an algorithm for a visual grammar useful in high level semantic interpretation for large collections of remote sensing image. A spatial invariant signature was proposed to describe the spatial relationships between objects, automatically extracted using a knowledge driven information mining system according to a user interest.

[3]

[4]

This attribute could provide new information for complex semantic classifications able to separate configurations of objects having different interpretations (for example: residential area or industrial park, forest near a village or park inside a city). Two scenes can thus be distinguished based on different interpretations regarding their content.

[5]

[6]

176

M. Schroder, H. Rehrauer, K. Seidel, M. Datcu, “Interactive Learning and Probabilistic Retrieval in Remote Sensing Image Archives”, IEEE Transactions on geosciences and remote sensing, vol. 38, no. 5, September 2000, pp.100-119. M. Datcu, K. Seidel, “Human-Centered Concepts for Exploration and Understanding of Earth Observation Images”, IEEE Transactions on geosciences and remote sensing, vol. 43, no. 3, March 2005, pp.601609. S. Aksoy, K. Koperski, C. Tusk,G. Marchisio,J.C. Tilton, “Learning Bayesian classifiers for scene classification with a visual grammar”, IEEE Transaction on geosciences and remote sensing, vol. 43, no. 3, March 2005, pp. 581-589. P. Matsakis, L. Wendling, “A new way to represent the relative position between areal objects”, IEEE Transactions on pattern analysis and machine intelligence, vol. 21, no. 7, July 1999, pp. 634-643. P. Matsakis, J. M. Keller, L. Wendling, J. Marjamaa, O. Sjahputera, “Linguistic description of relative position in images”, IEEE Trans. Systems, Man, and Cybernetics (Part B), vol. 31, no. 4, 2001, pp. 573588. C. R. Shyu, M. Klaric, G. J. Scott, A. S. Barb, C. H. Davis, K. Palaniappan, “GeoIRIS: Geospatial Information Retrieval and Indexing System—Content Mining, Semantics Modelling, and Complex Queries”, IEEE Transactions and remote sensing, vol. 45, no. 4, April 2007, pp. 839-852