and computer networks in general there is an increasing demand for e ective methods to search large heteroge- neous data collections (e.g. distributed digital image databases). ... troduced for the purpose of data retrieval. ..... mation Services.
First NOBLESSE Workshop on Nonlinear Methods in Model-Based Image Interpretation, Lausanne, Switzerland, September 1996
USING INVARIANT FEATURES FOR CONTENT BASED DATA RETRIEVAL H. Schulz-Mirbach, H. Burkhardt, S. Siggelkow
Technische Universitat Hamburg-Harburg Technische Informatik I 21071 Hamburg, Germany ABSTRACT The paper discusses a region based approach for content based image retrieval. Due to the explosive growth of image databases there is a great need for new methods to handle large amount of data eciently. Besides manual image annotation automatic feature extraction is necessary. It is desirable not only to compare whole images but also look up single objects in an image database.
1. INTRODUCTION Due to the explosive growth of the World Wide Web and computer networks in general there is an increasing demand for eective methods to search large heterogeneous data collections (e.g. distributed digital image databases). Traditional database techniques rely on speci c tags (often text identi ers) which are solely introduced for the purpose of data retrieval. These tags must be inserted manually and a database search is based on an exact match of a user query with these identi ers. Clearly, this strategy is unfeasible for large heterogeneous data collections. The steps of database population (i.e. insertion of new data) and data retrieval should keep the necessity of manual interaction at a minimum. Since search from and across large data collections is one of the key technological issues there is currently considerable research activity in the eld. We just mention the US Digital Library Initiative (DLI) where search methods for image databases are among the core activities [2]. At present several DLI projects utilize straigtforward image processing techniques like color histograms for content based image retrieval although The research described in this article is supported by the European Union in the Reactive Long Term Research Project 'Nonlinear Model-Based Analysis and Description of Images for Multimedia Applications' (NOBLESSE), Project number 20229
more advanced methods are already under investigation [7, 8]. In this paper we describe an emerging region based approach for content based data retrieval. For the sake of concreteness we describe the algorithms for 2D gray scale images but the basic methodology is applicable for other data types (e.g. 3D tomogra c images) as well.
2. USING IMAGE REGIONS FOR CONTENT BASED DATA RETRIEVAL Without suitable indexing an image database cannot serve as an information resource. Aiming at a fully content based search/retrieval our goal is an automatically generated index to an image database. It has to be stressed that the extraction of semantic information from unconstrained images has proven to be an extremely dicult computer vision task which is not yet solved satisfactorily. Therefore we refrain from extracting semantic information from the images and focus on a region based approach instead. We use a segmentation algorithm [6] to split the image into dierent regions. For every region we calculate a vector of numerical features. The numerical values of these features are used as an index into the image database. The crucial point is that the features are robust (i.e. have only a low variation) with respect to speci c transformations of the image regions. These transformations include
Geometric transformations; e.g. euclidean, ane,
or projective image transformations. Simple luminance transformations; e.g. uniform contrast and/or brightness changes. Small boundary variations; e.g. missing parts of the boundary which are due to segmentation erros.
Topological object deformations; e.g. a hand with the ngers moving independently.
The basic philosophy behind this approach is that image information is located in spatial image regions and that the mentioned region transformations do not change the 'information content' as far as database indexing is concerned. Let us consider e.g. an aerial image showing an airport. By using our segmentation algorithm we are eventually able to extract regions corresponding to dierent airplanes. For indexing purposes the content of these regions does not change when we take the images from a dierent altitude or at dierent times (e.g. one image in the morning and one image in the afternoon). The induced luminance transformations and scalings should not aect the feature values.
3. CONSTRUCTION OF INVARIANT IMAGE FEATURES We con ne ourselves to describe here algorithms for the extraction of image features which are invariant with respect to image rotations and translations [3, 4]. An extension of these techniques to ane and projective image transformations is currently under investigation. First results are described in [5]. A discussion based on nonlinear system theory is given in [1].
3.1. Invariant features for gray scale images
Gray scale images are denoted by uppercase boldface letters, e.g. M and are written in the form M = (M[i; j ]); 0 i; j < N . The number M[i; j ] is called gray value at the pixel coordinate (i; j ). There is a transformation group G with elements g 2 G acting on the images. For an image M and a group element g 2 G the transformed image is denoted by gM. Given a gray scale image M and an element g 2 G of the group of image rotations and translations, an angle ' 2 [0; 2] and a translation vector t = (t0 ; t1)T 2 IR2 exists so that (gM)[i; j ] = M[k; l] with (1) k = cos ' sin ' i t0 : (2) l sin ' cos ' j t1 All indices are understood modulo N . Note that due to this convention the range of the translation vector t = (t0 ; t1)T 2 IR2 can be restricted to 0 t0; t1 < N. An invariant feature is a complex valued function F (M) which is invariant with respect to the action of the transformation group on the images, i.e.
F (gM) = F (M) 8g 2 G: (3) We use uppercase letters (e.g. F ) for denoting invariant features and lowercase letters (e.g. f ) for complex valued functions which are not necessarily invariant.
3.2. How to construct invariant features For a given gray scale image M and a complex valued function f (M) it is possible to construct an invariant feature A[f ](M) by integrating f (gM) over the trans-
formation group G:
A[f ](M) =
Z G
f (gM)dg:
(4)
This averaging technique for constructing invariant features is explained in detail in [3, 4] for general transformation groups. For the group of image rotation and translation the integral over the group can be written as 1 Z A[f ](M) = 2N 2
N
Z
N
Z
2
f (gM)d'dt0 dt1: (5) A[f ] is called the group average of f . One can show [4] that the group average A[f ] can be calculated by evaluating rst for every pixel a local function which only depends upon the gray values in speci c neighbourhood of the actual pixel. The second step is to add the results of the local computations. This interpretation of the group average (5) will be useful for determining the feature properties for scenes with multiple (possibly overlapping) objects (cf. section 3.4). t0=0 t1 =0 '=0
3.3. Modi cations for object based invariants
Up to now only global invariants for whole images have been considered. However, as explained above, for content based image retrieval it is desirable to have features for single objects in an image. A straightforward approch is to segment the image into it's objects before calculating the invariants. The above mentioned gray scale invariants have to be modi ed only slightly for this purpose. The object can be part of an image scenario as well as of a small image containg only this one object. Thus the integration over the rest of the image must be neglected and therefore also the normalization with the image size. Instead the integration is done over the object only. Thus the two step strategy remains: First some local function is calcualted for each object pixel
and then the results are summed up. The normalization has to be done with the object size now.
3.4. Additional useful properties
In this section some special properties of the above explained invariants shall be stressed, which are useful in the context of content based image retrieval. As already mentioned it is desirable to obtain invariants for single objects instead of for the whole image. Therefore a segmentation is done. But the segmentation can cause deformations of the object contour, since the real object contour is cut to the pixel raster. This would increase the error for dierential techniques. The introduced gray scale invariants are based on an averaging process instead. That is advantageous since the small discretisation errors of the segmentation do not have that big eect. This property is also useful for objects that are partly deformable as the truck in gure 3. The errors resulting from these deformations are small if the rest of the object remains unchanged. But there is one even more important property of the gray scale invariants: By chosing appropriate functions f for group averaging one can enforce the so called 'additivity in feature space'. That means, the sum of the invariant features of two single objects (O1 and O2) are nearly the same as the ones of both objects considered as one object.
jO1jF (O1)+ jO2jF (O2) (jO1j + jO2j)F (O1 [ O2) (6) This can be directly seen from the two calculation steps: First a local function is calculated for each pixel and then the results are integrated over the object. The error in equation 6 results from the dierence in the calculation of the local function at the common boundary of both objects. If they are considered seperate, the pixels outside each object's boundary are set to one in our case in order to make the object invariants independent from the image content around the object. If both objects are considered as one, the integration is done over the boundary without replacing the values there. The resulting error depends on the size of the neighbourhood that is considered for the local function and on the relation between the boundary length and the objects' sizes. For big objects and a small common boundary the error is small because of the averaging eect. For small objects and a long boundary the error is larger. For the classi cation of objects this property is useful. Automatic segmentation most time isn't conformable to subjective image partitioning of a human being. Thus the invariants would be calculated for objects that
Objects: Ref. car 1 Ref. car 2 Ref. car 3 Ref. car 4
Dark police car 1.147e+04 1.174e+05 2.964e+02 1.128e+04
Light small car 67.53 6.257e+03 8.690e+04 3.584e+04
Light big car 6.771e+03 3.390e+03 9.270e+04 3.145e+04
Table 1: Classi cation example: weighted euclidean distances of the three car objects in gure 1 to the reference cars
Figure 1: Classi cation example do not correspond to semantic objects. But if one ensures that the image is oversegmented, it is possible to check the reference object invariants also with the sum of connected objects. Thus it is possible to support the computer at the segmentation task only for the rst few objects of a class by manually linking all automatically extracted objects together to one subjective object. Later on it can be tested automatically, whether some connected objects together refer to an object in the database. This should work also for database population, not only when obtaining information.
4. EXPERIMENTAL RESULTS In this section experimental results shall be given to illustrate the use of the gray scale invariants for content based image retrieval. The invariants were constructed from twelve dierent monomials f of orders up to three. The maximum support size was three. That means, that the local parts of the calculations could be performed on a 77 window around the pixels. In table 1 some classi cation results are shown. The features of four dierent car's have been learned before. These are compared with the features of the three car objects in the scene shown in gure 1, left side. The values in the table show the weighted Euclidean distance of the three cars' gray scale features (extended by the object's size) to the mean reference features. The reference cars 3 (dark police car), 1 (light small car), and 2 (light big car) have been correctly detected
1.8 1.6
Absolute percental error
1.4 1.2 1 0.8 3.5 0.6 3
0.2 0 1
2
3
4
5
6 7 Feature number
8
9
10
11
Figure 2: Example of additivity error
Absolute percental error
0.4 2.5
2
1.5
1
0.5
in this case. Due to the object based feature extraction this also works for other backgrounds as shown in gure 1, right side, where the cars 1 and 3 are correctly detected, too. In gure 2 a result on the additivity property is shown. The displayed car has been treated as one object on the one hand and divided into three objects (upper right labels) on the other hand. The error in the feature vector is displayed at the bottom. There the in uence of the local function can be seen very well. The features 1, 2, and 6 include no neighbour values, therefore there is no dierence in the integration at the boundary for both cases. Thus the error is zero there. But it is moderate also for the other features (at maximum 1.8 percent). Figure 3 shows three examples of a deformed object. As can be seen at the bottom, the dierence of the features stays small, so that the object could be classi ed although it is not only rotated and translated. This directly results from the integrative property of the presented invariants.
5. CONCLUSION In this paper a region based approach for content based data retrieval is presented. The features have several useful properties for working together with segmentation and indexing. The rst results obtained are promising for the use of these features in searching
0 1
2
3
4
5
6 7 Feature number
8
9
10
11
Figure 3: Example of deformation error large image databases. For the future it is planned to extend the features to colour invariants that also take into account changes in illumination and to perform experiments with large databases.
6. REFERENCES [1] H. Burkhardt, H. Schulz-Mirbach A contribution to nonlinear system theory. Proc. of the IEEE Workshop on `Nonlinear Signal and Image Processing', vol. II, pp. 823-826, Halkidiki, Greece, June 1995. [2] B. Schatz, H. Chen Building Large-Scale Digital Libraries. IEEE Computer, pp. 22-26, May 1996. [3] H. Schulz-Mirbach Constructing invariant features by averaging techniques. Proc. of the 12'th International Conference on Pattern Recognition, vol.II, pp.387-390, Jerusalem, Israel 1994. [4] H. Schulz-Mirbach Anwendung von Invarianzprinzipien zur Merkmalgewinnung in der Mustererkennung. VDI Fortschritt-Bericht, Reihe 10, Nr.
372, VDI Verlag 1995.
[5] H. Schulz-Mirbach Ane and projective invariant gray scale features. Internal Report 5/95, Technische Informatik I, Technische Universitat Hamburg-Harburg, July 1995. [6] S. Siggelkow, R.-R. Grigat, and A. Ibenthal Segmentation of image sequences for object oriented coding. Proceedings of International Conference
for Image Processing (ICIP), Lausanne, September 1996. [7] H. D. Wactlar, T. Kanade, M. A. Smith, S. M. Stevens Intelligent Access to Digital Video: Informedia Project. IEEE Computer, pp. 46-52, May 1996. [8] R. Wilensky Toward Work-Centered Digital Information Services. IEEE Computer, pp. 37-44, May 1996.