"Nonlinear Model-Based Analysis and Description of Images for Multimedia Applications" EC { DG III/F-Industry Project Number: 20229
Deliverable 12
The Volterra theory of nonlinear systems and algorithms for the construction of invariant image features Hanns Schulz-Mirbach (TUHH)
published as: Internal Report 7/96, TU Hamburg-Harburg, Technische Informatik I, October 1996
Technische Universitat Hamburg-Harburg Technische Informatik I
The Volterra theory of nonlinear systems and algorithms for the construction of invariant image features Interner Bericht 7/96 Hanns Schulz-Mirbach TU Hamburg-Harburg Technische Informatik I 21071 Hamburg, Germany
[email protected] October 1996
Abstract The Volterra theory gives a characterization of nonlinear systems in terms of polynomial functionals. In particular it provides stability criteria and methods for the synthesis of optimal (in the sense of minimal mean square approximation error) systems of a given order. It is the purpose of this paper to apply the Volterra theory for the construction of invariant image features. These are image characteristics which remain constant if the images are transformed according to the action of a transformation group. We derive conditions for the Volterra kernels which ensure the invariance of the resulting functionals. The methods developed are applicable both for 2D gray scale images and 3D data sets like CT (computer tomography) or MR (magnetic resonance) images.
1
1 Introduction Invariant features are data characteristics which remain constant when the data are transformed according to the action of speci c transformations. We will focus mainly on image processing applications of such invariant features (cf. the survey article [8]). It is the purpose of this paper to apply the Volterra theory of nonlinear systems (cf. [3]) for the construction of invariant image features. The Volterra theory gives a system characterization in terms of polynomial functionals. In particular it provides stability criteria and methods for the synthesis of optimal (in the sense of minimal mean square approximation error) systems of a given order. It is for several reasons not straightforward to apply Volterra techniques directly for the construction of invariant features. The rst point is that in the Volterra theory the system input and output are elements of the same signal space. This is undesirable for the purpose of feature extraction where one of the major goals is data reduction. The second point concerns the special role of signal translations. On the one hand the system output in Volterra theory is supposed to have a speci c transformation law with respect to signal translations and on the other hand there are transformation groups which require translation invariance of the system output. Clearly, these two points are not independent. Since we want the system output to be of lower dimension than the input it is unclear how to de ne output translations. A possible solution for these problems is to concentrate on the functional aspect of the Volterra series. The essence of the Volterra representation is that we have to pick at every location of the signal an appropriate subset of the signal, that we evaluate a function using this subset (in the Volterra theory these functions are monomials of the signal values) and that we have to multiply this value with the value of a function which only depends on the coordinates of the subset and that we have to add the results of these computations. That is the interpretation of the Volterra approach which is most useful in the context of feature extraction. We derive conditions for the Volterra kernels which ensure the invariance of the resulting functionals. The methods are applicable both for 2D gray scale images and 3D data sets like CT (computer tomography) or MR (magnetic resonance) images. It has to be emphasized that the invariant features developed in this paper are not geometric invariants which rely on a parametric description of the object boundaries and on the calculation of appropriate derivatives.
2 Basic concepts The purpose of this section is to introduce our terminology and to explain some essentials concerning the action of transformation groups on the data under consideration.
2 Signals (we will also call them images or patterns) are denoted by uppercase boldface letters, e.g. M and are de ned as complex valued maps M : S ! C ; u ! M[u] on the support space S . The support space S may be the whole of IRn; n 2 f1; 2; 3g or only a subset like IN n or ZZ n . The vectors u 2 S are called pixel coordinates. The number M[u] is called gray value at the pixel coordinate u. We denote by SS the set of all signals with support space S . For n = 1 one often speaks about time signals, for n = 2 about gray scale images and for n = 3 about 3D images. On SS we have the action of two dierent transformation types. The rst type are signal translations described by operators T (t) : SS ! SS : (T (t)M)[u] = M[u ? t]; t 2 S:
(1)
The second type of signal transformations is de ned by a group G and linear operators f. T (g) : SS ! SS ; g 2 G. Transformed signals T (g)M are sometimes also denoted by M The transformations we consider in this paper can be described in terms of the action of the group G on the pixel coordinates:
f[u] = M[ue]; ue = gu; g 2 G: M
(2)
The group elements g 2 G can in general be described by square matrices. It is assumed throughout the rest of the paper that these matrices are nonsingular, i.e. invertible. In the following we mention some groups which are of particular interest for computer vision applications:
the group Grot of image rotations and tanslations
u~ cos ' ? sin ' u t ue = u~ = sin ' cos ' u + t : 1
1
1
2
2
2
(3)
Some examples for rotated and translated images are shown in Figure 1. the group Gaff of ane image transformations
u~ a a u t ue = u~ = a a u + t : 1
11
12
1
1
2
21
22
2
2
(4)
Figure 2 shows some examples for anely transformed images. Invariant image features are complex valued functionals F : SS ! C de ned on the signal space SS which are invariant with respect to the action of the transformation group G on the signals, i.e.
F (T (g)M) = F (M) 8g 2 G; M 2 SS :
(5)
3
Figure 1: Examples for the action of the group of image rotations and translations.
Figure 2: Some examples for the action of the group of ane image transformations. We use uppercase letters (e.g. F ) for denoting invariant features and lowercase letters (e.g. f ) for functionals which are not necessarily invariant. It is the major purpose of this paper to apply the Volterra theory of nonlinear systems for the construction of invariant features (5).
4
3 The Volterra theory of nonlinear systems We brie y summarize in this section the main aspects of the Volterra theory which are of interest for our intended application. A comprehensive treatment of this topic can be found in [3]. A system is viewed as a black box which maps an input function x(u) to an output function y(u). We denote the system operator by T. Then the system is described by
y = T[x]: Note that both x and y are functions de ned on the support space S . The system is called shift invariant if the translation operators T (t) (cf. (1)) commute with the system operator T, i.e.
T [T (t)x] = T (t)T [x] 8t:
(6)
Unfortunately this terminology is somewhat misleading since shift invariance does not mean that the system output is unaected by translations of the system input. It only means that shifts of the system input induce shifts by the same amount of the system output but keeps the shape of the response unaltered. The fundamental theorem in Volterra theory states that for a shift invariant system T which ful lls some additional restrictions the relation between the input and the output can be expressed by the Volterra series:
y = T[x] =
1 X n=1
Hn [x]:
(7)
In this series the n'th order Volterra operator Hn is given by the following expression:
Hn[x](u) =
Z1 Z1 ?1
?1
hn (u ; . . . ; un )x(u ? u ) x(u ? un)du dun : 1
1
1
(8)
The functions hn(u ; . . . ; un) are called the Volterra kernels of the system. The Volterra representation (7) is essentially a functional power series. Therefore one must be prepared to encounter convergence problems. Convergence may be a dicult issue which we will neglect to a large extent in this paper. Anyway, the Volterra series gives in many cases a good approximation for a nonlinear system even if it converges only for a limited range of input signals. One important point is the stability of the system T. In accordance with the treatment in [3] we say that the system T is stable if every bounded input gives rise to an output 1
5 that is also bounded. It is shown in [3] that a system is stable if its Volterra kernels hn satisfy the following condition:
Z1 Z1 ?1
?1
j hn(u ; . . . ; un ) j du dun < 1 8n 2 IN: 1
1
(9)
It is important to note that the condition (9) is sucient but not necessary for the stability of a system.
4 Utilitzing Volterra techniques for constructing invariant features It is for several reasons not straightforward to apply the techniques described in section 3 for the construction of invariant features. First, note that in the Volterra series (7) the system input and output are elements of the same signal space. This is undesirable for the purpose of feature extraction where one of the major goals is data reduction. The second point concerns the special role of signal translations. On the one hand the systems in Volterra theory are supposed to be shift invariant (cf. (6)) and on the other hand there are transformation groups (cf. (3) and (4)) which require translation invariance of the system output. Clearly, these two points are not independent. Since we want the system output to be of lower dimension than the input it is unclear how to de ne output translations. A possible solution for these problems is to concentrate on the functional aspect of the Volterra series (7). The formula (7) can be evaluated for every pixel u of the support space:
y(u) = T[x] =
1 X n=1
Hn [x](u):
(10)
The pointwise value of the Volterra operators is given by (8). Now we interpret u only as an auxiliary parameter which labels the features y(u). The essence of formula (8) is that we have to pick at every location of the signal an appropriate subset of the signal, that we evaluate a function using this subset (in the Volterra theory these functions are monomials of the signal values) and that we have to multiply this value with the value of a function which only depends on the coordinates of the subset and that we have to add the results of these computations. That is the interpretation of the Volterra approach which is most useful in the context of feature extraction. Generically, we denote by f (M[u ]; . . . ; M[un ]) a real valued function of the n gray values M[u ]; . . . ; M[un] (the actual value of n will hopefully be clear from the context) and by I (u ; . . . ; un) a real valued function of the n pixel coordinates u ; . . . ; un . 1
1
1
1
6 For a given function f and a given gray scale image M we denote by A[f ](M) the following integral:
Z
Z
(11) f (M[u1]; . . . ; M[un ])I (u1; . . . ; un)du1 dun : u1 un Note that A[f ] depends on the entire gray scale image M whereas f only depends on n gray values. We call I (u1; . . . ; un) geometric kernel of order n and f (M[u1]; . . . ; M[un]) gray scale kernel of order n. The feature A[f ] is called an n-th order gray scale feature. We will derive appropriate gray scale and geometric kernels so that the features A[f ] constructed according to (11) are invariant with respect to the action of the transformation group G.
A[f ](M) =
5 Constructing invariant image features by integration 5.1
Convergence of the integrals
For a given function f we want to construct an invariant image feature A[f ] according to formula (11). Let us rst make a few preliminary remarks about the convergence of this integral. First we note that the integrations in (11) are over a bounded region. That is an obvious restriction due to the nite extent of the camera. On the one hand this is comfortable since we need not worry about the convergence of (11) for uj ; j = 1; . . . n going to in nity. On the other hand this causes some trouble concerning the relationship f as formulated in (2). The between a gray scale image M and the transformed image M reason for this trouble is best understood from a simple example. The ane group allows for image zooming. Given a zoom factor we have to calculate the transformed image f according to (2) as M
f[u] = M[ue] = M[(u ; u )T ]: M (12) For > 1 there are pixel coordinates u so that (u ; u )T exceeds the bound 0 1
2
1
2
u ; u < N . Therefore we do not know how to calculate the gray values of the transformed image for these pixel coordinates. In practice zooming simply means that the distance between camera and object changes and that a greater (smaller) part of the background becomes visible. However, the structure of the background is not known beforehand and therefore it is impossible to predict the transformed image from the original one. The aforementioned problem is circumvented by imposing the following (severe) restrictions which are also necessary for other reasons (cf. below): 1
2
7 1. there is only one object in the scene 2. in the transformed image the entire object is visible 3. the object can be segmented from the background. The background intensity is set to zero. 4. the gray scale kernel f (M[u ]; . . . ; M[un]) equals zero if one of the gray values M[u ]; . . . ; M[un ] equals zero. 1
1
These conditions guarantee that only gray values coming from the object in the image contribute to the feature A[f ] in (11). It is interesting to note that the situation is much better for the group of image rotations and translations. Here it is not necessary to segment the object from the background. It is also possible to cope with several objects in a single scene (cf. [6, 7]). Since we are here mainly interested in the ane and projective group we will not pursue these questions any further.
6 Geometric kernels and gray scale kernels for invariant image features f. In order to derive We start with a gray scale image M and a transformed image M f appropriate kernels we evaluate (11) for the transformed image M. f) A[f ](M
= =
Z Z
u1
u1
Z Z
un un
f[u ]; . . . ; M f[un])I (u ; . . . ; un )du dun: f (M
(13)
f (M[ue ]; . . . ; M[uen])I (u ; . . . ; un )du dun:
(14)
1
1
1
1
1
1
Now the basic idea is to apply the transformation formula for multidimensional integrals f) (equation (14)) and A[f ](M) (cf. Appendix A, Theorem 2) in order to relate A[f ](M (equation (11)). Two steps are necessary for this purpose. 1. Express I (u ; . . . ; un) in terms of I (ue ; . . . ; uen ). We assume that this can be done by an appropriate function R: 1
1
I (u ; . . . ; un) = R (I (ue ; . . . ; uen )) : 2. Calculate the Jacobian JUe (U) of the transformation 1
1
Ue := (ue ; . . . ; uen )T ! U := (u ; . . . ; un )T 1
1
(15) (16)
8
e each with 2n components and that Note that we have de ned two vectors U; U the functional matrix (cf. Appendix A) from which the Jacobian is calculated is a 2n 2n matrix. Let us assume for the moment that we have solved these questions for the transformation groups in question and let us proceed to examine the consequences for the corresponding kernels. By using the function R we can write for (14):
f) = A[f ](M
Z
Z
(17) f (M[ue1]; . . . ; M[uen])R (I (ue1; . . . ; uen )) du1 dun: u1 un Since we want apply the transformation formula we must relate the function R and the Jacobian JUe (U). We assume that the following key identity is ful lled:
R (I (ue ; . . . ; uen)) = JUe (U)I (ue ; . . . ; uen )
(18)
I (u ; . . . ; un) = JUe (U)I (ue ; . . . ; uen )
(19)
1
1
Recalling the de nition of R this can be rewritten in the more suggestive form 1
1
Equation (19) simply states that the geometric kernel I (u ; . . . ; un) must be a relative invariant with weight JUe (U). That statement is somewhat sloppy since the term `relative invariant' is used in the literature if JUe (U) only depends on the transformation parameters g. Here JUe (U) may depend on both the parameters g and the pixel coordinates ui. f) (cf. equation (17)) By using (18) we can write for A[f ](M 1
f) = A[f ](M
Z
Z
f (M[ue1]; . . . ; M[uen])JUe (U)I (ue1; . . . ; uen )du1 dun: (20) u1 un But according to the transformation formula for multidimensional integrals (cf. Appendix A, Theorem 2) equation (20) can be rewritten as
f) = A[f ](M
Z u1 e
Z uen
f (M[ue ]; . . . ; M[uen])I (ue ; . . . ; uen )d~u d~un = A[f ](M) (21) 1
1
1
In summary, we have proven the following theorem
Theorem 1 Let a gray scale image M and an n-th order gray scale kernel f (M[u ]; . . . ; M[un]) 1
be given which ful ll the requirements 1.-4. from section 5.1. For every n-th order geometric kernel I (u1; . . . ; un ) which is a relative invariant with weight JUe (U), i.e.
9 I (u ; . . . ; un) = JUe (U)I (ue ; . . . ; uen ) 1
(22)
1
the feature A[f ] constructed according to
A[f ](M) =
Z
u1
Z
un
f (M[u ]; . . . ; M[un ])I (u ; . . . ; un)du dun : 1
1
1
(23)
f), provided that the is an invariant n-th order gray scale feature, i.e. A[f ](M) = A[f ](M integral exists. Theorem 1 is a useful result because it suggests a universal strategy for constructing invariant image features for general transformation groups. However, one has to be cautious about the existence of the integrals (as stated in the last sentence of the Theorem). In a forthcoming publication we will discuss the application of Theorem 1 for the construction of ane and projective invariant gray scale features. There we will see that singularities may occur in the geometric kernels which must be handled with some care. The obvious disadvantage of this method is the rather high computational complexity. For an N N gray scale image and kernels of order n the complexity for the calculation of A[f ] is of order O(N n). However, by using a-priori knowledge or algebraic invariants it is possible to reduce the computational complexity signi cantly. That will be discussed in a separate publication. 2
7 Summary and Conclusion We have presented in this paper new techniques for the construction of invariant gray scale features. Motivated by analogies with the Volterra theory of nonlinear systems we have introduced in section 6 (Theorem 1) a general method for the determination of invariant image features for arbitrary transformation groups. The idea is to integrate the product of a gray scale kernel and a geometric kernel over a given image. The gray scale kernel only depends on the gray values of the image and the geometric kernel depends on the corresponding pixel coordinates. Both kernels are in general nonlinear functions. Through the product a coupling between gray values and pixel coordinates is achieved. The desired invariance of the resulting features imposes speci c restrictions on the geometric kernels. It is shown that the geometric kernels must be relative invariants with respect to the transformation group. The weight is given by the Jacobian of the coordinate transformation and the order of the kernels (cf. equation (22) in Theorem 1). It has to be emphasized that the invariant gray scale features developed in this paper are not geometric invariants which rely on a parametric description of the object boundaries and on the calculation of appropriate derivatives. In fact, our algorithms only require the
10 segmentation of the objects from the background and the pixel coordinates together with the associated gray values. The algorithms proposed in this paper oer a considerable amount of exibility for the construction of invariant features. The gray scale kernel f can be freely chosen. These degrees of freedom can be exploited to construct features which are adapted to speci c applications. In fact, it is a promising subject for further research to investigate systematically the in uence of the the gray scale kernel on the quantitative feature properties. Finally, we mention that the extension of the proposed techniques to 3D data sets is straightforward. In a forthcoming publication we will discuss the extension of the proposed approach to ane and projective image transformations. Currently, extensive experimental investigations of the proposed methods are underway and will be discussed elsewhere.
A Transformation formula for multidimensional integrals The section gives a summary of the transformation formula for multidimensional integrals (cf. [1] for a more detailed treatment). We denote by U; V open subsets of IRm. The elements of IRm are vectors x = (x ; . . . ; xm)T . Let ' : U ! V be a bijective mapping so that both ' and '? : V ! U are continuously dierentiable (i.e. the mappings are dierentiable and the derivatives are continuous). We denote by (' ; . . . ; 'm) the components of the mapping '. The functional matrix D' is the following m m matrix: 1
1
1
0 D'(x) = B @
@'1 (x) @x1
... @'m (x) @x1
@'1 @xm (x)
...
@'m (x) @xm
1 C A:
(24)
The Jacobian J'(x) of the transformation is de ned as the magnitude of the determinant of the functional matrix:
J'(x) =j det D'(x) j : Then the transformation formula for multidimensional integrals is
(25)
Theorem 2 Let U; V be open subsets of IRm and let ' : U ! V be a mapping as de ned above. Then for every continuous function f : V ! IR with compact support
Z
U
f ('(x))J'(x)dx =
Z
V
f (x)dx:
11
B List of Symbols M f M u
gray scale image transformed gray scale image coordinate vector of a pixel T (u ; u ) components of the pixel vector u ue transformed coordinate vector of a pixel M[u] gray value at the pixel coordinate u Grot group of image rotations and translations Gaff group of ane image transformations Gpro group of projective image transformations A 2 2 matrix of ane transformation parameters P 3 3 matrix of projective transformation parameters I (u ; . . . ; un ) geometric kernel of order n f (M[u ]; . . . ; M[un ]) gray scale kernel of order n T(ui; uj ; uk ) 2 2 matrix for ane geometric kernels D(ui; uj ; uk ) 3 3 matrix for projective geometric kernels spacing of the pixel grid cuto for geometric kernels factor for image zooming D(P; u) P1 u1 P2 u2 P3 3 (x) Dirac distribution A(M) area of the object in the image M Dn integration domain for n-th order features Rn reduced integration domain for n-th order features Di;j;k area of the triangle spanned by the three pixels ui ; uj ; uk 1
2
1
1
1
(
+
+
)
REFERENCES
12
References [1] O. Forster Analysis 3. Integralrechnung im IRn mit Anwendungen. Vieweg, 1983. [2] T. H. Reiss Recognizing Planar Objects Using Invariant Image Features. Lecture Notes in Computer Science, no.676, Springer 1993. [3] M. Schetzen The Volterra and Wiener Theories of Nonlinear Systems. John Wiley & Sons, 1980. [4] H. Schulz-Mirbach Constructing invariant features by averaging techniques. Proc. of the 12'th International Conference on Pattern Recognition, vol.II, pp.387-390, Jerusalem, Israel 1994. [5] H. Schulz-Mirbach Algorithms for the construction of invariant features. In W. G. Kropatsch und H. Bischof (Hrsg.), Tagungsband Mustererkennung 1994 (16. DAGM Symposium), Reihe Informatik Xpress, Nr. 5, S. 324-332, Wien 1994. [6] H. Schulz-Mirbach Anwendung von Invarianzprinzipien zur Merkmalgewinnung in der Mustererkennung. VDI Fortschritt-Bericht, Reihe 10, Nr. 372, VDI Verlag 1995. [7] H. Schulz-Mirbach Invariant features for gray scale images. In: G. Sagerer, S. Posch, F. Kummert (Hrsg.), Tagungsband Mustererkennung 1995 (17. DAGM Symposium), Reihe Informatik aktuell, S. 1-15, Springer 1995. [8] I. Weiss Geometric Invariants and Object Recognition. International Journal of Computer Vision, 10:3, 201-231, 1993