Therefore, given a class of objects we attempt to determine the set of ambiguous ..... Let I1 and I2 be two common images obtained with the di erent views,.
When Is It Possible to Identify 3D Objects from Single Images Using Class Constraints? Ronen Basri and Yael Moses Dept. of Applied Math The Weizmann Inst. of Science Rehovot, 76100, Israel Category: Object Recognition and Indexing
Abstract
One approach to recognizing objects seen from arbitrary viewpoint is by extracting invariant properties of the objects from single images. Such properties are found in images of 3D objects only when the objects are constrained to belong to certain classes (e.g., bilaterally symmetric objects). Existing studies that follow this approach propose how to compute invariant representations for a handful of classes of objects. A fundamental question regarding the invariance approach is whether it can be applied to a wide range of classes. To answer this question it is essential to study the set of classes for which invariance exists. This paper introduces a new method for determining the existence of invariance for classes of objects together with the set of images from which these invariance can be computed. We develop algebraic tests that, given a class of objects undergoing ane projection, determine whether the objects in the class can be identi ed from single images. In addition, these tests allow us to determine the set of views of the objects which are degenerate. We apply these tests to several classes of objects and determine which of them is identi able and which of their views are degenerate.
This research was supported by a grant from the Israel Science Foundation No. 184/96. The vision group at the Weizmann Inst. is supported in part by the Israeli Ministry of Science, Grant No. 8504. Ronen Basri is an incumbent of Arye Dissentshik Career Development Chair at the Weizmann Institute.
1 Introduction Inferring the identity of objects despite variations due to changes in viewing position is a fundamental problem in object recognition. One problem that arises when we attempt to recognize 3D objects from single 2D images is that some information about the shape of objects is lost with projection. Consequently, a particular image could be the result of projecting any of in nitely many objects. Determining which of these objects have in fact produced the image is impossible unless further constraints are imposed. Two common approaches to solving this problem are the model-based and class-based invariance approaches to recognition. Model-based methods (e.g., [4, 8, 12, 23]) approach recognition by storing a nite library of object models in the system memory. Given an image, the identity of objects in the image is determined by comparing the image to the models in the library. To avoid comparing the image to all the models, indexing tables were proposed [9, 11, 25]. To determine whether a given object can produce the image the object model must contain 3D information of the object. Therefore, model construction generally requires the acquisition and matching of two or more images of the object. Class-based invariance oers an alternative to the model-based approach. In this method objects are recognized by extracting image properties that are invariant to changes in viewing positions. Invariance has successfully been used for certain classes of objects, e.g., planar, bilateral symmetric, or polyhedral objects [7, 14, 15, 16, 18, 19, 20, 24], see a recent review in [28]. A recognition system that uses the invariance approach typically proceeds in two stages. In the rst stage the class of the object is recognized, and in the second stage the invariant properties of the class are used for identi cation. The advantage of using invariance for such classes is twofold. First, invariance is computed from single images; hence also object models can be constructed from single images. Second, invariance distinguishes between objects even if models for these objects are not stored in the system memory. Despite its appeal, the class-based invariance approach cannot be used directly as a method for identifying general 3D objects from single 2D images since, as has been shown in [2, 3, 14], single 2D images of general 3D objects exhibit no viewpoint invariance. The scope of the invariance approach, therefore, is limited to certain classes of objects. It is therefore of interest to study whether the invariance approach can be extended to handle a wide range of classes of objects. Existing studies are limited to merely a handful of examples of particular classes. These studies, however, oer no general tools to extend their results to other classes of objects. In this paper we develop a method to determine the existence of invariance for classes of objects. (The problem of determining the class of the object is not addressed in this paper.) We consider only invariance that can distinguish between all the dierent objects in a given class (as opposed to invariance that distinguishes between subsets of the objects, in which case dierent objects may give rise to the same invariant values). We call these classes identi able classes. We introduce 1
necessary and sucient conditions for such invariance to exist. For the identi able classes we can also use our method to detect the degenerate views, views from which the invariance cannot be computed. Our method is general and does not depend on a speci c choice of a class. Our approach is based on the following principles. A system can identify an object unequivocally from a given image if and only if that object is the only object, among the objects of interest, that can produce the given image. (Of course one can design systems that identify objects according to, say, maximal-likelihood principles. We do not deal with such a framework here.) We therefore develop a method for exploring the set of ambiguous images, images that can be produced by at least two objects of the same class. Clearly, if no further constraints are imposed an object cannot be identi ed from an ambiguous image. Therefore, given a class of objects we attempt to determine the set of ambiguous images of the class. If we nd that all the images in the class are ambiguous we conclude that the class is not identi able from any view. If some of the images are non-ambiguous, then the class is identi able, and the ambiguous images are considered degenerate. Below we develop tests to determine the set of ambiguous images of a class. These tests depend on the choice of projection model. In this paper we introduce algebraic tests for objects (given as 3D point sets) undergoing ane projection (that is, 3D-to-2D ane transformation). The rest of the paper is organized as follows. Section 2 introduces the general framework of our work. In Section 3 we use this framework to develop algebraic tests for classes of objects that undergo ane projection. In Section 4 we apply these tests to a number of classes of objects. We conclude with a discussion of our results in Section 5.
2 General Framework Object recognition (\naming") can be expressed as a mapping from a set of images to a set of object names. In general, we would like any image of a given object to be mapped to the object name. However, it is not always possible to de ne such a mapping. This is because an image of an object depends not only on the shape of the object, but also on its position, the projection model, and other imaging parameters (e.g., camera parameters). As a result, dierent objects may produce the same image, and therefore the inverse imaging function, from images to objects, cannot be de ned in such cases. Below we limit our discussion to images of objects that vary because of camera position (that is, we do not consider changes of illumination direction, background, etc.). In this section we discuss the conditions under which a mapping from a set of images to a set of object names can be de ned. We rst outline conditions which are independent of the projection model. Later, we develop tests for detecting these conditions for the case of objects undergoing an ane projection. We call two objects, O and O0 , equivalent if and only if every image of O is also an image of O0 and vice versa. Clearly, no recognition algorithm can discriminate between two equivalent objects. 2
The choice of projection model determines the set of equivalent objects. For example, under the ane projection model every two objects that are related by a 3D ane transformation are equivalent. When two objects are not equivalent (dierent) they may still share some of their images. We call these images ambiguous. In this case too no recognition algorithm can identify an object from an ambiguous image. If two objects share an image, they will also share other images which dier by 2D transformations from each other. Such images are called equivalent images. Similar to the equivalent objects, the set of equivalent images is determined by the projection model. For example, for rigid objects undergoing orthographic projection two images are equivalent if they are related by a 2D rigid transformations, whereas under the ane projection model two images are equivalent if they are related by a 2D ane transformation. For a given object, we de ne a view to be the set of all equivalent images. It is clear that if two objects share an image they will also share the set of images that are in the same view. It follows that any recognition function cannot identify an object from any of the images in its set of ambiguous views. We will therefore alternately consider in the rest of this paper ambiguous views and ambiguous images. Note that ambiguous views, which we also refer to as degenerate views, are related but not identical to accidental views. The term \accidental view" often is used to describe a view of an object from which a non-stable image is obtained (see, e.g., [26]). That is, a view is accidental if a small change in the viewing direction will cause a large change in the appearance of the object. Such instability often is caused as a result of a change of aspect of the object due to self occlusion. Degenerate views, in contrast, are views from which an object appears identical to another object of the same class. Degeneracy, therefore, is a property which depends on the class, while accidentalness is a property of the object irrespective of the class. In many cases accidental views are also degenerate. For instance, consider the side views of a planar objects. These views are accidental, since the object appears as a line. Likewise, within the class of planar objects these views are degenerate since many dierent planar objects can be confused from images that were taken from these viewing directions. Let us rst consider an extreme case in which the class of objects contains all the objects described by n-tuples of 3D points. It is easy to show that, for this class of objects, every image is ambiguous (there are at least two dierent objects that can project to the image). Since it is impossible to identify a given object from its ambiguous views, it is impossible in general to construct a recognition mapping (a mapping which given an image of an object returns a unique name for the object). This is a weaker version of the result that for this class there exists no view invariance [2, 3, 14]. It follows that to de ne a recognition mapping it is necessary to restrict the domain of the naming function. In our work we restrict the domain by imposing two types of constraints. First we constrain the set of objects considered by the system (e.g., the class of bilaterally symmetric objects as opposed to the set of general 3D objects). In addition, we may constrain the set of images considered for each object in the class (e.g., excluding the side viewing direction of planar objects). Note that when we consider all 3
3D point sets it is not sucient to restrict only the set of views since for this class of objects every view is ambiguous. Another extreme case of class constraints is provided by the model-based approach to recognition. In this paradigm only nite sets of objects are considered. For a nite set of objects the set of ambiguous views is nite (at most quadratic in the number of objects in the set), and so by excluding the ambiguous views we obtain identi able classes. Thus, every nite class of objects is identi able. Below we concentrate on in nite sets of classes. Our task then is to determine, given an in nite class of objects, the set of views from which the objects can be identi ed. Formally, given a class of objects C , let IC be the set of all images of C , and let NC denote the set of names of objects in C , we seek to determine the set of images IC IC on which the naming mapping NC : IC ?! NC
can be de ned. Note that usually NC will be isomorphic to C . In this case the existence of this mapping would imply that shape reconstruction of objects of the class from single images is in principle possible if the class constraints are known. Since without imposing further assumptions it is impossible to determine the identity of an object from an ambiguous view, the ambiguous views will restrict the domain of NC . The set IC , therefore, will denote the non-ambiguous images of C . Our general idea can be summarized in the following proposition: Proposition 1: An object O 2 C is identi able from a view v^ if and only if there exists no dierent object O0 2 C that shares the view v^ of O. Below we consider the case of objects that undergo ane projection. For these objects we develop tests that, given a class of objects C , determine the set IC of its non-ambiguous images. When IC 6= ; we say that C is identi able. In such a case we may also use our tests to determine the set of degenerate views of C , the viewing directions from which ambiguous images are produced. When IC = ; we say that C is not identi able from any view.
3 Identi cation under ane projection In this section we develop tests for identi cation of 3D objects undergoing ane projection. An image is an ane projection of a given object if it is obtained by applying a 3D ane transformation to the object followed by an orthographic projection. Ane projection is the model of choice in several recognition studies [9, 10, 11, 22, 23] (see [9, 1] for more on this transformation). The tests are applied to objects given as ordered 3D point sets. We assume here that all the points are visible in every considered view. It is possible, however, to use our results in cases of partial (or self) occlusion by considering subsets of the points of the objects. 4
3.1 Notation
Given an object O consisting of n ordered points p1 = (x1; y1 ; z1)T , ..., pn = (xn ; yn ; zn )T , we write O in a matrix form as a 3 n matrix O = [p1; p2; :::; pn]. Further, we denote the rows of O by x; y; z 2 Rn and by row(O) the row space of O (which is the vector space spanned by x; y; z). Throughout the paper we shall assume that the object points, p1 ; ::::; pn are non-coplanar. This implies that rank(row(O) [ f1g) = 4. We shall also assume that n is suciently large to induce invariance whenever such invariance exits. Note that we consider two objects that contain the same set of points but ordered dierently as dierent objects. This assumption can be dropped for some of our results (see Section 5). We denote an image by a 2 n matrix I , that consists of the location of the object points in the image. Let I be an image obtained as a result of an ane projection of O. I is produced by applying a 3D ane transformation to O followed by an orthographic projection. Speci cally, the image position qi 2 R2 of an object point pi 2 R3 of O taken from a view v^ is given by
qi = Av^ pi + t; where Av^ is a 2 3 linear matrix of rank 2 and t 2 R2. The subscript v^ represents the viewing direction from which the image is observed. This will be de ned below. The result of applying this ane transformation to O can be written in a matrix form as
I = Av^ O + t 1T : Two objects, O and O0 , share an image I if there exist two 2 3 matrices of rank 2, Av^ and Au^ , and two vectors, t; s 2 R2, such that
I = Av^1 O + t1T = Av^2 O0 + s1T : A view under the ane projection model is the set of images that are 2D ane equivalent (namely, related by a 2D ane transformation). That is, both I and I 0 = AI + t1 T (where A is 2 2) are ane equivalent and therefore belong to the same view of O. Clearly, whenever two objects share an image from a given view, they will also share all images in that view. An ane view v^ is de ned by the set of matrices AAv^ , where A is an arbitrary 2 2 non-singular matrix. It is not dicult to see that under ane projection a view is determined by a unit vector v^ 2 R3 such that Av^ v^ = 0 (or, in other words, by a point on a unit sphere). Since AAv^ v^ = 0 for every 2D ane matrix A, the vector v^ 2 R3 uniquely determines the set of ane-equivalent images that were taken from viewing direction v^. Below we shall use the notation Av^ to denote a 2 3 matrix of a given view, v^. Given two objects, O and O0, we de ne their 7 n joint matrix J(O; O0) to contain the rows of O 5
and O0 and the vector 1 stacked on top of each other in the form:
2O3 J(O; O0) = 4 O0 5
1T
(1)
In Section 3.2 below we show that the rank of J(O; O0) can be used to determine whether O and O0 are ane equivalent, share a view, or are completely disjoint.
3.2 Tests for identi cation Two objects, O and O0 , are called ane-equivalent if there exists an ane transformation, a 3 3 non-singular matrix A and a vector t 2 R3, such that O = AO0 + t 1T . In this case every image of O is also an image of O0 and vice versa (see Proposition 2 below). Clearly, under the ane projection, it is impossible to distinguish between objects that are ane-equivalent. Otherwise, if O0 cannot be obtained from O by applying an ane transformation to O then the two objects are called anedierent. Following is a list of necessary and sucient conditions for ane equivalence.
Proposition 2: The following conditions are equivalent: (a) O = AO0 + t 1T for some 3 3 non-singular matrix A and a vector t 2 R3. (b) rank(J(O; O0))=4. (c) Every image of O is also an image of O0 and vice versa. (d) O and O0 share more than a single view. Proof: (a) ) (b): Assume that O = AO0 + t1T . Since both O and O0 are non-planar (rank(row(O) [f1g) = rank(row(O0 ) [f1g) = 4) it follows that rank(J(O; O0)) 4, and since the rows of O contain only linear combinations of the row vectors of O0 and 1, it follows that rank(J(O; O0)) is exactly 4.
(b) ) (a): Assume that rank(J(O; O0)) = 4. By our assumption rank(row(O0) [ f1g) = 4. It follows that row(O) row(O0) [ f1g. In particular, it follows that there exists a 3 3 matrix A and a vector t 2 R3 such that O = AO0 + t 1T . The matrix A is clearly non-singular because otherwise it will contradict our assumption that rank(row(O) [ f1g) = 4. (a) ) (c): Assume that O = AO0 + t 1T . Let an image of O taken from a viewing direction v^ be I = Av^ O + t1 1T : (2) Denote Au^ = Av^ A and t2 = Av^ t + t1 . The image of O0 which is identical to I is given by:
Au^ O0 + t2 1T : 6
(3)
This can be easily veri ed since
I = Av^ O + t1 1T = Av^ (AO0 + t 1T ) + t1 1T = Au^ O0 + t2 1T :
(4)
(c) ) (d): Trivial. (d) ) (a): Assume that O and O0 share more than one view. Denote the common views of O by v^1 and v^2 and of O0 by u^1 and u^2 respectively. Let I1 and I2 be two common images obtained with the dierent views,
"
where
I1 = Av^1 O = Bu^1 O0 + t 1T I2 = Av^2 O = Bu^2 O0 + s 1T ;
#
"
#
"
(5)
#
"
#
Av^ = aax ; Av^ = aax ; Bu^ = bbx ; and Bu^ = bbx y y y y are non-singular matrices and t = (tx ; ty )T and s = (sx ; sy )T are two translation vectors. Since I1 and 1
1
2
1
2
1
2
1
2
1
2
2
I2 are obtained from dierent views (and so they are not related by a 2D ane transformation), then without a loss of generality ax2 is linearly independent of ax1 and ay1 . Denote by
2 ax 3 1 A = 4 ay1 5 ;
ax
2
From Eq. 5 it follows that
2 bx 3 2 tx 3 1 B = 4 by1 5 ; and t 0 = 4 ty ; 5 :
bx
2
sx
AO = BO0 + t 0 1T :
(6)
O = A0 O0 + t 001T :
(7)
By construction, A is non-singular. We can therefore choose A0 = A?1 B and t 00 = A?1 t to obtain The matrix A0 is clearly non-singular because otherwise it will contradict our assumption that rank(row(O)[ f1g) = 4. 2 To determine the set of ambiguous views of a class we need to develop necessary and sucient conditions for two ane dierent objects to share a view. These conditions are speci ed in Proposition 3 below.
Proposition 3: The following conditions are equivalent: (a) There exist a single direction v^ 2 R3 and translation t 2 R2 such that Av^ O = Au^ O0 + t 1T , where Av^ and Au^ are 2 3 matrices of rank 2. rank(J(O; O0))=5.
(b) (c) O = AO0 + t 1T + v^T , where A is a 3 3 matrix and rank(A) 2, t 2 R3, v^ 2 R3 is a non-zero vector, and 2 Rn is orthogonal to row(O0 ) [ f1g. (O0 share the view v^ of O.) 7
(d) O = AO0 + t 1T + v^T , where A is a 3 3 matrix and rank(A) = 3, v^ 2 R3 is a non-zero vector, and 2 Rn where 62 row(O0) [ f1g. (O0 share the view v^ of O.) Proof: (a) ) (b): Assume that there exists a single direction v^ such that Av^ O = Au^ O0 + t 1. Let B = [Av^ ; ?Au^ ; ?t ] be a 2 7 matrix. We obtain that BJ(O; O0) = 0:
(8)
Since rank(Av^ ) = rank(Au^ ) = 2 it follows that rank(B ) = 2. This implies that rank(J(O; O0)) 5. Since O is non-planar (rank(row(O) [ f1g) = 4) it follows that J(O; O0) 4. It must be dierent than 4 because O and O0 share only a single view, and therefore they cannot be ane-equivalent (Proposition 2, (d) ) (b)). Therefore, rank(J(O; O0)) = 5. (b) ) (c) Assume that rank(J(O; O0)) = 5. At least one of the rows of O must be independent of the rows of O0 and the vector 1. Assume without the loss of generality that this row is x1. It follows that there exists a vector which is orthogonal to the row(O0) [ f1g such that
x1 = cxx2 + cy y2 + cz z2 + c11 + ;
(9)
where x2, y2 , and z2 are the row vectors of O0. Since rank(J(O; O0)) = 5, it follows that y1; z1 2 spanfx1; x2; y2; z2; 1g or equivalently y1; z1 2 spanf; x2; y2; z2; 1g. Consequently,
O = AO0 + t1 + v^ T :
(10)
Finally, since 1 62 row(O) it follows that rank(O ? t 1) = 3. The rank of v^ T is 1, therefore, rank(AO0) 2. Since rank(O0 ) = 3 it follows that rank(A) 2. We next show that the view of O shared with O0 is v^ . Let Av^ be a 2 3 matrix such that Av^ v^ = 0 then O0 share the view v^ with O:
Av^ O = Av^ (AO0 + t 1 + v^ T ) = Av^ AO0 + Av^ t 1T
(11)
(c) ) (d) Assume that O = AO0 + t 1 + v^T where v^ = (vx; vy ; vz )T , rank(A) = 2 and 2 ax 3 A = 4 ay 5 : az Assume without a loss of generality that az depends on ax and ay , and that ax is independent of ay . Let a = ax ay , let 2 (ax ? vx(az ? a))T 3 B = 4 (ay ? vy (az ? a))T 5 ; vz aT and let 0 = + (az ? a)T O. It can be readily veri ed that O = AO0 + v^ T = BO0 + v^ 0 T : (12) 8
By construction rank(B ) = 3 and 0 2= row(O) [ f1g. It can be shown as in the previous e case that the view of O shared with O0 is v^ . (d) ) (a): Assume that O = AO0 + t1T + v^T . Consider the images of O taken from a direction v^. Let Av^ be a 2 3 matrix of rank 2 such that Av^ v^ = 0. It follows that
Av^ O = Av^ (AO0 + t1 1T + v^1T ) = Av^ AO0 + Av^ t1 1T :
(13)
Let Au^ = Av^ A and t = Av^ t1 , then we obtain
Av^ O = Au^ O0 + t 1T :
(14)
Since 62 row(O0) [f1g then O and O0 cannot be ane equivalent. Hence, according to Proposition 2, v^ must be unique. 2 To determine whether a class is identi able we use the above propositions in conjunction with Proposition 1.
4 Examples In this section we apply the tests developed in Section 3 to several classes of objects. For each class we determine the set of views from which the objects of the class can be identi ed. Some of these classes are shown to be identi able from most or all views. Other classes are shown to be non-identi able from any view. We begin with classes of objects which are composed of the same set of parts, but which the relative position and orientation of their parts may vary across objects (Section 4.1). We then proceed to discuss classes of objects which have two identical parts, except that one part may appear at dierent position and orientation with respect to the second part. In this case the shape of the parts may vary across objects (Section 4.2). Next, we consider classes that contain combinations of sub-structures, that is one part of the object is a function of the other parts (Section 4.3). Finally we consider classes of objects that can be expressed as combinations of prototype objects (Section 4.4).
4.1 Objects with same set of parts Many classes of interest contain objects all of which are composed of the same set of parts. The identity of an object within the class is de ned by the exact position and orientation of one part with respect to another, and by the relative size or stretch of one part with respect to another. An example is given in Figure 1, where three hammer-like shapes with identical handles and heads which dier by ane transformations are shown. Formally, a class CP is de ned by the set of parts fQ1 ; :::; Qmg where Qi are 3 ni matrices for some arbitrary ni > 3. An object O 2 CP is given by O = [P1 ; :::; Pm], where the 3 ni matrices Pi = fi (Qi) describe the parts of O. The functions fi determine the identity of the objects in the class. 9
Figure 1: Three hammer-like shapes composed of identical parts but dier in the shapes of their heads by generic ane transformations.
Figure 2: An overlay of the leftmost hammer (solid) with the other two hammers (dashed). Notice that the handles of the hammers are identical whereas their heads dier.
Figure 3: An overlay of the leftmost hammer (solid) and each of the other two hammers (dashed) taken from their common views. The common views are (0.29,-0.41,0.87) and (0.32,-0.73,0.60). Notice that for every view of the leftmost hammer we can produce in nitely many dierent hammer-like shapes that look identical to this hammer.
10
Below we limit our discussion to ffi g that represent ane transformations. In this case, an object O 2 CP can be written in a matrix notation as
O = [P1; ::::; Pm];
where Pi = Bi Qi + si 1T for 1 i m:
(15)
Bi are 3 3 non-singular matrixes and si 2 R3 which vary between the dierent objects in the class. Below we assume that every part Qi is non-planar, that is, rank(Qi [ f1g) = 4.
The next proposition establishes that if the parts of the objects in the class may undergo general ane transformations between objects parts then the class is non-identi able from any view. If however we restrict the transformation applied to the parts to scaling and stretching along the primary axes we obtain classes which are identi able from almost all views. This implies, in particular, that it is possible in principle to extract the relative scale and stretch of objects' parts from single images.
Proposition 4: Given 3 n non-planar matrices Qi (1 i m) denoting an object part, the class CP = f[P1; :::; Pm] j Pi = Bi Qi + si1T ]g is: 1. Non-identi able for general ane transformation, that is, for arbitrary 33 non-singular matrices Bi and constant si 2 R3. (Consequently, the class is non-identi able also if we let si be arbitrary.) 2. Non-identi able for pure translation, that is, for Bi = I and arbitrary si 2 R3. 3. Identi able from almost all views for scaling and stretching along the primary axes, that is, for diagonal Bi and constant s 2 R. In this case degenerate views are obtained only for v^ = (v1; v2; v3)T such that vi = 0 for some 1 i 3.
Proof:
1. To show this we need to show that for every choice of object O 2 CP and every choice of view v^ there exists an ane-dierent object O0 2 CP that shares the view v^ of O. Let O = [P1 ; :::; Pm] where Pi = Bi Qi + si1T for 1 i m, and let O0 = [P10 ; :::; Pm0 ] where Pi0 = Bi0 Qi + si 1T for another set of Bi0 and for 1 i m. Since the matrices Bi determine the identity of the object it follows that given fBi g and v^ we need to nd matrices fBi0 g and a view u^ such that O and O0 will share an image. We set B10 = B1 and Bi0 = Bi + v^riT for some non-zero vector ri 2 R3 such that rank(Bi0 ) = 3 and Bi0 6= I . It is readily veri ed that O and O0 are ane-dierent (since the rst parts of O and O0 are related by the identity matrix, while the other parts are not). Furthermore, the two objects O and O0 share an image when both are taken from viewing direction v^. That is, Av^ O = Av^ O0. To prove it, it is sucient to show that Av^ Pi0 ? Av^ Pi = 0. This can be easily veri ed as follows.
Av^ (Pi0 ? Pi) = Av^ (Bi0 ? Bi )Qi = Av^ v^rT Qi; and since Av^ v^ = 0 we obtain that Av^ (Pi0 ? Pi ) = 0: 11
2. To show this we need to show that for every choice of object O (determined by si ) and view v^ there exists an ane dierent object O0 (determined by si0 6= si ) that shares the view v^ of O. Let O = [P1; P2; ::; Pm] such that Pi = Qi + si 1, and let O0 = [P10 ; P20 ; ::; Pm0 ] such that Pi0 = Qi + s0i 1. We set s10 = s1 and si0 = si + iv^ for 2 i n for some arbitrary i 6= 0. In this case O and O0 are ane-dierent since their rst parts are related by a dierent translation than their second parts, and they share the view v^ of O since
Av^ (Pi0 ? Pi) = Av^ (si0 ? si) = iAv^ v^ = 0: 3. To show this we need to show that, given an object O and a view v^ , any object O0 that shares the view v^ of O must be ane equivalent to O. Let O = [P1 ; :::; Pm] where Pi = Bi Qi + si 1T , and let O0 = [P10 ; :::; Pm0 ] where Pi0 = Bi0 Qi + si 1T for another set of Bi0 . Since O and O0 share the view v^ of O then there exists a view u^ such that
Av^ O = Au^ O0 + t1T : Without loss of generality, we can assume that B1 = B10 = I and s1 = s10 = 0 (since we can choose to consider any of the objects that are ane equivalent to O0 ). This implies that
Av^ P1 = Au^ P10 + t1T Av^ (Bi Qi + si 1T ) = Au^ (Bi0 Qi + si 1T + t1T ):
The rst equation implies that u^ = v^ and that t = 0 (since P1 = P10 and P1 is non-planar), and so the second equation becomes Av^ (Bi ? Bi0)Qi = 0: Since Qi is non-planar we obtain that Av^ (Bi ? Bi0 ) = 0. If Bi = Bi0 for 1 i m then O and O0 are identical. It follows that Bi ? Bi0 = v^rT for some r 6= 0 (since the only vector w for which Av w = 0 is of the form w = v^). However, since both Bi and Bi0 are diagonal also v^rT must be diagonal. But since v^ contains no zero components, the only way to obtain zeros in all non-diagonal components is by setting r = 0. This, however, will imply again that Bi = Bi0 for 1 i m, and so O and O0 are identical. 2 An example to these results is shown in Figures 1-3. In these gures we show three hammerlike objects with identical handles, but their heads dier by an arbitrary linear transformation. We constructed these hammers as follows. We rst constructed the leftmost hammer. Then, we arbitrarily selected two views and modi ed the head of the hammer according to the proof of Proposition 4(1) so as to obtain two dierent hammers that will share those views with our original hammer. Note that according to the proposition we could do this to any desired view, and that at every view we could nd in nitely many dierent hammers (determined by the choice of r) which share this view with our 12
original hammer. It follows therefore that the class of objects with the same parts which may dier by an arbitrary ane transformation is non-identi able. Notice that according to Proposition 4(3) if we allow the parts to only scale and stretch in the directions of the primary axes we would not be able to nd two hammers with a common view except for a small set of degenerate views. A special case of this proposition is the case that the class composed of objects of two identical parts (Q1 = Q2). Notice that the actual shape of the parts was not used in the proof, so even the negative results extend to classes such that all the parts of a given object are identical. In Section 4.2 below we further explore the case of objects such that the parts of a given object are identical, but the parts may dier across the objects in the class. We can conclude that a set of objects that consist of identical set of parts cannot be identi ed if the objects dier by arbitrary ane transformation or translation of their parts. However, if the objects in the class dier by scaling or stretching of the parts, then they can be identi ed from almost all of their images.
4.2 Repeated structures The next set of classes that we consider consist of objects each of which contains two identical, nonplanar parts except that these parts are related by an ane transformation. An object from the class has the following form: O = [P; BP + s1T ]: In contrast to the classes discussed in Section 4.1 now the shape of the parts (denoted by P ) may also vary across objects. We consider below two cases. First we consider the case that the ane transformation relating the two parts may vary across objects. Then we consider the case that this ane transformation is the same for all the objects in the class. A special case for this latter class is the class of bilaterally symmetric objects, in which the identical parts are related by a re ection. Invariants for objects with repeated structures undergoing 3D-to-2D projective transformations were introduced in [15]. In addition, invariants for bilaterally symmetric objects and objects composed of planar repeated structure under both ane and projective transformations were presented in [6, 13, 14, 16, 20]. These studies showed that the class of repeated structures induces invariance on the set of images. The basic intuition is that, when an image of an object with repeated structures is given, this image will in general contain two copies of the same structure. Thus, under the appropriate conditions, the shape of the repeated part can be recovered up to an ane (or projective) transformation by simply using a stereo algorithm [5, 10]. Nonetheless, these studies do not determine if in general the identity of the objects can be determined from single images. Two objects with repeated structures may have exactly the same parts, but these parts may dier in their relative location or size across the objects. These studies determine the shape of the parts, but leave the relative position and size of the parts unknown. Below we show that if we permit the parts to be related by an arbitrary transformation then 13
the class will not be identi able (and the objects which have the same parts will be confused). This implies that the relationships between the parts cannot be uniquely recovered from single images, which is equivalent to saying that the problem of calibrating an ane camera from two images is inherently ambiguous. If however we consider a class of objects with repeated structures in which the transformation relating the parts is the same across objects (such as in the case of bilaterally symmetric objects) then for most choices of transformation the class is identi able from almost all views. It is straightforward at this point to show that the class of objects with repeated structures whose parts are related by an arbitrary ane transformation is not identi able from any view. This can be seen by applying Proposition 4(1) with Q1 = Q2, which tells us that a subset of this class, the objects that have the same parts, cannot be identi ed from all their views. Consequently, no invariants can distinguish between all the objects in this class. Next we consider the case that the same ane transformation relates the two parts in every object. We show that for most choices of an ane transformation the class determined by this transformation is identi able from almost all views. The degenerate views in this case lie along at most three great circles on the viewing sphere. We list these views in Proposition 5.
Proposition 5: Given a 3 3 non-singular matrix B and a vector s 2 R3, the class CB;s is identi able from all views v^ 2 R3 unless v^ is an eigenvectors of B or v^ is located on at most three great circles on the viewing sphere which depend on B . (The additional degenerate views are listed in Table 1.)
The proof is given in Appendix A. The additional degenerate views listed in Table 1 depend on the matrix B , and, in particular, on the number of eigenvectors and eigenvalues of B . The degenerate views include all the eigenvectors of B . For convenience, we list the degenerate views according to the Jordan form of the matrix B , B = A?1 BA. The views are given in terms of v 0, where v 0 = A?1 v. Note that given a vector v 0 it is straightforward to recover the corresponding view v^ = Av 0. Furthermore, v^ is an eigenvector of B if and only if v 0 is an eigenvector of the Jordan form B . Note also that in certain cases the actual list of degenerate views is smaller than what is speci ed in Table 1 since the list of vectors v 0 includes vectors that correspond to complex views v^ . Evidently, only real view vectors are geometrically feasible. As an example consider the class of bilaterally symmetric objects. In this case the matrix B representing re ection about a plane is given by 01 0 0 1 B = @0 1 0 A 0 0 ?1 and s = 0. B has three eigenvectors and two dierent eigenvalues, corresponding to the second row of the table. Since B is already in its Jordan form then v0 = v^ . Accordingly, bilaterally symmetric 14
a b c d e f
No. of No. of Eigen- Eigenvectors values 3 3 3 2 3 1 2 2 2 1 1 1
Constraints on v 0 vi0 = 0 for some 1 i 3. v10 = 0.
no identi cation from any view. v30 = 0 or v20 = 0. only eigenvectors. v30 = 0 or v10 = v20 = 0.
Table 1: List of possible additional degenerate views for the class CB;s listed according to the number of eigenvalues and eigenvectors of B .
objects are identi able from all views except for those which coincide with the symmetry plan, v^ = [1; 0; 0] + [0; 1; 0], and the direction perpendicular to the symmetry plan, v^ = [0; 0; 1].
4.3 Combinations of sub-structures Next we consider classes of objects in which one part can be expressed as a linear combination of the other parts. Below we consider only classes of objects with three or more sub-structures. When the number of parts is two we obtain the case of repeated structures, which was discussed in Section 4.2. One reason we are interested in such classes is that it is possible to build such classes such that their objects will have the same degrees of freedom as planar objects do (2n, where n is the number of points on the object). Yet, as is shown below, unlike the class of planar objects these classes are not identi able from any view. In a class that contains a combination of sub-structures an object is divided into m 2 parts. The location of every point on the m'th parts can be expressed as a linear function of the m ? 1 corresponding points in the other parts, where the same linear function is applied to all the points. The shape of an object of this class, therefore, is given by
O = [P1 ; P2 ; :::; Pm] ; where Pi are 3 n matrices de ning the shape of the i'th part (1 i < m), Bi are 3 3 non-singular matrices, s 2 R3, and Pm = B1 P1 + B2 P2 + ::: + Bm?1 Pm?1 + s 1T : In Proposition 6 below we show that for m > 2, identi cation is impossible from all views even when all the matrices Bi and s are the same for all objects in the class.
Proposition 6: Given 3 3 non-singular matrices Bi where 1 i m ? 1, m 3, and given s 2 R3,
the class CB1 ;:::;Bm?1;s is not identi able from any view. 15
Proof:
Here we prove the proposition for m > 3. The proof for m = 3 is given in Appendix B. To show this we need to show that for every choice of object O and view v^ there exists an ane dierent object O0 and a view u^ so that O and O0 share the view v^ and u^ respectively. According to Proposition 3(d) it is sucient to show that there exists an object O0 in the class such that O = O0 + v T for ? O. (We take A = I and t = 0.) Let mX ?1 (16) O = [P1 ; P2 ; :::; Pm] ; for Pm = Bi Pi + s1T : i=1
We set O0 = [P10 ; P20 ; :::; Pm0 ] such that
Pi0 = Pi + viT for (1 i m);
(17)
where T = [1T ; 2T ; :::; mT ]. The object O0 belongs to the class if and only if there exists a vector such that the m'th part of O0 can be written in term of the rst m ? 1 parts. That is, there exists a vector such that mX ?1 Pm0 = Bi Pi0 + s1T : (18) i=1
>From Eq. 17 (the case i = m) and Eq. 18 we obtain that
Pm0 = Pm + v^mT =
mX ?1 i=1
Bi Pi0 + s1T :
(19)
Plugging in the rst m ? 1 equations in Eq. 17 into Eq. 19 and rearranging we obtain mX ?1 i=1
Bi v^iT ? v^mT = 0:
(20)
h
i
It is left to show that for every view v^ there exists a non-trivial vector T = 1T ; :::; mT that solves the above equations subject to the constraint O = 0. This last equation and Eq. 20 contain 3n + 3 homogeneous equations in mn unknowns. Since m 4, mn > 3n + 3 for n > 3, and so for any choice of v^ a non-trivial solution will exist. It follows that for every v^ there exists an object given by O0 = O ? v^ T that shares the view v^ with O. Hence, the class is not identi able. 2 We next discuss the counting argument for the case m = 3. In this case, an object has the form O = [P1; P2 ; P3 ] where P3 = BP1 + CP2 + s 1T , B and C are 3 3 non-singular matrices, and s 2 R3. Below we assume in addition that rank(J(P1 ; P2)) = 7.1 If we rely only on counting arguments we may be misled to believe that the class CB;C;s is identi able. Consider for example the counting arguments given in [27]. Given an object O 2 CB;C;s, suppose O includes n points. An image of O gives 2n measurements. Together with n class constraints (the linear relations between the rst two parts and
Note that dropping the assumption that rank(J(P1 ; P2)) = 7 will not change our nal result that CB;C;s is not identi able since extending the class with more objects cannot reduce the set of ambiguous views. 1
16
the third part) they give rise to 3n equations. The number of variables in these equations are as follows. The shape of O is de ned by 3n variables. The projection parameters (the parameters of a 3D-to-2D ane transformation) are eight. A 3D ane reference frame can be obtained by picking the position of four points, hence such a frame involves setting 12 parameters. Therefore, the total number of variables is 3n +8 ? 12 = 3n ? 4. According to a counting argument the number of equations obtained, 3n, is suciently large to determine the values of the 3n ? 4 unknowns, and so in theory using elimination we should be able to recover the shape of the object from the equations. However, the underlying assumption in counting arguments is that the counted equations are all independent and consistent. In classes which contain combinations of sub-structures this assumption is violated, and so the identity of the objects cannot be recovered from single images, as was shown above in Proposition 6.
4.4 Combinations of prototypes Finally, we consider classes of objects that can be expressed as linear combinations of some prototype objects. That is, let fO1; O2; :::; Okg be a set of prototype objects, and assume further that the row spaces of all Oi (1 i k) are linearly independent (that is, the union of the row spaces is of rank 3k), and that 1 does not belong to these spaces. The objects in the class can be describe by combining the k prototypes, that is, O 2 C if and only if it can be written as
O=
k X i=1
for i 2 R:
i Oi;
Poggio and Vetter [16] already showed that this class is identi able. Our tools provide a short and elegant proof for this case and establish that identi cation is possible from all views.
Proposition 7: The objects in the class C are identi able from all their views. Proof: To show this we need to show that, given an object OPand a view v^, any objectP O0 that shares the view v^ of O must be ane equivalent to O. Let O = ki=1 i Oi and let O0 = ki=1 i Oi share the view v^ with O. It follows that, k X
Av^ ( This can be written as
i=1
k X
iOi ) = Au^ (
i=1
i Oi) + t 1T :
k X i=1
(i Av^ ? i Au^ )Oi ? t 1T = 0:
(21) (22)
Since the row spaces of all the prototype objects are linearly independent we obtain that
iAv^ ? iAu^ = 0 17
(23)
and that t = 0. Thus, Av^ and Au^ are related by a scale factor, c = i =i for all 1 i k. (Note that neither i and i can be zero because the rank of both Av^ and Au^ is 2.) This implies that v^ = u^ and O and O0 are ane equivalent (since O0 = cO). 2 The reason this class is of interest is the following. Suppose that the set of prototype objects is composed of a few similar objects that belong to a single perceptual category, say, two chairs, and suppose that "reasonable" correspondences between feature points on the objects can be assigned (by applying form and function considerations). Then, if we construct new objects by taking averages of the prototype objects, these new objects would tend to look similar to the prototype objects and assume the same perceptual category. If indeed for this kind of classes identi cation is possible then one may be able to distinguish between dierent exemplars of such a category even if the speci c exemplar is being seen for the rst time.
5 Summary and Discussion A fundamental question regarding the invariance approach is whether it can be applied to a wide range of classes. To answer this question it is essential to study the set of classes for which invariance exists. In this paper we investigated the invariant representations that discriminate between all the objects of a given class. We addressed the problem of determining, given a class of objects, the set of images from which the objects can be identi ed. Our approach is based on exploring the set of ambiguous images. We developed a number of algebraic tests to determine the ambiguous images under ane projection, and applied these tests to a number of classes of objects. We now consider a number of assumptions we have made and how they should be relaxed in future work. Projection model: our tests were developed for objects which may undergo ane projection. We intend in the future to develop similar tests for other, more realistic projection models. We would like to note, however, that some of our results (e.g., Proposition 4) apply also to rigid objects undergoing weak-perspective projection. Constructive tests: Although our tests can determine which classes are identi able and from what views, the tests at their present form are not constructive. That is, the tests cannot be used to derive the invariants for the objects. Consequently, our approach may serve only as a rst step in deriving invariance for new classes of objects. In particular, it can be used to avoid seeking invariance for classes for which invariance does not exist. In addition, for identi able classes our method can be used to determine the set of views from which objects are identi able and exclude the ambiguous views. Dealing with noise: in our tests we did not take into account the eect of noise on identi able classes. Obviously, if we allow for noise, more images may become ambiguous. One possible way to detect sensitivity to noise in classes of objects is by looking at the singular values of the matrix 18
J(O1; O2), which was introduced in Section 3.2. This issue also is left for future research.
The objects: in our tests we assumed that the objects are given as ordered point sets. In particular we
regarded two objects that contain the same set of points ordered dierently as two dierent objects. Several of our results are not aected by this assumption. For example, the class of objects with identical parts and the class of objects with repeated structures are closed under permutation of the objects' points. (One only has to nd correspondence between the dierent parts of the same object.) Furthermore, classes which were shown to be non-identi able remain non-identi able even if we relax this requirement. We intend in the future to extend our tests to other objects that consists of nonordered points. In addition, it is clearly of interest to develop tests for contour images, and grey-level images. Invariance: in this paper we concentrated on analyzing whether given classes are identi able. For classes that are not identi able it may still be possible to extract invariants from single images. These invariants will not suce to discriminate between all the objects of the class, but will distinguish only between subsets of the objects. Developing tests for such classes is left for future research. Classifying the objects: the rst step of applying class-based invariance to images involves classifying the object in the image. Unfortunately, it is impossible to both classify the object and recover its speci c identity using invariance, since this will contradict the result that there exists no view invariance that can discriminate between all 3D objects. In practical systems it may be the case that many of the objects belong to a small number of classes, in which case one may enumerate all these classes, or nd some properties which distinguish these classes from one another. This problem too is beyond the scope of this paper.
Appendix A Repeated Structures In this appendix we prove Proposition 5. Proof: We will show that the class CB;s is identi able from all views v^ 2 R3 except for the degenerate views which are the views that are eigenvalues of B and the views listed in Table 1. To show this we will show that given an object O 2 CB;s , and a view v^ there exists an ane dierent object O0 2 CB;s that shares the view v^ of O if and only if v^ is a degenerate view. Let O and O0 be two ane dierent objects in CB;s given by
O = [O1; O2 ] 2 CB;s;
where
O2 = BO1 + s 1 T
(24)
O0 = [O10 ; O20 ] 2 CB;s ;
where
O20 = BO10 + s 1 T :
(25)
Suppose that O0 shares only the view v^ with O. It follows that there exist two 2 3 non-singular 19
matrices Av^ and Au^ and a vectors t1 2 R2 such that
Av^ O = Au^ O0 + t1 1T :
(26)
In particular, since O and O0 share a view, also their parts share the same view. That is,
Av^ O1 = Au^ O10 + t1 1T
(27)
Av^ (BO1 + s 1T ) = Au^ (BO10 + s 1T ) + t1 1T : (28) Note that although we assumed that O and O0 are ane dierent, it might still be the case that their parts are ane equivalent. Therefore, the rank of J(O1; O10 ) 5. In addition, we assumed that both parts are non-planar, that is, rank(row(O1) [ f1g) = rank(row(O10 ) [ f1g) = 4. This implies that rank(J(O1; O10 )) 4. Below we analyze each of the two cases, rank(J(O1; O10 )) = 5 and rank(J(O1; O10 )) = 4, separately. Case I: Suppose that rank(J(O1; O10 )) = 5. According to Proposition 3(a), the two parts, O1 and O10 have exactly one view in common, v^. From Eq. 27 it follows that the projection matrix Av^ belongs to the view v^, and similarly from Eq. 28 it follows that the projection matrix Av^ B belongs to the view v^. Therefore, Av^ v^ = 0 and also Av^ B v^ = 0. By de nition of Av^ , the only vector w that satis es Av^ w = 0 is of the form w = v^. In particular it follows that Bv^ = v^. Consequently, O0 shares the view v^ of O when rank(J(O1; O10 )) = 5 if and only if v^ is an eigenvector of B , which implies that v^ is a degenerate view. Case II: Suppose that rank(J(O1; O10 )) = 4. According to Proposition 2, the parts O1 and O10 are ane equivalent, and so there exists a 3 3 non-singular matrix D and a vector t 2 R3 such that
O1 = DO10 + t 1T :
(29)
We will use the following two claims to prove that O0 shares a single view v^ with the object O only if v^ is an eigenvector of B or v^ is listed in Table 1.
Claim 8: O0 shares the view v^ of O if and only if there exists a vector r 2 R3 such that r 6= 0, and the commutator matrix [B; D] = BD ? DB satis es
[B; D] = v^r:
Claim 9: For every 3 3 non-singular matrices B and D, if [B; D] = v^ r 6= 0 for some r 2 R3 then v^ is an eigenvector of B or v^ is listed in Table 1. 20
(30)
These two claims imply that O0 shares the view v^ of O when rank(J(O1; O10 )) = 4 only if v^ is an eigenvector of B or v^ is listed in Table 1. We now turn to proving these two claims. Proof of Claim 8: If O0 shares the view v^ of O then Eq. 27 and Eq. 28 must be satis ed. Since we also assume here that their parts are ane equivalent it follows that Eq. 29 must be satis ed as well. Plugging Eq. 29 into Eq. 27 we obtain that Rearranging, we obtain
Av^ (DO10 + t 1T ) = Au^ O10 + t1 1T :
(31)
(Av^ D ? Au^ )O10 + (Av^ t ? t1 )1T = 0:
(32)
Av^ D ? Au^ = 0 : Av^ t ? t1 = 0:
(33)
Since we assume that O10 is non-planar (rank(row(O10 ) [ f1g) = 4) the coecients of O10 and 1 in the last equation must vanish, namely,
Consider now Eq. 28. Replacing Au^ by Av D and O1 by DO10 + t 1T we obtain
Av^ B(DO10 + t 1T ) = Av^ DBO10 + t2 1T ;
(34)
where t2 = (Av^ D ? Av^ )s + t1 . Rearranging, we get
Av^ (BD ? DB)O10 + (Av^ Bt ? t2 )1T = 0:
(35)
Again, since O0 is non-planar the coecients must vanish, namely,
Av^ (BD ? DB) = 0
(36)
Av^ Bt ? t2 = 0:
(37)
It is immediate to see that there always exists a t that satis es Eq. 37 for any viewing directions v^ (since t should satisfy a system of two independent linear equations in three unknowns). Since rank(Av^ ) = 2 it follows that there are two cases for which Eq. 36 can be satis ed. The rst is if BD ? DB = 0. In this case Eq. 36 will vanish for all viewing directions v^. In particular it follows that in this case O and O0 will share all their views (that is, O and O0 are ane equivalent). We are therefore left with the second case where BD ? DB = v^rT for some non zero vector r 2 R3. 2. Proof of Claim 9: To prove this claim we will use the Jordan form of the matrix B. The Jordan form, B , is obtained from B by a similarity transformation, that is, B = A?1 BA for some 3 3 non singular matrix A. If D and r satisfy Eq. 30 for a given matrix B and a vector v, then D0 = A?1 BA and r 0 T = r T A satisfy the same equation for B and v 0 = A?1 v. Thus, rather than showing that for 21
every two non-singular matrices B and D and a view v^ there exists a non-zero vector r that satis es Eq. 30 only if v^ is degenerate, we may instead show that for every matrix B in a Jordan form, a matrix D0 , and a vector v^ 0 there exists a non-zero vector r 0 only if v^ 0 is as speci ed in Table 1. Note however that the matrices B and D0 and the vectors r 0 and v 0 are de ned over the complex eld. Nevertheless, this will not aect our proof, since by proving the claim for every complex matrix D0 we prove in particular that every real matrix D satis es the claim. F ] of To derive the six cases listed in Table 1, let us rst list the forms that a commutator G = [B; a Jordan form matrix B takes according to the number of independent eigenvectors and eigenvalues D0] = v0r0 6= 0 if and only of B . These forms are listed in Table 2. In our case, F = D0 . Note that [B; if Gij = vi rj and G 6= 0. We next show that there exists a vector r0 6= 0 that satis es Gij = vi rj only if v0 is an eigenvector of B or v0 is listed in Table 1.
(a) B has 3 independent eigenvectors and 3 dierent eigenvalues. In this case vi0ri0 = 0, for 1 i 3.
Since G 6= 0 it follows that at least one of the matrix entry is non-zero. In particular it follows that for some i 6= j , (j ? i )Dij 6= 0. That is vi0 rj0 6= 0. Since vi0 ri0 = 0, it follows that vi0 = 0. The three eigenvectors of B are of the form (v10 ; 0; 0)T , (0; v20 ; 0)T , and (0; 0; v30 )T , Note that vi0 6= 0 for all 1 i 3 if and only if v^ cannot be expressed as a linear combination of any two of the eigenvectors of B . If the eigenvectors of B are all real we exclude by this from the viewing sphere exactly three great circles through all pairs of eigenvectors. If some of the eigenvectors of B are not real we exclude from the viewing sphere even less views.
(b) B has 3 independent eigenvectors and 2 dierent eigenvalues. In this case v10 = 0. We next show that if v10 6= 0 then v30 = 0 (in this case, v0 is an eigenvector of B . Assume that both v10 6= 0 and v30 6= 0. In this case r = 0 since v10 r20 = v10 r10 = v30 r30 = 0. (c) B has 3 independent eigenvectors and only 1 eigenvalue. This case has been handled in Section 4.2. In this case every v0 is an eigenvector. (d) B has 2 independent eigenvectors and 2 dierent eigenvalues. In this case v30 = 0 or v20 = 0. We next show that if both v20 6= 0 and v30 6= 0 then r0 = 0 contradicting our assumption. Since v30 r30 = 0 then r30 = 0. Further, since v20 r10 = 0 it follows that r10 = 0. This implies that the entire rst column of G is zero. Therefore, v10 r10 = d021 = 0. But now also v20 r20 = ?d021 = 0.
(e) B has 2 independent eigenvectors and only 1 eigenvalues. In this case v20 = 0, and therefore, v0 is an eigenvector. We next show that if v20 6= 0 then r0 = 0 contradicting our assumption. Since
v20 r10 = v20 r30 = 0 it follows that r10 = r30 = 0. As a result the entire rst and third columns of G vanish. It follows that d021 = 0 and therefore also v20 r20 = ?d021 = 0. This implies that r20 = 0. 22
No. of No. of Eigen- Eigenvectors values a
b
c
d
e
f
3
3
3
2
2
1
3
2
1
2
1
1
B
F ] G = [B;
1
0
0 0
1
0
0 0
0 2 0 0 3 0 1 0 0 3
1
0
0 0
1
1
0 0
1
1
0 0
1
1
0 1
0 1 0 0 1 0 1 0 0 3 0 1 0 0 1 0 1 0 0 1
!
0
(2 ? 1 )f21 (3 ? 1 )f31
!
0 0
(1 ? 2 )f12 (1 ? 3 )f13 ! 0 (2 ? 3 )f23 (3 ? 2 )f32 0 0 0
(3 ? 1 )f31 (3 ? 1 )f32
! !
(1 ? 3 )f13 ! (1 ? 3 )f23 0
0 f21
0 (3 ? 1 )f31
?f11 + f22 (1 ? 3 )f13 + f23 ! ?f21 (1 ? 3 )f23 ?f31 + (3 ? 1 )f32 0
!
?f11 + f22 f23 ! 0 ?f21 0 0 ?f31 0
f21
!
?f11 + f22 ?f12 + f23 ! ?f21 + f32 ?f22 + f33 0 ?f31 ?f32
f21 f31
F ] as a function of the number of eigenvectors and eigenvalues of B . Table 2: The shape of the commutator [B;
23
(f) B has only 1 independent eigenvectors and 1 eigenvalues. In this case v30 = 0 . We next show that if v30 6= 0 then r0 = 0 contradicting our assumption. Since v30 r10 = 0 it follows that r10 = 0. The entire rst column of G therefore vanishes, and so v20 r10 = d031 = 0. This implies also that v30 r20 = ?d031 = 0, and since v30 6= 0 we obtain that r20 = 0. Thus, also the entire second column of G vanishes, implying in particular that v20 r20 = ?d021 + d032 = 0. Since v10 r10 = d021 = 0, we can conclude also that d032 = 0. But v30 r30 = ?d032 = 0, and again, since v30 6= 0, r30 = 0.
2
B Combinations of sub-structures Proposition 10: Given 3 3 non-singular matrices B and C , and given s 2 R3, the class CB;C;s is
not reconstructible from any view.
Proof:
We can prove this by showing that for every object O 2 CB;C;s and every view v^ 2 R3, there exists an object O0 2 CB;C;s that shares the view v^ with O (and no other view). According to Proposition 3(d), if O0 shares a single view v^ with O then there exist a 3 3 non singular matrix D, a vector t 2 R3, and a vector 2 R3n such that and
O = DO0 + v^ T + t 1T :
(38)
62 row(O0) [ f1g
(39)
Since O and O0 are objects in the class CB;C;s , they take the following form:
where O3 = BO1 + CO2 + s 1T ; (40) where O30 = BO10 + CO20 + s 1T : It is therefore sucient to show that for every object O and a view v^ there exist a matrix D and vectors t 2 R3 and 2 R3n that satisfy Equations 38, 39, and 40. Plugging in the constraints speci ed in Eq. 40 into Eq. 38 results in a set of equations that is speci ed in the following claim:
O = [O1; O2 ; O3] ; O0 = [O10 ; O20 ; O30 ] ;
Claim 11: For every v^, there exist a matrix D and vectors t 2 R3 and 2 R3n that satisfy Equations 38 and 40 if and only if they satisfy the following equation:
Bv^1T + C v^2T ? v^3T = [D; B]O10 + [D; C ]O20 + t 01T where [D; B ] = DB ? BD, [D; C ] = DC ? CD and, t 0 = (I ? B ? C )t + (D ? I )s. 24
(41)
It follows that to show that there exists an object O0 that shares the view v^ with O it suces to show that there exist a non-singular matrix D and vectors t 2 R2 and 2 R3n that satisfy Eq. 41 subject to the constraint that 62 row(O0) [ f1g. Proof of Claim 11: Eq. 38 together with Eq. 40 imply that
O1 = DO10 + v^1T + t 1T O2 = DO20 + v^2T + t 1T O3 = DO30 + v^3T + t 1T ;
(42)
where T = [1; 2; 3]T . Using Eq. 40, the third of these equations can be written in terms of the rst two parts of the objects as follows:
BO1 + CO2 + s 1 = D(BO10 + CO20 + s 1T ) + v^3T + t 1T ;
(43)
Replacing O1 and O2 by the right hand side of the rst two equations in Eq. 42 we obtain
B(DO10 + v^1T + t 1T ) + C (DO20 + v^2T + t 1T ) + s 1 T = D(BO10 + CO20 + s1T ) + v^3T + t 1T : (44) Denote t 0 = (I ? B ? C )t + (D ? I )s, by rearranging we get
Bv^1T + C v^2T ? v^3T = (DB ? BD)O10 + (DC ? CD)O20 + t 01T
(45)
which is identical to Eq. 41. 2 Note that for a xed matrix D and a vector t Eq. 41 contains 3n linear equations in 3n unknowns, the 3n components of . In the rest of the proof we will show how given an object O and a vector v^ it is possible to select D and t so that a solution to Eq. 41 that satis es 62 row(O0 ) [ f1g will exist. We will consider two cases according to the rank of the linear system in Eq. 41. To do so, we rst show that the rank of the linear equations of Eq. 41 is either equal to 3n, or it is smaller or equal to 2n.
Claim 12: Let
W = [Bv^; C v^; ?v^] : Given D and a vector t, the rank of Eq. 41 is kn, where k is the rank of W .
Proof of Claim 12: Let 1i; 2i and 3i be the i'th components of the three vectors 1, 2 and 3, respectively, and let p10 i and p20 i be the i'th points of O10 and O20 , respectively. Eq. 41 can be written as n sets of the following equations (1 i n):
0 1i 1 [B v^ ; C v^; ?v^ ] @ 2i A = (DB ? BD)p10 i + (DC ? CD)p20 i + t 0: 3i
25
(46)
It can be readily veri ed that the rank of the system given in Eq. 41 is kn, where k is the rank of W .
2
Below we consider two cases according to the rank of W (or the rank of Eq. 41). Case I: Suppose that W is singular, that is, rank(W ) 2. In this case the rank of Eq. 41 is at most 2n. We select D = I and t = 0. This implies that [D; B ] = [D; C ] = 0, and t 0 = 0. Eq. 41 now simpli es to a set of homogeneous linear equations:
Bv^1T + C v^2T ? v^3T = 0:
(47)
To satisfy Eq. 39 we will choose that is perpendicular to the union of the row space of O0 and f1g (and so, in particular, it will not belong to this space). This requirement results in four more homogeneous equations. Thus, we obtain 3n + 4 homogeneous equations of rank 2n + 4 or less in 3n unknowns. Consequently, there will always exist a non-trivial solution for that will satisfy both Equations 41 and 39. It follows that reconstruction is impossible from any direction v^ for which rank(W ) < 3. Case II: Suppose that rank(W ) = 3. In this case for every choice of D and t Eq. 41 is a system of 3n linear equations of rank 3n with 3n unknowns, the components of . This system has a non-trivial solution if and only if it is non-homogeneous. Furthermore, in this case the solution is unique. We therefore need to nd in this case D and t for which Eq. 41 is non-homogeneous and such that the solution to this equation will also satisfy 62 row(O0) [ f1g. We show this by using the following two claims:
Claim 13: Let rank(W ) = 3 and denote E = D + v^r T . If there exists a non-singular matrix D such that for every vector r the commutator [B; E ] 6= 0 then 1. Eq. 41 is non-homogeneous. 2. The solution, , to Eq. 41 satis es Eq. 39.
Claim 14: For any view v^ for which rank(W ) = 3 there exists a matrix D that satis es the conditions of claim 13.
The same arguments can be made for the matrix C .
Proof of Claim 13: 1. We rst show that when [B; E ] 6= 0 for every vector r then Eq. 41 is necessarily non-homogeneous. Eq. 41 is non-homogeneous when [D; B ]O10 + [D; C ]O20 + t 0 1 6= 0: 26
(48)
Since rank(J(O10 ; O20 )) = 7 this is true if (and only if) either one of the following conditions holds: [D; B ] 6= 0;
[D; C ] 6= 0;
or
t 0 6= 0:
Now [D; B ] 6= 0, since for r = 0 [B; D] = [B; E ] 6= 0. 2. Given D such that for every r, [B; E ] 6= 0, we will show that any 2 row(O0 ) [ f1g will violate Eq. 41. Consequently, any solution to Eq. 41 will satisfy the constraint that 62 row(O0 ) [ f1g. Assume, by way of contradiction, that 2 row(O0) [ f1g. In this case can be expressed as a linear combination of the rows of O0 and of 1, namely
T = r T O0 + 1T ; for some r 2 R3 and 2 R. Considering each part of the object separately we obtain 1T = rT O10 + 1T 2T = rT O20 + 1T 3T = rT (BO10 + CO20 + s 1T ) + 1T Replacing this in Eq. 41 we obtain Bv^(rT O10 + 1T ) + C v^(rT O20 + 1T ) ? v^(rT (BO10 + CO20 + s 1T ) + 1T ) = [D; B ]O10 + [D; C ]O20 + t 0 1T :
(49)
(50)
(51)
Denote t 00 = (B + C ? I )v^ + v^ r T s + t 0. Rearranging, we get (B (D + v^r T ) ? (D + v^r T ) B ) O10 + (C (D + v^r T ) ? (D + v^r T ) C ) O20 + t 001T = 0:
(52)
Substituting E = D + v^ r T we obtain [B; E ]O10 + [C; E ]O20 + t 001T = 0:
(53)
Since rank(J(O10 ; O20 )) = 7 it follows that the above equation holds only if all the three following equations hold: [B; E ] = 0; [C; E ] = 0; and t 00 = 0: (54) However, by our assumption [B; E ] 6= 0 for all r. It follows that Eq. 54 does not hold, contradicting our assumption that 2 row(O0 ) [ f1g. 2
Proof of claim 14:
As in Section 4.2, we show this by rst bringing B to a Jordan form. Let B = A?1 BA and E 0] = 0. Since E = D + v^r T we denote also let E 0 = A?1 EA then [B; E ] = 0 if and only if [B; D0 = A?1 DA, v 0 = A?1 v^ and r 0 T = r T A (so that E 0 = D0 + v 0r 0 T ). Note that B , D0, v 0 and r 0 may be complex. 27
E 0] 6= 0 for all complex vectors r 0 then there exists We begin by showing that if D0 causes [B; a real matrix D such that for every real vector r, [B; E ] 6= 0. Consequently, it will suce to show D0 + v 0r 0] 6= 0. that for every v 0 there exists a complex matrix D0 such that for every complex r, [B; Consider D0 that satis es this requirement. Denote AD0 A?1 = D1 + iD2, where D1 and D2 are some 3 3 real matrices, and denote A?T r 0 = r1 + ir2, where r1; r2 2 R3 and A?T denotes the inverse of AT . If either D1 = 0 or D2 = 0 then we are done. Suppose that [B; D1 + iD2 + v^(r1 + ir2)] 6= 0 for every r1 + ir2. It follows that either [B; D1 + v^r1] 6= 0 (55) or
[B; D2 + v^r2 ] 6= 0:
(56)
If Eq. 55 is satis ed for every r 2 R, then D1 is the sought real matrix. Similarly, if Eq. 56 is satis ed for every r, then D2 is the sought real matrix. One of these cases must hold since otherwise, assume that Eq. 55 is not satis ed for some vector r1 and that Eq. 56 is not satis ed for some vector r2, then consider the complex vector r = r1 + ir2, this commutator [B; E ] will vanish for r contradicting the assumption that [B; D1 + iD2 + v(r1 + ir2)] 6= 0 for every r1 + ir2. Consequently, it is sucient to show that for every v 0 there exists a complex matrix D0 such that for every complex vector r 0, D0 + v 0r 0] 6= 0. [B; We now turn to showing that for every vector v 0 for which rank(W ) = 3 there exists a non-singular E 0] 6= 0. We show this by looking at the shape of G = [B; E 0] matrix D0 such that for every r 0, G = [B; E 0] 6= 0 for for every possible form of B and deriving constraints on D0 that guarantee that G = [B; any choice of r 0 . Notice that by requiring that rank(W ) = 3 we exclude those views v^ which are eigenvectors of B . Below we consider six cases according to the number of linearly independent eigenvectors and dierent eigenvalues of B (see Table 2). Notice that fij in this table corresponds here to the components of E 0 (denoted as e0ij ), and since E 0 = D0 + v 0 r 0 these components are given by e0ij = d0ij + vi0 rj0 .
(a) As can be seen in Table 2, when B has three dierent eigenvalues G = 0 if and only if all the
six non-diagonal elements of E 0 are non-zero. In particular, consider the second row of G, e012 = e032 = 0 implies that d012 = ?v10 r20 and d032 = ?v30 r20 . Therefore, given a view v^ if v10 = 0 we can choose any non-singular matrix D such that d012 = 1, and if v10 6= 0 we can choose any D such that d012 = 0 and d032 = 1. Note that v10 and v20 cannot vanish simultaneously since that would imply that v^ is an eigenvector of B .
(b) In this case G = 0 implies in particular that d013 = ?v10 r30 and d023 = ?v20 r30 . If v10 = 0 we can choose D such that d013 = 1, and if v10 6= 0 we can choose D such that d013 = 0 and d023 = 1.
(c) In this case all vectors are eigenvectors of B, and so W is necessarily singular. 28
(d)-(f) In these three cases, G = 0 implies in particular that d021 = ?v20 r10 and d031 = ?v30 r10 . If v20 = 0
we can choose D such that d021 = 1, and if v20 6= 0 we can choose D such that d021 = 0 and d031 = 1.
2
Acknowledgment The authors thank Amitai Regev for his assistance in proving Proposition 10.
References
[1] Basri, R., 1996. \Paraperspective ane", International Journal of Computer Vision, 19(2): 169{180. [2] Burns, J., Weiss, R., and Riseman, E., 1992. \The non-existence of general-case view-invariants", in Mundy, J. and Zisserman, A. (Eds), Geometric Invariance in Computer Vision, MIT Press, Cambridge. [3] Clemens, D. and Jacobs, D., 1991. \Space and time bounds on model indexing", IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10): 1007{1018. [4] Fischler, M.A. and Bolles, R.C., 1981. \Random sample consensus: a paradigm for model tting with application to image analysis and automated cartography". Com. of the A.C.M., 24(6): 381{395. [5] Faugeras, O.D., 1992. \What can be seen in three dimensions with an uncalibrated stereo rig?" Second European Conf. on Computer Vision: 563-378. [6] Fawcett, R., Zisserman, A., and Brady, J.M., 1994. \Extracting Structure from an Ane View of a 3D Point Set with One or Two Bilateral Symmetries", Image and Vision Computing, 12(9): 615-622. [7] Forsyth, D., Mundy, J. L., Zisserman, A., Coelho, C., Heller, A., and Rothwell, C., 1991. \Invariant descriptors for 3-D object recognition and pose". IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10): 971{991. [8] Huttenlocher, D.P., and Ullman, S., 1990 \Recognizing Solid Objects by Alignment with an Image", Int. J. Computer Vision, 5(2): 195{212. [9] Jacobs, D., 1992. \Space ecient 3D model indexing", IEEE Conf. on Computer Vision and Pattern Recognition: 439{444. [10] Koenderink, J. and van Doorn, A., 1991. \Ane structure from motion", Journal of the Optical Society of America, 8(2): 377{385. [11] Lamdan, Y. & H.J. Wolfson, 1988, \Geometric Hashing: A General and Ecient Model-Based Recognition Scheme," Second International Conference on Computer Vision: 238{249. 29
[12] Lowe, D., 1985, Perceptual Organization and Visual Recognition, The Netherlands: Kluwer Academic Publishers. [13] Mitsumoto, H., Tamura, S., Okazaki, K., Kajimi, N., and Fukui, Y., 1992. \3-D reconstruction using mirror images based on a plane symmetry recovering method". IEEE Trans. on Pattern Analysis and Machine Intelligence, 14(9): 941{946. [14] Moses, Y. and Ullman, S., 1992. Limitations of non model-based recognition schemes, Second European Conf. on Computer Vision: 820{828. [15] J. L. Mundy and A. Zisserman, 1994. Repeated structures: image correspondence constraints and 3D structure recovery. In J. L. Mundy, A. Zisserman, and D. A. Forsyth (Eds), Applications of Invariance in Computer Vision, Lecture Notes in Computer Science 825, Springer-Verlag. [16] Poggio, T. and Vetter, T., 1992. \Recognition and structure from one 2D model view: observations on prototypes, object classes and symmetries," M.I.T., A.I. Memo No. 1347. [17] Rothwell, C., Zisserman, A., Mundy, J. L, and Forsyth, D. A, 1992, \Ecient Model Library Access by Projectively Invariant Indexing Functions," IEEE Conf. on Computer Vision and Pattern Recognition: 109{114. [18] Rothwell, C., Zisserman, A., Forsyth, D. A., and Mundy, J. L, 1992, \Canonical Frames for Planar Object Recognition," Proc. of 2nd European Conf. on Computer Vision: 757{772. [19] Sparr, G., 1992. \Depth Computations from Polyhedral Images," Image and Vision Computing, 10:683{688. [20] Sugimoto, A., 1994. \Geometric invariant of noncoplanar lines in a single view,". Proc. of Int. Conference on Pattern Recognition, Vol. 1: 190{195. [21] Thompson, D.W., and Mundy, J.L., 1987. \Three dimensional model matching from an unconstrained viewpoint". Proc. of IEEE Int. Conf. on robotics and Automation: 208{220. [22] Tomasi, C., and Kanade, T., 1992, \Shape and Motion from Image Streams under Orthography: a Factorization Method," International Journal of Computer Vision, 9(2): 137{154. [23] Ullman, S., and Basri, R., 1991. \Recognition by linear combinations of models". IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10): 992{1006. [24] Weiss, I., 1988. \Projective invariants of shape," DARPA Image Understanding Workshop: 1125{ 1134. [25] Weinshall, D., 1993. \Model-based invariants for 3D vision," International Journal of Computer Vision, 10(1):27{42. [26] Weinshall, D., and Werman, M., 1997. \On view likelihood and stability," IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(2):97{108. [27] Werman, M., and Shashua, A., 1995. \The study of 3D-from-2D using elimination," Fifth International Conference Computer Vision: 473{479. 30
[28] Zisserman, A., Forsyth, D., Mundy, J., Rothwell, C., Liu, J., and Pillow, N., 1995. \3D object recognition using invariance," Arti cial Intelligence, 78:239{288.
31