An E cient Recursive Factorization Method for

13 downloads 0 Views 179KB Size Report
real-time applications because shape and motion can ... system having origin at the centroid of all of the points depicted ..... within a cube of predetermined size.
An Ecient Recursive Factorization Method for Determining Structure from Motion Yanhua Li and Michael J. Brooks Department of Computer Science The University of Adelaide Adelaide, South Australia 5005, Australia

Abstract

A recursive method is presented for recovering 3D object shape and camera motion under orthography from an extended sequence of video images. This may be viewed as a natural extension of both the original [10] and the sequential [8] factorization methods. A critical aspect of these factorization approaches is the estimation of the so-called shape space [8], and they may in part be characterised by the manner in which this subspace is computed. If P points are tracked through F frames, the proposed recursive least-squares method updates the shape space with complexity O(P ) per frame. In contrast, the sequential factorization method updates the shape space with complexity O(P 2) per frame. The original factorization method is intended to be used in batch mode using points tracked across all available frames. It e ectively computes the shape space with complexity O(F P 2) after F frames. Unlike other methods, the recursive approach does not require the estimation or updating of a large measurement or covariance matrix. Experiments with real and synthetic image sequences con rm the recursive method's low computational complexity and accuracy, and indicate that it is well suited to real-time applications.

1 Introduction

The structure-from-motion problem | recovering object shape and camera motion from a sequence of images | is a core computer vision concern that has been studied extensively [1, 2, 6, 7, 8, 9, 10, 11, 12]. Tomasi and Kanade [10] developed an innovative factorization method, based on singular value decomposition (SVD), to recover shape and motion from extended image sequences under the assumption of orthographic projection. This method is direct in that it avoids computing depth as an intermediate step. While this approach generally produces robust

and accurate results, it is not readily applicable to real-time applications because shape and motion can be determined only after points in all of the image frames have been tracked. The key SVD procedure of the method has complexity O(F P 2), given P feature points tracked over a sequence of F image frames. Moreover, the method requires storage of a 2F  P measurement matrix (its size therefore increasing with the number of frames) prior to computation of structure from motion. Morita and Kanade [8] subsequently developed a sequential factorization method that enables shape and motion to be updated at every frame. The cost of computing the critical shape space is O(P 2) per frame. Additionally, a P  P covariance matrix is updated as part of the processing of each frame. The method presented in this work estimates the shape space within a mean square errors (MSE) minimization framework. A recursive least squares (RLS) algorithm is developed for the MSE minimization which uses the coordinates of feature points at each image frame as input. Shape space is computed with complexity O(P ) per frame. The method proceeds without computing and storing any large matrices. The algorithm'slow computational complexity and accuracy make it suitable for real-time applications.

2 Review

We rst describe elements of the approach of Tomasi and Kanade [10]. Assume that P points are tracked through F image frames. Let (xij ; yij ) denote the coordinates of the j -th image point in the i-th frame. Assume that, for any given frame, each image point is expressed with respect to an image coordinate system having origin at the centroid of all of the points depicted in the frame. The input to the method is then the 2F  P mea-

surement matrix, W, given by 2 66 66 W = 66 66 4

x11



x1P

xF 1 y11

 

xF P y1P

yF 1



yF P

.. .

.. .

.. .

.. .

.. .

.. .

3 77 77 77 : 77 5

then we have that

^ = R^ S^: W

(8)

W^ = R^ AA?1S^;

(9)

^ will in general fail to correspond to an However, R image sequence rotation matrix composed of unit, orthogonal pairs of vectors. Accordingly, we note that there is a determinable, non-singular, 3  3 linear transformation matrix A such that

(1)

Here, each column of W stores the image trajectory of one feature point over the whole sequence, while each row stores either the x- or y-coordinates of the P points in one of the frames. Assume that the origin of the world coordinate system resides at the object centroid. Object shape may then be represented by the 3  P matrix S = [s1;    ; sP ]; (2) where sj describes the location of an object point (corresponding to the j -th image point) expressed with respect to the world coordinate system. Camera rotation over the sequence of F frames may be characterised by the 2F  3 matrix R = [i1;    ; iF ; j1;    ; jF ]T : (3) Here, ik and jk are unit, orthogonal 3-vectors specifying the orientation in the world coordinate system of the x and y axes, respectively, of image frame k. It then emerges that the equation W = RS (4) holds under orthographic projection. The rank theorem [10] then states that, in the absence of noise, the rank of W is at most 3; in the presence of noise the rank of W is approximately 3. The implication of this theorem is that camera rotation and object shape can be robustly recovered via a special factorising of W. Assume 2F  P . Matrix W 2 R2F P may be decomposed using SVD to obtain W = O1O2 , where O1 2 R2F P has orthonormal columns, O2 2 RP P is an orthogonal matrix,  = diag(1 ;    ; P ), and 1  2  3  4      P > 0. The best rank-3 ^ of W is then given by estimate W ^ = U0VT ; W (5) where U, V are the rst three columns of O1 , O2 , respectively, and 0 = diag(1 ; 2; 3). We would now ^ so as to obtain estimates of the like to factorise W shape and rotation matrices de ned in eqs. (2) and (3). Setting R^ = U[0]1=2 (6) ^S = [0]1=2VT ; (7)

and R = R^ A is rendered a proper image sequence rotation matrix (see [10] for details of the metric constraints). The soughtafter shape matrix is then S = A?1S^. Shape space is de ned as the row space of S. Inspection of eq. (4) shows that the row space of S is equivalent to the row space of W. Given that eq. (5) also reveals that the columns of V span the row space of W, we may infer that the columns of V provide an orthogonal basis for the shape space. The original factorization method readily obtains this basis via SVD. Morita and Kanade [8] transformed the factorization approach from a batch method to a sequential update method. Under their scheme, shape space is updated with each incoming frame via a power iteration method [4]. Shape and motion are then immediately determinable from the estimated shape space. The method proposed in this paper also updates the shape space frame-by-frame. However, it proves to be a more computationally and storage ecient recursive approach.

3 MSE Formulation of Shape Space

Prior to presenting a new recursive scheme, we rst formulate a cost function which attains a global minimum at the desired shape space. Let

xk = [xk1; xk2;    ; xkP ]T yk = [yk1; yk2;    ; ykP ]T ;

(10)

we may express eq. (1) in the shorthand form

W = [x1; : : : ; xF ; y1; : : : ; yF ]T : (11) De ne the correlation matrices Ck 2 RP P , for k = 1; 2;    ; F , to be

Ck = Efxk xTk + ykykT g; where E fg denotes expectation. 2

(12)

4 Recursive Computation of Shape Space

Let x^k ; ^yk be the linear mapping of the vectors xkP;y3k into the column space of a rank-3 matrix Q 2 R , such that ^xk = QT xk (13) T y^k = Q yk (k = 1; 2;    ; F ): The vectors x^k , y^k can now be transformed to P dimensional vectors x~ k , y~ k by the mapping x~k = Q^xk = QQT xk (14) T y~k = Q^yk = QQ yk (k = 1; 2;    ; F ):

In this section we develop a recursive least squares scheme for estimating the shape space at frame k given the shape space at frame k ? 1 and the incoming data xk; yk. Let uk = [xk yk] (19) u^k = QTk?1uk = [QTk?1xk QTk?1yk ]; then Ck = E fukuTk g and 2 JMSE (Qk ) = E fk uk ? Qk QT (20) k uk kF g k X = 1 k u ? Q QT u k2 (21)

Consider now the following MSE formulation of the distance between xk , yk and x~ k , y~k : ~k k2 + k yk ? y~ k k2g JMSE (Q) = E fk xk ? x = E fk xk ? QQT xk k2 + k yk ? QQT yk k2 g = trfCk ? 2QT Ck Q + QT Ck QQT Qg; (15) where tr denotes trace, and Ck is as previously de ned in eq. (12). It is shown in [13] that  the stationary points of JMSE (Q) satisfy QT Q = I3, the columns of Q being orthonormal.  when the columns of Q are one of the orthonormal bases of the subspace spanned by the rst three dominant eigenvectors of Ck , JMSE (Q) attains the global minimum, while all the other stationary points of JMSE (Q) are saddle points. Let Wk = [x1    xk y1    yk]T (16) represent the measurement matrix formed by the rst k frames. If the rank-3 SVD estimate of Wk is Wk = Uk kVkT ; (17) where Uk 2 R2k3 and Vk 2 RP 3 have orthogonal columns, and k = diag(k1; k2; k3), then

Ck =

E

k

=

fxk xTk + yk ykT g = k

i=1

i F

k

fCk ? 2QTk Ck Qk + QTk Ck Qk QTk Qk g:

k

k

i=1

i

k i

F

(Q ) is likely to be a good approximation for (Qk ) and the matrix Qk obtained by minimizing (Q ) is likely to be a good estimate for the principle subspace of Ck , i.e. the shape space. J 0 (Qk ) is minimized when Qk = MkN?k 1 (23) J0 k JMSE J0 k

Mk =

(xi xTi + yi yiT )

= k1 WkT Wk = Vk ( k1 2k )VkT :

k

JMSE (Qk ) is a fourth-order function of the elements of Qk and is unwieldy. Accordingly, we now seek a simpler cost function. Assume that QTk ui, the projections of the column vectors of ui onto the column space of Qk , can be approximated by u^ i = QTi?1ui, for all 1  i  k. Since we assume the shape space is stationary, the errors in the above projection approximation should be small. This results in a modi ed quadratic cost function of Qk given by k 1X J 0(Q ) = k u ? Q u^ k2 : (22)

1X k

tr

i

i=1

Nk =

(18)

k X i=1 k

X i=1

uiu^Ti = Mk?1 + uku^Tk

(24)

u^iu^Ti = Nk?1 + u^ku^Tk

(25)

Thus we can apply the standard RLS method (see [5]) to update Qk recursively. De ne the 3  3 inverse matrix Pk = N?k 1: (26) Applying the matrix inversion lemma [5] to Pk , we obtain Pk = (Nk?1 + u^ku^Tk )?1 = Pk?1 ? gk u^Tk Pk?1 (27)

From the above we may infer that: (1) the rank of Ck is at most 3 since that of Wk is at most 3; (2) the rst three right singular vectors of Wk (the columns of Vk ) are the rst three dominant eigenvectors of Ck ; (3) the rst three dominant eigenvectors of Ck span the shape space since the columns of Vk span the shape space (as noted previously); (4) the columns of Q are one of the orthonormal bases of the shape space when JMSE (Q) attains the global minimum. 3

5 Experiments

where

gk = Pk?1u^k(I2 + u^Tk Pk?1u^k)?1

5.1 Synthetic Data

(28)

We now describe the synthetic tests used to compare the performance of the recursive method with that of the original and sequential factorization methods. An object was represented by 100 random points within a cube of predetermined size. The distance of the object centroid from the camera was chosen to be 20 times the side of the cube and was kept xed throughout the sequence. Camera rotation was speci ed as given in Figure 1 and the object was translated so that its centroid projected on to the principal point of each frame. A sequence of 140 images was generated by projecting the object points onto 512  512 pixel image planes with sub-pixel accuracy using a perspective camera model. The ( xed) focal length was chosen so as to yield good coverage of image points across the image planes. Gaussian noise with a standard deviation of 2 pixels was added to all points in all frames. Shape space estimation error was de ned as the subspace distance between the estimated and the true shape spaces. Figure 2 shows typical convergences for each of the three methods. The methods perform similarly, with shape space being estimated reasonably accurately within 40 frames. Shape error was de ned as the root-mean-square of the distance between the recovered shape and the true shape, divided by the object size. Figure 3 again shows a typical convergence for each of the methods, with accurate estimates of shape being obtained within 40 frames. Camera rotation errors were de ned as the di erence between the estimated and the true values for roll, pitch and yaw, and are shown in Figure 4. The errors for each of the methods settle quickly to within 1 degree.

is the 3  2 gain matrix. Using eq. (27) it is easy to verify that gk = Pku^k: (29) Noting that

gku^Tk Pk?1 = (gku^Tk Pk?1)T = Pk?1u^kgkT ; (30) we may use eqs. (23)-(29) to obtain

Qk = Mk Pk = Mk?1(Pk?1 ? gk u^ Tk Pk?1) + uk gkT (31) = Qk?1 + (uk ? Qk?1u^ k )gkT : The RLS scheme is now given by:

Initialization: P0 = I3 ( is a small positive number) Q0 2 RP 3 with orthonormal columns Update equations: u^k = QTk?1uk gk = Pk?1u^k(I2 + u^Tk Pk?1u^k)?1 Pk = Pk?1 ? gku^Tk Pk?1 Qk = Qk?1 + (uk ? Qk?1u^k)gkT The initial values for P0 ; Q0 should be set properly to ensure convergence [5]. Since the covariance matrix of u^ k is positive de nite, P0 should be initialized as a symmetric positive de nite matrix, for example  I3 . Q0 should have orthonormal columns, a simple choice

being the rst three columns of the P  P identity matrix. Clearly, the updating of Qk requires only O(P ) operations, while classical subspace computation algorithms like the orthogonal iteration method normally need O(P 2) operations. After the shape space Qk is computed, the camera orientation vectors (up to a linear transformation) are given by ^iTk = xTk Qk; ^jTk = ykT Qk: (32) The true object shape Sk and camera orientation vectors ik ; jk are then computed in the same way as in the sequential factorization method. See [8] for details.

5.2 Real Images

In order to test the accuracy and applicability of the recursive method, a sequence of 120 real images of a grid was acquired using a Phillips CCD camera with a 12.5-mm lens. A total of 141 feature points were detected and tracked using a corner detector with subpixel accuracy. Figure 5 shows the rst image of the sequence. In acquiring the sequence, the camera was rotated by hand around the scene. A stream of 120 frames was grabbed at a rate of 15 frames per second. Application of the recursive method yields a good 3D reconstruction as indicated in Figures 6, 7 and 8. Coplanarity of points is well preserved.

4

6 Conclusion

[10] C. Tomasi and T. Kanade, \Shape and Motion from Image Streams under Orthography: a Factorization Method," International Journal of Computer Vision, 9, 2, pp. 137-154, 1992. [11] R. Tsai and T. Huang, \Uniqueness and Estimation of Three-dimensional Motion Parameters of Rigid Objects with Curved Surfaces," IEEE Trans. on Pattern Analysis and Machine Intelligence, 6, 1, pp. 13-27, Jan. 1984. [12] G. Xu and Z. Zhang, Epipolar Geometry in

We have proposed a new recursive update scheme for estimating object shape and camera motion from a stream of images. Its key advantage over the original and sequential factorization methods is that it signi cantly reduces the shape space updating cost to O(P ), while maintaining similar accuracy and robustness. The method is designed under the assumption of orthography, although experiments are conducted with real and synthetic imagery obtained under perpective projection. Excellent results in these experiments suggest that the recursive factorization method holds promise for use in real-time vision systems.

Stereo, Motion and Object Recognition: a Uni ed Approach, Kluwer Academic Publishers, 1996.

[13] B. Yang, \Projection Approximation Subspace Tracking," IEEE Trans. on Signal Processing, 43, 1, pp 95-107, Jan. 1995.

Acknowledgments

The authors thank Darren Gawley for the use of his CATE [3] system to detect and track image feature points.

50

Roll

45

References

Pitch 40

[1] M.J. Brooks, W. Chojnacki and L. Baumela, \Determining the Egomotion of an Uncalibrated Camera from Instantaneous Optical Flow," Journal Optical Soc. America A, 14, 10, pp. 2670-2677. [2] R. C. Bolles, H. H. Baker, and D. H. Marimont, \Epipolar-plane Image Analysis: an Approach to Determining Structure from Motion," International Journal of Computer Vision, 1, 1, pp. 7-55, 1987. [3] D. Gawley, Tracking Image Features in Uncalibrated Video Streams, Honours thesis, Department of Computer Science, Univ. Adelaide, Nov. 1998. [4] G. H. Golub and C. F. Van Loan, Matrix Computations, Second Edition, The Johns Hopkins University Press, 1989. [5] S. Haykin, Adaptive Filter Theory, Third Edition, Prentice-Hall Inc, 1996. [6] R. Jain, R. Kasturi and B. G. Schunck, Machine Vision, MIT Press and McGraw-Hill Inc., 1995. [7] S.J. Maybank and O.D. Faugeras, \A theory of self-calibration of a moving camera," International Journal of Computer Vision, 8, 2, pp. 123-151, 1992. [8] T. Morita and T. Kanade, \A Sequential Factorization Method for Recovering Shape and Motion from Image Streams", IEEE Trans. on Pattern Analysis and Machine Intelligence, 19, 8, pp. 858867, Aug. 1997. [9] C. J. Poelman and T. Kanade, \A Paraperspective Factorization Method for Shape and Motion Recovery," European Conference on Computer Vision, pp. 97-108, May 1994.

Yaw

Rotation (deg.)

35

30

25

20

15

10

5

0

0

20

40

60

80

100

120

140

Frame

Figure 1: The de ned camera rotation 0

10

Recursive method Sequential method

Subspace distance

Original method

−1

10

−2

10

0

20

40

60

80

100

Frame

Figure 2: Shape space errors

5

120

140

0

10

Recursive method Sequential method

Shape error

Original method

−1

10

Figure 5: The rst image of the grid sequence −2

10

0

20

40

60

80

100

120

140

Frame

Figure 3: Shape errors

Roll error (deg.)

1 0 −1 −2

0

20

40

60

80

100

120

140

80

100

120

140

Figure 6: Front view of the reconstructed grid

Frame Pitch error (deg.)

2 1 0 −1

0

20

40

60 Frame

Yaw error (deg.)

2

Figure 7: Oblique view of the reconstructed grid

1 0 −1

0

20

40

60

80

100

120

140

Frame

Figure 4: Rotation errors. The errors of the recursive, sequential and original methods are plotted using solid, dashed and dotted lines, respectively.

Figure 8: Overhead view of the reconstructed grid 6