Reconstruction from Image Streams : Continuous Multilinear Constraints Anders Heyden Dept of Mathematics, Lund University Box 118, S-221 00 Lund, Sweden email:
[email protected]
Abstract This paper deals with the problem of analysing continuous streams of images of rigid point objects taken by uncalibrated cameras. This analysis will also give some new insight into the problem of estimating structure and motion from image sequences. We study the velocity case of the so called multilinear constraints that exist for each subsequence in a sequence of images. These multilinear velocity constraints link the infinitesimal motion of the image points with the infinitesimal viewer motion. The analysis is done both for calibrated and uncalibrated cameras. Two simplifications are also presented for the uncalibrated camera case. One simplification is made using affine reduction and kinetic depth. The second simplification is based upon a projective reduction with respect to the image of a planar patch. The main results are the generalisation of the bilinear and trilinear constraints to the continuous case. Although all multilinear constraints will be generalised to the continuous case. Furthermore, it is shown that in order to reconstruct the scene, the third order continuous multilinear constraints, which are generalisations of the trilinear constraints, are needed.
1 Introduction
first one is the affinely reduced setting where three corresponding points in each image are used as an affine baA central problem in scene analysis is the analysis of 3D- sis. The second one is the traditional uncalibrated setobjects from 2D-images, obtained by projective trans- ting where no particular choice of coordinates have been formations. In this paper we will concentrate on the case made. The third one is the projectively reduced setting, of a continuous stream of images of rigid point configu- where three or more coplanar object points are used to rations, with known correspondences. The objective is reduce the number of indeterminates. The fourth one is to calculate the shape of the object using the shapes of the traditional calibrated setting, where the intrinsic pathe images and to calculate the camera matrices, which rameters of the camera are known. gives the camera movement. We will present a method where no camera calibration is needed; making it possible to reconstruct the object and the camera movement 2 Preliminaries up to a projective transformation. One interesting question is to analyse the multilinear 2.1 Camera Matrices in Reduced Affine Coordinates constraints that exist between corresponding points in an image sequence. It is well known that corresponding We assume that the camera is described by the following points in two images fulfill a bilinear constraint, known standard model, as the epipolar constraint. This can bee represented by 2 3 a three by three matrix called the essential matrix in the 2 3 X x calibrated case and the fundamental matrix in the uncal6Y 7 7 (1) λ 4y 5 = Pi 6 ibrated case. In the continuous time case similar con4 Z 5 ; i = 1; : : : ; m ; 1 straints exist. One talks about the infinitesimal epipole 1 or the focus of expansion. This has been studied by photogrametrists in the calibrated case and recently by where i denotes the image order number, x and y are the Faugeras and Vieville in the uncalibrated case, cf. [8]. image coordinates, Pi is a projection matrix, X, Y and We will use four different settings of the problem of Z are the coordinates of the object and λ is a scale facand different images). estimating structure and motion from images sequences tor (different for different points ? m and image streams, for further details see [1], [4]. The Given a sequence of images, Yi i=1 , represented by the This work has been supported by the ESPRIT project BRA EP 6448, VIVA, and the Swedish Research Council for Engineering Sciences (TFR), project 95-64-222.
coordinates of corresponding points in each image, from this model our aim is to reconstruct the object up to projective transformations, reconstruct the camera motion, also up to projective transformations, and to obtain a canonic description of the imaging situation. Since the camera matrices Pi are unknown we can multiply (1) from the left by an arbitrary nonsingular 3 3 matrix, which corresponds to choosing different projective coordinate systems in the images. This was done in [3] using reduced affine coordinates, i.e. the first three points in each image constitute an affine basis. Observe that using these coordinates in the images means that if (x; y; z) are reduced affine coordinates, then x + y + z = 1. In the same way, we can multiply the camera matrices, Pi , in (1) from the right by an arbitrary nonsingular 4 4 matrix, which corresponds to choosing a different projective coordinate system in the object. The choice of reduced affine coordinates in the object in [3] was achieved in this way by letting the first focal point together with the first three object points build up a standard affine basis. This means that if (X ; Y; Z; W ) are coordinates in the object, then X + Y + Z = 1, which means that the plane at infinity is described by X + Y + Z = 0, assuming that no point in the object is located at the plane at infinity. This also means that the first focal point is located at the plane at infinity, which means that the first camera matrix represents a parallel projection. Observe also that this choice of coordinates implies that λ = 1 for all points in the first image. This choice of coordinates is basically the same as the relative affine coordinates in [7]. The following theorem can be found in [5].
same point and noticing that they are linear in the object coordinates (X ; Y; Z) and the scalar factors λ1 ; : : : ; λm gives 2
D1 6 D2 6 6 rank 6 D3 6 . 4 .. Dm
t¯1 t¯2 t¯3 .. . t¯m
x1 0 0 .. .
0 x2 0 .. .
0 0 x3 .. .
:::
0
0
0
:::
3
0 07 7 07 7 m+3 .. 7 . 5
::: :::
..
.
;
xm (3)
where xi = (xi ; yi ; 1) denotes homogeneous coordinates in the i:th image. Picking out submatrices containing all three rows corresponding to two images and the corresponding nonzero columns gives the so called bilinear constraints. In the same way, picking out two rows corresponding to two images and three rows corresponding to a third image gives the so called trilinear constraints. It turns out that the rank condition in (3) can be expressed using only the bilinear and trilinear constraints, but the bilinear constraints alone are not sufficient, see [5].
3 Problem Formulation
In this section we will study the multilinear constraints in the continuous case, where a stream of images is available. This will be done from Taylor series expansions of the camera equations and will be carried out in the affinely reduced setting. However, the method is general and can easily be translated to any of the other four setTheorem 2.1. The camera matrices for a sequence, tings. ? m Consider the equations describing the projection of a Yi i=1 , of images, using reduced affine coordinates, point, X = (X ; Y; Z; W ), in reduced affine coordinates to a can be uniquely represented as point, xi = (xi ; yi ; zi ), also in reduced affine coordinates, Pi = [ Di j ? Di ti ]; i = 1; : : : ; m ; (2) in image i, cf. Theorem 2.1, λi xi = Pi X = [Di j ¯ti ]X
where D1 = I, t1 = 0¯ and det Di = 1;
jt 2 j = 1
:
Notice the similarities with the calibrated case, see also [1], where the camera matrices can be written Pi = [ Ri j ? Ri ti ]
;
:
(4)
Here Di is a diagonal matrix with the kinetic depth components for the first three points on the diagonal, with D0 = I, and ¯ti is the translational vector from camera i ¯ This discrete representation in to camera 1, with ¯t1 = 0. (4) corresponds to the continuous representation
(5) λ(t )x(t ) = P(t )X = [D(t ) j ¯t(t )]X ; with Ri denoting an orthogonal matrix. Now it is possible to calculate the positions of the focal point, Zi , of camera i as the nullspace of Pi , which is (ti ; 1) = where t is a continuous parameter, representing time. (t1i ; t2i ; t3i ; 1). Then all possible camera locations and the Here we have corresponding reconstructions can be calculated from D(0) = I; ¯t(0) = 0¯ ; λ(0) = 1 ; (6) this representation by projective transformations. ¯ ¯ 2.2 Multilinear Constraints in the Discrete corresponding to D0 = I, t0 = 0 and the fact that all scale factors in the first image in the discrete case are equal to Time Case 1. Observe that D(t ) is diagonal for all t, because Di is Using the camera matrices, Pi , in (2) together with the diagonal for all i. camera equations in (1) for the first m images for the Making Taylor series expansions of the time depen-
This system of equations, for k = 1; : : : ; n has a solution for λ1 ; λ2 ; : : : ; λn and X if and only if
dent functions in (5), we get 8 λ(t ) = λ0 + λ1t + λ2t 2 + : : : > > > > < D(t ) = D0 + D1t + D2t 2 + : : : > ¯t(t ) = ¯t0 + ¯t1t + ¯t2t 2 + : : : > > > : 0 1 2 2
x(t ) = x
+x
t +x t
+ :::
(7)
2 1 x 6 x2 6 6 4
? D1x0 ? D2x0
.. . xn ? Dn x0
:
t, Dk
¯t1 ¯t2 .. . ¯tn
x0 x1 .. .
xn?1
0 x0 .. .
xn?2
::: :::
..
.
:::
3
0 07 7 .. 7 n + 1 .5 x0
Since D(t ) is a diagonal matrix for all is a diagonal matrix for all k. We will use the notation 8 k D = diag(d1k ; d2k ; d3k ) >
: k x = [ xk yk zk ]T
:
Theorem 4.1. In a stream, Y (t ), of images the affinely reduced image coordinates x0 and their derivatives, xi , k up to order n at the same instant of time obey the n:th orObserve that λk = k! ddt kλ , and similarly for the other varider continuous constraints. This means that there exist ables. This means in particular that x1 has the meaning a solution to (14) for D1 ; D2 ; : : : ; Dn , and ¯t1 ; ¯t2 ; : : : ; ¯tn . of image velocity. Using (6) we obtain Proof. See the discussion above. λ0 = 1; D0 = I; t¯0 = 0¯ : (9) :
Since D(t ) is undetermined up to a scalar factor, one way to enforce uniqueness is to require det D(t ) = 1, for all t. This condition is of course fulfilled for D(0) and a Taylor series expansion of D(t ) in (7) gives
Remark. Observe that (14) are nonlinear in x0 ; x1 ; : : : ; xn?1 and that there are 5n ? 1 independent parameters to estimate. This can be seen from the fact that we may impose the conditions tr Di = 0 and j¯t1 j = 1.
det(D0 + D1t + D2t 2 + : : : ) = 0 1 2 2 = det D + trD t + (trD + : : : )t + : : : = 1 2 2 = 1 + trD t + (tr D + : : : )t + : : :
(10)
;
4.1 The First Order Continuous Constraints where tr means trace. Thus we have tr D1 = 0. The coefficients for t k in (10) are complicated expressions in Di . In a stream, Y (t ), of images the affinely reduced image It can be seen that the coefficient of t k can be written as coordinates x0 and the image velocities x1 at the same tr Dk plus terms involving Di for i < k. This indicates instant of time obey the first order continuous constraint another way to ensure uniqueness, by claiming (15) det x1 ? D1 x0 ¯t1 x0 = 0 : (11) D0 = I; tr Dk = 0; k 1 : The price we have to pay for this simplification is that Observe that the continuous constraint is more comdet D(t ) depends on t, det D(t ) = f (t ), where f (0) = 0, plicated than the discrete counterpart; the bilinear constraint. One complication is that (15) is not linear in x0 . f 0 (0) = 0, but in general f (k) (0) 6= 0 for k 2. It follows from (15) that ¯t1 only can be recovered up to a scale factor, which is in agreement with the discrete 4 Continuous Multilinear Con- case. In order to achieve uniqueness we may require that j¯t1 j = 1. We also have required that tr D1 = 0, which alstraints ways can be achieved. This can be seen from the fact that Inserting the Taylor series expansions of (7) into the the determinant does not alter when we add a multiple of the last column to the first column. This shows that the camera equation of (5) gives first order continuous constraint depends on 4 parame1 2 2 0 1 2 2 (1 + λ t + λ t + : : : )(x + x t + x t + : : : ) = ters, exactly the same as the reduced fundamental ma(12) 1 2 2 1 2 2 trix, see [3]. = [I + D t + D t + : : : j ¯t t + ¯t t + : : : ]X : Expanding the determinant in (15) we get a linear exIdentifying coefficient for a general term t k on both sides pression in ti1 and ti1 d ij . Putting d31 = ?d11 ? d21 gives gives an expression which is linear in (b1 ; b2 ; b3 ; b4 ; b5 ; b6 ), where xk + λ1xk?1 + : : : + λk?1 x1 + λk x0 = [ Dk j ¯tk ]X ) x
k
k
+
i k?i
∑λ x
i=1
k
= [D
j
¯tk ]X
b1 = t11 ;
:
b2 = t21 ;
b3 = t31 ;
b5 = d21t21 + 2d11t21 ; (13)
b4 = d11t11 + 2d21t11 ;
b6 = d11t31 ? d21t31
:
(16)
The three basis points trivially obey the linear constraint tings in the form and we need 5 further points, in total 8, to be able to solve 2 0 0 3 0 0 Q t¯ x0 ::: 0 for bi . When we have determined bi we can recover di1 6Q1 t¯1 x1 ::: 07 x0 0 and ti1 linearly, because b1 , b2 and b3 gives ti1 , and then 6 7 1 2 6Q2 t¯2 x2 1 1 x x ::: 07 b4 , b5 and b6 are linear in d1 and d2 . This is exactly the rank 6 7 n + 4; 6 . .. 7 .. .. .. .. . . same number of points as needed in the discrete case in . 4 . . .5 . . . . order to determine the reduced fundamental matrix. Qn t¯n xn xn?1 xn?2 : : : x0 Another possibility is to solve (15) with respect to D1 (19) and ¯t1 using an iterative method. It seems reasonable to set D1 = 0 to initialise and then we can estimate ¯t1 from where (15), for example by minimising in the traditional uncalibrated setting, the matrices 1 Qi are arbitrary but nonsingular matrices, Q0 = I 1 x0 ¯t1 x )2 ; ( det x ? D (17) 0 ∑ and Q1 is undetermined up to a choice of the plane at infinity (3 parameters), where the summation goes over all points. This minimi in the calibrated setting, the matrices Qi are anti sation problem has an explicit solution because the goal symmetric matrices and Q0 = I, function is quadratic. Given this estimate of ¯t1 we can minimise (17) again, this time with respect to D1 . Again in the projectively reduced setting, the matrices Qi we have an explicit solution, because the goal function are zero matrices, is quadratic. These steps can be repeated until D1 and ¯t1 stabilises. For a robust method to estimate the coef in the affinely reduced setting, the matrices Qi are ficients in the first order expansion of the epipolar condiagonal matrices and Q0 = I. straint, see [8]. This follows from the camera equations in (1) using Pi = [ Qi j t i ] and Qi = exp(Qi ). That is if the matrices Qi belongs to a Lie Group then the matrices Qi belongs to the 4.2 The Second Order Continuous Con- corresponding Lie Algebra. Observe that in all settings straints the matrices Qi are undetermined up to addition of an arbitrary multiple of the identity matrix, since Qi are undeIn a stream, Y (t ), of images the affinely reduced image termined up to scale. coordinates x0 , the image velocities x1 and the image accelerations x2 at the same instant of time obey the second 5.1 Discussion order continuous constraint The continuous constraints can be used to estimate struc 1 1 0 1 0 ¯ ture and motion from image sequences. Using only x ?D x t x 0 rank 2 (18) 2 0 ¯2 1 0 3 : the first order constraints gives just the direction of the x ?D x t x x movement of the camera. From the first order continu0 0 It follows from (18) that the length of ¯t1 and the length of ous constraints it is possible to obtain t (t ) and Q (t ) up to unknown scale factors for every t. Having only this in¯t2 can not be chosen independently. Once we have fixed formation it is not possible to recover entirely the camera the length of ¯t1 the length of ¯t2 is determined, which is in correspondence with the discrete case. In the same way, movement. Using the second order constraints gives also the secwhen we have determined D1 , with tr D1 = 0, we can deond order derivatives, t00 (t ) and Q00 (t ), with a scale contermine D2 , with tr D2 = 0, because the rank does not 0 change if we add a multiple of the last column to the first sistent with the first order derivatives, if t (t ) 6= 0. This recover the camera movecolumn. This shows that the second order continuous information can00 be used to ment. In fact, t (t ) and Q00 (t ) can be regarded as funcconstraint depends on 9 parameters, exactly the same as tions of t(t ), t0 (t ), Q(t ), Q0 (t ) and the image derivatives the reduced trilinear tensor, see [3]. x(t ), x0 (t ) and x00 (t ),
Q00 (t ); t00 (t )
5 The Continuous Multilinear Constraints in Four Different Settings The continuous multilinear constraints, derived above for the reduced affine setting, is not limited to this setting. They can be easily expressed in four different set-
=g
?
t(t ); t0 (t ); Q(t ); Q0 (t ); x(t ); x0 (t ); x00 (t )
It is well-known that such a differential equation can be solved, at least locally, given a set of initial conditions on t(0), t0 (0), Q(0) and Q0 (0). These initial conditions are determined by choosing a coordinate system, and at the same time fulfilling the second order continuous constraint at t = 0. Thus the full motion of the camera and the camera matrices is observable from the third order continuous constraint if t0 (t ) 6= 0. This is analogous to
:
the discrete case, where the trilinearities must be used in order to estimate the camera movement, if only multilinear constraints between consecutive images are used.
Here p1 should be compared to (1; 1; 1) + hp1 = (1:0020; 0:9978; 1:0002) and ¯t1 to ¯t1 . The angle between ¯t1 to ¯t1 is 2:8 degrees.
Theorem 5.1. Given a stream of images, Y (t ), where the camera motion, t(t ) obeys the condition t0 (t ) 6= 0. 7 Conclusions Then the full camera motion is observable from the secThe continuous counterpart to the multilinear forms are ond order continuous constraints, but not from the first derived in all the four settings, showing again the simiorder continuous constraints. larities between the affinely reduced, the projectively reduced, the traditional uncalibrated and the calibrated settings. It is shown that the full camera motion is observ6 Experiments able from the second order continuous constraints. It is To illustrate the continuous constraints, we have used also illustrated in an example that the first order continuiterative methods as described above. We only con- ous constraint are compatible with the discrete countersider the first order continuous constraints. An image part. The example shows that the first order continuous sequence of an indoor scene have been used, see Figure 1, where one image in the sequence is shown. The constraint is comparable to the discrete case. However, whole sequence contains more than 200 images. To il- the continuous constraints are sensitive to noise because lustrate the applicability of the continuous constraints, of the derivations. The higher order continuous conwe have only used 2 images. Points have been extracted straints are even more sensitive, involving higher orusing a corner detector, made available by [6], and we der derivatives. Using filter techniques to estimate the have used 28 points with correspondences. The affinely derivatives from image coordinates in more than two imreduced coordinates have been calculated from the im- ages could reduce the influence of noise. Despite these drawbacks we believe that the continages, giving x(0) and x(h), where h denotes the time increment between the different exposures. In this case uous constraints are helpful to understand and deal with h = (1=25)sec. We have used x0 = x(0). The derivatives image sequences where the time interval between the have been computed from image 1 and image 2 using a different exposures are small. difference approximation References x1 =
x(h) ? x(0) h
:
Figure 1: One image in the sequence used in the continuous case. Using the iterative approach and only 10 iterations, starting from D1 = 0, we obtain the following solution fulfilling the first order constraints: p1 = (0:0490; ?0:0543; 0:0053) ¯t1 = (0:4838; 0:3088; ?0:8189)
;
(20)
:
This solution can be compared to the solution obtained in the discrete case between x(0) and x(h): p1 = (1:0021; 0:9976; 1:0003) ; ¯t1 = (0:5173; 0:2752; ?0:8103) :
(21)
˚ stro¨m, K., Heyden, A., Canonic Framework for [1] A Sequences of Images: Similarities between Calibrated and Uncalibrated Case, Proc. Symposium on Image Analysis, SSAB, Linko¨ping, Sweden, 1995, pp. 25-28. ˚ stro¨m. K., Canonic Framework for [2] Heyden, A., A Sequences of Images: The Uncalibrated Case, Proc. Symposium on Image Analysis, SSAB, Linko¨ping, Sweden, 1995, pp. 21-24. [3] Heyden, A., Reconstruction from Image Sequences by means of Relative Depths, Proc. ICCV’95, IEEE Computer Society Press, 1995, pp. 1058-1063, Also to appear in IJCV, International Journal of Computer Vision. ˚ stro¨m, K., A Canonical Framework [4] Heyden, A., A for Sequences of Images, Proc. IEEE Workshop on Representation of Visual Scenes, 1995. [5] Heyden, A., Geometry and Algebra of Multiple Projective Transformations, Doctoral Thesis, CODEN:LUFTD2/TFMA--95/5002--SE, ISBN 91-628-1784-1, Lund, Sweden, 1995. [6] Lindeberg, T., Bretzner, L., personal communication. [7] Shashua, A., Trilinearity in Visual Recognition by Alignment, ECCV’94, Lecture notes in Computer Science, Vol 800. Ed. Jan-Olof Eklund, SpringerVerlag, 1994, pp. 479-484. [8] Vieville, T., Faugeras, O., D., Motion analysis with a camera with unknown, and possibly varying intrinsic parameters, Proc. ICCV’95, IEEE Computer Society Press, 1995, pp. 750-756.