ESTIMATING APPROXIMATE AVERAGE SHAPE AND MOTION OF ...

2 downloads 0 Views 3MB Size Report
May 27, 2005 14:56 WSPC/115-IJPRAI. SPI-J068 00415. International Journal of Pattern Recognition and Artificial Intelligence. Vol. 19, No. 4 (2005) 585–601.
May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

International Journal of Pattern Recognition and Artificial Intelligence Vol. 19, No. 4 (2005) 585–601 c World Scientific Publishing Company 

ESTIMATING APPROXIMATE AVERAGE SHAPE AND MOTION OF DEFORMING OBJECTS WITH A MONOCULAR VIEW

TAEONE KIM∗ and KI-SANG HONG† Department of Electronic and Electrical Engineering Pohang University of Science and Technology (POSTECH) San 31, Hyojadong, Namgu, Pohang 790-784, Republic of Korea ∗[email protected][email protected]

With a monocular view, the nonrigid recovery of 3D motion and time-varying shapes of a deforming object may be impossible without any prior information, because ambiguous, multiple solutions exist for motion and shapes which produce the same projection image. In this paper, as a preceding step to the nonrigid recovery of a deforming object, we develop an approach for estimating the approximate average shape and motion of the object. This reasonably solves the ambiguity problem in nonrigid recovery. By investigating the internal structures of nonrigid objects, we introduce a novel concept, called DoN (Degree of Nonrigidity). Based on this, we propose an iterative certainty reweighted factorization method. In addition, we refine and improve the method by reformulating it in a robust manner to cope with outliers existing in the tracked features. Finally, we present some experimental results on both synthetic data and a real video sequence. Keywords: Nonrigid recovery; deforming objects; average shape and motion; structure from motion; monocular view.

1. Introduction 1.1. Previous work on nonrigid recovery Nonrigid recovery refers to reconstructing 3D motion and time-varying shapes of a nonrigid object which deforms in space with respect to time. In past decades, many researchers in the computer vision community have studied nonrigid recovery techniques, based either on a monocular view or on multiple viewpoints.13,15,16,18,23,24 Generally, FEM (Finite Element Method)-based approaches8,14,15,18 model the surface as having some type of elastic property,2 represent the deforming shapes using vibration modes, and compute the resulting equation by the FEM (Finite Element Method). They also incorporate visual cues such as optical flow into the modal framework to improve the ability to estimate motion and nonrigid shapes.9 However, these model-based methods become heavily dependent on the elastic property †Author

for correspondence. 585

May 27, 2005 14:56 WSPC/115-IJPRAI

586

SPI-J068 00415

T. Kim & K.-S. Hong

of the surface or the ability of shape representation of modeling primitives. These limitations make the versatility of the approaches dependent on the applications, and parameters or elastic properties such as stiffness are readjusted or changed accordingly. For a survey of other related techniques, we refer the readers to Ref. 1. There are also different approaches for a dense and accurate nonrigid recovery. Blanz et al.3 constructed a textured 3D morphable face model from an example of 3D face models. By using the morphable face model, they show that a new face model responding to input images can be reconstructed by automatically matching the morphable model to the images. However, a large set of correspondence established face models is prerequisite for constructing the initial morphable model. Vedula et al.23,24 proposed a framework for computing the dense, nonrigid scene flow of a nonrigid scene using optical flows obtained by multiple cameras at a certain time and demonstrated its good performance on real videos. Recently, Bregler et al.6 proposed a method for factoring motion and timevarying shapes directly from the measurement matrix composed of tracked feature points. Assuming an affine camera (orthographic, weak-perspective), they extended the well-known rigid factorization20 to the nonrigid case by approximating the space of a deforming object using a linear combination of basis shapes so that the rigid factorization equation is nicely converted into a more general form to cope with object deformation.a Based on this work, Torresani et al.21,22 suggested a new tracking method that tracks the point features of deforming objects accurately by exploiting a rank constraint as an extension of the subspace constraint of Irani11 to the nonrigid case. They also addressed handling the occlusion problem. Brand et al.4,5 demonstrated interesting results for both nonrigid recovery and the tracking problem through an elaborate rederivation of the nonrigid factorization method. There are also factorization based approaches that use two or more viewpoint information. Torresani et al.22 showed the possibility of the nonrigid factorization formulation being adapted to multiview cases, and Bue et al.7 implemented it and demonstrated its feasibility on stereo video sequences. Although this multiview factorization yields more accurate results compared to the monocular based approach, it needs spatially synchronized feature points across the views, which to date requires manual interruptions. 1.2. Shape and motion ambiguity However, with a monocular view, the nonrigid recovery of 3D motion and timevarying shapes of a deforming object may be impossible without further information, e.g. the initial shape used in the FEM-based approaches15,18 and multiple viewpoint information,7,23,24 because multiple solutions exist for the motion and shapes that produce the same projection image. This ambiguity is obvious if we a For future reference, we refer to Bregler et al.’s factorization method as the nonrigid factorization method.

May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

Estimating Approximate Average Shape and Motion

Fig. 1.

587

Illustration of ambiguous motion and shapes from a monocular view.

consider Fig. 1. Let us assume that an object, composed of four darkly filled circles at time t = 0, deforms with respect to time (see Fig. 1). Note that I 0 , I 1 and I 1 denote the images at time t = 0 and t = 1 and the arrows represent the orthographic projection.20 We can see that the image I 1 can be an alternative to I 1 at t = 1 if each darkly filled circle moves to an arbitrary position on the corresponding dotted projection ray instead of to the lightly filled one on the projection ray of I 1 . Note that, although the images I 1 and I 1 at time t = 1 have the same projection, their motion is different. As a result, it is clear that ambiguity always exists when reconstructing deforming objects using only a monocular view. Furthermore, taking a close look at Fig. 1, we observe that the ambiguity problem occurs because there is no distinction between local displacements (or local variation) of the four circles and rigid motion (or external motion) of the image plane or object. More generally, with a monocular view, the external motion of an object can be replaced by local variation of the object shape for generating exactly the same shape with respect to a fixed coordinate system. Therefore, we conclude that, to resolve this situation, it is necessary to discriminate the two types of motion of a deforming object: local variation and external motion. (Note that we use the term local variation instead of local motion to emphasize that the motion of a deforming object should be defined by the external motion irrespective of local shape variation.) It is interesting to note that this ambiguity is not solved by the nonrigid factorization method,6 which will be confirmed by performing an experiment on a real video in a later section. The nonrigid factorization method basically

May 27, 2005 14:56 WSPC/115-IJPRAI

588

SPI-J068 00415

T. Kim & K.-S. Hong

has no measure in its formulation to discriminate the two motion types and so fails to resolve the ambiguity problem, yielding only one of multiple solutions. Note also that Brand et al.4,5 used a “parsimony” constraint as a regularization on the shape basis in Ref. 4 or a Gaussian prior to the deformation coefficients in Ref. 5 to avoid the problem. But it seems that regularization is done uniformly on the surface points of an object irrespective of different deformation degrees of each surface point, which will be addressed in this paper. As a deforming object changes its shape over time, it is natural to define the time average of varying shapes as the average shape. Next, the deformation of a nonrigid object can be described by a combination of the two types of motion. That is, a varying shape of the object at an arbitrary time can be explained by the local variation of the average shape after it is rotated and translated rigidly. Later, we will show experimentally that the motion of average shape is natural for human perception of the motion of a deforming object, e.g. a human face, and that it can also be an important cue for reasonably performing a nonrigid recovery, which remains for future work. In this paper, as a preceding step to the nonrigid recovery of a deforming object, we develop an approach for estimating approximate average shape and its corresponding motion of the object. To accomplish this, we investigate internal structures of nonrigid objects and introduce a novel concept, called DoN (Degree of Nonrigidity). Briefly, the DoN represents the range spanned by each surface point of a time-varying object due to its local variation in space. By being used as weights in the factorization scheme proposed in this paper, the DoN enables us to estimate the approximate average shape and motion of deforming objects. This procedure is conceptually similar to Irani et al.,12 except that the weights are unknown. The weights, or the DoNs, are estimated in an iterative manner. The remainder of this paper is organized as follows. Section 2 explains both the rigid and the nonrigid factorization methods and presents the ambiguity problem, which is not solved by the nonrigid factorization method. The degree of nonrigidity is introduced in Sec. 3. We propose an iterative certainty reweighted factorization method and reformulate it in a robust manner to cope with outliers in Sec. 4. Finally, we present experimental results on both synthetic data and a real video in Sec. 5 and conclude in Sec. 6. 2. Factorization Methods: Review 2.1. (Rigid) factorization method Let us define 2F × P measurement matrix as  T   w1 x11 · · · x1P  T  x21 · · · x2P   w2   W = . ..  , or W =   .. .  . .   . xF 1 · · · xF P wTF

   ,  

(1)

May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

Estimating Approximate Average Shape and Motion

589

where xij = (xij , yij )T is the jth tracked feature point in the ith frame out of P feature points viewed over F frames. If the measurement matrix W is composed of feature points of a rigid scene, it is well known that the matrix can be approximated ˆ which is decomposed concisely into two terms: motion and rigid shape20 ; by W ˆ +ε W =W where ε represents a noise term and ˆ =M ˆ 2F ×3 S ˆ 3×P W = M GG−1 S, ˆ = [R1 · · · RF ]T and S ˆ = [X 1 · · · X P ]. Specifically, Ri is a 2 × 3 rotation where M matrix, X j is a 3 × 1 scene point, and G is a 3 × 3 nonsingular matrix. The pseudo ˆ and S ˆ by G. motion and structure M and S are transformed to Euclidean forms M Note that the translation is subtracted from the measurement matrix so that the origin of the coordinate system becomes located at the centroid of the recovered structure. 2.2. Nonrigid factorization method Bregler et al.6 extended the rigid factorization method to cope with tracked feature points of deforming objects. The shape at each time step is represented as a linear combination of a fixed number (K) of basis shapes; note that the measurement matrix is approximated by a rank-3K matrix when K basis shapes are used. Next, wi is approximated by the reprojection of a shape as: ˆ Ti = Ri wTi ≈ w

K 

lik S k ,

(2)

k=1

where S k are K basis shapes of 3 × P dimension and lik coefficients. Rewriting Eq. (2) for all frames yields:  l11 R1 l12 R1 · · · l1K R1  l21 R2 l22 R2 · · · l2K R2 ˆ = W ≈W  . ..  .. . lF 1 R F ˆ ˆ = M S.

lF 2 R F

· · · lF K R F

are the basis expansion     

S1 S2 .. .

    

(3)

SK

They first perform rank-3K SVD factorization on the measurement matrix, and subsequently apply rank-1 factorization on a term derived by the appropriate rearrangement of the motion matrix. For details, refer to Ref. 6. We applied the nonrigid factorization method to a video sequence in Fig. 6, and show the computed shape corresponding to the first frame in Fig. 2. According to Fig. 2, we observe that the recovered face shape appears strange in terms of human perception; the area around the mouth protrudes slightly. This shows that the

May 27, 2005 14:56 WSPC/115-IJPRAI

590

SPI-J068 00415

T. Kim & K.-S. Hong

Slightly upper view

Side view

Fig. 2. Reconstructed shape for i = 1 frame displayed from two different viewpoints: we set the number of basis shapes to K = 5.

nonrigid factorization method does not seem to estimate the time-varying shapes correctly for our video. As noted in the introduction, the method has no measure in its formulation to avoid the ambiguity problem illustrated in Fig. 1. In fact, the ˆ is affected shape shown in Fig. 2 occurs because the computed motion matrix M not only by the motion of the near-rigid face area, e.g. the forehead area but also by the local variation of some part of the face, e.g. the mouth area. As a result, ˆ is also affected by the motion matrix M ˆ , yielding the shape in the shape matrix S Fig. 2. As a remedy for this problem, we introduce the concept called DoN (Degree of Nonrigidity) in the next section. 3. The Degree of Nonrigidity Let us suppose that there is a time-varying 3D object whose average shape is known, e.g. a deforming human face. Observation of the temporal shape variation of the object shows us that some near-rigid surface points of the object usually have small deviations from corresponding points on the average shape, and, by contrast, other heavily changing surface points of the object have more chances to deviate severely; we will call this degree of deviation of a point from its average point the degree of nonrigidity. Mathematically, if we assume that an object is composed of P surface points and denote its shape at time i as X i = [X i1 , . . . , X iP ] (i = 1, . . . , F ), the ¯ P ], is defined by ¯ = [X ¯ 1, . . . , X average shape, X F  ¯  1 X X i, F i=1

and the degree of nonrigidity of the jth point can be defined as DoNj 

F  i=1

¯ j )(X ij − X ¯ j )T . (X ij − X

(4)

May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

Estimating Approximate Average Shape and Motion

591

According to the above discussion, objects in the world can be categorized into three classes: rigid, near-rigid (or near-nonrigid) and fully nonrigid. For instance, objects in the near-rigid class are composed of parts that show both small and large deviations from the average shape, but those in the fully nonrigid class mainly show large deviations. Note that a deforming human face can be classified into the nearrigid class because areas such as forehead and nose will remain almost the same as the average shape, but those such as upper/lower lips and cheeks tend to deviate more. Our concern lies in the near-rigid objects for which we can do something more. 3.1. 2D projection of DoN The projection of the jth point and its average point onto the image plane xij = Ri X ij ¯ j, ¯ ij = Ri X x

and

where Ri represents ith rotation, yields the following reprojection error: ¯ ij . eij = xij − x

(5)

Then, C j which is a 2D projection of DoNj can be defined as F 

eij eTij =

i

F  i

 Cj,

¯ j )(X ij − X ¯ j )T RT Ri (X ij − X i (6)

where C j is a 2 × 2 matrix. Note that C j is the covariance of reprojection errors. At least, from Eq. (6), we can say that a small DoN yields a small covariance, though a large DoN does not necessarily mean a large covariance. The last exceptional case ¯ j ) is mainly along the viewing direction occurs when the local variation (X ij − X and this corresponds to the degenerate case to which any algorithm without a priori information cannot be applied successfully. 4. Proposed Method In general, a simple application of the rigid factorization method (the rank-3 factorization method) to the measurement matrix composed of feature points of deforming objects does not produce the desired average shape and motion, which can be seen from Fig. 3.b The covariances (DoNs) computed by Eq. (6) using the tracked feature points and the reprojected points of the recovered rigid shape are illustrated by overlaying the corresponding covariance ellipses centered at the tracked feature b Note

that applying the rigid factorization method to the measurement matrix gives almost the same results as the nonrigid factorization method with K = 1 according to our experiment, except that the shape is scaled by the deformation coefficients li1 . Note that the most dominant basis shape S1 resulting from the SVD has been considered to be an average shape, but our experiment shows that it is not always true.

May 27, 2005 14:56 WSPC/115-IJPRAI

592

SPI-J068 00415

T. Kim & K.-S. Hong

(a)

(b)

Fig. 3. (a) Overlaying the covariance (DoN) ellipses centered at 30 tracked feature points: They are computed via the rank-3 factorization method using Eq. (6). (b) A side view of the shape by the rigid factorization method.

points (the ellipses are properly scaled for illustration). Note that the rigid shape looks similar to the shape produced by the nonrigid factorization method shown in Fig. 2, although they are not the same. The covariance ellipses on the nose area are large, although they should be small. Nevertheless, we can observe a tendency that some points around the forehead and cheeks tend to have low covariances and some around the mouth have high ones. In fact, compared to the underlying true average shape and motion, those provided by the rank-3 factorization method usually increase the reprojection errors rather slightly for the near-rigid points but decrease the errors for the nonrigid ones. This is because the rank-3 factorization method tries to reduce the errors evenly for all the points of the object. Our idea is that using a method similar to the factorization method in Ref. 12 will produce shape and motion that fits more closely to the feature points having low covariances. Irani et al.12 proposed a certainty weighted factorization algorithm for handling noisy feature points with high directional uncertainty. They obtained better structure and motion estimates by assigning higher weights to the points with more certainty and lower to those with low certainty. Note that, in our problem, the noisy points with high directional uncertainty correspond to the nonrigid points with high DoNs. However, unlike Ref. 12, the covariances are not given in advance. In the next section, we devise an iterative method that estimates not only the covariances but desired approximate shape and motion, simultaneously.

4.1. Iterative Certainty Reweighted Factorization (ICRF) The desired shape and motion should produce small covariances for near-rigid points and large ones for nonrigid points. As noted, we accomplish this by adopting the certainty (or uncertainty) concept from Ref. 12. The covariance matrix C j represents the inverse of the certainty measure of the jth point. Our iterative certainty

May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

Estimating Approximate Average Shape and Motion

593

reweighted factorization (ICRF) procedure is described in the following: (1) Subtract the translation from the measurement matrix W ; the translation is initially obtained from the average location (centroid) of the tracked points. (2) Perform the rank-3 SVD on W ; this yields the pseudo motion and shape, M (0) and S (0) . (0) (3) Compute C j in Eq. (6) for each feature point using the error: (W −M (0) S (0) ). (4) Do the following iteratively (i−1)

as the uncertainty weight, recompute the pseudo motion and — Using C j (i) shape, M and S (i) , by the certainty-weighted factorization of Ref. 12. (i) (i)T (i) Let r ij  xij − mi sj . Then, Irani’s certainty-weighted factorization method minimizes the following cost function:  (i)T  (i) −T  (i) −1 (i) rij Cj Cj rij , min {M (i) ,S (i) }

(i)

ij

(i)T (i)T = [m1 · · · mF ]T (i) C j using the error:

(i)

(i)

and S (i) = [s1 · · · sP ]. where M (W − M (i) S (i) ). — Recompute — If the error does not increase, stop the iteration; otherwise go. • Finally, upgrade the pseudo motion and shape, M and S, to Euclidean ˆ and S. ˆ M The proposed algorithm usually converges within several iterations for our experiment using synthetic data sets. Note that we have started from the covariances obtained by the rigid factorization method to initialize the algorithm. In general, the reprojection error (W − M S) increases during iteration because, compared to the rigid factorization, it is inclined to fit the near-rigid points more closely to the tracked features rather than the nonrigid points. 4.2. Robust ICRF In real applications, outliers usually occur in tracked features. In particular, when we track feature points on a deforming surface, the outliers may occur more often in the nonrigid parts than in the near-rigid parts of the surface. To remove the negative effects of the outliers on the ICRF, robustness is incorporated into the ICRF 10 by adjusting the uncertainty weight C j as with the M-estimators. By defining dj  det C j , the weight is recomputed as  2 di ≤ σ j /(dj )   √ C 2 Cj = σ C j /(dj ) σ < dj ≤ 3σ   0 dj > 3σ. The standard deviation σ is usually set to a maximum likelihood estimate using the median: meddi . σ= 0.6745

May 27, 2005 14:56 WSPC/115-IJPRAI

594

SPI-J068 00415

T. Kim & K.-S. Hong

5. Experimental Results To test and validate our method, we have conducted experiments on both synthetic data and a real video. Here we specify the properties of synthetic data used for experiments: (1) Time-varying structures • 30 points are generated randomly on the half sphere of radius 320. • Each coordinate (x, y or z) of a point varies randomly in the interval [−L, L] with respect to time. • For a point, L is determined depending on whether the point is near-rigid or nonrigid. — The near-rigid points have L arbitrarily selected in the interval [1 − 20]. — Similarly, the nonrigid points in [100 − 200]. (2) A sequence which is composed of a total of 100 views is generated by the following procedures: • The translation is randomly generated. • The rotation is smoothly generated by interpolating between randomly generated two quaternion vectors. (Note that we use the unit quaternion to represent the rotation.) (3) Gaussian noise with σ = 1.0 is independently added to the projected image points. A total of nine synthetic data sets are generated by varying the ratio of the number of near-rigid points to that of all points from 0.1 to 0.9, and we generated 100 sequences for each set. We applied the three methods to these data: rigid factorization, nonrigid factorization and ICRF. The robust ICRF method is omitted because it showed almost the same results as the ICRF method. The results are shown in Fig. 4; the left graph plots the RMS (Root Mean Square) rotation error (expressed in the unit quaternion) between the true and computed rotation, and the right graph shows the RMS shape error between the true and computed average shape. (Note that the true average shape is given by averaging each synthetic point over time.) For the nonrigid factorization method, the computed average shape is obtained by averaging the resulting time-varying shapes. The two graphs indicate that the ICRF method computes the motion and shape more closely to the true ones than the other two factorization methods. We also plot the three XY Z axis angles (or Roll/Pitch/Yaw angles) over frames for a sequence to compare the motion results with the ground truth angles in Fig. 5. Due to lack of space, we only show the graphs for which the ratio of the near-rigid points is set to a low value 0.2 to emphasize the performance of the ICRF method. Note that the angles computed by the ICRF method are not only closer to the true angles but also smoother than those by the other two methods.

May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

Estimating Approximate Average Shape and Motion The RMS unit quaternion error

0.11 0.1

Rigid Factorization Bregler method ICRF

30

0.08

25

0.07

Shape error

Quaternion error

0.09

The RMS shape error

35

Rigid Factorization Bregler method ICRF

595

0.06 0.05

20 15

0.04 0.03

10 0.02 0.01 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

5 0.1

0.9

0.2

the near-rigid points/the total points

0.3 0.4 0.5 0.6 0.7 the near-rigid points/the total points

(a)

0.8

0.9

(b)

Fig. 4. X-axis represents the ratio of the near-rigid points among points and Y -axis the error; (a) The RMS unit quaternion error. (b) The RMS shape error. Roll angle

0.5

Pitch angle

0.2

Ground truth Rigid Bregler ICRF

0

0

−0.2 −0.4 radian

radian

−0.5 Ground truth Rigid Bregler ICRF

−1 −1.5

−0.6 −0.8 −1

2 −2.5

−1.2 0

10

20

30

40 50 60 frame number

70

80

90

−1.4

100

0

10

20

30

40 50 60 frame number

(a)

70

80

90

100

(b) Yaw angle

3

Ground truth Rigid Bregler ICRF

2.5 2

radian

1.5 1 0.5 0 −0.5

0

10

20

30

40 50 60 frame number

70

80

90

100

(c) Fig. 5. Plot of roll/pitch/yaw angles computed by three methods and the ground truth angles where X-axis represents the frame number and Y -axis the angle in radian. (a) Roll angle. (b) Pitch angle. (c) Yaw angle.

May 27, 2005 14:56 WSPC/115-IJPRAI

596

SPI-J068 00415

T. Kim & K.-S. Hong

We have also tested our algorithm on a real video composed of 200 frames that records a deforming face of one who opens and closes his mouth with the head being rotated and translated simultaneously. Example frames are shown in Fig. 6. A total of 32 feature points are tracked using the KLT tracker.19 While marks are drawn on the face to facilitate feature tracking, outliers exist, in particular, around the mouth. Therefore, we applied the robust ICRF method instead of ICRF. In Fig. 7, we illustrate the performance of robust ICRF method by overlaying the changing covariances of feature points during iteration on the first frame. The covariances have converged only after a few iterations (see also Fig. 8), producing the shape shown in Fig. 9. We observe that it approximates the true average face well; it does not show protrusion of the mouth compared to the results in Figs. 2 and 3. Figure 8 shows that the RMS reprojection error between the tracked and reprojected points is stabilized at an error higher than that obtained by the rigid factorization method, as discussed in Sec. 4. We compare the motion results obtained by the three methods in Figs. 10 and 11; because we have no true motion with which to compare the results, we test the performance qualitatively by displaying the motion over time. The XYZ angle variation is plotted in Fig. 10. Note that the robust ICRF method shows angle curves smoother than those computed by the other two methods. In Fig. 11, we display the motion by overlaying the three orthogonal axes on the corresponding images. The three columns (left, middle and right) display the motion computed by the robust ICRF, rigid factorization, and nonrigid factorization, respectively. Note that we see that frames 42 and 193 should have a similar pose on the video. Although the poses of the three axes overlayed on the two frames in the left column are similar, those of the three axes on the two frames in the middle and the right columns, respectively, are not similar, particularly, the Z-axis directions are almost opposite to each other. This is because the motion results obtained by the two methods,

Fig. 6. Example frames of a real video are displayed in upper-left and lower-right order: 30 feature points are tracked by the KLT tracker over 200 frames.

May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

Estimating Approximate Average Shape and Motion

597

Fig. 7. Changing covariance ellipses during robust ICRF iteration are overlayed on the first frame (covariance scale is adjusted for the sake of visualization).

Plot of reprojection rms error

2.6

RMS pixel error

2.55

2.5

2.45

2.4

2.35

Fig. 8.

0

5

10

15 20 Iteration number

25

30

35

Plot of the RMS reprojection error (W − MS). X-axis represents the iteration number.

May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

T. Kim & K.-S. Hong

598

Slightly upper view

Side view

Fig. 9. Visualization of the approximate average shape obtained by the robust ICRF method. Compare the shapes to those in Figs. 2 and 3.

Roll angle

0.1

0.05

Pitch angle

0.2

Rigid Nonrigid Robust ICRF

Rigid Nonrigid Robust ICRF

0.1 0

radian

radian

0

−0.05

−0.1 −0.2 −0.3

−0.1

−0.15

−0.4

0

20

40

60

−0.5

80 100 120 140 160 180 200 frame number

0

20

40

(a)

60

80 100 120 140 160 180 200 frame number

(b) Yaw angle

0.2

Rigid Nonrigid Robust ICRF

0.15 0.1

radian

0.05 0

−0.05 −0.1 −0.15 −0.2

0

20

40

60

80 100 120 140 160 180 200 frame number

(c) Fig. 10. Plot of the roll/pitch/yaw angles computed by the three methods where the X-axis represents the frame number and Y -axis the angle in radian; (a) Roll angle. (b) Pitch angle. (c) Yaw angle.

May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

Estimating Approximate Average Shape and Motion

599

Fig. 11. Rotation of the face is represented by the three orthogonal axes overlayed on the corresponding frames; the axes are aligned to the first frame with the Z-axis inwardly orthogonal to the plane.

the rigid and nonrigid factorization methods are likely to be affected by the local variation of face, e.g. the mouth area as discussed in the introduction. 6. Conclusion and Future Work In this paper, we studied the problem of obtaining approximate average shape and motion of deforming objects using a monocular view. By investigating the internal structure of deforming objects, we have introduced the concept of DoN (Degree of Nonrigidity.) Based on this, we have proposed an iterative certainty reweighted factorization method and shown experimentally that it can be an important cue for resolving the ambiguity problem with a monocular-based nonrigid recovery. For complete nonrigid recovery (not only the average shape and motion but also the time-varying shapes), we may use the motion estimate of the average shape obtained by our method as an initial motion estimate for the iterative optimization method in Ref. 22 or the nonlinear bundle adjustment in Ref. 17, hopefully to produce more reasonable motion and time-varying structures of the objects. We remark that the recovered motion and shape obtained by our method are only approximate because

May 27, 2005 14:56 WSPC/115-IJPRAI

600

SPI-J068 00415

T. Kim & K.-S. Hong

nonrigidity is modeled by the covariance. However, if the reprojection errors in Eq. (5) are Gaussian distributed, the covariances or the DoNs are good enough to accurately estimate average shape and motion. In the future, we will concentrate on the problem of the complete nonrigid recovery of deforming objects. We expect that our results will be available for this problem.

References 1. J. Aggarwal, W. Liao, Q. Cai and B. Sabata, Nonrigid motion analysis: articulated and elastic motion, Comput. Vis. Imag. Underst. 70(2) (1998) 142–156. 2. R. Bajcsy, Multiresolution elastic matching, Comput. Vis. Graph. Imag. Process. 46 (1989) 1–21. 3. V. Blanz and T. Vetter, A morphable model for the synthesis of 3D faces, SIGGRAPH’99 (1999). 4. M. Brand, Morphable 3D models from video, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2001), pp. II: 456–463. 5. M. Brand and R. Bhotika, Flexible flow for 3D nonrigid tracking and shape recovery, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2001), pp. I: 315–322. 6. C. Bregler, A. Hertzmann and H. Biermann, Recovering non-rigid 3D shape from image streams, CVPR00 (2000), pp. II: 690–696. 7. A. Del Bue and L. Agapito, Non-rigid 3D shape recovery using stereo factorization, Asian Conf. Computer Vision (2004). 8. C. Davatzikos, D. Shen, A. Mohamed and S. K. Kyriacou, A framework for predictive modeling of anatomical deformations, IEEE Trans. Med. Imag. 20(8) (2001) 836–843. 9. D. Decarlo and D. Metaxas, The integration of optical flow and deformable models with applications to human face shape and motion estimation, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (1996), pp. 231–238. 10. P. Huber, Robust Statistics (John Wiley and Sons, 1981). 11. M. Irani, Multi-frame optical flow estimation using subspace constraints, in Proc. Int. Conf. Computer Vision (1999), pp. 626–633. 12. M. Irani and P. Anandan, Factorization with uncertainty, in Proc. Eur. Conf. Computer Vision (2000). 13. M. Kass, A. Witkin and D. Terzopoulos, Snakes: active contour models, Int. J. Comput. Vis. 1(4) (1987) 312–331. 14. J. Martin, A. Pentland and R. Kikinis, Shape analysis of brain structures using physical and experimental modes, Technical Report TR-276, MIT (1994). 15. C. Nastar and N. Ayache, Classification of nonrigid motion in 3D images using physicsbased vibration analysis, in Proc. IEEE Workshop on Biomedical Image Analysis (1994). 16. A. P. Pentland and B. Horowitz, Recovery of nonridid motion and structure, IEEE Trans. Patt. Anal. Mach. Intell. 13(7) (1991) 730–742. 17. H. Aanæs and F. Kahl, Estimation of deformable structure and motion, Workshop on Vision and Modelling of Dynamic Scenes, ECCV02, Copenhagen, Denmark (2002). 18. S. Sclaroff and A. Pentland, Physically-based combinations of views: representing rigid and nonrigid motion, in Proc. IEEE Workshop on Nonrigid and Articulate Motion (1994). 19. J. Shi and C. Tomasi, Good features to track, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (1994), pp. 593–600.

May 27, 2005 14:56 WSPC/115-IJPRAI

SPI-J068 00415

Estimating Approximate Average Shape and Motion

601

20. C. Tomasi and T. Kanade, Shape and motion from image streams under orthography: a factorization method, Int. J. Comput. Vis. 9(2) (1992) 137–154. 21. L. Torresani and C. Bregler, Space-time tracking, in Proc. Eur. Conf. Computer Vision (2002), pp. I: 801. 22. L. Torresani, D. B. Yang, E. J. Alexander and C. Bregler, Tracking and modeling non-rigid objects with rank constraints, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2001), pp. I: 493–500. 23. S. Vedula, S. Baker, P. Rander, R. Collins and T. Kanade, Three-dimensional scene flow, in Proc. Int. Conf. Computer Vision (1999), pp. 722–729. 24. S. Vedula, S. Baker, S. Seitz and T. Kanade, Shape and motion carving in 6D, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (June 2000).

Taeone Kim received his B.S. and M.S. degrees in electrical and electronic engineering from POSTECH, Korea, in 1996 and 1998, respectively, where he is now in the Ph.D. program. His current research interests include augmented reality (AR), nonrigid structure and motion (SaM), rigid SaM and pattern recognition.

Ki-Sang Hong received his B.S. degree in electronic engineering from Seoul National University, Korea, in 1977, his M.S. degree in electrical and electronic engineering in 1979, and his Ph.D. in 1984 from KAIST, Korea. From 1984 to 1986, he was a researcher with the Korea Atomic Energy Research Institute and in 1986, he joined POSTECH, Korea, where he is currently a professor with the Division of Electrical and Computer Engineering. From 1988 to 1989, he was a visiting professor with the Robotics Institute at Carnegie Mellon University, Pittsburgh, Pennsylvania. His current research interests include computer vision, augmented reality, pattern recognition and synthetic aperture radar image processing.

Suggest Documents