Shape Models from Image Sequences

5 downloads 29522 Views 362KB Size Report
SCHOOL OF COMPUTER STUDIES. RESEARCH REPORT ... moving autonomously in natural environments (e.g. the vehicle in Fig. 1). ..... Classical Mechanics.
University of Leeds

SCHOOL OF COMPUTER STUDIES RESEARCH REPORT SERIES Report 93.37

Shape Models from Image Sequences by

X Shen & D C Hogg

Division of Arti cial Intelligence October 1993

Abstract This paper describes an adaptive method for the recovery of 3-D shape models from sequences of images. A 3-D surface model, initialised to be spherical, is progressively deformed under the action of simulated external forces arising from the pro le of the target object in successive images, obtained using a lowlevel motion segmentation scheme. Intrinsic constraints encourage the model to deform smoothly and to remain symmetrical about a plane parallel to the direction of motion. Experimental results are presented for recovery of shape models of vehicles. The resulting models may be used for several purposes, including tracking, recognition and visualisation.

1 Introduction Shape is one of the main attributes of an object and is important in applications such as object tracking, recognition and visualisation [1, 2, 3, 4, 5]. A basic goal of computer vision is to obtain shape models for 3-D objects directly from images. Shape recovery based on a single 2-D image of an object is dicult due to the impoverished information for deducing the 3-D structure of the visible object surface and the lack of information about hidden parts of the surface. In practice, multiple images appear to be necessary, showing the object from di erent views. The problem is to synthesise shape information available from successive images into a single shape model. In this paper, we present an adaptive method for obtaining shape models of objects moving autonomously in natural environments (e.g. the vehicle in Fig. 1). We require two major assumptions about the objects to be modelled:

 they move on a ground plane,  they are rigid and symmetrical about a plane parallel to the direction of motion. These assumptions are reasonable for vehicle trac scenes. The method centres on the use of a 3-D physically-based surface model which is progressively deformed under the action of external forces, driving the model pro le towards the object pro le extracted from the image. Suitable intrinsic constraints maintain priori assumptions about the object shape. The model resulting at the end of an input sequence represents the 3-D object shape. Correct alignment of the model and object pro les in successive images is achieved by pre-processing the sequence using a motion segmentation algorithm to detect and track objects, and ultimately to infer their poses in 3-D. The model is initialised to 1

Figure 1: A typical scene image from a trac monitoring system a sphere and deformed incrementally for each image in the sequence. On each step, the model surface deforms elastically away from its current shape. The remainder of the paper is organised as follows: Section 2 discusses related approaches to shape recovery. Section 3 discusses the low-level processing strategy used to obtain object pro les and model poses for successives frames. Section 4 describes the deformable model and its intrinsic constraints. Determination of the external forces is discussed in Section 5. Section 6 gives implementation details. Experimental results are presented in Section 7, and the conclusion is given in Section 8.

2 Related work There is a considerable body of work on the reconstruction of the visible surfaces of objects from their images (e.g. [6, 7, 8]). Typically, parametric functions are used to represent the patches of object surface, and parameters of these functions are estimated using features extracted from the image. Composing di erent surface patches which are characterised by these functions into an enclosed surface or volume model is a dicult problem. One approach is to infer the 3-D locations of scene points from the motion of point features (e.g. corners) in the image. Surfaces are laid onto these points either by triangulation (e.g. [9]) or using parametric surface functions. For example, Westphal and Nagel [10] produce polyhedral models by connecting prominent 3-D points with line segments. Connecting these prominent points to correctly re ect the corresponding surface characteristics of the object is hard. The resulting 3-D descriptions tend to lack smoothness due to irregular distributions of the prominent points on the object. It also depends crucially on the correct extraction 2

of these points from images. Stenstrom and Connolly [11] describe a method for producing a 3-D solid model of an object from multiple 2-D images. For each image, a 3-D bounding volume is produced by extracting the 2-D pro le information. This bounding volume is intersected with the previously obtained bounding volume to gradually re ne the model. The method is applied in a controlled environment so that images are obtained from predetermined known views. Some methods have been proposed for recovering object shape from range data or other 3-D data using generic or deformable models. Solina and Bajcsy [12] used superquadrics as a generic model to recover the shape of simple objects given range data. Miller, et al [13] use geometrically deformable model for recovering object shape given a set of 3-D data. Recently, physically based models have been proposed for modelling rigid and non-rigid objects [14, 15, 16]. By combining the dynamics, the model can accommodate both object shape and its motion. Shape recovery can be achieved by applying suitable external forces to the model. At present, these methods are generally applied to situations in which range data is available [17, 18]. An exception is the work of Terzopoulos, et al [19], in which 3-D shape information is extracted from pro le data using the assumption that the object is a surface of revolution. Our work extends their idea to cope with arbitrary object shape through deformation of a physically based model using pro le data obtained in multiple views and object pose information extracted through low-level preprocessing of the given input image sequence.

3 Low level processing 3.1 Coordinate systems

A world coordinate system X is chosen so that the ground plane on which the objects are moving is given by Z = 0 (Fig. 2). Scene points are expressed in homogeneous form so that (X; Y; Z; );  6= 0 represents the point (X; Y; Z ). The transformation from world coordinates to pixel coordinates x of the image, again expressed in homogeneous form, is characterised by a 3  4 calibration matrix C

x = CX

(1)

This matrix is assumed to be known. It incorporates the rigid transformation from world coordinates to camera coordinates, perspective projection, and an ane transformation to pixel coordinates of the image (e.g. to accommodate non-square 3

Z’ Y’ X’ Z Y

X

x y

Figure 2: De nitions of coordinate systems pixels and focal scaling). Non-ane intrinsic camera parameters, such as spherical distortion, are assumed to have been removed in advance. For our experiments, the calibration matrix is found automatically using statistical regularity in the projected sizes of moving objects (see Hogg, et al [20] for details). Finally, the model is expressed in a model centered coordinates system X 0.

3.2 Object detection Given an image sequence depicting an unknown target object in motion, our objective is to infer its shape from pro les obtained from the di erent views in successive images. As a rst step, the object is segmented in each image by di erencing the image with a background image (see Baumberg and Hogg [21] for details), and applying dilation and erosion operations on the thresholded di erence-image. This process returns a binary image with the foreground pixels just covering the projection of the object. The pro le of the object is extracted by simply tracking around the outside of the foreground pixels to give a continuous closed curve. Fig. 3 shows some results from this processing.

3.3 Pose establishment for the model The method depends on being able to pose the model consistently over the object for each image in a given sequence. This is essential for correctly applying external forces to the model. The centroid of the object region in each image is used to estimate a sequence of poses for the model in 3-D. These poses are expressed in terms of the positions and orientations of the model coordinate system within world coordinates, and are represented by a rigid transformation. 4

(1)

(2)

(3)

(4)

Figure 3: Some results from object detection. (1) the background image obtained by using median lter; (2) an object image; (3) extracted object region; (4) overlapping the region boundary to the original image For the ith image in the sequence, the model center X i is posed so that it projects onto the centroid xi of the segmented object in the image, i.e. X i lies on the line given by xi = CX i (Fig. 4). In general, X i will not correspond to a xed point in the object, but we have found, at least in our experiments, it remains suciently localised within the object to enable satisfactory shape recovery. At present, the height of the model center above the ground plane h is assumed known and xed throughout a sequence. Thus, X i can be obtained by solving the following equations: 8 < xi = CX i (2) : Zi = h We have investigated several ways in which h may be established automatically. 5

xi

model Xi h

Figure 4: Method for posing the model It may also be possible to avoid locating X i in depth explicitly through the use of the weak perspective camera model for the deformation stage. The model coordinate system is orientated so that the X 0-axis is aligned with the direction X i ? X i, and the Z 0-axis is normal to the ground plane. +1

4 Deformable model In this section, we describe the deformable model and the intrinsic constraint forces coupled to it.

4.1 De nition of the deformable model The physically based model is a simple closed surface represented in parametric form as r(u; v), where (u; v) 2 [0; 1] are the material coordinates (Fig. 5). The surface has the following boundary conditions: 2

r(u; 0) = r(u; 1) ; @@vr j u; = @@vr j u; r(0; v) = r(0; 0) ; r(1; v) = r(1; 0) (

0)

(

1)

(3)

i.e. the model is \seamed" by joining the curves v = 0 and v = 1, and has two poles. The model is progressively deformed for each image of the sequence according to extracted pro le information. Let r^(u; v) be the model resulting from the previous frame (the model priori to processing the rst frame is initialised to a sphere). r^(u; v) represents the best estimate of object shape to date. External forces derived from 6

^

d

r

Z’

Y’

r

X’

Figure 5: The deformable model the object pro le in the current image are applied to give an elastic deformation to r^(u; v). The model being deformed in the current frame, r(u; v), is expressed as

r(u; v) = r^(u; v) + d(u; v)

(4)

where, d(u; v) is the displacement away from the `reference' shape r^(u; v). Suppose the model is not subject to any shift during the deformation, its dynamic behaviour is governed by the following Lagrangian motion equations [15, 22]

d + d_ + dE (d) = f + g

(5)

dE (d) = f (r)

(6)

where  and are the mass density and damping density of the model; f is the net external force acting on the model, and g is the inertial force of the model; E (d) is the deformation energy produced due to the deformation of the model away from its original position. The variational derivative dE (d) is thus the elastic force so produced. Since our goal is to recover the shape of the object, we are not interested in how the model is continuously deformed. We only consider the static situation, i.e. assume the mass density and damping density are both zero. The governing equation is thus The deformation energy E (d) used here is the membrane deformation energy suggested by Terzopoulos [17] 1 !1 Z 0 0 @d ! 1 @ d E (d) = 2 @w @ @u + @v A + w d A dudv (7) 2

2

0

1

2

where w and w are weighting parameters used to control the local magnitude and variation of the deformation respectively. Taking the w and w as constants, its variational derivative is then [23] ! @ d ( u; v ) @ d ( u; v ) (8) dE (d) = w d(u; v) ? w @u + @v 0

1

0

2

2

0

1

7

1

2

2

This deformation energy encourages stepwise deformation to be smooth but does not prevent creases from developing in the surface over several frames should this be justi ed by the pro le data.

4.2 Intrinsic constraint forces Intrinsic constraints re ect the general assumptions about the nal model shape. They endow the model with particular shape characteristics. Regular spacing of the model's `latitudes' is maintained by a regularity constraint. A atness constraint induces the model to be locally at in certain parts. Finally, the model is constrained to be symmetrical about a vertical plane passing through the two poles. In order to impose these constraints conveniently, we pose the initial model in such a way that the line intersecting with its two poles is parallel to the ground plane. In the following discussion, we call this line the base axis of the model.

4.2.1 Regularity constraint The regularity constraint force maintains a uniform deformation of the model in the direction of the base axis. The idea is to encourage sample points in the discretisation of the model to be uniformly distributed. Let r^(u; v) and r(u; v) be the reference model and current model respectively. Let n^ and n be the unit direction vectors of the base axes corresponding to the reference model and the current model respectively. For each point r(u; v), the intrinsic force attracts it in the direction n to a position r(u; v) + kn which satis es the following equation (r(u; v) + kn ? r(0; 0))  n = (^r(u; v) ? r^(0; 0))  n^ (9) D D where, D is the distance between r^(0; 0) and r^(1; 0) (the poles of r^(u; v)) and D is the distance between r(0; 0) and r(1; 0) (the poles of r(u; v)). The constraint force f R is de ned as follows f R =  kn (10) 2

1

1

2

1

where the constant factor  controls the strength of the force. In the discrete situation, this intrinsic constraint force encourages the latitudes of the model to be regularly spaced along the base axis, thereby distributing sample points on the shape and avoiding local `bunching'. 1

8

r (u,v i ) l

m _

r (u,vi ) base axis

Figure 6: Determination of atness constraint force

4.2.2 Symmetry constraint The object is assumed to be symmetrical about a vertical plane parallel to its direction of motion. The symmetry constraint encourages the model to deform symmetrically. In order to simplify the computation of the symmetrical points on the model, we pose the initial model in such a way that the points with material coordinates u = 0 and u = 1=2 are on the symmetry plane. During shape recovery, the model will follow the motion of the object and be deformed symmetrically. Therefore, the symmetry plane of the object will always pass through the points with material coordinates u = 0 and u = 1=2. We implement this constraint indirectly by imposing external forces symmetrically (see section 5).

4.2.3 Flatness constraint The atness constraint force is coupled to the model in order to impose a tendency to

atness on particular parts of the model without preventing sharp corners to develop elsewhere. The force encourages the surface local to the intersection with the symmetry plane to be at. We call these local surfaces top surfaces and bottom surfaces according to whether they are above or below the plane which passes through the two poles and is orthogonal to the symmetry plane. This force was introduced to remove a tendency for creases to develop in the surface at the intersection with the symmetry plane. Further work is required to generalise the force without discouraging desired creases to occur in the surface. Let r(u; vi); a  vi  b be all on the top/bottom surface, r(u; vi) be the orthogonal projection of the r(u; vi) to the base axis. Let m be a unit vector which is on the symmetry plane and vertical to the base axis, with the direction away from the base axis (Fig. 6). To encourage a local atness in the direction normal to the symmetry plane, r(u; vi) is subjected to the following intrinsic force 9

f F =  ls

(11)

2

where,

s = krr((u;u; vvii)) ?? rr((u;u; vvii))k

l satis es the following equation

(r(u; vi) ? r(u; vi) + ls)  m = D

3

(12)

where, D = amin fj(r(u; vi) ? r(u; vi))  mjg. v b In the numerical implementation (see section 6), two lists which record the top and bottom nodes respectively are maintained during the deformation operations. A node is considered as a top/bottom node, if it has a top/bottom node as its neighbour on the same latitude, and its tangent with respect to v is within a preselected range. The initial top/bottom nodes are preset to the nodes which are located on the symmetry plane. 3

i

5 External force For each point on the model, we need to determine how much, and in which direction, the external force is acting on it. Intuitively, the force should be proportional to the distance between the data points on the object and the corresponding points on the model. This is relatively straightforward when 3-D data is available, and corresponding 3-D model points a ected by forces exerted from the data points can be determined [18, 24]. In our situation for which no explicit 3-D information is available, we choose the external forces in such a way that they draw the model pro le towards the object pro le. The set P of points on the model which project onto the model pro le is computed. The external forces are applied only to these points. Other points on the model are adjusted by virtue of the intrinsic forces. The external force on a point X in P is computed as follows (Fig. 7). Let X be the orthogonal projection of X onto the base axis (or the model center if X is a pole). Let x and x be the images of X and X respectively,

x = CX x = CX

(13)

Assume xp is the intersection of the line segment xx with the object pro le. Under perspective projection, there must exist a point X p on the line segment XX , which 10

xp

x

_ x X Xp

_ X

base axis

Figure 7: Method for determining the external force projects to xp. X p satis es the following:

X p = X + (X ? X )t xp = CX p

(14)

for some t. These 7 equations are sucient to solve for X p. For a nonconvex object, there may be more than one intersection point. In that case, we take the one which is farthest from the x. The Euclidean distance from X to X p is used to determine the external force acting on the point X in the direction X p ? X . We de ne the external force as follows:

f E =  kXX pk

(15) where,  is a constant factor which controls the strength of the force. k:k is the standard Euclidean norm. This force is transformed into the model coordinate system before applying it to the model. In the discrete situation, xp is determined by tracking a discrete line segment from x to x, nding the intersection point and projecting it to the line xx. The external forces are symmetrically applied to the model. For each point in P , we apply the symmetrical force to its corresponding point. It may happen that both symmetrical points are in P , and in uenced by di erent external forces due to noise involved in the extraction of the object pro le. In this case, we simply use the force which has the smaller magnitude. The net force applied to the model is now 3

3

f = fR + fF + fE 11

(16)

6 Implementation The system (6) is discretised for numerical simulation. We use the same solution method as Terzopoulos and Witkin [15]. The domain 0  u; v  1 of the material coordinate is discretised into a regular M  N discrete mesh of nodes (in our experiments, M = 32; N = 17). Nodes are indexed by integers (m; n) (0  m  M ? 1 and 0  n  N ). Using nite di erences to approximate the second derivatives, the discrete form of the equation (8) is dE (d)  ? hw d(m ? 1; n) ? hw d(m + 1; n) ? wh d(m; n ? 1) ? wh d(m; n + 1) m m n n 1 1 (17) +[w + 2w ( h + h )]d(m; n) 1

1

2

0

2

1

2

m

1

1

2

2

2

n

By collecting the components of d(m; n) and f (r(m; n)) into vectors, we have the following discrete approximation of equation (6)

K d = f (r)

(18)

where,

3 2 2 3 2 3 K 0 0 f ( r ) d x x x 7 6 6 7 6 7 K = 664 0 Ky 0 775 ; d = 664 dy 775 ; f (r) = 664 f y (r) 775 dz 0 0 Kz f z (r) Kx = Ky = Kz is a MN  MN matrix. It is symmetrical and sparse. dx,dy , dz , f x(r), f y (r), f z (r) are all M  N vectors. K is called the sti ness matrix. Since the system is non-linear, we cannot nd d in one step. Consequently, the following evolution equations are used 8 n < K d = f (rn ) (19) : rn = rn + dn n = 1; 2; : : : At each iteration, the model r evolves to a new shape according to the computed displacement d. By taking the evolved model as a new model, the corresponding external forces are again computed. The next deformation is performed by applying the governing equation to this new model. The sti ness matrix K is symmetrical and positive de nite. Equation (18) is solved using LU decomposition of K . As the parameters w and w are constants, the LU decomposition only has to be performed once during the whole shape recovery operation. Since the external forces are proportional to the distance between the object surface and model surface, quick convergence can be achieved by increasing parameter  , with the tradeo that the quality of the nal shape model may be sacri ced. +1

0

3

12

1

frame 4

frame 10

frame 16

frame 24

Figure 8: Four object images with the object at di erent positions and orientations

7 Experimental results Results are presented for a sequence of 24 images depicting a car turning into a parking space. Fig. 8 shows frames 4, 10, 16 and 24. Fig. 9 shows the initial model, and intermediate states of the model after processing of the same images. The model at frame 24 is also the nal model. These intermediate models are shown in the same pose states as those of the object. Fig. 10 shows the nal model from di erent views. Fig. 11 shows the model at frames 16 and 24 overlapping the object in the image. From these results, it can be seen that the latitudes of the model are regularly spaced due to the action of the regularity constraint force. The nal model is a plausible representation of the vehicle in the scene and should be suciently detailed and accurate for several purposes. 13

initial model

frame 4

frame 10

frame 24

frame 16

Figure 9: Initial model and intermediate models corresponding to the images in the gure 8; the model at frame 24 is the nal model

8 Conclusion We have proposed a method for generating 3-D object models from 2-D images using a deformable model coupled with suitable intrinsic constraints. Under the action of simulated external forces, the model is deformed gradually to approach the object shape as it follows the motion of the object. The simulated external forces are based on the object pro le, which is relatively easily extracted from the image. Intrinsic constraints encourage the model to have certain desired shape characteristics. The deformation energy (7) ensures that individual deformations at each step are smooth but may accumulate over several frames to allow creases to emerge. The result is a 3-D model suitable for use in a variety of applications. We are currently investigating the use of such model to support robust object tracking.

Acknowledgements We wish to thank M. Fa, N. E ord, S. Fletcher, D. O'Brien, A. Baumberg, S. Butter eld, and D. Ranyard for their help and discussions. X. Shen gratefully acknowledge nancial support from the Chinese government and the British Council. 14

Figure 10: Final model in di erent orientations

References [1] P. J. Besl and R. C. Jain. Three-dimensional Object Recognition. ACM Computing Surveys, 17(1):75{145, 1985. [2] R. T. Chin. Model-based Recognition in Robot Vision. ACM Computing Surveys, 18(1):67{108, 1986. [3] D. Hogg. Shape in Machine Vision. Image and Vision Computing Journal, 2(6):309{316, 1993. [4] A. D. Worrall, R. F. Marslin, G. D. Sullivan, and K. D. Baker. Model-based Tracking. In The British Machine Vision Conference, pages 311{318, Glasgow, U.K., 1991. [5] D. G. Lowe. Robust Model-based Motion Tracking Through the Integration of Search and Estimation. International Journal of Computer Vision, 8(2):113{122, 1991. 15

(1)

(2)

Figure 11: Models overlapping to the object in the image. (1) model at the frame 16; (2) model at the frame 24 [6] R. M. Bolle and B. C. Vemuri. On Three-dimensional Surface Reconstruction Methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-11(1):1{13, 1991. [7] S. S. Sinha and B. G. Schunck. A Two-Stage Algorithm for DiscontinuityPreserving Surface Reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-14(1):37{55, 1992. [8] Y. F. Wang and J. Wang. Surface Reconstruction Using Deformable Models with Interior and Boundary Constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-14(5):573{579, 1992. [9] D. Charnley and R. Blissett. Surface Reconstruction from outdoor Image Sequences. Image and Vision Computing Journal, 7(1):10{16, 1989. [10] H. Westphal and H.-H. Nagel. Toward the Derivation of Three-dimensional Descriptions from Image Sequences for Nonconvex Moving Objects. Computer Vision, Graphics and Image Processing, 34:302{320, 1986. [11] J. R. Stenstrom and C. I. Connolly. Construction Object Models from Multiple Images. International Journal of Computer Vision, 9(3):185{212, 1992.

16

[12] F. Solina and R. Bajcsy. Recovery of Parametric Models from Range Images: The Case for Superquadrics with Global Deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(1):131{147, 1990. [13] J. V. Miller, D. E. Breen, W. E. Lorensen, R. M. O'Bara, and M. J. Wozny. Geometrically Deformed Models: A Method for Extracting Closed Geometric Models from Volume Data. Computer Graphics, 25(4):217{226, 1991. [14] D. Terzopoulos, J. Platt, A. Barr, and K. Fleischer. Elastically Deformable Models. ACM Computer Graphics, 21(14):205{214, 1987. [15] D. Terzopoulos and A. Witkin. Physically-based Models with Rigid and Deformable Components. IEEE Computer Graphics and Applications, 8(6):41{51, 1988. [16] A. P. Pentland. Perceptual Organization and the Representation of Natural Form. Arti cial Intelligence, 28:193{331, 1986. [17] D. Terzopoulos and D. Metaxas. Dynamic 3D Model with Local and Global Deformations: Deformable Superquadrics. In Proceedings of the Third International Conference on Computer Vision (ICCV 90), pages 606{615, Osaka, Japan, 1990. [18] A. Pentland and S. Sclaro . Closed-form Solution for Physically Based Shape Modelling and Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-13(7):715{729, 1991. [19] D. Terzopoulos, A. Witkin, and M. Kass. Constraints on Deformable Models: Recovering 3D Shape and Nonrigid Motion. Arti cial Intelligence, 36:91{123, 1988. [20] D. Hogg, D. Young, and L-Q Xu. Statistical Regularity in Motion Sequences. Technical report, School of Computer Studies, The University of Leeds, 1993. [21] A. Baumberg and D. Hogg. Learning Flexible Models from Image Sequences, 1993. submitted to ECCV. [22] H. Goldstein. Classical Mechanics. Addison-Wesley, 1950. [23] R. Courant and D. Hilbert. Methods of Mathematical Physics II. Interscience, London, 1953. [24] A. G. and C. Liang. 3-D Model-data Correspondence and Nonrigid Deformation. In CVPR'93, 1993. 17