IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997
1255
Closed-Form Connectivity-Preserving Solutions for Motion Compensation Using 2-D Meshes Yucel Altunbasak and A. Murat Tekalp, Senior Member, IEEE
Abstract— Motion compensation using two-dimensional (2-D) mesh models requires computation of the parameters of a spatial transformation for each mesh element (patch). It is well known that the parameters of an affine (bilinear or perspective) mapping can be uniquely estimated from three (four) point correspondences (at the vertices of a triangular or quadrilateral mesh element). On the other hand, overdetermined solutions using more than the required minimum number of point correspondences provide increased robustness against correspondence-estimation errors; however, this necessitates special consideration to preserve mesh-connectivity. This paper presents closed-form, overdetermined solutions for least squares estimation of affine motion parameters for a triangular mesh, which preserve mesh-connectivity using patch-based or node-based connectivity constraints. In particular, four new algorithms are presented: patch-constrained methods using point correspondences or spatio-temporal intensity gradients, and node-constrained methods using point correspondences or spatio-temporal intensity gradients. The methods using point correspondences can be viewed as postprocessing of a dense motion field for best representation in terms of a set of irregularly spaced samples. The methods that are based on spatio-temporal intensity gradients offer closed-form solutions for direct estimation of the best node-point motion vectors (equivalently the best transformation parameters). We show that the performance of the proposed closed-form solutions are comparable to those of the alternative search-based solutions at a fraction of the computational cost. Index Terms—Closed-form least squares solution, connectivity constraints, motion compensation, texture mapping, 2-D meshbased motion representation.
I. INTRODUCTION
R
ECENT advances in low bit rate video compression have been summarized in [1] and [2] and the references therein. These techniques range from sophisticated threedimensional (3-D) object-based methods, including those using customized wireframe models to more simple and general two-dimensional (2-D) object based methods. It is generally acknowledged that 2-D object-based coding provides more flexibility and ease of implementation compared to 3-D objectbased methods. Manuscript received October 8, 1995; revised January 20, 1997. This work was supported in part by a National Science Foundation SIUCRC grant and a New York State Science and Technology Foundation grant to the Center for Electronic Imaging Systems, University of Rochester, and by a grant from Eastman Kodak Company. The associate editor coordinating the review of this manuscript and approving it for publication ws Prof. Eric Dubois. The authors are with the Department of Electrical Engineering and Center for Electronic Imaging Systems, University of Rochester, Rochester, NY 14627 USA (e-mail:
[email protected];
[email protected]). Publisher Item Identifier S 1057-7149(97)06246-5.
Ideally, 2-D motion compensation should be performed using a dense motion field. However, this is not feasible with limited bit budgets, since it requires encoding of all motion vectors. International compression standards such as H.263 and MPEG 1-2 use block motion compensation to address this problem, which suffers from blocking artifacts at low bit rates [3]. Overlapped block motion compensation [4] was proposed to reduce blocking artifacts. A promising alternative is motion compensation using 2-D mesh models. Brusewitz [5] was among the first who proposed triangle-based motion compensation, where a triangular mesh is overlaid on the image. Sullivan and Baker [6] used quadrilateral meshes for motion compensation under the name control grid interpolation. In 2-D mesh based methods, motion compensation within each mesh element (patch) is accomplished by means of a spatial transformation (affine, bilinear, etc.) whose parameters can be computed from node-point motion vectors. The spatial transformation (motion model) to be employed within a patch is related to the geometry of the patch. Typically, affine and bilinear mappings are used with triangular and quadrilateral patches, respectively. This is because given three (four) point correspondences at the respective nodepoints, the parameters of an affine (bilinear) transformation can be uniquely determined by solving a system of three (four) linear equations [7]. A challenging problem in 2-D mesh-based motion compensation is estimation of the best motion vectors at the node points such that they constitute a compact representation of the entire 2-D dense motion field (while still preserving the connectivity of the mesh). Estimation of the motion vectors at each node independently (e.g., by block-matching or deformable block matching [8]) is usually not desirable because 1) motion vectors do not represent the entire 2-D dense motion field, and 2) motion vectors may cross each other (especially around small patches) destroying the connectivity of the mesh. Recently, search-based solutions to node-point motion estimation with triangular and/or quadrilateral meshes and spatial transformations, for 2-D object-based coding, have been proposed introducing the hexagonal-search [9] and an optimization framework [10]. Because search procedures are generally computationally demanding, here we propose closed-form solutions to node-point motion estimation (or equivalently, estimation of the parameters of the spatial transformation) from an estimated dense motion field or spatiotemporal intensity gradients. The problem can then be posed as follows. Find the least squares estimates of the spatial transformation parameters for each patch, given the dense
1057–7149/97$10.00 1997 IEEE
1256
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997
motion field or spatio-temporal intensity gradients within each patch, while still preserving the connectivity of the 2-D mesh. In this paper, we derive four closed-form solutions based on two types of connectivity constraints: patch based and node based. The patch-based approach processes patches sequentially in a predetermined order. If motion vector(s) have already been assigned to one or two nodes of the current patch (while processing previous patches), they serve as constraints in the estimation of the affine parameters of the present patch. A closed-form constrained least squares solution has been derived to estimate the affine parameters. If all three nodes have already been assigned motion vectors, then they are directly used to compute the affine parameters of the patch for motion compensation. In the node-based approach, we consider the polygon enclosing a particular node, composed of triangular patches, at a time. Affine parameters of all patches within the polygon are simultaneously estimated, in closed form, to minimize a sum squared error criterion under connectivity preserving constraints. The procedure is repeated for each node. A special case of the latter approach provides a closedform solution to the hexagonal matching procedure [9]. Both approaches may be implemented using estimated dense motion fields or spatio-temporal image gradients. The resulting nodepoint motion vectors constitute a compact parameterization of the dense motion field. These constrained overdetermined solutions using more than the required minimum number of point correspondences (which could be sampled from the dense motion field at the node points) provide increased robustness against errors in point-correspondence (dense motion) estimation. The proposed methods can be used in 2-D mesh-based compression schemes, such as that described in [11]. Other applications of mesh-based motion compensation include image registration, video editing for special effects, augmented reality, and so on. Modeling choices and estimation of the required parameters are briefly reviewed in Section II. Patch-based and node-based closed-form constrained least squares solutions are derived in Sections III and IV, respectively. Experimental results to demonstrate the efficacy of the proposed methods are presented in Section V. We show that proposed methods yield results which are comparable with hexagonal searching using an order of magnitude less number of operations.
inability to reflect the scene content; i.e., a single mesh element may contain multiple motions. To this effect, hierarchical meshes are proposed, in which patches that yield high motion compensation error are successively subdivided [12], [13]. Content-based meshes aim to match boundaries of patches with important scene features [10], [14]. If a priori information about scene content is available, a 2-D knowledge-based mesh can be computed as the projection of a 3-D wireframe model [15]. Although the methods proposed in this paper apply to any 2-D mesh composed of triangular or quadrilateral elements, the following discussion assumes meshes with triangular patches. In particular, we have used a uniform mesh, a contentbased mesh, and a knowledge-based mesh. The content-based mesh is designed by selecting a fixed number of node points followed by Delaunay triangulation [14]. The knowledgebased mesh is obtained by orthographic projection of the CANDIDE model [15], which is scaled to fit the given image as described in [16]. B. Motion Modeling In this paper, the motion (displacement or velocity) field within each patch will be represented by an affine model with six parameters, , where denotes the patch index. Then, the parametric displacement vector at , the th pixel within the th patch, is given by (1) (2) whereas the parametric velocity vector is given by (3) (4) Note that the proposed node-point motion estimation methods can be extended to other parametric motion models, which are linear in the parameters. In general, the choice of the motion model is related to the geometry of patches (triangular or quadrilateral). For example, a bilinear model should be selected with quadrilateral patches [7]. C. 2-D Dense Motion Estimation
II. BACKGROUND The proposed method requires a 2-D mesh model, a parametric motion model, and estimates of either a dense motion field or spatio-temporal image gradients. Choices to obtain these parameters are reviewed in the following. We also briefly review the hexagonal matching method and its computational complexity. A. 2-D Mesh Model There are several strategies for mesh design: regular mesh, hierarchical mesh, content-based mesh, and knowledge-based mesh design. Regular meshes are obtained by dividing the image area into equal size triangular or rectangular elements, called patches [9]. A disadvantage of regular meshes is their
Two-dimensional dense motion estimation methods are classified as block-based, optical-flow equation-based, pelrecursive, and Bayesian methods [3]. Block-based motion compensation has been adopted in the international standards for digital video compression, such as H.261 and MPEG 1-2. Although, hierarchical block-based dense motion compensation yields high peak signal-to-noise-ratio (PSNR), the corresponding dense motion field usually contains several outliers (because of the aperture problem and lack of overall smoothness constraints), and thus not easily parameterizable. Optical-flow equation based and Bayesian methods yield smoother dense motion fields, which are more suitable for parameterization. Because Bayesian methods are time consuming, we have used optic-flow equation based methods, such as [17], [18]. It should be noted that the value of the
ALTUNBASAK AND TEKALP: SOLUTIONS FOR MOTION COMPENSATION USING 2-D MESHES
1257
Fig. 1. Illustration of hexagonal matching.
smoothness parameter is important since, if the motion field is oversmoothed, tracking of motion over small regions becomes impossible. D. Estimation of the Intensity Gradients Estimation of spatio-temporal intensity gradients can be accomplished by averaging finite differences or polynomial fitting [3]. The former approach estimates the spatial and temporal partials by approximating them with the average of forward and backward finite differences. Furthermore, we can compute a local average of the average differences to eliminate the effects of observation noise [18]. Spatial and temporal presmoothing of the intensity using Gaussian kernels helps the estimation. The latter approach first approximates the intensity locally by a linear combination of some loworder polynomials in and . Once the coefficients of the polynomial are estimated, the components of the gradient (partials) can be found by simple differentiation [19]. Both approaches generally yield comparable results. E. Hexagonal Matching Hexagonal matching is a two-step search procedure to optimize node-point motion vectors for affine motion compensation of each patch [9]. It consists of i) block matching at a node point to find an initial motion vector; and by iterative local ii) refinement of the motion vector at minimization of the prediction error. In the second step, the vertices of the bounding hexagon about node are fixed, and the motion vector at the node is perturbed within a search region as depicted in Fig. 1. At each local perturbation, the affine parameters for the six triangles within the bounding hexagon are recomputed, and the resulting affine motion compensation error within the hexagon is found. The perturbation which minimizes this motion compensation error is accepted as the refined motion vector at . Since the estimate at depends on the motion vectors at the fixed vertices, the procedure iterates over all node points times. Assuming that there are pixels within the bounding hexagon and ignoring the multiplies in the computation of the affine parameters, the number of multiply and add operations in
Fig. 2.
Illustration of the patch-based connectivity constraint.
hexagonal matching is in the order of point.
per node
III. PATCH-BASED METHODS Patch-based methods compute the least squares estimates of the affine parameters for each patch, hence the nodepoint motion vectors, under mesh-connectivity constraints, on a patch-by-patch basis. More specifically, the patches of the mesh are first ordered in some preferred fashion. Each patch is then processed under one of the four following cases. • Case 1: If none of the nodes have been assigned motion vectors while processing previous patches, then there are no constraints on that patch. • Cases 2 and 3: If motion vectors have already been assigned to one or two nodes of the current patch, they serve as constraints in the estimation of the affine parameters for the present patch. • Case 4: If all three nodes have assigned motion vectors, then the affine parameters of that patch can be determined uniquely from these vectors. In the following, we present two closed-form patch-based methods: from a dense motion field and spatio-temporal intensity gradients, respectively. A. Patch-Based Method Using Dense Motion Vectors The affine parameters for each patch are estimated sequentially by means of a least squares estimation procedure given
1258
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997
a set of optical flow estimates and within the respective patch. The sequential nature of the procedure is best explained by means of an example. Suppose we have a mesh with four patches as depicted in Fig. 2. Let and denote the coordinates of node one at times and , respectively. Let , and be defined similarly. At the first patch, the motion vectors for all three nodes have not been previously estimated (Case 1). Then, the affine parameters for the first patch are computed by unconstrained least squares estimation, i.e., by minimizing
Substituting (8) into (3) and (4), we get
(9) . Then, patch 2 can be processed by where solving (9) with respect to and in the least squares sense using point correspondences. In the case of patch 3, only the motion vector at node 4 is known (Case 2), so we have four free parameters. Choosing the free parameters as and , we have
(5) , where is the number of with respect to estimated flow vectors within the first patch. The solution can be obtained by solving
(10)
Substituting 10 into (3) and (4), we obtain
(11) .. .
.. .
(6)
in the least squares sense. Note that, once the affine parameters are estimated, the motion of all points within the patch, including the nodes, can be easily expressed in terms of . Next, we estimate the affine parameters for patch 2, under the constraint that the motion vectors for the nodes 2 and 3 are already known (values computed when processing patch 1), which lead to the following constraint equations:
Then, patch 2 can be processed by solving (11) with respect to and in the least squares sense using point correspondences. Next, for patch 4, observe that the motion vectors at nodes 3, 4, and 5 have already been estimated (Case 4). Then, the affine parameters for patch 4 can be computed uniquely from
(12)
The sequential constrained least-squares estimation procedure is completed when all patches are visited. (7) B. Patch-Based Method Using Intensity Gradients Since an affine transformation maps a straight line onto another straight line, the constraints (7) will ensure preserving connectivity of the mesh along the line . In this case (Case 3), there are four constraint equations and six unknowns. Therefore, there are only two free parameters, say and . The four dependent parameters can be expressed in terms of the free parameters as
Here, the affine parameters are estimated to minimize the error in the optical flow equation (OFE) [3] within each patch (without using a dense motion field). Substituting (3) and (4) into the OFE, the error in the OFE within a patch containing pixels can be expressed as
(13)
(8)
with The solution can be found by taking the partials of respect to and setting the result equal to zero. However, not all affine parameters are independent because of the mesh-connectivity constraints. We have again four cases depending on which of the nodes have already been processed. We explain the solution for each case using the same example given in the previous section (see Fig. 2).
ALTUNBASAK AND TEKALP: SOLUTIONS FOR MOTION COMPENSATION USING 2-D MESHES
At the first patch, no node-point motion vectors have been set; hence, all six affine parameters are independent (Case 1). Taking the partials of (13) with respect to , and setting the results equal to zero yields (14), shown at the bottom of the page. Once the spatio-temporal gradients at each pixel are estimated, the affine parameters can be estimated by solving (14). Next, we estimate the parameters for patch 2, given the motion vectors for the nodes 2 and 3 (Case 3). In this case, we have the constraint equations (7) and two free parameters and . Substituting the dependent parameters, expressed as in (8), into (13), taking the partial derivatives with respect to the independent parameters and , and setting equal to zero results in
1259
dependent parameters can be obtained as
(16)
In the case of patch 3, only one of the motion vectors, i.e. the motion vector for node , is known (Case 2). Then, we have four free parameters and . Following the same procedure as before, we obtain (17), shown at the bottom of the next page, where
(15) where
which can be solved for the
and
. Then,
(18)
This equation can be solved for and , given the spatiotemporal gradients at all pixels within the patch. Then, the four
Patch 4 falls in Case 4, where all motion vectors are predetermined. The affine parameters can then be uniquely estimated from (12) as before.
(14)
1260
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997
C. Order of Processing Processing of patches are prioritized such that patches where we have the highest confidence on the estimated dense motion vectors and the highest spatial image activity are processed first. The order of processing is determined according to the following criterion function: DFD
(19)
where and are positive scalars, “DFD” denotes displaced frame difference (from dense motion), and denote the variance and number of pixels in patch , respectively. The patch with the smallest is processed first. Prioritization avoids propagation of erroneous motion estimates. Fig. 3.
Polygon enclosing the node (x0 ; y0 ) for the node-based method.
IV. NODE-BASED METHODS This section proposes an alternative constrained least squares node-point motion estimation strategy: Suppose triangles meet at a particular node point as shown in Fig. 3. We find the affine parameters that best fit all estimated dense displacement vectors (or minimize the error in the optical flow equation) within the bounding polygon defined by these patches, under the constraints that they define a common motion vector at the node-point , and the affine parameters for each pair of neighboring triangles move their common nodes (corners of the bounding polygon) to the same location. Hence, the motion vectors at the vertices of the bounding hexagon as well as the center point are variables. This principal is related to that of the hexagonal matching (see Section II-E); however, there are a number of key differences: 1) we generalize hexagonal matching to polygonal matching, since in content-based meshes triangles can meet at a node point (the hexagonal matching algorithm assumes , since it only considers a regular mesh); and 2) hexagonal matching algorithm fixes the vertices of the matching hexagon (see Fig. 1), whereas we only constrain the motion vectors at the vertices such that two neighboring triangles move their common vertex to the same location (the hexagonal matching becomes a special case whereby the motion of all vertices are set equal to
zero). Finally, the fast hexagonal matching scheme (see [9]) resembles the gradient-based method presented in Section IVB, except: 1) it fixes the vertices of the bounding hexagons for each node point, and 2) it considers simultaneous estimation of all node point motion vectors to preserve mesh connectivity. However, the latter leads to very large matrix inversions. In the following, we present two node-based algorithms for local connectivity-preserving (within the bounding hexagons) motion estimation: one based on a dense motion field, another direct method using the optical flow equation. Any inconsistency between the resulting node-point motion vectors, which may occur around occlusion boundaries, is resolved by a postprocessing step discussed in Section IV-C. We elaborate on the computational complexity of these schemes in Section IV-D.
A. Node-Based Method Using Dense Motion Vectors Here, we assume that a dense displacement vector field has been estimated a priori. 1) Constraint Equations: Each affine parameter set should minimize the norm-squared error between the estimatednonparametric and parametric motion fields within the respective patch, subject to the following connectivitypreserving (within the bounding polygon) constraints.
(17)
ALTUNBASAK AND TEKALP: SOLUTIONS FOR MOTION COMPENSATION USING 2-D MESHES
1) All sets of affine parameters should yield the same motion vector at the common node , that is
1261
terms of the free variables. To this effect, we partition and
as (23)
denotes the first components of , and where denotes the free variables. Then, (20) can be rewritten as (24) 2) The affine parameter sets of two neighboring patches and should move their common node to the same position:
and we have the dependent variables in terms of the free variables as (25) 2) Constrained LSE Solution: The parameters for the component of the motion should best fit model (1) (26) for each pixel within patch in the least squares sense, where denote the estimated (nonparametric) dense displacement vectors. The model equations (26) over patches surrounding can also be expressed in vector-matrix form as
.. .
(27)
Since
the
estimation of the affine parameters are decoupled from , expressions for 1) and 2) have been provided only on the first three affine parameters, without loss of generality. These constraint equations at the node point can be expressed in vector-matrix form as (20) is an matrix, defined as (21) and where (22), shown at the bottom of the page. Observe that we have constraint equations in unknowns for the estimation of the first three affine parameters component of the motion field. Thus, we related to the designate the last components of as free variables, and solve for the remaining dependent variables in
is a matrix and is a 1 vector, and where is the number of observed dense motion vectors within the polygon. Partitioning , to match the partitions of as (28) we can express (27) as (29) , the least squares Then, substituting (25) into (29) for estimates of the independent (free) parameters can be obtained by solving (30) for each polygon given at It follows that we can estimate estimated motion vectors evenly distributed (at least least one vector from each patch) within the polygon. The dependent parameters can then be estimated from (25).
(21)
(22)
1262
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997
B. Node-Based Method Using Intensity Gradients
Continuing in this fashion, for patches
and
, we have
In this section, we consider direct estimation of the nodepoint motion vectors based on spatio-temporal image gradients. 1) Constraint Equations: Here, the constraints 1) and 2) in Section IV-A1 are restated for both the and components of the velocity vectors, since it is not possible to decouple the estimation of the six affine parameters in this case. For example, the affine parameters of patches 1 and 2 (see and Fig. 3) must move their common nodes to the same location, which can be expressed as Finally, for patches as follows:
and 1, we have only two equations,
2) Constrained LSE Solution: Here the affine parameters, and , are estimated to minimize the sum of the errors in the optical flow equation within the bounding polygon, given by
where the parameter vector is now defined as
(31) Likewise, the affine parameters of patches 2 and 3 must move and to the same location, their common nodes given by
(32) is the number of patches connected to the node and is the number of pixels in the th triangle, subject to the constraints . To this effect, we define the Lagrangian where
(33) with respect to Taking the partials of , and results equal to zero yields
, and setting the
(34) where (35)
(37)
ALTUNBASAK AND TEKALP: SOLUTIONS FOR MOTION COMPENSATION USING 2-D MESHES
The matrix
1263
is block-diagonal
(36)
is a 6 6 matrix, defined as (37), shown at the where each bottom of the previous page, and is, therefore, . The matrix is , defined by (38), shown at the bottom of the page, and is , given by (39), shown at the bottom of the next page. Further, is vector, given by (40) where
Fig. 4. Illustration of inconsistent motion vectors.
and substituting in the latter, we obtain a closed-form solution for the Lagrange multipliers, given by (46) (41) can be solved from (45). Given , the parameter vector Finally, the motion vectors at the node point and all corner points of the bounding polygon can be estimated from the computed affine parameter set .
and (42) Then, (34) can be expressed as (43) (44) Writing the former expression for
we have (45)
C. Postprocessing for Mesh Connectivity Because the employed constraints enforce connectivity only within a bounding polygon, a final postprocessing is needed to ensure the connectivity of the mesh structure, once motion estimation is performed at all node points. The resulting nodepoint motion vectors are said to be consistent if motion vectors at all node points do not result in overlapping mesh elements. The requirement for the consistency of the node-point motion vectors is illustrated in Fig. 4, where the node in frame
(38)
1264
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997
is connected with the nodes , and . The motion vector at the node must propagate it to within the polygon formed by the motion-compensated nodes , and in frame propagated by their motion vectors. Should this condition be violated at any node point, the estimated motion vector is replaced by a motion vector that is interpolated from those of the surrounding nodes, through the following procedure. [see (19)] for each node point, where the 1) Compute summation of the DFD in (19) is now performed over the bounding hexagon. Order the nodes from highest to lowest. This enables resolution of conflicts at nodes with lowest confidence motion vectors first in favor of those nodes with more reliable motion vectors. 2) Scan the nodes in the order determined in Step 1) to detect nodes with inconsistent motion vectors as follows: At each node , a) Find all the nodes connected to node , and label them as , where is the number of nodes connected to . b) Find the motion compensated node locations in frame using the motion vectors at the nodes . Form the polygon defined by .
c) Motion compensate the node to find . If is inside the polygon, go to next node in the order. Otherwise, go to Step 3). 3) Interpolate the motion of the node from its neighbors as follows: (47)
(48) where is the node-motion vector at the node , and is the distance between the node and node . Alternatively, a local search may be conducted to estimate the motion vectors at such nodes. Our experiments indicate that this condition rarely occurs. D. Computational Complexity Node-based methods require inversion of two matrices, which are and , respectively, at each node point, where denotes the number of triangles meeting at that node point (typically six to eight). This remains negligible compared to the cost of estimating the dense motion field or spatio-temporal gradients at each pixel. For example, dense motion estimation using the method of Lucas–Kanade (based on a block about each pixel) requires on the
(39)
ALTUNBASAK AND TEKALP: SOLUTIONS FOR MOTION COMPENSATION USING 2-D MESHES
1265
(a)
(b)
(c)
(d)
Fig. 5. (a) Regular mesh with 225 nodes. (b) Content-based mesh with 155 nodes. (c) Content-based mesh with 225 nodes. (d) Projection of the CANDIDE wireframe model.
AVERAGE
OF
TABLE I PAIRWISE MOTION COMPENSATION PSNR OVER FRAMES 1–30
order of operations per node point, where is the number of pixels within the polygon bounding the node point. While this cost is significantly smaller than that of hexagonal matching (see Section V for a comparison), it can be further reduced by estimating a subsampled motion field (reduces the value of ). The computational cost of the postprocessing step is also negligible. V. RESULTS Experiments have been conducted the Miss America sequence (using i.e., 1–4–7– –31) to compare the proposed closed-form solutions with
on frames 1–31 of every third frame, performance of the those obtained by
OF THE
MISS AMERICA SEQUENCE
the hexagonal search (Hex-S.) algorithm [9] and by using independently estimated node-point motion vectors. The independently estimated motion vectors are sampled (at the node points) from a dense motion field estimated by a three-stage Lucas–Kanade (LK) estimator. At each stage, the LK algorithm is applied to a motioncompensated sequence using motion vectors from the previous 19, stage, where the window sizes are set equal to 19 15, and 11 11, in stages 1, 2, and 3, respectively. 15 Spatio-temporal intensity gradients have been estimated using the finite difference method as described in [18], where the sequence is blurred spatially using a Gaussian point spread function with the variance 2.5 pixels.
1266
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997
Fig. 6.
Candide mesh with 155 nodes.
Fig. 7.
Regular mesh with 150 nodes.
Because the motion compensation results depend on the type of mesh model employed, we have tested the proposed methods using five different mesh models: two regular meshes with 150 and 225 nodes (obtained by dividing each frame into equal size triangular patches), two content-based meshes with 155 and 225 nodes, designed for each pair of frames by using the method described in [14], and a knowledge-based mesh obtained by projecting the 3-D CANDIDE wireframe model into 2-D and fitting it to each frame as described in [14]. The projected CANDIDE mesh has 155 nodes. Four of the five meshes overlaid on the first frame of Miss America are depicted in Fig. 5(a)–(d). Then, each of the following six experiments were carried out using all five mesh models, which are independently designed for each frame or pair of frames, where the node-point motion vectors were computed by the following experiments: 1) sampling node motion vectors from the dense LK motion field (LK);
2) hexagonal-search algorithm [9] (Hex-S); 3) patch-based method with dense motion vectors (patchmotion); 4) patch-based method with intensity gradients (patchgradient); 5) node-based method with dense motion vectors (Nodemotion); 6) node-based method with intensity gradients (Nodegradient). The parameters of the node ordering criterion (19) was set and . We performed pairwise motion as compensation (that is, frame 4 is compensated using original frame 1, frame 7 is compensated using original frame 4, and so on). In all cases, the motion vectors were quantized to 0.25 pixel accuracy before motion compensation. The averages of the pairwise motion compensation PSNR improvement over the ten processed frames in each case are listed in Table I. where The model failure regions, which are defined as
ALTUNBASAK AND TEKALP: SOLUTIONS FOR MOTION COMPENSATION USING 2-D MESHES
Fig. 8.
Regular mesh with 225 nodes.
Fig. 9.
Content-based mesh with 155 nodes.
Fig. 10.
Content-based mesh with 225 nodes.
1267
1268
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997
COMPARISON
OF
TABLE II NUMBER OF OPERATIONS
PER
NODE POINT
yields 37.31 dB improvement, whereas placing them as a 10 15 matrix yields 36.66 dB improvement. REFERENCES
DFD is greater than a model-failure threshold (set equal to six), have been excluded in the PSNR calculations. We have verified that the model failure regions are limited to the vicinity of the mouth and eyes where there are covered and uncovered regions. The graphs of pairwise motion compensation PSNR improvement versus the frame index for each mesh model (with all six methods) are shown in Figs. 6–10. A comparison of the computational complexity of the proposed closed-form solutions with hexagonal matching is shown in Table II. It is assumed that the search range for hexagonal matching is , the motion vectors are estimated at 0.25 pixel accuracy, there are pixels within the bounding hexagon, and iterations are performed. The block size for LK dense motion estimation is taken as . Inspection of Figs. 6–10 and Table II indicate that closed-form solutions for polygon-based node motion estimation perform almost as good as search-based methods [9] at a fraction of the computational cost. VI. CONCLUSION We proposed four new closed-form overdetermined solutions for the estimation of the mapping parameters (equivalently node-point motion vectors) subject to mesh connectivity constraints. The methods that are based on the dense motion estimates can be viewed as postprocessing of the dense motion field for a compact representation in terms of irregularly spaced samples. The methods that are based on spatio-temporal intensity gradients offer closed-form solutions for direct estimation of the best node-point motion vectors. Close examination of the results indicates the following. 1) Postprocessing of the dense motion field improves motion compensation PSNR. 2) Node-based methods, in general, perform better than patch-based methods. 3) Estimation using a dense motion field or spatio-temporal gradients performs equally well on sequences with small interframe motion. 4) In the presence of large interframe motion and/or low texture patches, the spatio-temporal gradient-based method may exhibit numerical accuracy problems. The method based on dense point correspondences can handle both of these problems if point correspondences are computed by hierarchical block-matching or hierarchical flow estimation. 5) The performance of the methods depends on the mesh design To expand on the last point, the LK method is subject to a certain degree of randomness even with the same number of nodes. For example, placing 150 nodes as a 15 10 matrix
[1] H. Li, A. Lundmark, and R. Forchheimer, “Image sequence coding at very low bitrates: A review,” IEEE Trans. Image Processing, vol. 3, pp. 589–609, Sept. 1994. [2] K. Aizawa and T. S. Huang, “Model-based image coding: Advanced video coding techniques for very low bit-rate applications,” in Proc. IEEE, vol. 83, pp. 259–271, Feb. 1995. [3] A. M. Tekalp, Digital Video Processing. Englewood Cliffs, NJ: Prentice-Hall, 1995. [4] M. Orchard and G. Sullivan, “Overlapped block motion compensation: An estimation-theoretic approach,” IEEE Trans. Image Processing, vol. 3, pp. 693–699, Sept. 1994. [5] H. Brusewitz, “Motion compensation with triangles,” in Proc. 3rd Int. Conf. 64-kbit Coding of Moving Video, Rotterdam, The Netherlands, Sept. 1990. [6] G. J. Sullivan and R. L. Baker, “Motion compensation for video compression using control grid interpolation,” in Proc. ICASSP’91, Toronto, Canada, 1991, pp. 2713–2716. [7] G. Wolberg, Digital Image Warping. Los Alamitos, CA: IEEE Comput. Soc. Press, 1990. [8] V. Seferidis and M. Ghanbari, “General approach to block matching motion estimation,” Opt. Eng., pp. 1464–1474, 1993. [9] Y. Nakaya and H. Harashima, “Motion compensation based on spatial transformations,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp. 339–356, June 1994. [10] Y. Wang and O. Lee, “Active mesh—A feature seeking and tracking image sequence representation scheme,” IEEE Trans. Image Processing, vol. 3, pp. 610–624, Sept. 1994. [11] Y. Altunbasak, A. M. Tekalp, and G. Bozdagi, “Two-dimensional object-based coding using a content-based mesh and affine motion parameterization,” in Proc. IEEE Int. Conf. Image Processing, Washington, DC, Oct. 1995. [12] J. Flusser, “An adaptive method for image registration,” Pattern Recognit., vol. 25, pp. 45–54, 1992. [13] C. L. Huang and C. Y. Hsu, “A new motion compensation method for image sequence coding using hierarchical grid interpolation,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp. 72–85, 1994. [14] Y. Altunbasak and A. M. Tekalp, “Content-based mesh design for 2-D object-based video coding,” in Proc. Symp. Multimedia Communications and Video Coding, New York, NY, Oct. 1995; also in Multimedia Communications and Video Coding, Y. Wang, S. Panwar, S.-P. Kim, and H. L. Bertoni, Eds. New York: Plenum, 1996. [15] M. Rydfalk, “CANDIDE: A parametrised face,” Dept. Electr. Eng. Rep. LiTH-ISY-I-0866, Link¨oping Univ., Link¨oping, Sweden, Oct. 1987. [16] K. Aizawa, H. Harashima, and T. Saito, “Model-based analysis-synthesis image coding (MBASIC) system for a person’s face,” Signal Proc.: Image Commun., vol. 1, pp. 139–152, 1989. [17] B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in Proc. DARPA Image Understanding Workshop, 1981, pp. 121–130. [18] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artif. Intell., vol. 17, pp. 185–203, 1981. [19] J. S. Lim, Digital Image Processing, Englewood Cliffs, NJ: PrenticeHall, 1990.
Yucel Altunbasak was born in Kayseri, Turkey, in 1971. He received the B.S. degree from Electrical Engineering Department, Bilkent University, Ankara, Turkey, in 1992 (with highest honors). He received the M.S. and Ph.D. degrees from the Electrical Engineering Department, University of Rochester, Rochester, NY, in 1993 and 1996, respectively. He joined Hewlett-Packard Laboratories, Palo Alto, CA, in July 1996. He has performed research in the areas of object-based video representation, video compression, visual communication, video indexing/annotation, multimedia applications, 2-D image/video analysis, stereo image processing, and computer vision. Dr. Altunbasak has published over 20 journal and conference publications and proposals to MPEG-4, and has two patent applications.
ALTUNBASAK AND TEKALP: SOLUTIONS FOR MOTION COMPENSATION USING 2-D MESHES
A. Murat Tekalp (S’80–M’82–SM’91) received the B.S. degrees in electrical engineering and in mathematics from Bo˘gazi¸ci University, Istanbul, Turkey, in 1980 (with highest honors), and the M.S. and Ph.D. degrees in electrical, computer, and systems engineering from Rensselaer Polytechnic Institute (RPI), Troy, NY, in 1982 and 1984, respectively. From December 1984 to August 1987, he was a research scientist, and then a senior research scientist at Eastman Kodak Company, Rochester, NY. He joined the Electrical Engineering Department, University of Rochester, as an Assistant Professor in September 1987, where he is currently a Professor. His current research interests are in the areas of digital image and video processing, including image restoration and reconstruction, object-based image/video editing and coding, object-tracking, and image/video indexing for digital libraries. Dr. Tekalp received the NSF Research Initiation Award in 1988, and IEEE Rochester Section Awards in 1989, 1992, and 1994. He has served as an Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING (1990–1992), IEEE TRANSACTIONS ON IMAGE PROCESSING (1994–1996), and the Journal on Multidimensional Systems and Signal Processing (1994–1996). He has been the Technical Program Chair for the IEEE Signal Processing Society MDSP Workshop (1991), Special Sessions Chair for the IEEE International Conference on Image Processing (1995), and the organizer and first Chairman of the Rochester Chapter of the IEEE Signal Processing Society. He has served as the Chair of the IEEE Rochester Section in the 1994–1995 term of office. At present, he is the Chair of the IEEE Signal Processing Society Technical Committee on Image and Multidimensional Signal Processing, and a member of the IEEE Computer Society Technical Committee on Computer Vision and Image Processing. He is also on the editorial boards of academic journals Graphical Models and Image Processing and Visual Communications and Image Representation. He authored Digital Video Processing (Englewood Cliffs, NJ: Prentice-Hall, 1995). He is a member of Sigma Xi.
1269