VI. SUMMARY

1 downloads 0 Views 366KB Size Report
interpolating quadratic and Catmull–Rom cubic have similar stopband responses, the .... segment. The segmentation field is updated to yield the minimum-norm.
1326

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997

In the stopband, a better performance is indicated by a function that stays closest to zero. Nearest-neighbor, thus, has appalling performance, with the other B-splines (linear, quadratic, and cubic) getting progressively better. The approximating cubic B-spline has the best stopband performance of any function considered. The interpolating quadratic and Catmull–Rom cubic have similar stopband responses, the Catmull–Rom’s being slightly better. D. Summary of Evaluation The best passband performance of the six functions is shown by the Catmull–Rom cubic, followed by the interpolating quadratic. The best stopband performance is shown by the approximating cubic B-spline, followed by the Catmull-Rom. The approximating cubic B-spline has, however, appalling passband performance. The interpolating quadratic has average stopband performance. The interpolating quadratic thus has a frequency response better than the linear but not as good as the Catmull–Rom cubic. The surprising result of the visual analysis is that the interpolating quadratic produces a visual result that is very nearly as good as the Catmull–Rom cubic’s. The Catmull-Rom cubic or Mitchell and Netravali’s subjectively best cubic [1] both produce slightly better visual results than this quadratic, and so would probably be used in preference in an application where quality is the overriding issue. In time-critical applications, however, the faster computation time of the quadratic over the cubic could offset the slight loss in quality that may result. Linear interpolation, while even faster, would represent a much greater loss of quality. A trial across nine two-dimensional resampling operations showed that a quadratic took 55–63% of the time of a cubic for similar visual quality, while a linear interpolation took 25–36% of the time of a cubic for significantly degraded visual quality.

[9] N. A. Dodgson, Image Resampling, Tech. Rep. 261, Comput. Lab., Univ. Cambridge, Cambridge, U.K, Aug, 1992. [10] W. F. Schreiber and D. E. Troxel, “Transformation between continuous and discrete representations of images: a perceptual approach,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-7, pp. 178–186, Mar. 1985. [11] S. E. Reichenbach and S. K. Park, “Two-parameter cubic convolution for image reconstruction,” in Proc. SPIE, vol. 1199, pp. 833–840, 1989. [12] R. W. Schafer and L. R. Rabiner, “A digital signal processing approach to interpolation,” Proc. IEEE, vol. 61, pp. 692–702, June 1973. [13] W. K Pratt, Digital Image Processing. New York: Wiley, 1978. [14] J. A. Brewer and D. C. Anderson, “Visual interaction with Overhauser curves and surfaces,” Comput. Graph., vol. 11, pp. 132–137, Summer 1977. [15] E. Catmull and R. Rom, “A class of local interpolating splines,” in Computer Aided Geometric Design, R. E. Barnhill and R. F. Riesenfeld, Eds. New York: Academic, 1974, pp. 317–326. [16] J. P. Oakley and M. J. Cunningham, “A function space model for digital image sampling and its application in image reconstruction,” Comput. Graph. Image Processing, vol. 49, pp. 171–197, Feb, 1990. [17] M. Unser, A. Aldroubi and M. Eden, “Fast B-spline transforms for continuous image representation and interpolation,”IEEE Trans. Pattern Anal. Machine Intell., vol. 13, pp. 277–285, Mar. 1991. [18] E. T. Whittaker, “On the functions which are represented by the expansions of the interpolation theory,” in Proc. R. Soc. Edinb., vol. 35, pt. 2, pp. 181–194, July 1915. [19] C. E. Shannon, “Communication in the presence of noise,” Proc. IRE, vol. 37, pp. 10–21, Jan. 1949.

Simultaneous Motion Estimation and Segmentation Michael M. Chang, A. Murat Tekalp, and M. Ibrahim Sezan

VI. SUMMARY We have shown that linear-phase, piecewise quadratics do exist and can be used for image resampling. We have derived two potentially useful quadratic functions. One of these (the interpolating quadratic) has visual quality approaching that of the Catmull–Rom cubic, but it requires only 60% of the computation time of the cubic. This could make it useful in applications where speed and quality must be traded off against one another. REFERENCES [1] D. P. Mitchell and A. N. Netravali, “Reconstruction filters in computer graphics,” Comput. Graph., vol. 22, Aug, 1988, pp. 221–288. [2] H. S. Hou and H. C. Andrews, “Cubic splines for image interpolation and digital filtering,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, pp. 508–517, Dec, 1978. [3] R. G. Keys, “Cubic convolution interpolation for digital image processing,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 1153–1160, Dec. 1981. [4] S. K. Park and R. A. Schowengerdt, “Image reconstruction by parametric cubic convolution,” Comput. Vis., Graph. Image Processing, vol. 23, pp. 258–272, Sept. 1983. [5] J. A. Parker, R. V. Kenyon, and D. E. Troxel, “Comparison of interpolation methods for image resampling,” IEEE Trans. Med. Imag., vol. 2, Mar, 1983, pp. 31–39. [6] E. Maeland, “On the comparison of the interpolation methods,” IEEE Trans. Med. Imag., vol. 7, no. 3, pp. 213–217, Sept. 1988. [7] P. S. Heckbert, “Fundamentals of texture mapping and image warping,” Rep. UCB/CSD 89/516, Comput. Sci. Div., Univ. Calif., Berkeley, June 1989. [8] G. Wolberg, Digital Image Warping. Los Alamitos, CA: IEEE Comput. Soc. Press, 1990.

Abstract— We present a Bayesian framework that combines motion (optical flow) estimation and segmentation based on a representation of the motion field as the sum of a parametric field and a residual field. The parameters describing the parametric component are found by a least squares procedure given the best estimates of the motion and segmentation fields. The motion field is updated by estimating the minimum-norm residual field given the best estimate of the parametric field, under the constraint that motion field be smooth within each segment. The segmentation field is updated to yield the minimum-norm residual field given the best estimate of the motion field, using Gibbsian priors. The solution to successive optimization problems are obtained using the highest confidence first (HCF) or iterated conditional mode (ICM) optimization methods. Experimental results on real video are shown. Index Terms—Bayesian methods, motion estimation, motion segmentation, parametric motion models.

Manuscript received April 12, 1995; revised November 1, 1996. This work was supported in part by a National Science Foundation SIUCRC grant, a New York State Science and Technology Foundation grant to the Center for Electronic Imaging Systems, University of Rochester, and a grant by Eastman Kodak Company. M. M. Chang was with the Department of Electrical Engineering, University of Rochester, Rochester, NY 14627 USA. He is now with San Diego Imaging Operation, Hewlett-Packard Company, San Diego, CA 92127 USA. A. M. Tekalp is with the Department of Electrical Engineering and Center for Electronic Imaging Systems, University of Rochester, Rochester, NY 14627 USA (e-mail: [email protected]). M. I. Sezan was with the Electronic Imaging Research Labs, Eastman Kodak Company, Rochester, NY 14627 USA. He is now with Sharp Laboratories of America, Camas, WA 98607 USA. Publisher Item Identifier S 1057-7149(97)06240-4.

1057–7149/97$10.00  1997 IEEE

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997

(a)

(b)

(c) Fig. 1. Synthetic sequence. (a) Initial frame. (b) Search frame. (c) Ideal parametric field.

I. INTRODUCTION Robust motion estimation and segmentation are fundamental to such applications as multiple object tracking, object-based video compression, and machine vision. Motion estimation establishes correspondence between two consecutive frames on a pixel or subpixel level. Motion segmentation refers to grouping together pixels that belong to independently moving objects in the scene. Clearly, motion estimation and segmentation are interrelated, since good motion segmentation requires good motion estimation, and vice versa. In recognition of this fact, this paper proposes a novel Bayesian approach to simultaneous motion estimation and segmentation based on modeling the two-dimensional (2-D) motion field as the sum of a parametric field and a residual field. Almost all optical flow estimation methods combine the so-called optical flow constraint (that the intensity of a particular pixel in the image remains constant along the motion trajectory) with a proper regularization constraint. A common regularization technique is to impose a global smoothness constraint among the flow vector estimates, either in a deterministic form as proposed by Horn and Schunck [1] or in a stochastic form implemented by Gibbs random fields. However, a global smoothness constraint ignores the motion discontinuities between independently moving objects, thus blurring the flow vectors around object boundaries. To this effect, the orientedsmoothness constraint [2] and line fields [3], [4] have been introduced for the deterministic and Bayesian methods, respectively. However, the computational complexity of motion estimation increases significantly with the inclusion of the line fields. As a simpler

1327

alternative, Iu [5] proposed an outlier-rejection algorithm that relaxes the smoothness constraint at object boundaries. Yet another alternative is to introduce a segmentation label field, such as in [6] and [7]. Whereas a line field is defined on the dual lattice (whose sites are located between pixel sites) to model discontinuities, a segmentation field occupies the same lattice as the pixel sites, and models similarity between motion vectors. However, Stiller’s algorithm [6] performs segmentation using the motion vectors rather than a parametric description of them; thus, it does not handle motion segmentation in the presence of rotations and scaling well. Various approaches to motion segmentation exist in the literature, including dominant motion analysis [8]–[10], clustering [11], and Bayesian methods [12], which are based on parametric descriptions of the motion field. A Bayesian method has been proposed by Murray and Buxton [12], which uses Gibbsian priors to encourage connectivity among the segmentation labels to avoid small, isolated regions. There appear to be two major problems with all of the above methods: i) their performance is limited by the accuracy of the 2D motion estimation, which is itself an ill-posed problem, and ii) ambiguities may arise in parameter estimation using small windows (e.g., rotation may be well approximated by translation over a small aperture). In order to overcome the aforementioned limitations of existing methods, we propose a combined motion estimation and segmentation method within a Bayesian framework, based on modeling the motion field as the sum of a parametric field and a nonparametric residual field. There are several novelties offered by our simultaneous Bayesian approach, as follows. 1) A piecewise-smooth motion vector field is obtained without introducing line fields, which results in simpler algorithms. The segmentation is defined on the basis of a parametric model, providing physically meaningful regions as compared to [6]. 2) The segmentation map defines the parametric part of the motion field. We impose an additional constraint on the optical flow estimation by searching for the minimum norm residual motion field. 3) The interdependence between the optical flow estimates and the segmentation map has been reinforced iteratively within a Bayesian framework in a mutually beneficial manner. The proposed modeling and algorithm has been developed independently of Hsu et al. [13] and Stiller [6] as evidenced in [7]. Other work on simultaneous motion estimation and segmentation includes a successive refinement approach reported by Cloutier et al. [14]. Section II presents the formulation of our simultaneous approach. Issues related to the implementation of the method are discussed in Section III. Experimental results including comparisons with the existing methods are presented in Section IV. II. THEORY This section first describes our motion field model, and then presents the problem formulation. A. Motion Field Model Let d(m; n) = [u(m; n); v (m; n)] denote a 2-D motion vector from the current frame to the search frame at the pixel site (m; n); and u and v denote vectors obtained by lexicographic ordering of u(m; n) and v (m; n), respectively. We assume that there are K independently moving, opaque objects, and that a segmentation label x(m; n) assigns each motion vector d(m; n) to one of the K classes. Let the motion of each object be approximated by a parametric mapping, , such as an eight-parameter perspective or bilinear mapping or a six-parameter affine mapping [15]. Then, we

8

1328

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997

(a)

(b)

(c) Fig. 2.

Motion estimation by (a) global smoothness, (b) Stiller, and (c) proposed method.

have

, v , and

u

d (m; n)

=

dp (m; n)

+ dr (m;

n)

(1)

given

x

gk

and

p(u; v ; x jg gk ; g k

gk

01

can be expressed as

01 ) = 01 )p(u; v jxx; g 01 )p(xjgg 01 ) p(g jg g 01 )

p(g k ju u; v ; x ; g k

where dp (m; n) denotes the parametric component of the motion vector, which clearly depends on the segmentation label x(m; n), and dr (m; n) denotes the residual motion vector at the site (m; n). In the following, up , v p , and ur , v r will denote the lexicographic ordering of the x and y components of the parametric and residual motion fields, respectively.

(^ u; v ^; x ^) = max

Given two frames g k (current frame) and g k01 (search frame), we wish to compute the maximum a posteriori probability (MAP) estimate of the motion field, u and v , and the segmentation field x [composed of lexicographic ordering of the labels x(m; n)]. Using the Bayes rule, the a posteriori probability density function (pdf) of

(2)

k

where the denominator is constant with respect to the unknowns. Thus, the MAP estimates are

u; v ; x

B. Problem Formulation

k

k

k

u; v ; x ; g k p(g k ju

01 )p(u; v jxx; g 01 )p(xjgg 01 ): k

k

8

(3)

for each The least squares estimates of the mapping parameters class can be computed in closed-form given the best estimates of the ^, v ^, and the segmentation field x ^. motion field u The conditional probability density function (pdf) p(g k ju u; v ; x ; g k01 ) quantifies how well the motion and segmentation estimates fit the given frames. It is modeled by

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997

1329

model this pdf also by a Gibbs distribution with the potential function

U2 (u; v jx) =

kd(m; n) 0 d (m; n)k

2

p

(m; n)

kd(m; n) 2N 0 d(i; j )k2[x(m; n) 0 x(i; j )] (6) is the set of neighbors of the site (m; n). The first term +

(m; n) (i; j )

(a)

and Nm; n in (6) calls for a minimum norm estimate of the residual motion field ur = u 0 up and v r = v 0 v p . That is, it aims to minimize the deviation of the motion field u and v from the parametric motion field up and v p , while u and v minimize the DFD. Note that the parametric components up and v p are functions of x and the mapping parameters , which is in turn a function of u, v , and x. The second term in (6) is a piecewise smoothness constraint of the 2-D motion estimates, imposed only when the neighboring pixels share the same segmentation label. The scalars and control the emphasis of these two terms. Neglecting the dependence on the search frame g k01 , the third term represents the a priori probability of the segmentation. In order to encourage formation of contiguous regions, it is modeled by a Gibbs distribution with the potential function

(b)

8

U3 (x) =

V2 [x(m; n); x(i; j )]

2N

(7)

(m; n) (i; j )

(c)

where

(d)

Fig. 3. Segmentation estimates. (a) Murray–Buxton. (b) Wang–Adelson. (c) Stiller. (d) Proposed method.

controls the relative emphasis of this term in (2), and

V2 [x(m; n); x(i; j )] =

0

1; +1;

if x(m; n) = x(i; j ) otherwise

(8)

denotes two-pixel clique potentials. Based on (4), (6), and (7), the a posteriori pdf (2) can be rewritten as

p(u; v ; xjg k ; g k01 ) / exp f0U1 (g k ju ; v ; x ; g k01 ) 0 U2 (u ; v jx ) 0 U3 (x)g:

(a)

(9)

Substituting U1 (1), U2 (1), and U3 (1) into (9), maximization of the a posteriori pdf p(u; v ; xjg k ; g k01 ) is equivalent to minimizing the energy function

2 (m; n) + kd(m; n) 0 dp (m; n)k2

Eu; v ; x = (m; n)

(b)

+

2N

kd(m; n) 0 d(i; j )k

2

(i; j )

1[x(m; n) 0 x(i; j )] Fig. 4. Mobile and calendar sequence. (a) Seventh frame. (b) Eighth frame.

+

2N

V2 [x(m; n); x(i; j )]

(10)

(i; j )

a Gibbs distribution with the potential function

U1 (g k ju; v ; x; g k01 ) =

(m; n)2

where the individual terms have been defined in (4) and (6)–(8). (4)

(m; n)

where

(m; n) = jgk (m; n) 0 gk01 [m + u(m; n); n + v(m; n)]j

(5)

is the displaced frame difference (DFD). This probability density is maximized when the flow field minimizes the DFD function, indicating that accurate optical flow estimates are obtained. The second term in the numerator in (2) is the conditional pdf of the displacement field given the motion segmentation and the search frame. Neglecting the dependence on the search frame g k01 , we

C. Relationship with Existing Approaches Several existing motion analysis algorithms can be formulated as special cases of the proposed framework. If we retain only the first and third terms in (10), and assume that all sites possess the same segmentation label, i.e.,  [x(m; n) 0 x(i; j )] = 1 for all (m; n) and (i; j ), we have Bayesian motion estimation with a global smoothness constraint [15]. The motion estimation algorithm proposed by Iu [5] utilizes the same two terms, but replaces the  (1) function by a local outlier rejection function. The algorithm proposed by Stiller [6] involves the first, third, and the fourth terms in (10), and yields an estimated motion field u, v , and a region label field x. However, the segmentation labels in Stiller’s algorithm are used merely as

1330

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997

tokens to allow for a piecewise smoothness constraint on the flow field, since the second term in (10), involving mapping parameters, is not present in [6]. On the other hand, the motion segmentation algorithm proposed by Murray and Buxton [12] employs only the second and fourth terms in (10) to model the conditional and prior pdf, respectively. Wang and Adelson [11] rely on the second term to compute the motion segmentation. However, they take the DFD of the parametric motion vectors into consideration if this term exceeds a threshold. III. IMPLEMENTATION The MAP estimates are obtained by minimizing the energy function (10). In general, this energy function is nonconvex due to the nature of the problem. Furthermore, the parametric motion model introduces region-based interactions, which are nonlocal in the MRF neighborhood sense. Therefore, we employ suboptimal deterministic optimization schemes, such as the iterated conditional modes (ICM) [15] and highest confidence first (HCF) [16] methods (with lesser computational requirement), as opposed to asymptotically optimal stochastic optimization algorithms, such as simulated annealing and Gibbs sampling [15], which rely on MRF neighborhood interactions for convergence.

(a)

A. Optimization Direct minimization of (10) with respect to all unknowns is an exceedingly difficult problem. To this effect, we perform the minimization of (10) by iterating over the following two steps. 1) Update the motion field u , v , given the best estimate of the segmentation field, x. This step involves minimization of a new energy function

Eu; v =

(b)

2 (m; n) (m; n)

kd(m; n) 0 d (m; n)k

+

2

p

(m; n)

+ (m; n) (i; j )

kd(m; n) 0 d(i; j )k

2N

1  [x(m; n) 0 x(i; j )]

2

(11)

which contains all terms in Eu; v ; x (10) that depend on u and v . The first term in (11) measures how well u, v agree with the two observed frames g k and g k01 . The second term enforces the minimum-norm criterion on the residual field ur , v r , given the best estimate of the parametric field up , v p . The third term models the piecewise smoothness of the motion vectors within each region. To minimize this energy function, we employ the HCF method [16]. We have observed that HCF provides more robust results compared to other methods such as ICM. 2) Update the segmentation x, assuming the motion field u, v is given. This step involves all the terms that contain x as well as dp , the parameter-based flow field. Let’s define the new energy function

kd(m; n) 0 d (m; n)k

Ex =

p

(m; n)

2

kd(m; n) 0 d(i; j )k

+

(c) Fig. 5. Motion estimation by (a) global smoothness, (b) Stiller, and (c) proposed method.

segmentation field should be updated such that the norm of the residual motion field is minimum given u and v . The second term is related to the prior probability of the configuration of the segmentation labels. The optimization of Ex is also carried out using the HCF method. The mapping parameters are updated by least squares estimation at the conclusion of the segmentation step. In order to improve the mapping parameter estimation, rejection of outliers in the optical flow estimates has been implemented as in [11], where flow vectors that deviate significantly from the mapping parameters are not used in mapping parameter estimation.

8

2

2N 1 [x(m; n) 0 x(i; j )] V2 [x(m; n); x(i; j )]: + ( )( )2N

B. Parameter Determination

(i; j )

m; n

(12)

i; j

The first term again imposes the minimum norm of the residual constraint; however, this time u, v are given. Because up and v p are dependent on the segmentation field, this implies that the

There are three free parameters in the proposed algorithm, ; , and , which control the weight of the second, third, and the fourth terms relative to the first term in (10), respectively. The determination of these parameters is a design problem. For example, if we have a scene that can be well approximated by K planar surfaces, then the resulting motion field can be well modeled by K sets of mapping parameters. In this case, the second term in (10)

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997

1331

(a)

(b)

(c)

(d)

Fig. 6. Segmentation estimates. (a) Murray–Buxton. (b) Wang–Adelson. (c) Stiller. (d) Proposed method.

can be emphasized by choosing large, provided that the value of K can be correctly estimated. Otherwise, a smaller value for may be more appropriate. In the following, we employed the following procedure for parameter selection: Given the initial estimates of the motion and segmentation fields, compute the mapping parameters for each segment. (Initialization of the algorithm will be discussed in the next section.) Then, compute the initial values of the four terms in (10), which provides an indication of the goodness of fit of the parametric models. Since the optimization is implemented in two steps as described in Section III-A, first choose and to equalize the contributions of the three terms in (11). Then, select the value of to equalize the contributions of the three terms in (12). It is generally desirable to select 1  =  5, depending on how well the motion field conforms to a piecewise-parametric model. It is important to note that the highest sensitivity is expected with variations in . If is too small, isolated regions may appear. On the other hand, if is too large, oversmoothing may have adverse effects on the parametric component of the motion field. One may therefore consider to start at = = 5 and decrease it as the iterations progress.

8

C. Initialization The initial motion estimates are obtained by using Bayesian motion estimation with a global smoothness constraint [15], whose performance is comparable to that of the Horn–Schunck algorithm [1]. Given the initial motion field, a complete procedure for initialization of the segmentation field has been proposed in [11]. This procedure starts by dividing the image into small blocks. A set of affine parameters is estimated for each block B. The affine models in blocks with acceptably small residuals residual = (m; n)

2B

kd(m; n) 0 d (m; n)k p

2

(13)

are treated as candidate models. To determine the appropriate number of seed models, hence the regions, the affine parameters of the candidates are clustered in the six-dimensional parameter space under a distance measure. This results in K regions, each with an initial set of affine parameters. IV. EXPERIMENTAL RESULTS This section demonstrates the performance of the proposed algorithm and compares it with some other Bayesian approaches. The

experiments have been performed on a synthetic sequence where the motion and the region boundaries are known, and two real sequences, mobile and calendar, and salesman. Our parametric motion model is a six-parameter affine model.

8

A. Synthetic Sequence Two frames of the synthetic sequence that we used in our examples are shown in Fig. 1(a) and (b). There are four independently moving segments in this image: the top left portion and the bottom portion follow different rotations, and the two parts of the top right portion undergo opposing translations. The ideal (parametric) motion field between these two frames is depicted in Fig. 1(c), illustrating the magnitude and direction of the flow vectors. In this example, we set = = 10, and = = 5 since the motion field is clearly piecewise-parametric. When a global smoothness constraint is applied in the Bayesian estimation, shown in Fig. 2(a), the flow vectors are coherent but the boundaries between each segment are blurred, such as the boundary between the two regions in the top right corner. The result of Stiller’s algorithm is shown in Fig. 2(b), and that of our proposed algorithm is shown in Fig. 2(c) [both initialized with the estimate shown in Fig. 2(a)]. These results are rather similar, and are significantly closer to the ideal flow field in Fig. 1(c), suggesting that a constraint based on segmentation is favorable. Among the motion segmentation methods, the Murray–Buxton algorithm obtains the MAP estimate of the segmentation through simulated annealing with the temperature schedule given in [12] and an initial temperature of 1000. The resulting segmentation after 1000 iterations is shown in Fig. 3(a). This segmentation is not close to the ideal segmentation because: the input motion field [Fig. 2(a)] is not perfect, and the ideal temperature schedule for simulated annealing (which is unpractically slow) cannot be followed. We input the same motion field to the Wang and Adelson algorithm and obtain the segmentation shown in Fig 3(b). Clearly, this result resembles the original segmentation closely. Our proposed algorithm yields a segmentation [Fig. 3(d)] that further improves upon that of Fig. 3(b). The Stiller algorithm, however, tends to generate more segments, as shown in Fig. 3(c). The reason for this is that it ignores the underlying higher level motion, such as a large area of rotation.

1332

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997

(a) (a)

(b) Fig. 7.

Salesman sequence. (a) Sixth frame. (b) Ninth frame.

(b) Fig. 8. (a) Motion and (b) segmentation estimates by using the proposed method.

B. Real Video We now turn our attention to two frames, seven and eight, of the Motion Picture Expert Group (MPEG) test sequence, mobile and calendar, shown in Fig. 4(a) and (b). There are three independent moving objects in this scene, which can be reasonably well represented by parametric models. To this effect, we have used the same parameter set as in the case of the synthetic sequence. Bayesian motion estimation based on a global smoothness constraint, shown in Fig. 5(a), is the initial estimate in our algorithm. The results of Stiller’s and our proposed methods are shown in Fig. 5(b) and (c), respectively, for comparison purposes. The visible improvements are around the boundaries of the ball. The Murray–Buxton segmentation, shown in Fig. 6(a), performs better than in the previous example, while Wang and Adelson’s [Fig. 6(b)] is satisfactory. The method of Stiller, shown in Fig. 6(c), fails to capture the individual objects, but rather isolates patches in the flow field that undergo smooth motion. The segmentation field generated by our proposed algorithm, shown in Fig. 6(d), provides the best separation of the objects in the scene. Finally, to illustrate the choice of parameter values with different sequences, the proposed algorithm has been applied to two frames,

sixth and ninth, of the “Salesman” sequence, shown in Fig. 7(a) and (b). Here, we observe multiple local motions of the arm (due to movements of the shirt). Hence, in order to classify the entire arm as a single segment, we need to emphasize the spatial connectivity term compared to the residual motion term in (15). To this effect, we have set K = 5 and = = 1: It can be seen that our proposed algorithm yields satisfactory motion and segmentation fields with these choices of the parameter values, as shown in Fig. 8(a) and (b), respectively. V. CONCLUSION We have introduced a framework for simultaneous Bayesian motion estimation and segmentation that is based upon modeling the motion field by the sum of a parametric and a residual field. The proposed method provides a dense motion field, a motion segmentation field, and a set of mapping parameters. The dense representation of the residual motion field, in this paper, has been proposed for improved motion segmentation. In object-based coding, it is possible to model a single residual motion vector per 8 2 8 or 16 2 16 block of pixels (in addition to a parametric model

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 9, SEPTEMBER 1997

1333

An Algorithm for Encoding and Decoding the 3-D Hilbert Order

per object). In other words, the number of residual motion vectors per frame can be varied from application to application. Therefore, the number of bits required for motion representation in the proposed modeling may not be any more than those of the standard block-based schemes. REFERENCES [1] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artif. Intell., vol. 17, pp. 185–203, 1981. [2] H. H. Nagel and W. Enkelmann, “An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences,” IEEE Trans. Pattern Anal. Machine Intell., vol. 8, pp. 565–593, 1986. [3] J. Konrad and E. Dubois, “Bayesian estimation of motion vector fields,” IEEE Trans. Pattern Anal. Machine Intell., vol. 14, pp. 910–927, Oct. 1992. [4] F. Heitz and P. Bouthemy, “Multimodal estimation of discontinuous optical flow using Markov random fields,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. 1217–1232, Dec. 1993. [5] S.-L. Iu, “Robust estimation of motion vector field with discontinuity and occlusion using local outliers rejection,” in Proc. Visual Communication and Image Processing, Boston, MA, Nov. 1993, pp. 588–599. [6] C. Stiller, “Object-oriented video coding employing dense motion fields,” in Proc. IEEE ICASSP, Adelaide, Australia, Apr. 1994. [7] M. M. Chang, M. I. Sezan, and A. M. Tekalp, “An algorithm for simultaneous motion estimation and scene segmentation,” in Proc. IEEE ICASSP, Adelaide, Australia, Apr. 1994. [8] M. Hoetter and R. Thoma, “Image segmentation based on object oriented mapping parameter estimation,” Signal Process., vol. 15, pp. 315–334, 1988. [9] N. Diehl, “Object-oriented motion estimation and segmentation in image sequences,” Signal Process.: Image Commun., vol. 3, pp. 23–56, Feb. 1991. [10] J. R. Bergen et al., “Dynamic multiple motion computation,” in Artificial Intelligence and Computer Vision, Proc. Israeli Conf., Y. A. Feldman and A. Bruckstein, Eds. Amsterdam, The Netherlands: Elsevier, 1991, pp. 147–156. [11] J. Y. A. Wang and E. H. Adelson, “Representing moving images with layers,” IEEE Trans. Image Processing, vol. 3, pp. 625–638, Sept. 1994. [12] D. W. Murray and B. F. Buxton, “Scene segmentation from visual motion using global optimization,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, pp. 220–228, 1987. [13] S. Hsu, P. Anandan, and S. Peleg, “Accurate computation of optical flow by using layered motion representation,” in Proc. Int. Conf. Pattern Recognition, 1994, pp. 743–746. [14] L. Cloutier, A. Mitiche, and P. Bouthemy, “Segmentation and estimation of image motion by a robust method,” in Int. Conf. Image Processing, Austin, TX, Nov. 1994, pp. 805–809. [15] A. M. Tekalp, Digital Video Processing. Englewood Clliffs, NJ: Prentice-Hall, 1995. [16] P. B. Chou and C. M. Brown, “The theory and practice of Bayesian image labeling,” Int. J. Comput. Vis., vol. 4, pp. 185–210, 1990.

Xian Liu and G¨unther F. Schrack Abstract— Spatial ordering is a transformation relating an ndimensional voxel to a set of natural numbers. Interest in the Hilbert order (H-order) has been high because of its desirable performance when applied to image processing. In this correspondence, an encoding/decoding algorithm for the three-dimensional (3-D) Hilbert order is presented.

I. INTRODUCTION Many applications involve transforming high-dimensional image data into the one-dimensional (1-D) space. One of the latest examples is region-oriented color image compression [2]. This work can effectively be carried out by a transformation called spatial ordering [1]–[6], [8]–[10]. One of the well-known orders is the Hilbert order (H order), the discrete representation of the Hilbert’s space-filling curve [7]. A three-dimensional (3-D) H order is of resolution r if it is defined on an image with 2r 2 2r 2 2r voxels. An instance of resolution two and the corresponding space-filling curve are depicted in Fig. 2. Interest in the H order has been high because of its desirable behavior. However, little has been discussed on its high-dimensional algorithms in the literature due to the complexity. A diagrammatic approach is described in [3] but no algorithm is given. An implementation is reported in [2] but it is coupled with dedicated hardware. In this correspondence we present an algebraical encoding/decoding algorithm for the 3-D H order. II. ENCODING

THE

3-D HILBERT ORDER

In the following, the H order is to be established in a subset r r r D = f0  x  2 ; p  y  2 ; 0  z  2 g of the 3-D integer space. The resolution or level of the spatial order is specified by the parameter r (r  1): Small values of r correspond to low resolutions or high levels. Given the coordinates of a particular point P in the voxel system D; the corresponding value of P in the H order (hvalue) is to be determined. This process is called encoding. We will introduce a geometric primitive to conduct the encoding procedure. Consider a 3-D cube. A Hamiltonian path (H-path) is a path that passes every vertex of the cube exactly once. A Hamiltonian cycle is a closed H-path. There are two types of H-paths. The first type can be formed by deleting an edge from a Hamiltonian cycle, and the second type cannot. In the following, the first type is chosen as the geometric primitive to construct the H order. The geometric primitive has 24 rotation states: A cube has eight vertices and starting from a particular vertex, six distinct H-paths can be constructed. Thus there may be 48 such paths in a cube. Considering the undirected paths only, there are 24 ones. The components of the spatial structure of the 3-D H order can be characterized by indexed H-paths called units. A unit has two features: aspect and orientation. The aspect specifies Manuscript received February 2, 1996; revised October 31, 1996. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. A. Murat Tekalp. The authors are with the Department of Electrical Engineering, The University of British Columbia, Vancouver, B.C. V6T 1Z4, Canada (e-mail: [email protected]). Publisher Item Identifier S 1057-7149(97)06328-8.

1057–7149/97$10.00  1997 IEEE