Binary Shape Coding Using Baseline-Based Method - Semantic Scholar

3 downloads 5684 Views 1MB Size Report
The authors are with Signal Processing Laboratory, Samsung Advanced. Institute of ...... interests include image coding and computer vision. Dae-Sung Cho ...
44

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 1, FEBRUARY 1999

Binary Shape Coding Using Baseline-Based Method Shi Hwa Lee, Dae-Sung Cho, Member IEEE, Yu-Shin Cho, Se Hoon Son, Euee S. Jang, Associate Member, IEEE, Jae-Seob Shin, and Yang Seock Seo

Abstract— Here, we propose a new shape-coding algorithm called baseline-based binary shape coding (BBSC), where the outer or inner contours of an arbitrarily shaped object are represented by traced one-dimensional (1-D) data from the baseline with turning points (TP’s). There are two coding modes, i.e., the intra and inter modes as in texture coding. In the intra mode, the differential values of the neighboring 1-D distance values and TP’s corresponding to the given shape are encoded by the entropy coder. In the inter mode, object identification, global shape matching, and local contour matching are employed for motion compensation/estimation. Lossy shape coding is enabled by variable sampling in each contour segment or by allowing some predefined error when performing motion compensation. We compare the proposed method with the bitmap-based method of context-based arithmetic encoding (CAE). Simulation results show that the proposed method is better than CAE in coding efficiency for intra mode and better in subjective quality for both intra and inter modes, although the CAE method has performed better than the proposed method in inter mode. Index Terms— Baseline, binary shape coding, block-based method, contour-based method, distance list, MPEG-4, turning point, variable sampling.

I. INTRODUCTION

B

INARY shape representation is one of the key functionalities provided in the MPEG-4 standard [1]–[3] which considers audio–visual coding in multimedia applications with a content-based approach. Hence, shape coding is a new feature in MPEG-4 different from other standards such as MPEG-1 and MPEG-2. In MPEG-4, several video contents (or objects) in a frame are separately encoded, and they are constructed to the object layers in the bit-stream domain. In the decoder part, for example, some contents out of many encoded contents are decoded and composed on one background region. Other objects that have already been stored can also be included in the composed image [2]. This operation is referred to as content-based functionality. The binary shape coding is necessary to efficiently encode and describe the arbitrary shape of content. There are two typical approaches for shape coding: the block-based and the contour-based methods. For the blockbased approach, the context-based arithmetic encoding (CAE) [3], [4] has been adopted as a binary shape-coding tool in the verification model in MPEG-4 [1]. In the binary shape Manuscript received October 31, 1997; revised April 29, 1998. This paper was presented in part at the IEEE International Conference on Image Processing, Santa Barbara, CA, October 1997. This paper was recommended by Associate Editor K. N. Ngan. The authors are with Signal Processing Laboratory, Samsung Advanced Institute of Technology (SAIT), Suwon 440-600 Korea. Publisher Item Identifier S 1051-8215(99)01187-8.

coding of MPEG-4, the smallest rectangle enclosing an entire 16 pixels. The object is divided into shape blocks of 16 bitmap of each shape block is to be encoded. The block-based size conversion is performed for lossy shape coding. In the inter mode, it uses the block-matching algorithm for estimating motion vector between two shape blocks of the current and previous frames. The CAE method is well integrated into the current MPEG4 scheme since CAE is also the block-based approach. It has the benefit of short processing time because it only has a delay time of a macro-block unit. The block-based size conversion in CAE, however, shows visually annoying staircase effects [3], [4]. In the case of the inter mode, some mismatch error is shown at the boundary region on the reconstructed shape because of nonrigid motion. Another block-based method of the modified MMR method [3], [5] differs from CAE only in the actual encoding of the shape information. Hence, the properties of MMR are very similar to the CAE. Contour-based shape coding can be thought of as alternative to avoid the above disadvantage. We here introduce a new binary shape coding method called the baseline-based shape coding (BBSC) method [3], [6]. Vertex-based binary shape coding [7] is also a contour-based shape coding. One of the major features of BBSC is using one-dimensional (1-D) data that are found to be useful for simpler operation of data manipulation and efficient representation of shape boundary. In the intra mode, the 1-D distance data and turning points (TP’s) are encoded by entropy coder [8]. In lossy coding, subsampling of the contour segment is performed. The shape reconstructed by bilinear interpolation of the contour segment only shows a geometric error, which is less annoying to human eyes than blocking artifacts. In the predictive coding mode, the matching of the contour segment is performed in every separate shape differently from a block-matching-based algorithm such as CAE. In the following section, the overall structures of several shape-coding methods that have been candidates for MPEG4 are reviewed. Section III explains the proposed BBSC which represents the procedure for extracting contour data, variable sampling for lossy shape coding, shape motion estimation/compensation, and shape reconstruction methods. In Section IV, the performance of the proposed BBSC and blockbased CAE method are compared. Finally, Section V closes this paper with conclusions. II. REVIEW OF BINARY SHAPE CODING METHODS In MPEG-4, there were four major candidate technologies for shape coding. A macro-block-based method called CAE

1051–8215/99$10.00  1999 IEEE

LEE et al.: BINARY SHAPE CODING

45

(a)

case of modes 5 and 6, a template of nine pels is used [see Fig. 1(b)]. The context of five pels is defined from the motion compensated shape block of the previous frame. The context of four pels is defined by neighboring pixels of the current shape block. As a contour-based shape coding, a vertex-based shape coding method was also proposed [7]. In the method, a shape of sets of different contours is approximated using a polygon representation for lossy shape coding. For lossless shape coding, the polygon approximation is substituted by chain coding [10]. The contour-based methods may have advantages such as full-frame-based coding and content-based manipulation and accessing. III. BASELINE-BASED BINARY SHAPE CODING (BBSC) A. Overview

(b) Fig. 1. Contexts for intra and inter CAE (the pixel to be coded is marked with “?”). (a) The intra template and context construction. (b) The inter template and context construction.

has been selected since it showed good results, particularly in the coding efficiency of the inter mode, and it can be easily integrated into the existing block-based motion compensation and texture coding [1], [4]. Another block-based method called modified MMR differs slightly in the coding of actual bitmap data [5]. The CAE encodes the bitmap of shape using the context of the neighboring pixels by binary arithmetic encoding [4], [9]. For lossy coding, the coder allows size conversion of the shape block by subsampling and upsampling. Before encoding the binary pixel value, the shape information of the current macro block (16 16 size) is subsampled by a factor of 2 (8 8 block) or 4 (4 4 block) when size conversion is on. The size conversion information, called the conversion ratio (CR), is also provided in the bit stream. Based on CR, the number of coded pixels is determined. Then the decoded shape block is upsampled to get the original block size. In the intra mode, a template of ten pixels is used to predict the current pixel [see Fig. 1(a)], and then it defines the probability model for the arithmetic encoding. This can be viewed as a Markov source model with ten states. For the macro block with all opaque (“255”) values or all transparent (“0”) values, only the coding mode is sufficient without encoding the bitmap information. In the inter mode, there are seven coding modes [4], [5]. The coding modes are shown in Table I. In the case of modes 0 and 1, the shape of the current macro block is predicted from the shape macro block of the previous frame using motion compensation. Shape motion vectors are estimated by the block-matching algorithm with full search range. In the

The overall encoding block diagram of BBSC is shown in Fig. 2(a). As shown in Fig. 2, a shape mask is considered as a set of contours (or boundaries). By processing the contour pels instead of the whole shape mask, we can simplify the coding process, although we need a filling process at the coding process to store the original shape mask. A contour is traced in the clockwise (or counterclockwise) direction, and represented by 1-D distance data and TP’s. A basic processing unit for extracted data is called a contour segment. A contour segment (CS) consists of 16 contour pixels. In the intra mode, distance values of the contour are subsampled before the entropy coding. In the predictive coding mode, the shape motion vector between two contour data is estimated such that the sum of differences of two distance data is minimized. Then the residual data of the motion-compensated distance value from the current input data are subsampled and encoded like DPCM [11]. Variably sampled distance values are input to DPCM block and encoded by the entropy coder in the VLC block. We predict TP’s in a DPCM-like manner, and encode them with an by entropy coder. Missing data between two subsampled distance data are reconstructed by bilinear interpolation. In the inter mode, the data are predicted from the previously reconstructed shape data. The shape reconstructed by the distance data and TP’s is stored in the shape buffer, of which the values are used for predicting the next shape data. The binary shape mask is reconstructed by filling the opaque (“255”) value inside the reconstructed contour data in the shape-filling block. In the decoder shown in Fig. 2(b), the encoded bit stream is input to the inverse VLC block. In the inverse DPCM block, the subsampled distance data, TP’s, and shape motion vectors are reconstructed. The shape MC block reconstructs the motion-compensated shape using the previous shape data stored in the shape buffer. Then reconstructed data are added to the residual shape data. By filling the contour data, we can obtain the bitmap of the shape in the shape-filling block. B. Distance Data and Turning Point Extraction To obtain the contour pixels, a contour of shape is traversed in the clockwise (or counterclockwise) direction. In this case,

46

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 1, FEBRUARY 1999

TABLE I CODING MODE IN CAE

(a)

(b) Fig. 2. Overall coding scheme of the baseline-based method. (a) Encoder. (b) Decoder.

the eight neighboring pixels are considered as shown in of the Fig. 3(a). In order to find the next contour pixel the first encountered pixel having an opaque current pixel (“255”) value is searched from the previous contour pixel in the clockwise (or contourclockwise) direction.

Before extracting the encoded contour pixels, the baseline is chosen upon the longevity of shape. In Fig. 3(b), the horizontal axis is chosen as the baseline since the width of the minimal bounding rectangle enclosing the shape is longer than the height.

LEE et al.: BINARY SHAPE CODING

47

(a)

Fig. 4. Typical direction type of contour pixel.

dure for extracting distance data and TP’s is as follows. PROCEDURE Do for all contour pixels: 1) Contour data extraction: find

(b) Fig. 3. Contour tracing. (a) Contour tracing clockwise through the eight neighbors of the current pixels. (b) Contour tracing for turning points and distance lists.

Once the baseline is chosen, the position of a contour pixel is defined by a perpendicular distance value from the baseline. The 1-D distance values are extracted in the order of contour tracing. The coordinates of the employed pixels are increased by 1. In some pixels, these directions are changed. From the pixel, the -coordinate values are decreased by 1. The pixel in which the tracing direction is changed is called the turning point (TP). , In Fig. 3(b), the tracing starts from the contour pixel and keeps tracing contour pixels and During the process the distance values between and the baseline are extracted, i.e., the discontour pixels and tance values are extracted. Then the contour pixels and are is called the extracted as TP’s. A list of distance value distance list. To reconstruct a contour data, the data such as: 1) the starting pixel of tracing; –29; 2) the coordinate distance values 3) the coordinate of TP’s (turning points) should be transmitted to the decoder. The summarized proce-

2) Distance list and turning point extraction: 2.1) if contour type is Type 0, then append to distance list if previous contour type is Type 1, then append to list set previous contour type to Type 0; 2.2) else if contour type is Type 1, then append to distance list if previous contour type is Type 0, then append to list set previous contour type to Type 1; 2.3) else if contour type is Type 2, then if previous contour type is Type 0, then if is , then append to distance list append to distance list append to list else then append to list set previous contour type to Type 1; 2.4) else if contour type is Type 3, then if previous contour type is Type 1, then if is , then append to distance list append to distance list append to list else then append to list set previous contour type to Type 0; As shown in Fig. 4, every contour pixel has four different types according to the relationship between consecutive con-

48

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 1, FEBRUARY 1999

tour pixels. The previous, current, and next contour pixels of and are extracted [step 1)]. In the case where the contour type is Types 0 or 1 as of the current in steps 2.1) or 2.2), the distance value If Types 0 contour pixel is added to the distance list or 1, which has one tracing direction (left to right, or right to left) is different from the previous contour type, the current is added to the TP list, The turning contour pixel point is described by the -directional position. In the case where the current pixel has Types 2 or 3 such as steps 2.3) or 2.4), the pixel is verified as to whether or not it is is included in the distance list twice a distance pixel. if it is the distance pixel. Since the contour pixel is projected is to the baseline in the -axis direction, the distance list updated under the condition that there are no already selected contour pixels in upside or in downside positions. The of the current contour pixel is added to the TP list. If the is added to the extracted pixel is not the distance pixel, without updating distance list C. Variable Sampling for Lossy Shape Coding Sampling Rate (SR): In lossless coding, all distance list and TP values are encoded. However, for further compression, the geometric distance error is given to each contour segment. Hence, the subsampling is performed in each CS. Since the degree of the complexity of the polygon drawn by the contour pixels is different, the number of selected samples is variable in each segment. For the subsampling error, the encoded contour pixels are selected in such a way that the amount of error of the reconstructed distance values is smaller than a given quality threshold QualityTh. One CS is divided into four parts for evaluating the distance error. Each part has four distance values. The quality threshold is an allowable error between two CS’s in each part. If there is at least one part in which the sum of the distance errors is above QualityTh, further sampling is performed. To make the reconstructed shape preserve subjectively good quality, the sampling rate in each segment should exceed the Nyquist rate, where the Nyquist rate is two times the maximum frequency of the subsampled distance values. Before encoding the sequence of the subsampled distance values, the SR is coded for indicating the number of distance samples, where the SR indicates the number of selected samples per the number of original samples (16 in this paper). [see Fig. 5(a)] and one in There are five SR’s, only one distance value is coded for each CS. If SR is a CS. If SR is 1, then all distance values are coded. The Whenever the evaluated process selecting SR starts from error is above QualityTh, the SR becomes two times larger, and the error test is performed again. The sum of absolute difference (SAD) is used for evaluating the error during variable sampling. To reconstruct a missing sample between two coded distance values, a bilinear interpolation is performed as follows:

(1)

where and are the th and th sample values of and (the axis is horizontal to the contour pixels at is the interpolated distance baseline, the axis is vertical). in the th section which is value at the th position and The SAD between surrounded by two samples of the original CS and the reconstructed CS at the sampling rate is defined by

SAD and

(2)

is the distance value between an original CS and where the baseline at the th position in the th section of a CS. This process will be completed when the evaluated error is below the QualityTh or when SR is set to 1. In Fig. 5(b), the selected pixels during variable sampling are marked on the contour shape. After reconstructing the subsampling values, the missing samples are approximated and the shape [of Fig. 5(b)] is obtained by filling. Sampling Position Reassignment: When selecting contour data in a CS, the distance values of the subsampling intervals are all the same. However, if there is a large discontinuity between two particular sampling positions, further samples should be included, that is, SR is increased. Since the fixed intervals are preserved in a CS, some redundant samples are inserted despite their small distortion errors. To reduce the redundancy, the positions of the nearest two samples from abruptly changed sample positions are reassigned as shown and are in Fig. 6(a). In Fig. 6(a), the sample positions and In the case of sampling with reassigned as a variable interval, the -coordinate position of the sampling pixel should be encoded in addition to SR and the distance lists. This can be avoided by detecting TP. TP is used for determining whether or not the reassignment is performed. In general, there exists a large discontinuity in the position of TP from which the neighboring TP’s have a long distance. Distance List Adjustment: Before encoding the subsampled distance lists, the distance values are adjusted for further reduction of the reconstruction error. As shown in Fig. 6(b), the adjustment is employed in the range of [ 1, 1] in direction. The adjusted value giving the minimum the reconstruction error is used as the new distance list value. The distance list value on each sampling position is updated at by using the following evaluating error measure the adjustment position

(3) and are the th and th distance where values of approximated CS, in which the missing samples be-

LEE et al.: BINARY SHAPE CODING

49

(a)

(b)

(c)

Fig. 5. Variable sampling of contour. (a) Variable sampling with sampling ratio (SR). (b) Contour trace with variable sampling. (c) Reconstructed shape.

tween and and between and are reconstructed and and of by bilinear interpolation of and respectively. The and are the sets of the and are distance values of each contour part. the th and th distance values of the original th and th contour segments, respectively. In the experiment, the updating is iterated until the number of evaluation repetitions is ten or until there is no further adjustment for all samples.

D. Intershape Coding of Baseline-Based Method Search of Reference Shape: For temporal prediction of a separate shape, the contour data of an object are predicted from the previous contour data by motion compensation of the contour segment. Before performing a contour segment matching, the object content of the previous frame which is assumed to be correspondent to a current shape should be selected first. The most similar contour out of several

50

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 1, FEBRUARY 1999

(a)

(a)

(b) Fig. 6. Sample adjustment method. (a) Sampling position reassignment. (b) Distance list adjustment.

separate contours is selected as the reference contour. The methods for finding the reference contour and global motion vector are shown in Fig. 7. One or more contours exist in the previous and current video object plane (VOP), where VOP means a group of separate shapes in an image. In and are the outer contours or Fig. 7(a), the contours holes in the reference search range of the previous VOP. The and are the outer contours or holes in the current VOP. or The Possible reference contour data of are contours matching procedure is not the actual motion estimation, but the initial contour matching before the contour segment matching. For reducing the computational complexity, we only consider the previous contours which exist inside the reference search range of 32 of the previous VOP from the start position of a current contour. For the initial contour matching of the hole data, a similar procedure to that of the outer contour data is performed. In the matching procedure, a start position of each candidate in the previous VOP should be aligned with that of the current contour. Then a motion vector from the aligned point, which gives minimum motion-compensated error in the search range of [ 8, 8], is selected in each contour. One contour giving the smallest error is set to the reference contour, and its global motion vector is also considered for contour matching later. The global motion vector of the current one is from the reference contour. The sum of absolute differences (SAD) between a previous reference contour and a current contour can be defined as follows: (4)

(b) Fig. 7. Searching reference shape. (a) Searching the corresponding outer shape of e out of all of the previous shapes a; b; c: (b) Sum of absolute difference (SAD) between a previous shape and a current shape.

where and indicate the binary contour of is the a previous and a current VOP, respectively. initial start position of the current contour from that of a is the global motion vector inside the previous contour. search range of [ 8, 8]. The method calculating is shown in Fig. 7(b). The SAD in (4) can be rewritten as follows: (5) and indicate the th pair of where the distance lists on the th perpendicular line passing through the baseline. Before evaluating SAD, contour data of two compared shapes are composed in one spatial domain at the After the alignment of two contours, search position of the values of the distance list of the composed contour are sorted by their magnitudes. The symbol means the length represents the number of of the baseline, and the symbol distance list pairs at the th position of the baseline. Motion Estimation and Compensation of Contour Segment: Once the reference contour is decided, contour segment (CS) matching is performed for local motion estimation. If the error after global motion compensation is smaller than a predefined threshold, the global skip mode is used. Since there is no

LEE et al.: BINARY SHAPE CODING

51

(a)

(b) Fig. 8. Local motion estimation of contour segment. (a) Start and end point matching between two segments. (b) SAD between a reference contour segment and a current contour segment.

transmitted local motion, a significant bit reduction is possible, particularly in a shape having rigid motions. The brief description of the local contour matching is shown in Fig. 8. In Fig. 8(a), a current contour part to be estimated is aligned with the reference contour at the initial -coordinate position defined by a global motion. The length of the current CS is fixed, for example, to be 16 in the proposed method. However, the length of the reference CS is determined by the start and the end positions coinciding with the start and end positions of the current CS, respectively. In the search range of [ 4, 4], a motion vector is selected, resulting in a minimal estimation error. The sum of absolute difference (SAD) is used for reference contour selection and global motion estimation. In the contour matching, the SAD is investigated in the following way:

current CS and sorted by their magnitudes before evaluating the error. The symbol means the number of distance values represents the number in a current CS, and the symbol th position of the of pairs of distance values at the baseline. The mismatched errors of the contour segments are shown in Fig. 8(b). Before evaluating SAD between CS’s, two constraints should be satisfied in each search range. One is that the start and the end positions of the reference CS should coincide with those of the current CS. The other is that the tracing directions of the start and the end positions of two CS’s should be the same, respectively. The tracing direction of the current CS is known both to the encoder and the decoder by TP encoding. A contour segment that is not predicted well by the local motion is encoded in the same way in the intra mode. If the sum of absolute DPCM of the current CS is smaller than the error of motion-compensated CS, the intra mode is selected. Otherwise, motion compensation and residual coding are performed.

(6) and indicate the where th pair of the distance lists on the th line perpendicis the -coordinate position ular to the baseline. The of the th tracing contour pixel in a CS. After the -coordinate start and end positions of two CS’s coincide with each other, The SAD is calculated at the search position of the distance lists are extracted both from the reference CS and the

E. Encoded Symbols In BBSC, 2-D contour data are converted into 1-D data. This is to give simpler operations for shape reconstruction and compression by reducing the correlation between two distance values of the contour pixels. The encoded symbols of 1-D data and some information for reconstructing the shape data are summarized in Table II.

52

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 1, FEBRUARY 1999

TABLE II ENCODED SYMBOL LISTS

At first, the baseline direction is encoded. If the -coordinate distance difference of the leftmost (rightmost) contour pixel pair is larger than that of the rightmost (leftmost) contour pixel pair, the contour is traced in the clockwise (counterclockwise) direction from the leftmost (rightmost) position to decrease the encoded bits of distance values. There may exist one or more objects in a VOP, and there may also exist several holes in a shape. So the number of contours and holes should be transmitted to the decoder. Each location of a separate contour is represented by the -coordinate difference value from the start position of the baseline or the start position of a previously encoded separate contour. To represent the contour data of shape in the predictive coding scheme, global contour motion, contour segment motion, the number of TP, the coordinate of the TP, the sampling rate, and the -coordinate distance values are encoded for each separate contour. F. Shape Reconstruction Using Contour Filling Using the reconstructed 1-D distance data and TP’s, a binary shape can be reconstructed by filling the inside of the contour data. The decoded distance data can be mapped onto a closed boundary of a contour as shown in Fig. 9(a), where the data are numbered by contour tracing order. In a convex object, a shape can be formed by assigning an opaque “255” value between two -coordinate data of one pair of two contour pixels located in the same coordinate. In a shape containing some concave contour segments, however, the distance data of the contour pixels on the same -

coordinate position of the baseline should be grouped together as depicted in Fig. 9(b). Then the distance values on the same position are sorted by their magnitudes, and grouped into several pairs as shown in Fig. 9(c). Contour filling is performed inside each pair of distance data. The shape in Fig. 9(a) is reconstructed by the 1-D distance value. IV. EXPERIMENTAL RESULTS Test sequences “Weather,” “Children-Kids,” and “Logo” were used for evaluating the proposed BBSC method and the conventional CAE method [3], [4]. Those sequences were also employed for the core experiment on MPEG-4 shape coding. The “Weather” sequence with the QCIF format (176 144), and the “Kids” and “Logo” sequences with the SIF format (352 240) were coded at 10 frames/s. Each shape datum has two mask values for separating the object and the background. The opaque value “1” for image objects and the transparent value “0” for the background can be extracted by the blue screen method from the original texture image. The lossless coding results of each shape at the thirtieth frame are shown in Fig. 10. The luminance data of texture are overlapped with the binary shape. “Weather” has the shape of a head and shoulders of a human, which is commonly used for the application of teleconference and videophone. The “Kids” sequence is useful for checking how efficient each method is for the coding of a shape having fast motions. The “Logo” sequence shows the shape of moving characters. Simulations were conducted both in the intra and inter mode.

LEE et al.: BINARY SHAPE CODING

53

(a)

(b)

Fig. 9. Contour-filling method in baseline-based method. (a) Structure of decoded distance lists. (b) Distance lists on the same position of the baseline in the order of contour tracing. (c) Sorted distance lists as the magnitude of distance lists.

In the case of the intra mode, we know that the proposed method can reduce more bits per VOP than the CAE method from the lossless result of Fig. 10. In a contour-based algorithm such as BBSC, the overhead bits for shape are fewer than those of a block-based method such as CAE. This is due to the fact that if one object of shape is divided by several blocks with fixed size, the overhead bits for indicating the block coding mode and size conversion information are needed for each block. The proposed method encodes the outer contour pixels of the shape, whereas the CAE method encodes all of the pixels inside the shape. In the case of the “Logo” sequence, however, the length of the traced contour data per the area inside the shape is larger than those of other sequences. Hence, the intra coding result of BBSC is slightly worse than that of CAE in the “Logo” sequence. In the inter mode, CAE yields fewer bits than BBSC. In the bitmap-based method, the mismatched error between two shape blocks is small because the shape block with a small size 16 has a high probability of having similar pattern of 16 with another shape block. Instead of block matching, contour matching is performed in BBSC. In the test sequences, the variation of the -coordinate distance data between two VOP’s is larger than that of the bitmap pattern. So the intra mode is selected in most parts of the contour segment (CS). BBSC hence gives lower compression than CAE, particularly in the inter mode.

(c) Fig. 10. Lossless result at thirtieth frame of each sequence with texture 494, inter bits 474) (lossless). (a) Weather (baseline method, intra bits (VM(CAE) method, intra bits 531, inter bits 325). (b) Kids (baseline method, intra bits 1843, inter bits 2078) (VM(CAE) method, intra bits 2085, inter bits 1730). (c) Logo (baseline method, intra bits 3229, inter bits 1436) (VM(CAE) method, intra bits 3166, inter bits 826).

=

=

= =

=

=

=

=

=

=

= =

For lossy coding, we should consider the tradeoff between bit rate and the accuracy of shape representation. For evaluating the distortion error of a reconstructed shape, we use the following distortion criteria [12]. number of error pels related to the object size that is • the total number of pels inside the original shape. peak deviation (maximum real deviation). • the total number of samples whose values have For changed and the total number of nonzero pels of the original

54

Fig. 11.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 1, FEBRUARY 1999

Graph of Weather. (a)

(a)

(b)

(c)

(d)

Dp versus bits (intra). (b) Dn versus bits (intra). (c) Dp versus bits (inter). (d) Dn versus bits (inter).

shape have to be calculated for each VOP. The first value of the VOP is divided by the second one and averaged over is all VOP’s in a sequence. The brief source code for described as follows:

where VopHeight and VopWidth are the height and width of and are the mask VOP. values of the original shape and reconstructed shape at respectively. is obtained dividing the number of original contour pixels originalCount by the number of mismatched contour pixels errorCount. or of each method The graphs of average bits per are shown in Figs. 11–13. In the intra mode, the average bit and are fewer in the proposed method numbers per than in the CAE. However, in the inter mode, the CAE gives fewer bits than the proposed method. The reconstructed binary shape by interpolating subsampled distance values in the BBSC method and block-based size conversion in CAE for lossy coding are shown in Figs. 14–16,

where the original texture values are overlapped inside the shape. In the case of CAE, due to the block-based size conversion error, the reconstructed shape shows annoyance with the blocking artifacts, particularly in the intra mode. In the proposed method, the sampling on the contour was performed under the consideration of the Nyquist rate on each CS. And the lossy coding results in a geometric reconstruction error which is less annoying to human eyes than the blocking effects. In the inter mode, the CAE method is performing better at the object quality of and It, however, seems that random distortions are present in the boundary of the reconstructed shape. We call it the irregular boundary effect. The irregular boundary effect comes from the boundary mismatching between two neighboring motion-compensated shape blocks. That effect is particularly large in the sequence having fast motion such as the “Kids” sequence (see Fig. 15). From the viewpoint of hardware implemenation, the blockbased CAE method has some advantages over the contourbased BBSC method. There is a small delay in a macro-block unit in CAE, and it needs only a small memory buffer size. Although the contour-based method shows a better shape quality, it is needed to optimize the hardware complexity more. Efficient integration into hardware should be further studied. In the case of CAE, a problem remains in the computational complexity of binary arithmetic encoding, which should be further investigated in the future.

LEE et al.: BINARY SHAPE CODING

Fig. 12.

Graph of Kids. (a)

55

(a)

(b)

(c)

(d)

(a)

(b)

Dp versus bits (intra). (b) Dn versus bits (intra). (c) Dp versus bits (inter). (d) Dn versus bits (inter).

(c)

Fig. 13.

Graph of Logo. (a)

(d)

Dp versus bits (intra). (b) Dn versus bits (intra). (c) Dp versus bits (inter). (d) Dn versus bits (inter).

56

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 1, FEBRUARY 1999

(a)

(b)

(c) (d) Fig. 14. Results of a lossy shape at thirtieth frame of Weather sequence. (a) VM7 (CAE); intra, AlphaT h 128 (Dp = 2:1376; Dn = 0:0499; bit = 218). (b) Baseline; intra, QualityT h = 16 (Dp = 3:0830; Dn = 0:0291; bit = 219). (c) VM7 (CAE); Inter, AlphaT h = 96 (Dp = 2:3072; Dn = 0:0330; bit = 101). (d) Baseline; inter, QualityT h = 16 (Dp = 2:4047; Dn = 0:0334; bit = 113).

=

(a)

(b)

(c) (d) Fig. 15. Results of a lossy shape at thirtieth frame of Kids sequence. (a) VM7 (CAE); intra, AlphaT h = 128 (Dp = 2:0925; Dn = 0:0742; bit = 825). (b) Baseline; intra, QualityT h = 10 (Dp = 2:0009; Dn = 0:0371; bit = 846). (c) VM7 (CAE); inter, AlphaT h = 128 (Dp = 3:6434; Dn = 0:0751; bit = 583). (d) Baseline; inter, QualityT h = 18 (Dp = 2:9669; Dn = 0:0586; bit = 586).

LEE et al.: BINARY SHAPE CODING

57

(a)

(b)

(c)

(d)

=

Fig. 16. Results of a lossy shape at thirtieth frame of Logo sequence. (a) VM7 (CAE); intra, AlphaT h 96 (Dp = 1:4997; Dn = 0:2067; bit = 1799). (b) Baseline; intra, QualityT h = 12 (Dp = 1:5299; Dn = 0:1371; bit = 1820). (c) VM7 (CAE); inter, AlphaT h = 85 (Dp = 1:4756; Dn = 0:1844; bit = 298). (d) Baseline; inter, QualityT h = 16 (Dp = 1:7327; Dn = 0:1960; bit = 481).

V. CONCLUSION The BBSC method is proposed for binary shape coding for better objective and subjective qualities, particularly in the intra coding mode, as shown from the experimental results. The method shows a smaller blocking effect on the reconstructed shape. On the other hand, the block-based CAE method gives more efficient results, particularly in the inter mode. However, it presents some defects such as the irregular boundary effect as well as the blocking effect. In the BBSC, the contour of objects is directly considered, and the main feature of shape is conserved even if the degree of loss is high. The geometric error of shape in which the -coordinate distance values are reconstructed by bilinear interpolation are less annoying to human eyes. If we consider the application using contents with high quality, the proposed method seems more efficient than the block-based method. In the contour-based method, there is one problem that should be optimized further, i.e., computational processing time and hardware implementation. From the viewpoint of hardware implementation, the block-based CAE is well fit to the existing block-based motion and texture coding scheme. It allows only a delay in a block unit, whereas the proposed method needs a processing delay time in a frame unit. Since there is still redundancy in the contour-based method, the processing time and hardware implementation in the contour-

based method BBSC thus should be further investigated in the future. Further research will be focused on the development of descriptions of arbitrarily shaped objects for content-based accessibility in addition to the hardware optimization. In the future, some efficient method for searching contents from a large database is needed, which contains all types of object images, texture, text, and speech. This may be a requirement for the future work of the MPEG group. REFERENCES [1] Video Group, “MPEG-4 video verification model version 7.0,” ISO/IEC JTC1/SC29/WG11 MPEG96/N1642, Bristol, England, Apr. 1997. [2] T. Sikora, “The MPEG-4 video standard verification model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp. 19–31, Feb. 1997. [3] Shape Coding Ad-Hoc Group, “Core experiments on MPEG-4 video shape coding,” ISO/IEC JTC1/SC29/WG11 N1326, Chicago, IL, Oct. 1996. [4] N. Brady, F. Bossen, and N. Murphy, “Context-based arithmetic encode of 2D shape sequences,” Special Session on Shape Coding, ICIP 97, Santa Barbara, CA, 1997. [5] N. Yamaguchi, T. Ida, and T. Watanabe, “A binary shape coding method using modified MMR,” Special Session on Shape Coding, ICIP 97, Santa Barbara, CA, 1997. [6] S.-H. Lee, D.-S. Cho, Y.-S. Cho, S.-H. Son, E. S. Jang, and J. S. Shin, “Binary shape coding using 1-D distance values from baseline,” Special Session on Shape Coding, ICIP 97, Santa Barbara, CA, 1997. [7] K. J. O´connell, “Object-adaptive vertex-based shape coding method,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp. 251–255, Feb. 1997.

58

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 1, FEBRUARY 1999

[8] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,” Commun. ACM, vol. 30, pp. 520–540, June 1987. [9] J. Osterman, E. S. Jang, J.-S. Shin, and T. Chen, “Coding of arbitrarily shaped video objects in MPEG-4,” Special Session on Shape Coding, ICIP 97, Santa Barbara, CA, 997. [10] M. Eden and M. Kocher, “On the performance of a contour coding algorithm in the context of image coding. Part I: Contour segment coding,” Signal Processing, vol. 8, pp. S.381–S.386, July 1985. [11] M. Rabbani and P. W. Jones, Digital Image Compression Techniques. SPIE, Int. Soc. Opt. Eng., 1991. [12] Video Subgroup/Adhoc Group on Shape Coding/J. Ostermann, “Core experiments on MPEG-4 video shape coding” ISO/IEC JTC1/SC29/WG11 N1584, Mar. 1997.

Shi Hwa Lee was born in Wonju, Korea, on November 17, 1965. He received the B.S. and M.S. degrees in computer science from the Yonsei University, Seoul, Korea in 1990 and 1992, respectively. He joined Samsung Advanced Inst. of Technology (SAIT) as a member of research staff at Signal Processing Lab. in 1992. He is currently pursuing Ph.D degree in KAIST, Daejon, Korea. His research interests include image coding and computer vision.

Dae-Sung Cho (M’97) was born in Seoul, Korea, on September 5, 1971. He received the B.S. and M.S. degrees all in electronic engineering from Sogang University, Seoul, Korea, in 1994 and 1996, respectively. Currently, he is a member of research staff at the Signal Processing Lab., Samsung Adavanced Institute of Technology, Kiheung, Korea. His current research interests are image coding, image analysis, and image representation.

Yu-Shin Cho was born in Seoul, Korea, on March 24, 1970. He received the B.S. and M.S. degrees all in computer science from Sogang University, Seoul, Korea, in 1992 and 1994, respectively. Currently, he is a member of research staff at the Human Computer Interface (HCI) Lab., Samsung Adavanced Institute of Technology, Kiheung, Korea. His research interests are in computer vision and video coding.

Se Hoon Son received the B.S. and M.S. degrees in electrical engineering from Korea University, Seoul, Korea, in 1993 and 1995, respectively. In 1995, he joined Samsung Adavanced Institute of Technology as a research staff member. He has worked in the fields of signal processing and image coding, especially on MPEG-4 standards. His research interests are object-based visual communication and scalable coding scheme.

Euee S. Jang (S’93–A’96) received the B.Eng. degree in computer engineering from Chonbuk National University, Chonju, Korea, in 1991. He received the M.S.E.E. and Ph.D. degrees in electrical and computer engineering from the State University of New York at Buffalo in 1994 and 1996, respectively. In 1995, he served as a Research Associate in the Army Research Laboratory, Adelphi, MD, to research FLIR and SAR image compression. He is currently Senior Researcher at the Samsung Advanced Inst. of Technology, Korea. He also serves as Project Editor in ISO/IEC JTC1/SC29/WG11 for MPEG-4. His current research areas are image/video coding, 2D/3D shape coding and representation, and standard development (MPEG4 and JPEG2000).

Jae-Seob Shin was born in Andong, Korea, in April 1963. He received the B.S. and M.S. degrees in electronic engineering from Sogang University, Seoul, Korea, in 1985 and 1987, respectively. In 1987, he joined the Digital Signal Processing Laboratory in Samsung Advanced Institute of Technology, Kiheung, Korea. He conducted research in the area of image processing and video coding. During the research period, he was a member of team which developed the desk-top image scanner, high speed alpha numeric character recognition system, and JPEG built-in board for IBM-PC. He also did a chairing position for the Version Management ad-hoc group of MPEG from November 1997 to March 1998. At present, he is a Principal Investigator in the MPEG-4 audio-visual coding team in SAIT. His current research interests are in the area of image processing, video coding, and computer vision. His work in this area resulted in more than 20 patents and patent applications.

Yang Seock Seo received the B.S. and M.S. degrees in electronic engineering from Seoul National University, Seoul, Korea, in 1973 and 1975, respectively. He also received the Ph.D degree from Pennsylvania State University in 1990. He is currently a Director of the Signal Processing Laboratory of Samsung Advanced Institute of Technology (SAIT), Kiheung, Korea. His current research interests include color processing, object-based image and audio signal compression related to international standardization (MPEG, JPEG), color vision, and image understanding.

Suggest Documents