number of blocks, the shape of a moving object can be accurately represented ... MB (SMB): MBs that contain little or no motion;. 2) Active MB ..... Car phone. 44.
Very Low Bit-Rate Video Coding using an Extended Arbitrary-Shaped Pattern Selection Algorithm Manoranjan Paul, Manzur Murshed, and Laurence Dooley Gippsland School of Computing and IT, Monash University, Churchill Vic 3842, Australia E-mail: {Manoranjan.Paul,Manzur.Murshed,Laurence.Dooley}@infotech.monash.edu.au
Abstract Very low bit-rate video coding using patterns to represent moving regions in macroblocks exhibits good potential for improved coding efficiency. Recently an Arbitrary Shaped Pattern Selection (ASPS) algorithm was presented, that used a dynamically extracted set of same sized patterns, which were based on actual video content. This algorithm however, like other pattern matching algorithms failed to capture a large number of active-region macroblocks (RMB) that cover partial moving objects in a video sequence. As the size of the moving object may vary, superior coding performance is achievable by using dynamically extracted patterns of a different size instead of a fixed size. This paper, proposes an Extended Arbitrary Shaped Pattern Selection (EASPS) algorithm that uses two different user-selected pattern sizes for very low bit rate coding. Experimental results show that EASPS exhibits significant improved performance compared with other pattern matching algorithms, including the low-bit rate video coding standard H.263.
MB (SMB): MBs that contain little or no motion; 2) Active MB (AMB): MBs which contain moving object(s) with little static background; and 3) ActiveRegion MB (RMB): MBs that contain both static background and part(s) of moving object(s). SMBs and AMBs are treated in exactly the same way as in H.263. For RMB coding, it was assumed that the moving parts of an object might be represented by one of the eight predefined patterns P1–P8 in Figure 1. An MB is classified as RMB if using some similarity measure, the part of a moving object of an MB is well covered by a particular pattern. The RMB can then be coded using the 64 pixels of that pattern with the remaining 192 pixels being skipped as static background.
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P12
P13
P14
P15
P16
P17
P18
P19
P20
P21
P22
P23
P24
P25
P26
P27
P28
P29
P30
P31
1. Introduction Reducing the transmission bit-rate while concomitantly retaining image quality continues to be a challenge for efficient very low bit-rate video compression standards, such as H.263 [3]. These standards are however unable to encode moving objects within a 16×16 pixel macroblock (MB) during motion estimation (ME), resulting in all 256 residual error values being transmitted for motion compensation (MC) regardless of whether there are moving objects. One solution is to sub-divide the MB and apply ME and MC to each sub-block. With a sufficient number of blocks, the shape of a moving object can be accurately represented, but this has a high processing expenditure [1]. The MPEG-4 [2] standard first introduced the concept of content-based coding, by dividing video frames into separate segments comprising a background and one or more moving objects. To address the limitations of [1], Wong et al. [9] exploited the idea of partitioning the MBs via a segmentation process that again avoided handling the exact shape of moving objects, so that popular MB-based motion estimation techniques could be applied. [9] classified each MB into three distinct categories: 1) Static
P32 Figure 1: The pattern codebook of 32 regular shaped, 64pixel patterns, defined in 16×16 blocks, where the white region represents 1 (motion) and black region represents 0 (no motion). Other pattern matching algorithms have been reported [4]-[6], while Paul et al. in [7] proposed a new MB classification definition that outperformed [9]. Figure 1 shows the complete 32-pattern codebook, where each 64pixel pattern is regular—bounded by straight lines, clustered—the pixels are connected, and boundaryadjoined. All these various pattern-matching algorithms approximate the actual shape of an object by assuming the moving regions of a RMB are similar to one or more of the patterns in the codebook (Figure 1). A key problem in using predefined patterns is finding the optimal number for coding. An arbitrarily shaped region-based MC algorithm may offer a better solution. Yokoyama et al. [10] applied
image segmentation techniques to acquire arbitrarily shaped regions, though its performance was not guaranteed because it used a large number of thresholds. Paul et al. [8] proposed an arbitrary shaped pattern selection (ASPS) algorithm, whose performance was better than the best predefined pattern-matching algorithm. Table I shows the MB types percentage generated by the ASPS algorithm. While SMBs contribute nothing to the bit rate, RMBs contribute nearly 25% and AMBs 75% of the overall bit rate, so any attempt to reduce the number of AMBs leads to better compression. In this paper, an Extended Arbitrary Shaped Pattern Selection (EAPS) algorithm is proposed which divides the AMB into two types of MB; a large RMB (LRMB), and a traditional AMB. The previously defined RMB [7] is defined as a small RMB (SRMB). Experimental results confirm that the overall performance of the new EASPS algorithm with four types of MB is not only superior to H.263 but also other contemporary pattern matching algorithms for very low bit rate video coding. Table I: Percentage of different MB types generated by the ASPS algorithm Video sequences Miss America Suzie Mother&Daughter Carphone Foreman Salesman Claire
SMB(%) RMB(%) AMB(%) 63 29 43 22 14 57 77
21 23 28 27 27 31 15
16 48 29 51 59 12 8
region Mk(x,y) of the kth MB in the current frame is obtained as follows: M k ( x, y ) = T (| C k ( x, y ) • B − Rk ( x, y ) • B |)
(1)
where B is a 3×3 unit matrix for the morphological closing operation • [9], which is applied to reduce noise, and the thresholding function T(v) = 1 if v > 2 and 0 otherwise. If 8 ≤ ∑ M k < δ S where δS is the number of ‘1’s (moving pixels) in Mk [9], then the kth MB is defined as a candidate small RMB (CSRMB), while if δ S ≤ ∑ M k < δ L ,where δL is also the number of ‘1’s (moving pixels) in Mk, but δL > δs, then it is defined as a candidate large RMB (CLRMB). Algorithm ASPG(λ, µ, χ) Parameters: λ is the number of patterns; µ = number of moving pixels in each pattern; χ = set of all CRMBs to be considered with patterns of µ moving pixels. Return: Pattern codebook of P1, P2, …, Pλ each having µ moving pixels. Step 1: Classify χ into λ classes C1, C2, …, Cλ using any clustering method, such as Fuzzy C-means (FCM), by the gravitational centres of χ. Step 2: For i = 1, 2, …, λ Step 2.1: Calculate a temporary array Ti of 256×3 integers as follows: Ci
Ti (x × 16 + y,0) = ∑ C i , j ( x, y ); j =1
This paper is organized as follows. The video coding strategy using the EASPS algorithm is described in Section 2, while some simulation results are analysed in Section 3 and conclusions presented in Section 4.
2. Low Bit-Rate Video Coding Using EASPS Prior to video coding, two pattern codebooks (PC) have to be constructed. The EASPS algorithm performs this in two phases. In first phase, the PCs are formulated on the basis of the actual video content, while in the second phase; the coding is undertaken using these contentdependent PCs.
Ti (x × 16 + y,1) = x; Ti (x × 16 + y,2 ) = y;
where Ci , j is the jth CRMB in class Ci and 0 ≤ x, y ≤ 15 .
Step 2.2: Calculate the rank {l 0 , K , l 255 } on Ti such that Ti (l j ,0) ≥ Ti (l j +1 ,0) for 0 ≤ j < 255 . Step 2.2: Set Pi(x,y) = 0 for 0 ≤ x, y ≤ 15 . Step 2.3: For j = 0, 1, …, µ –1, Pi (Ti (l j ,1), Ti (l j ,2)) = 1 . Figure 2: The ASPG algorithm.
2.1 Two Sets of PCs Generation Let Ck(x,y) and Rk(x,y) denote the k MB of the current and reference frames, each of size W pixels × H lines , respectively of a video sequence, where 0 ≤ x, y ≤ 15 and 0 ≤ k < W 16 × H 16 . The moving th
The Arbitrary Shaped Pattern Generation (ASPG) algorithm detailed in Figure 2 then generates a pattern set PC1 ( p1 K , p λ1 ) of user-defined λ1 size for SRMBs using
(
)
CSRMBs and PC2 P1 K , Pλ2 of user-defined λ2 size for LRMBs using CLRMBs. Any clustering method, such as the FCM can be used in the ASPG algorithm. The
clustering method classifies all CSRMBs (or CLRMBs) into λ1 ( or λ2) classes using the gravitational centre (GC), G(A), which for a 16×16 binary matrix A, is given by:-
MB Type SMB
By using FCM, those CSRMBs (or CLRMBs) with less inter GC distance are placed in the same class. The ASPSG algorithm then adds all the corresponding ‘1’s of those CSRMBs (or CLRMBs) in the same class to provide the most populated moving region. To create the µ-most populated moving regions as a pattern, only the first µpixel positions are assigned ‘1’ with the rest assigned ‘0’.
2.2 Coding After obtaining respectively the p1, … pλ1 and P1, … Pλ2 patterns from a sequence using ASPG, each MB is classified as SMB, SRMB, LRMB, or AMB using the MB classification rules detailed in Table II. For the bestmatched pattern, the following similarity measure is used:-
LRMB
δ S ≤ ∑ M k < δ L or
AMB
classified due to the threshold for SRMB and find a pattern Pi by (4). Otherwise AMB.
CSRMB
not
To process the SRMB and LRMB, a motion vector is calculated from only the µ moving pixels of the best-match pattern. To avoid more than 8×8 blocks of DCT calculations for µ residual error values per SRMB and LRMB, these µ values are rearranged into an 8×8 block. It avoids unnecessary DCT block transmission, for example, for µ = 64 only one and µ = 128, only two DCT blocks need to be transmitted. A similar inverse procedure is performed during the decoding.
3. Simulation Results
40% 20% 0%
Claire
Note, each CSRMB (or CLRMB) is compared against all corresponding patterns of corresponding PC to maximize the image quality, as sometimes the extracted pattern from a cluster is unable to capture all the moving regions of the CSRMBs (or CLRMBs) in that particular cluster. There is always a possibility of misclassifying a CSRMB (or CLRMB) as an AMB, if only the extracted pattern from a cluster is used to match against the CSRMBs (or CLRMBs) of the same cluster. Since every SMB and the static region of a SRMB and LRMB are considered as having zero motion, they are omitted from the coding and transmission since they can be obtained from the reference frame. For each AMB, as well
60%
Salesman
where TS is a similarity threshold. For the SRMB and LRMB two different pattern sets are used when calculate the Dk,n. To select the two best patterns for corresponding SRMB and LRMB, the corresponding TS value is also different.
SMB Small RMB Large RMB AMB
80%
Foreman
(4)
The EASPS algorithm along with a number of other low-bit rate coding algorithms has been tested on a large number of standard and non-standard video sequences of QCIF digital video formats.
Carphone
∀Pi ∈PC
8 ≤ ∑ M k < δ S and find pattern pi by (4).
(3)
where 1 ≤ n ≤ λ . From the similarity measure, Dk,n, EASPS selects only one from each PC for which the following condition is satisfied Pi = arg min ( Dk ,i Dk ,i < TS ) .
SRMB
Mother&Daughter
1 15 15 ∑ ∑ | M k ( x, y ) − Pn ( x, y ) | 256 x =0 y =0
Conditions ∑Mk < 8.
Suzie
Dk , n =
Table II: Rules for classifying MB types
(2)
Miss America
15 15 1 15 15 ∑ ∑ xA( x, y ), ∑ A( x, y ) x=0 y =0 ∑ x =0 y =0 . G ( A) = 15 15 1 ∑ ∑ yA( x, y ) 15 15 ∑ ∑ A( x, y ) x = 0 y =0 x =0 y =0
as the moving region of each SRMB and LRMB, motion vectors and residual errors are calculated using conventional block-based processing, with the obvious difference in having the shape of the blocks for the moving regions of SRMBs and LRMBs as that of the best-match pattern, rather than being square.
Figure 3: Percentage of MB types generated by EASPS algorithms for different standard video sequences. In this paper, experimental results are presented using the first 100 frames of seven standard gray-scale video test sequences. Full-search, half-pel accuracy motion estimation and the H.263 recommended default variable length coding were employed to obtain the encoding results for the new EASPS approach, as well as the ASPS, Fixed-8, and H.263. The EASPS algorithm used λ1 = 8,
λ2= 4, δS = 128, δL = 192, Ts=0.25 for SRMB, Ts=0.40 for LRMB, µ = 64 for SRMB and µ =128 for LRMB, while for ASPS the parameter selection was λ = 8, δS = 128 and
Fixed-8 algorithms and the low-bit rate video coding standard H.263.
Ts=0.25.
Miss America
Table III: PSNR values for standard sequences using the H.263, Fixed-8, ASPS, and EASPS algorithms. Video sequences
PSNR (dB) Bit Rate (Kbps) H.263 Fixed8 ASPS EASPS
Miss America Suzie
18 39
34.41 34.60 35.60 35.82 29.01 29.09 30.10 30.79
Mother&Daughter
29
28.16 28.36 29.31 29.59
Car phone Foreman Salesman Claire
44 52 22 12
28.81 26.41 28.50 32.30
29.00 26.29 28.74 32.46
29.41 26.62 29.93 33.67
29.58 26.94 30.04 34.04
EASPS PSNR (dB)
Figure 3 shows that a significant number of AMBs are now classified as LRMBs and the total number of LRMBs varied from 2% to 25%, so justifying the motivation behind the EASPS algorithm.
36.5 36
ASPS
35.5 35 16
17
18 Bit Rate (Kbps)
19
20
Figure 4: The comparison of coding performance between EASPS and ASPS for Miss America sequence.
References [1] Fukuhara, T., K. Asai, and T. Murakami, “VLBR video coding with block partitioning and adaptive selection of two timedifferential frame memories,” IEEE Trans. Circuits Syst. Video Tech., 7, 212–220, 1997. [2] ISO/IEC N4030, MPEG-4 International Standard, 2001.
Table III confirms that EASPS provides superior results compared to the H.263 standard, the Fixed-8 and ASPS algorithms for low bit rate video coding. For example, the EASPS algorithm performs better than ASPS [8] by improving the peak signal-to-noise ratio (PSNR) by 0.7dB for the smooth motion Suzie sequence. ASPS has proven to be a better algorithm than all pattern-matching algorithms which use a predefined pattern set. This confirms the premise that an extended arbitrary shaped pattern-matching algorithm offers superior performance. The coding performance comparison curve in Figure 4 shows that EASPS algorithm is better than the ASPS algorithm for very low bit rate video coding though a diminishing trend is observed as the bit rate increases. At higher bit rates, the performance of the EASPS algorithm may actually become worse than other techniques for some video sequences, because of the additional overhead due to pattern identification.
[3] ITU-T Recommendation H.263, “Video coding for low bitrate communication,” Version 2, 1998.
4. Conclusions
[8] Paul, M., M. Murshed, and L. Dooley, “An Arbitrary Shaped Pattern Selection Algorithm for Very Low Bit-Rate Video Coding Focusing on Moving Regions,” Proc. of 4th IEEE Pacific-Rim Int. Con. on Multimedia (PCM-03), 2003, Singapore.
In this paper, a novel Extended Arbitrary Shaped Pattern Selection (EASPS) algorithm has been developed using two extracted pattern sets derived from actual video content to approximate the shape of moving objects in a macroblock. By exploiting the arbitrariness of video objects, it has proven to be a better pattern-matching algorithm compared to contemporary pre-defined and arbitrary shaped pattern-based algorithms. Experimental results proved that the EASPS algorithm provided superior results for all video sequences, compared with the ASPS,
[4] Paul, M., M. Murshed, and L. Dooley, “A Low Bit-Rate Video-Coding Algorithm Based Upon Variable Pattern Selection,” Proc. of 6th Int. Conf. on Signal Processing (ICSP-02), Beijing, Vol-2, 933–936, 2002. [5] Paul, M., M. Murshed, and L. Dooley, “A new real-time pattern selection algorithm for very low bit-rate video coding focusing on moving regions,” Proc. of IEEE Int. Con. of Acoustics, Speech, and Signal Proc. (ICASSP-03), Hong Kong, Vol-3, III_397-III_400, 2003. [6] Paul, M., M. Murshed, and L. Dooley, “A Real Time Generic Variable Pattern Selection Algorithm for VLBR Video Coding,” IEEE Int. Con. on Image Proc. (ICIP-03), Spain, 2003. [7] Paul, M., M. Murshed, and L. Dooley, “Impact of Macroblock Classification on LBR Video Coding Focusing on Moving Region,” Proc. of Int. Conf. of Com. and IT (ICCIT-02), Dhaka, Bangladesh, 465–470, 2002.
[9] Wong, K.-W., K.-M. Lam, and W.-C. Siu, “An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions,” IEEE trans. circuits and systems for video technology, 11(10), 1128–1134, 2001.
[10] Yokoyama, Y., Y. Miyamoto, and M. Ohta, “Very Low BitRate Video Coding Using Arbitrarily Shaped Region-Based Motion Compensation,” IEEE trans. circuits and systems for video technology, 5(6), 500–507, 1995.