A Block-Based Video-Coding Algorithm Focusing on ...

3 downloads 28427 Views 311KB Size Report
we presented the Variable Pattern Selection (VPS) algorithm, which selects a preset number of best-matched patterns from a pattern codebook of regular ...
A Variable Pattern Selection Algorithm with Improved Pattern Selection Technique for Low BitRate Video-Coding Focusing on Moving Objects Manoranjan Paul, Manzur Murshed, and Laurence Dooley Gippsland School of Computing and Info. Tech., Monash University, Churchill Vic 3842, Australia E-mail: {Manoranjan.Paul,Manzur.Murshed,Laurence.Dooley}@infotech.monash.edu.au Abstract: Research into pattern representation of moving regions in blocked-based motion estimation and compensation in video sequences has focused mainly upon using a fixed number of regular shaped patterns. Recently we presented the Variable Pattern Selection (VPS) algorithm, which selects a preset number of best-matched patterns from a pattern codebook of regular shaped patterns. The pattern elimination technique in the selection process of the VPS algorithm however, was to be computationally expensive, especially when the preset number is low. In this paper, the concept is extended to develop the Extended Variable Pattern Selection (EVPS) algorithm where the pattern elimination technique is replaced with a fast solution. The complexity analysis confirms that this algorithm can be as much as 8.5 times faster than the VPS algorithm. In order to take advantage of this computational speed-up in eliminating patterns, the pattern codebook size of the EVPS algorithm has also been increased to 32. Keywords—Motion estimation and compensation, low bit-rate video coding, moving region detection, pattern matching. 1. Introduction Researchers into very low bit-rate digital video coding are often faced with the daunting challenge of meeting two diametrically conflicting requirements—reducing the transmission bit-rate while concomitantly retaining image quality. Computational complexity and system bandwidth limitations of a communication media mean these two factors tend to be inversely proportional. H.26X [6][7][8], MPEG-X [3][4][5] are some of the well-known contemporary standards for video compression. H.26X for example, are widely used in video-telephony and video-conferencing applications, where very low bit-rates to accommodate public switched telephone networks (PSTN) is required Many of the video coding standards tends to employ block-based techniques because of their implementation simplicity and also because they generally provide good results when the bandwidth requirement is relaxed e.g., in MPEG-1/2. This is however, not the case with low bit-rate block-based video coding such as in H.263. The shape of a moving object is generally arbitrary and may not necessarily be aligned with the hypothetical grid structure created by the fixed-sized, non-overlapping rectangular blocks, termed macroblock (MB) in the coding standards. The typical size of a MB being 16×16 pixels, which leads to a large number of blocks, some of which will contain only static background, some will have moving objects and some a combination of the two.

Increase in RMBs P1

P2

P3

P4

P5

P6

P7

P8

4 to 8 8 to 16 16 to 24 24 to 32

20% 15%

P9

P10

P11

P12

P13

P14

P15

P16

10% 5%

P17

P18

P19

P20

P21

P22

P23

P24

0% Miss America

P25

P26

P27

P28

P29

P30

P31

Carphone

Foreman

Tennis

Salesman

Football

P32

Figure 1: 32 regular shaped 64-pixel patterns, defined in 16×16 blocks, where the shaded region represents 1’s and the white region represents 0’s.

Figure 2: Percentage increase in RMBs when the number of patterns is increased from 4 to 8, 8 to 16, 16 to 24 and 24 to 32.

1

In [12], macroblocks were classified according to the following three mutually exclusive classes: i) Static MB (SMB)—Blocks that contain little or no motion; ii) Active MB (AMB)—Blocks that contain moving object(s) with little static background; iii) Active-Region MB (RMB)—Blocks that contain both static background and some part(s) of moving object(s). By treating AMB and RMB alike, as is done in H.263/H.263+, leads to coding inefficiencies [1]. In order to improve this efficiency, block size may be reduced only to add additional information to be transmitted due to the increase in the number of blocks [11]. Both [1] and [12] successfully addressed the above issue by segmenting each RMB into two regions using a fixed number of predefined regular patterns with 128-pixels and 64-pixel respectively. Once the segmentation process was complete, motion estimation/compensation was then only performed on moving regions. Each SMB was skipped for transmission (since they did not contain motion and could be copied from the reference frame) and each AMB was treated exactly as defined in H.263 standard, using motion estimation and compensation techniques. The non-coding of SMB patterns and limiting the number of RMBs to a prescribed set of patterns lead to an improved coding efficiency. Using eight instead of four patterns improved the peak signal to noise ratio (PSNR) by up to 0.6 dB [12]. It also meant that better classification of the RMB blocks was achieved, thus contributing to a higher compression ratio, even after compensating for the larger size codebook required for Pattern coding. In [10] we extended this concept by proposing a new Variable Pattern Selection (VPS) algorithm, using 24 regular shaped patterns. The VPS algorithm selects a preset number of the best-matched patterns, from a 24-pattern codebook, by eliminating the least frequent pattern per iteration. This strategy is computationally expensive as 24–λ iterations are required to select λ best-matched patterns. In this paper, an Extended VPS (EVPS) algorithm is proposed with improved pattern elimination strategy. In the EVPS algorithm, the best-match pattern selection process is still based upon the matching frequency of each pattern. However, unlike the VPS algorithm, the EVPS algorithm allows to eliminate more than one pattern per iteration under the assumption that λ best-matched patterns lie in the first ρ (≥ λ) most frequent patterns such that their cumulative frequency is greater than or equal to 75%. Empirically it has been observed that only 16 λ  + 1 iterations are required to select λ best-matched patterns, where λ ∈ {4, 8, 16, 24}. This improved elimination strategy translates into lowered computational time. It has been proved in Section 3.2 that the EVPS algorithm can be as much as 8.5 times faster than the VPS algorithm.. The pattern frequency values are used to code pattern identifier numbers using a variable length coding technique e.g., Huffman or arithmetic coding. This leads to an improvement in coding efficiency. Similar to the VPS algorithm, the EVPS algorithm naturally exhibits both better compression and PSNR performance compared with using a fixed patterns as presented in [12]. The rest of the paper is organized as follows. In Section 2, low bit-rate video-coding algorithms focusing on moving region using fixed patterns is explained. The EVPS algorithm that uses λ best-matched patterns from a pattern codebook of 32 patterns is developed in Section 3. Section 4 concludes the paper.

2. Low Bit-Rate Video Coding Using Fixed Patterns 2.1. Moving Region Detection The basis of this technique is to let the first eight patterns P1–P8 in Figure 1 approximate the moving region. Let C k (x, y ) and Rk (x, y ) , 0 ≤ x, y ≤ 15 , denote the kth block of the current and the reference frames respectively. The moving region M k (x, y ) in the kth block of the current frame is obtained as follows: M k ( x, y ) = T (| C k ( x, y ) • B − Rk ( x, y ) • B |)

(1)

where B, a square pattern of size 3×3, is the structuring element of morphological closing operations [2][9], |v| returns the absolute value of v, T(v) returns 1 if v > 2 or 0 otherwise, and 0 ≤ x, y ≤ 15 . Each block is then classified into SMB, AMB, and RMS according to the following rules. For the k-th block, if M k (x, y ) has less than eight 1’s then the block is classified as an SMB; else the block is divided into four sub-blocks and if none of these sub-blocks contain all 0’s, the block is classified as AMB. Otherwise, the block is considered as a candidate RMB. Each of these candidate RMBs is then matched against all eight prescribed patterns and the best-match pattern is obtained by minimizing the following expression: - Dk ,n =

1 256

15

15

∑∑ | M

k ( x,

y ) − Pn ( x, y ) |

(2)

x =0 y =0

where 1 ≤ n ≤ 32 . A candidate RMB is classified as an RMB if min( Dk ,n ) < 0.25 ; otherwise it is an AMB.

2

2.2. Motion Estimation and Compensation Since both each SMB and the static regions of RMBs are considered as having no motion, they can be skipped from coding and transmission as they can be obtained from the reference frame. For each AMB, as well as the moving region of each RMB, motion vector and residual errors are calculated using conventional block-based methods, with the obvious difference in having the shape of the blocks for the moving regions of RMBs as that of the best-match pattern, rather than being square. 2.3. Encoding and Decoding Having dealt with the coding/decoding of SMB and AMB, the issue of how to process the RMB blocks is now discussed. A motion vector is calculated from only the 64 moving pixels of the best-match pattern. To avoid multiple 8×8 blocks of DCT calculations for only 64 residual error values per RMB, these 64 values are rearranged into an 8×8 block. Similarly, an inverse rearrangement is performed decoding.

3. Low Bit-Rate Video Coding Using Variable Patterns As stated in the introduction, the motivation of using more than eight patterns originates from the observation in [12] that using eight patterns instead of four patterns not only improves PSNR, but also classifies more blocks as RMBs, which contributes towards higher coding compression even after compensating for the larger codebook size (one extra bit per RMB) requirement. However, Figure 2 clearly shows for a selection of popular video sequences, the increase in RMB becomes insignificant once the number of patterns is greater than 24. Another interesting observation is made in Figure 3, which shows that the maximal-frequency pattern sequence is not fixed for all types of video data. For example, the most frequent eight patterns for “Miss America” video sequence is 5, 6, 7, 8, 4, 3, 16, and 1 (in order); whereas the same for “Foreman” video sequence is 6, 1, 14, 8, 7, 12, 5, and 11 (in order). It confirms the general judgment, that the first eight patterns in Figure 1, which were used in [12], do not constitute the optimal set. 3.1. The EVPS Algorithm The Extended Variable Pattern Selection (EVPS) algorithm attempts to select an optimal set of λ best-matched patterns from a 32-pattern codebook presented in Figure 1, where λ ∈ {4, 8, 16, 24}. However, selecting such an optimal set is not straightforward, as for example, in finding the optimal set of eight patterns, it is not sufficient to simply select the patterns with the highest frequencies as given in Figure 3. All the RMBs that were initially matched against a pattern, outside the selected patterns, should be considered as candidate RMBs to be matched against the selected patterns. Some of these candidate RMBs may not be classified as RMBs and the frequency of the patterns may also be changed. In some cases, this change may lead to a different ordering in the optimal pattern set. The VPS algorithm [10] selects the optimal set by eliminating the least frequent pattern per iteration. The EVPS algorithm improves this computationally expensive elimination process by allowing more than one pattern to be removed per iteration. This elimination strategy is based on the empirical observation that λ best-matched patterns lie in the first ρ (≥ λ) most frequent patterns such that their cumulative frequency is greater than or equal to 75%. The full algorithm is formalized in Figure 5. It is interesting to note that as more than one candidate patterns are eliminated in the EVPS algorithm, there always exists a possibility (even if the possibility is negligible) that the λ best-matched patterns selected by the EVPS algorithm will not be identical to the selection made by the VPS algorithm. Miss America Carphone Foreman Tennis Salesman Football

20% 15% 10%

20% Frequency

5%

15% 10% 5%

Pattern No

Figure 3: Frequency of optimal eight patterns for selective video sequences.

31

28

25

22

19

16

13

10

7

1

31

28

25

22

19

16

13

10

7

4

0% 1

0%

Miss America Carpohone Foreman Tennis Salesman Football

4

Frequency

25%

Pattern No

Figure 4: Frequency of optimal sixteen patterns for selective video sequences.

3

Algorithm EVPS(λ)

Algorithm VPS(λ)

Parameter: λ = Size of the optimal pattern set.

Parameter: λ = Size of the optimal pattern set.

Return: Ω = The optimal pattern set. Step 1: Ω = {P1, P2, …, P32}; Step 2: Calculate the frequency in Ω;

Return: Ω = The optimal pattern set. of

each

pattern

Step 3: if |Ω| = λ then exit; Step 4: Sort Ω to ( Pi1 , Pi2 ,K, Pi ) Ω

Freq( Pi j ) ≥ Freq( Pi j +1 ) ,

such that frequency for

all

j < |Ω|; Step 5: Find the minimum value of ρ ≥ λ such that ρ

∑ Freq(P ) ≥ 0.75 ; j =1

ij

Step 6: Ω = Ω − {∀j >

Step 1: Ω = {P1, P2, …, P24}; Step 2: Calculate the frequency of each pattern in Ω; Step 3: if |Ω| = λ then exit; Step 4: Find the pattern Pi ∈ Ω such that its frequency is the minimum; Step 5: Ω = Ω – {Pi}; Step 6: Go to Step 2;

ρ : Pi } ; j

Step 7: Go to Step 2; Figure 5: The EVPS algorithm.

Figure 6: The VPS algorithm.

The approach of selecting suitable patterns based on their relative frequencies has an additional benefit in coding. The frequency information can now be used to code pattern identifier numbers using a variable length coding, such as Huffman or arithmetic coding. Experimental result shows that the optimal eight (sixteen) patterns for six standard test video sequences can be Huffman coded using on average only 2.69 (3.56) bits, instead of using fixed 3 (4) bits. 3.2. Comparative Time Complexity Analysis of the EVPS algorithm For the VPS algorithm is described in Figure 6. Although the VPS and EVPS algorithms use pattern codebooks of different sizes, for comparison, we assume they both use the same sized codebook of α patterns. Now, the VPS algorithm requires α − λ iterations to select λ best-matched patterns, whereas it has been empirically observed on a large number of standard and non-standard test video sequences that the EVPS algorithm requires only α 2λ  + 1 iterations to select λ best-matched patterns, where λ ∈ {4, 8, 16, 24} and α = 32. Consider that both the VPS and EVPS algorithms are applied on a video sequence with β RMBs in total. Let the pattern codebook at iteration i be denoted by Ωi. Now consider iteration i. Step 2 of both the algorithms requires β|Ωi| evaluations of Dk,n values while keeping track of the minimum. Therefore, this step can be done in O(β|Ωi|) time. Step 4 of the VPS algorithm, needs to track down the pattern with minimum frequency from the current pattern set Ωi and this can be done in O(|Ωi|) time. However, Step 4 of the EVPS algorithm requires sorting than can be done in O( Ω i log Ω i ) time. Step 5 of the EVPS algorithm requires scanning the sorted pattern set, which can be done in O(|Ωi|) time. Considering the α − λ iterations, the VPS algorithm can be completed in the following time: α −λ +1

∑ O( β Ω i =1

i

α −λ  β (α + λ )(α − λ + 1)  ) + ∑ O ( Ω i ) = O  2   i =1

where β >> α .

Now, consider that there will be at most ψ = α 2λ  + 1 iterations by the EVPS algorithm. Obviously, |Ω1| = α and |Ωψ| = λ. Although it has been empirically observed that the value of |Ωi| decreases logarithmically, for the sake of α −λ simplicity, the decrease is assumed to be linear. Let |Ωi| be λ + (ψ − i) , for all i = 1, 2, …, ψ. Without loosing any ψ −1 generality, this assumption will overestimate the complexity of the EVPS algorithm. Using such high estimate in comparing the EVPS algorithm against the VPS algorithm also guarantees that the strength of the EVPS algorithm is never unfairly overestimated. Considering the ψ iterations, the EVPS algorithm can then be completed in the following time: 4

ψ −1  ψβ (α + λ )  Ω + ( ) β O O( Ω i log Ω i ) = O  ∑ ∑ i 2   i =1 i =1 ψ

It can therefore, be concluded that the EVPS algorithm is at least

where β >> α .

α − λ +1 α − λ +1 times faster than the VPS = ψ α 2λ  + 1

algorithm. For α = 32 and λ = 4, 8, 16, and 24, the EVPS algorithm is respectively at least 5.8, 8.33, 8.5, and 4.5 times faster than the VPS algorithm.

4. Conclusions Pattern representation of moving regions in blocked-based video motion estimation and compensation has received considerable attention recently. In general, the pattern representation problem is equivalent to selecting λ best-matched patterns from a pattern codebook of α patterns. [1] used α = λ = 4 and [12] used α = λ = 8, i.e., both of these two works used only a fixed number of regular shaped patterns for all video sequences. In [10] we introduced the Variable Pattern Selection (VPS) algorithm with α = 24 and λ = 16. The pattern elimination technique in the selection process of the VPS algorithm, however, has been found to be computationally expensive, especially when λ