Hybrid motion estimation scheme for secondary ... - Semantic Scholar

1 downloads 832 Views 782KB Size Report
Sep 10, 2011 - Ki-Kit Lai, Yui-Lam Chan n, Chang-Hong Fu, Wan-Chi Siu. Centre for Signal ...... Coding (HEVC) is currently under development for coding video from QVGA (320 В .... Available: /http://iphone.hhi.de/suehring/tml/S. [13] J. Yeh ...
Signal Processing: Image Communication 27 (2012) 1–15

Contents lists available at SciVerse ScienceDirect

Signal Processing: Image Communication journal homepage: www.elsevier.com/locate/image

Hybrid motion estimation scheme for secondary SP-frame coding using inter-frame correlation and FMO Ki-Kit Lai, Yui-Lam Chan n, Chang-Hong Fu, Wan-Chi Siu Centre for Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

a r t i c l e i n f o

abstract

Article history: Received 4 January 2011 Accepted 28 August 2011 Available online 10 September 2011

To cope with considerable size of secondary SP-frames, quantized-transform domain motion estimation has recently been proved to be appropriate for the coding of secondary SP-frames in H.264/AVC. Nevertheless, its computational complexity is tremendous and there are still some situations that pixel-domain motion estimation can perform better. Both techniques are therefore not implemented solely in secondary SP-frame coding. In this paper, a hybrid scheme is proposed to effectively combine two existing motion estimation techniques. The combination is based on a new measurement of inter-frame correlation using the bit-counts of the macroblocks in SP-frames, so that the hybrid scheme is dominated by employing quantized-transform domain motion estimation in the macroblocks with weaker inter-frame correlation; otherwise, it approaches to pixel-domain motion estimation. With the further help of the explicit mode in Flexible Macroblock Ordering (FMO), the proposed hybrid scheme classifies MBs into two slice groups by examining the domain used in motion estimation prior to coding motion vectors in a secondary SP-frame. The slice structure of a secondary SP-frame using the explicit FMO mode is flexible and can be changed during the encoding of each new frame. Simulation results show that our proposed scheme overwhelmingly outperforms the quantized-transform domain motion estimation scheme. As a consequence, the size of secondary SP-frames can be reduced remarkably with significant computational reduction. & 2011 Elsevier B.V. All rights reserved.

Keywords: SP-frame Bitstream switching FMO QDCT domain Motion estimation

1. Introduction SP-frame [1,2] is a new picture type of the H.264/AVC Extended Profile that can be perfectly reconstructed using different reference frames. This feature enables seamless video streaming in heterogeneous networks where bit rate adaption is required. In this scenario, multiple bitstreams encoded at different bit rates are stored in a server to deal with network bandwidth variation. Switching among multiple bitstreams can be accomplished by inserting SP-frames in the bitstreams. It is well known

n

Corresponding author. Tel.: þ852 27666213; fax: þ 852 23628439. E-mail address: [email protected] (Y.-L. Chan).

0923-5965/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2011.08.003

that I-frames can also be used for this purpose because no temporal prediction is included. However, periodic I-frame insertion will sacrifice the coding efficiency in order to support rapid switching among multiple bitstreams. SP-frames then offer an attractive alternative for seamless bitstream switching. The SP-frame scheme in H.264/AVC is composed of primary and secondary SP-frames. They both exploit temporal redundancy with predictive coding, but use different reference frames. Although different reference frames are used, it still allows an identical reconstruction. This property allows the use of SP-frames to replace I-frames for drift-free switching among multiple compressed bitstreams. Besides, the coding efficiency of a primary SP-frame is much better than that of an I-frame and is slightly worse than that of a P-frame [2–4]. Nevertheless, extra storage for

2

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

secondary SP-frames is inevitably required. In [3–5], some investigations were conducted to evaluate the trade-off between the coding efficiency of primary SP-frames and the storage cost of secondary SP-frames for multiple bit rate video applications. It is found that a primary SP-frame with high quality results in a significantly high storage requirement for a secondary SP-frame. Tan et al. [5] revealed that the bulky size of a secondary SP-frame can be reduced by avoiding the reuse of coding modes and motion vectors of a primary SP-frame during secondary SP-frame coding. Thus a method was proposed to improve the coding efficiency by correctly choosing reference pictures. When switching is performed from bitstream 1 (B1) to bitstream 2 (B2) as shown in Fig. 1, the motion vectors and modes are calculated between the reconstructed reference frame in B1 (P1,t-1) and the reconstructed target frame in B2 being switched to (SP2,t). But, this coding arrangement can only reduce the size of a secondary SP-frame in multiple bit rate switching scenarios by about 2%. Meanwhile, Lai et al. [6–8] disclosed the problem of using the traditional pixel-domain motion estimation and compensation when secondary SP-frames are employed in bitstream switching. An analysis in [6–8] shows that the coding mechanism of secondary SP-frames does not pave the way for entropy coding. Macroblocks in secondary SP-frames involve the transformation and quantization processes prior to calculating the residue. Many non-zero coefficients need to be encoded without further quantization as the conventional encoder. Quantized-transform (QDCT) domain motion estimation and compensation processes adopting in secondary SP-frame coding were therefore proposed and can compensate the demerit of their pixel-domain counterpart. Nonetheless, computational complexity is tremendous and there are still some exceptional cases that the coding performance of the QDCT-domain technique is not as good as that of the pixel-domain technique. A hybrid motion estimation scheme for secondary SP-frame coding based on interframe correlation is proposed in this paper. With the help of flexible macroblock order in H.264/AVC, the hybrid scheme effectively combines two existing techniques, operated in different domains, to further reduce the size of secondary SP-frames.

B1

Time

2. Review of SP-frame coding An example of bitstream switching using SP-frames is shown in Fig. 1. In this figure, an image sequence is encoded into two bitstreams (B1 and B2) with different bit rates using two quantization parameters, Qp1 and Qp2, respectively. Two primary SP-frames—SP1,t and SP2,t—are placed at frame t (switching point) within each bitstream. In addition, a secondary SP-frame (SP12,t) is produced, which has the same reconstructed values as SP2,t even though different reference frames are used. When switching is needed from B1 to B2 at frame t, a secondary SP-frame (SP12,t ) instead of SP2,t is transmitted. The decoder can obtain seamlessly as normally SP2,t decoded at frame t. Consequently, it can continually decode frame tþ1 from B2 seamlessly. The block diagram of a primary SP-frame encoder is depicted in the upper part of Fig. 2. In comparison with a P-frame encoder, the only difference of a primary SP-frame encoder is the extra quantization/dequantization steps with a quantization level Qs applied to the transform coefficients of the primary SP-frame (SP2,t in Fig. 1) [1,2]. Interested readers are encouraged to read [1,2]. These extra steps ensure that the quantized-transform coefficients of SP2,t are divisible by Qs, which is used in the encoding process of the secondary SP-frame, SP12,t, as shown in the middle part of Fig. 2. For encoding SP12,t, the prediction of the reference frame is firstly transformed and quantized using Qs before generating the residue.

mv1 mode1

P1,t-2

SP 1,t

P1,t-1

mv12 mode12

B2

The rest of the paper is organized as follows. Section 2 gives a brief description of the conventional SP-frame coding process and the QDCT-domain motion estimation algorithm. An in-depth study of the problem on applying both motion estimation techniques into a secondary SP-frame encoder is provided in Section 3. Analysis of relationship between primary and secondary SP-frames is also covered in this section. After the detailed investigation, a novel hybrid scheme with the support of Flexible Macroblock Ordering (FMO) is then proposed. In Section 4, experimental results are shown, which are focused on the comparison between our proposed scheme and the QDCT-domain scheme employed in the secondary SP-frame encoder. Finally, the conclusions of the paper are presented in Section 5.

P2,t-2

P 2,t-1

P1,t+1

P1,t+2

SP 2,t

P2,t+1

P2,t+2

t

t+1

t+2

S P12,t

mv2 mode2

t-2

t-1

Fig. 1. Switching bitstream from B1 to B2 using SP-frames.

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

3

mv 1 , mode 1 B1

T

VLC -

+

QP 1-1

QP 1

+

Bitstream B1

T

Primary SP-frame MC

Pˆ1 , t − 1 Frame Buffer

ME

Loop Filter

T -1

Qs -1

Qs

mv 12 , mode 12 Qs

-

Bitstream SP12 VLC

+

Secondary SP-frame mv 2 , mode 2 B2

T

VLC -

+

Bitstream B2

QP 2-1

QP 2

+

T Qs [T[SP 2,t ]]

Primary SP-frame MC

ME

Frame Buffer

Loop Filter

T -1

Qs -1

Qs

Fig. 2. Simplified encoding block diagram of primary and secondary SP-frames [1,2].

Both the prediction and the quantized-transform coefficients of SP2,t in the lower part of Fig. 2 are thus synchronized to Qs. From this point, there is no further quantization. It means that the decoder can perfectly reconstruct the quantized-transform coefficients of SP2,t. Producing secondary SP-frames also involves motion estimation independently in all modes and submodes of H.264/AVC [9–11] by minimizing the Lagrangian cost function Jmotion: Jmotion ðmv12 , lmotion Þ ¼ SADðBc ,Br Þ þ lmotion Rmotion ðmv12 pmv12 Þ ð1Þ where mv12 is the motion vector, pmv12 is the motion vector predictor, lmotion is the Lagrangian multiplier for motion estimation, Rmotion(mv12–pmv12) is the estimated number of bits for coding mv12, and SAD is the sum of absolute differences between the current block Bc and its reference block Br [10,11]. After the motion estimation for each mode, a rate-distortion (RD) optimization technique is used to obtain the best mode and its general equation is given by Jmode ðBc ,Brec , mode12 , lmode Þ ¼ SSDðBc ,Brec , mode12 Þ þ lmode Rmode ðBc ,Brec , mode12 Þ ð2Þ where lmode is the Lagrangian multiplier for mode decision, mode12 is one of the candidate modes during motion estimation, SSD is the sum of squared differences between

Bc and its reconstruction block Brec, and Rmode(Bc,Brec, mode12) represents the number of coding bits associated with the chosen mode. To compute Jmode, forward and inverse integer transforms and variable length coding are performed. However, there is a deficiency in using the conventional motion estimation and compensation processes, which are operated in pixel domain, for secondary SP-frames. In H.264/AVC [12], the motion estimation process in P-frames and secondary SP-frames is the same. For P-frame coding, the best match in the reference frame is subtracted from the current macroblock to form a residual macroblock. Then the transformation and quantization processes are performed on the residue. Most of the coefficients become zero after these processes. This property paves the way for entropy coding. On the other hand, secondary SP-frame coding involves firstly the transformation and quantization processes of the macroblock in SP2,t and its best match in the reference frame, SP1,t-1. They are then subtracted in quantized-transform (QDCT) domain. In this case, their quantized-transform coefficients are only near, but not equal. There are many nonzero coefficients generated, especially for a small Qs. Since there is no further quantization from this point, the spread of non-zero coefficients in entropy coding exists. This induces the bulky size of a secondary SP-frame. To reduce residues for better entropy coding, quantized-transform domain motion estimation (QDCT-ME)

4

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

mv1 , mode1 B1

VLC

T

Bitstr eam B1 -

+

QP 1

+

T MC ME

Fr ame Buffer

Pˆ1 , t – 1 T

Qs

mv12 , mode12 QDCT-ME

QDCT-MC

VLC

- +

Bitstr eam SP 12

B2

mv2 , mode2

T

VLC Bitstream B2

QP 2-1

+

MC Qs[T[SP2,t ]] ME

Frame Buffer

Qs

Fig. 3. Simplified secondary SP-frame encoder using QDCT-ME.

instead of its pixel-domain counterpart is used [6–8]. Fig. 3 shows the SP-frame encoder implemented with QDCT-ME. Instead of pixel-domain, the aim of QDCT-ME is to minimize residues in QDCT domain. Therefore, the modified Lagrangian cost function of QDCT-ME, J’motion , is based on the sum of absolute differences in quantizedtransform coefficients and the estimated rate. This cost function is expressed as 0 Jmotion ðmv12 , lmotion Þ ¼ k  SAQTDðBc ,Br Þ þ lmotion Rmotion ðmv12 pmv12 Þ

ð3Þ where SAQTD(Bc,Br) denotes the sum of absolute differences between the quantized-transform coefficients of the current block Bc and the quantized-transform coefficients of its reference block Br, and it can be defined as X SAQTDðBc ,Br Þ ¼ 9Qs½TðBc ÞQs½TðBr Þ9 ð4Þ k in (3) is a weighting factor to compensate for the energy loss of SAQTD(Bc,Br) due to the fact that SAQTD(Bc,Br) is operated in the QDCT domain and the extra quantization causes the energy is no longer preserved. Using this arrangement, the motion estimation is now consistent with the residue generation of the secondary SP-frame encoder. In summary, the pixel-domain motion estimation is to minimize the absolute differences in which the subtraction

is performed in terms of pixels while the QDCT-domain motion estimation is to minimize the absolute differences in which the subtraction is performed in terms of transformed and quantized coefficients. The QDCT-ME is capable of finding the motion vectors with more zero values of quantized-transform coefficients, which benefits the entropy coding of a secondary SP-frame, and provides a remarkable size reduction [6–8]. 3. Proposed hybrid secondary SP-frame coding 3.1. Impact of using QDCT-ME on secondary SP-frame coding In [7], it was found that some secondary SP-frames generated by the QDCT-domain motion estimation technique induce more bits than the pixel-domain technique, as illustrated in Fig. 4(a) and (b) where the test video streams used for simulation are ‘‘Riverbed’’ and ‘‘Shuttlestart’’. They were encoded into two bitstreams with Qp being equal to 20 and 28, and Qs was set to Qp–6, i.e. 14 and 22, respectively. Switching was then taken place from the bitstream with Qp¼20 to the bitstream with Qp¼28. This phenomenon can be explained by the distribution of quantized-transform coefficients in some 4  4 blocks where all pixels are similar or with the same value.

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

3.2. MB-based hybrid motion estimation scheme

12

size reduction (%)

10 8 6 4 2 0 -2 -4 0

10

20

30

40 50 60 frame number

70

80

90

0

10

20

30

40 50 60 frame number

70

80

90

4 3 size reduction (%)

5

2 1 0 -1 -2 -3

Fig. 4. Frame-by-frame comparisons of size reduction of secondary SP-frames in percentage achieved by QDCT-ME over pixel-domain ME in (a) Riverbed and (b) Shuttlestart.

In these blocks, their quantized-transform coefficients are mainly zeros except the DC coefficient. If this type of QDCT-domain blocks are used to search over a predetermined search area on the quantized-transform coefficients of the reference frame, all the SAQTD values in (4) are very similar and the motion estimation process becomes very sensitive to noise. In this case, the smallest SAQTD value may not be presumed to be the best motion vector, and then QDCT-ME introduces more bit-counts. Another drawback of QDCT-ME is the surge in computational complexity. For QDCT-ME, each current block Bc is transformed and quantized to Qs[T(Bc)]. A search window in the reference frame centered on the current block position is set. QDCT-ME starts with transforming and quantizing a block, Qs[T(Br)], in the top right-hand corner of the search window. After obtaining Qs[T(Bc)] and Qs[T(Br)], SAQTD(Bc,Br) can then be calculated for this candidate. Afterward, the next candidate is another block shifted by 1 pixel in horizontal direction, and this block is also transformed and quantized for computing its SAQTD(Bc,Br). These procedures continue for all possible candidates within the search window. It means that all possible candidates within the search window also need to be transformed and quantized in QDCT domain. Consequently, SAQTD is computationally very intensive though it can achieve higher coding efficiency as compared with SAD.

The aforementioned drawbacks motivate us to adopt a hybrid approach to reduce the computational complexity and further improve the coding efficiency of QDCT-ME. In the proposed hybrid scheme, a selection mechanism based on inter-frame correlation between the current frame and the reference frame is adopted to choose between the use of pixel-domain and QDCT-domain techniques. One straightforward approach is to perform pixel-domain motion estimation and QDCT-domain motion estimation separately. This frame-based approach compares the sizes of secondary SP-frames generated by both estimation techniques and chooses the final motion vector with a smaller bit-count. However, spatial characteristic within a frame is not considered. For instance, a 4  4 block in homogeneous area causes more quantizedtransform coefficients to be zero in which QDCT-ME is no longer suitable. In contrast, choosing spatial-domain ME might give a lower-energy residual after motion compensation in this situation. In general, spatial-domain ME is appropriate for homogeneous areas of the frame while QDCT-ME is beneficial to detailed areas. By taking this into consideration, the scheme proposed in this paper is operated at macroblock (MB) level. Similar to the frame-based approach, two independent motion estimation techniques operated in pixel and QDCT domains for each MB are carried out. Then two bit-counts for an MB can be given and the minimum one, associated with a motion vector, is chosen. This motion vector is considered as the best one. However, this brute-force approach will increase the encoding time drastically. The results given in [7] revealed that the improvement in the coding efficiency of a secondary SP-frame from QDCT-ME highly relies on the degree of inter-frame correlation between the MB in SP2,t and its motion compensation MB in P1,t-1 as shown in Fig. 1, and it is denoted by corrSSPMB,t. The weaker the correlation, the better the coding efficiency of QDCT-ME. Therefore, corrSSPMB,t is a good measure to determine a proper domain for motion estimation in coding an MB in a secondary SP-frame. In other words, a smaller value of corrSSPMB,t tends to use QDCT-ME; otherwise, pixel-domain motion estimation is appropriate to reduce the required computational complexity and maintain sufficient coding efficiency for MBs in homogeneous areas. The number of bits required for encoding the MB in SP12,t can be used directly as a measure of corrSSPMB,t. However, this bit-count cannot be obtained prior to secondary SP-frame coding. Fig. 5(a) and (b) then shows the bit-counts of all MBs in two rows of SP12,t and SP1,t in the 82nd frame of ‘‘Riverbed’’ and the 69th frame of ‘‘Shuttlestart’’. It is noted that the MB of the primary SP-frame has already been available when its corresponding MB in the secondary SP-frame is encoded. From this figure, it can be easily seen that the general trends of the two curves are very similar. It is due to the fact that the current frames used for encoding SP12,t and SP1,t are the frames at time t from B2 and B1, respectively. They are actually the same video content and the only discrepancy is the quantization parameter. It implies that corrSSPMB,t is reasonably approximated by the bit-count of the MB in the

6

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

14th Row

1200 1000

1200

primary SP secondary SP

1000

primary SP secondary SP

800 Bits

800 Bits

26th Row

600

600

400

400

200

200 0

0 0

10 20 30 40 50 60 70 Position of macroblocks in the 14th row

0

10 20 30 40 50 60 70 Position of macroblocks in the 26th row

1280 pixels

720 pixels

1200

14th Row

1000

1200

primary SP secondary SP

1000

primary SP secondary SP

800 Bits

800 Bits

26th Row

600

600

400

400

200

200

0

0 0

10 20 30 40 50 60 70 Position of macroblocks in the 14th row

0

10 20 30 40 50 60 70 Position of macroblocks in the 26th row

1280 pixels

720 pixels

Fig. 5. Bit-counts for all macroblocks in two rows of a secondary SP-frame and its corresponding primary SP-frame in (a) Riverbed and (b) Shuttlestart.

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

As a consequence, the motion vector predictor is formed based on the median value of the motion vectors of the three adjacent blocks, on the left, top, and top-right (or top-left if top-right is not available). However, considering that the proposed hybrid scheme allows the use of different domains for motion estimation in secondary SP-frame coding, the motion vector of the MB with fewer bit-count obtained by ME in one domain will contribute to the motion vector predictor for the adjacent MB, no matter which domain is used in motion estimation [13,14]. Motion vectors obtained from different domains are spatially less correlated, resulting in an increase in bitcounts of secondary SP-frames. It is noted that, in H.264/ AVC video, slice is a basic structure in a picture, and each picture can be subdivided into one or more slices. A sequence of MBs is defined in these slices. These slices segment a picture into different partitions and they are coded independent of each other. In the typical picture setting of an H.264/AVC encoded video bitstream, only one slice is always used in a picture. In other words, even though two neighboring MBs in a slice using different motion estimation techniques are still dependent on each other, mixing two kinds of motion vectors in a frame is not good for coding efficiency. It means that only one slice per frame is not very applicable to the proposed hybrid scheme. This problem can be solved by assigning only one MB per slice. Yet it is not a practical way due to the excessive bits required by the headers in these slices. In the proposed hybrid scheme, an additional consideration is made based on the new MB classification policy by utilizing Flexible Macroblock Ordering (FMO) [15]. FMO is one of the most striking error resilience tools supported by H.264/AVC. It specifies a pattern that assigns MBs in a picture to one or several slice groups, and provides more flexible way of macroblock grouping. Each MB can be assigned into a slice group through a

primary SP-frame at time t, SP1,t, denoted by bit-countPSPMB. It is then used to select a proper domain for motion estimation in secondary SP-frame coding with the proposed hybrid scheme. When bit-countPSPMB is larger than a predefined threshold TH, corrSSPMB,t becomes lower and QDCT-ME is carried out for secondary SP-frame coding in order to offer higher coding efficiency. In contrast, pixeldomain motion estimation is good enough when bitcountPSPMB rTH, which can relieve the computational burden of secondary SP-frame coding as well as providing better coding efficiency of MBs within the homogeneous area. In general, more MBs in a frame with complex spatial or temporal activities will be encoded using QDCT-ME. To determine TH, arithmetic mean for each frame with M  N MBs is used as a tool to derive the central tendency of bits in these MBs. TH can then be formulated as TH ¼

1 N 1 X X 1 M bitcountPSPMBði,jÞ MN i ¼ 0 j ¼ 0

7

ð5Þ

where bit-countPSPMB(i,j) is the bit-count needed to be encoded for the MB at the ith row and jth column of a primary SP-frame. 3.3. Utilization of FMO in the proposed hybrid scheme The proposed hybrid scheme estimates motion vectors obtained from two kinds of domains—pixel domain or QDCT domain. Unsystematically mixing these two kinds of motion vectors might increase the bit-count required by a secondary SP-frame. It is due to the adoption of motion vector prediction where motion vectors are coded differentially with respect to a predictor of the motion vectors from the previous coded blocks. This predictor is computed with consideration of the motion vectors of the adjacent blocks. They tend to have very high correlation.

Foreground

background

Type 0

Type 1

Type 2

Type 3

Type 4

Type 5

Fig. 6. FMO map types.

8

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

examining the bit-counts of the primary SP-frame prior to the motion estimation and motion vector coding in a secondary SP-frame. The six predefined FMO slice groups in Fig. 6 are not flexible enough to fulfill the classification of MBs since none of the patterns in the predefined slice groups can fit for all frames. Apart from the predefined patterns, fully flexible macroblock ordering in the explicit mode is used in the proposed scheme. By doing so, each secondary SP-frame changes dynamically the MB classification throughout the entire video sequence. The provision of dynamic formation of slice groups in every secondary SP-frame is exploited by the bit-counts of MBs in the primary SP-frame. Then each MB in the corresponding secondary SP-frame is classified either in pixel or QDCT domain. Specifically, an identification number is given to each MB. The MBs that are identified using pixeldomain ME are assigned as slice group 0 while the MBs that are classified using QDCT-ME are assigned as slice group 1. After that, an MBAmap is established. This map partitions the

Macroblock-to-slice Allocation map (MBAmap) [15–17]. In this map, each MB is identified with a number that indicates which slice group the MB belongs to. The MBAmap can update each frame using the Picture Parameter Set (PPS). By this mechanism, FMO can divide a picture with different patterns of MBs. There are seven different types of slice group maps for FMO in the standard [9]. Fig. 6 depicts six predefined slice group maps from type 0 to type 5: interleaved, dispersed, foreground and background, box-out, raster scan, and wipe. In addition, there is an explicit mode (type 6), which is also the most flexible FMO type. This explicit mode allows users to define their own MBAmap such that MBs in a picture can be assigned to any slice group in any order. In the proposed hybrid scheme, motion estimation is carried out either in pixel domain or QDCT domain. According to the domain selection in Section 3.2, there are two slice groups in each secondary SP-frame, and MBs are classified by

39 37

PSNR (dB)

PSNR (dB)

38 36 35 pixel-ME [12] QDCT-ME [7] Hybrid

34 33 32 30

40 50 Bitrate (MBits/s)

43

41

42

40

41 40 pixel-ME [12] QDCT-ME [7] Hybrid

39

37

40

43

39

42

38 37

pixel-ME [12] QDCT-ME [7] Hybrid

36

pixel-ME [12] QDCT-ME [7] Hybrid 5

10

15 20 25 Bitrate (MBits/s)

30

35

41 40 39

pixel-ME [12] QDCT-ME [7] Hybrid

38

35

70

38

13

PSNR (dB)

PSNR (dB)

9 11 Bitrate (MBits/s)

50 60 Bitrate (MBits/s)

39

35 7

40

36

38 5

pixel-ME [12] QDCT-ME [7] Hybrid 30

60

PSNR (dB)

PSNR (dB)

20

37 36 35 34 33 32 31 30 29 28

37 5

7

11 9 Bitrate (MBits/s)

13

15

2

6 4 Bitrate (MBits/s)

8

Fig. 7. Rate-distortion performance for switching-down scenario in (a) Riverbed, (b) Duckstakeoff, (c) Shuttlestart, (d) Crew, (e) Mobisode1, and (f) Mobisode2.

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

secondary SP-frame into two slice groups. During encoding, only MBs in the same slice depend on each other. It means that motion vector predictors, which are the motion vectors computed through motion estimation in the same domain, are from the same slice only. On the whole, FMO partitions a secondary SP-frame into two slice groups referring to the MBAmap, which describes the use of the proper domain in motion estimation of each MB. 4. Results A large amount of experimental work has been conducted to evaluate the performance of the proposed hybrid

scheme for bitstream switching using SP-frames. Results in terms of both coding efficiency and computational complexity were compared with those obtained using the pixel-domain and QDCT-domain schemes. Let us denote them as pixel-ME [12] and QDCT-ME [7]. All schemes were implemented based on the H.264/AVC reference software (JM version 11.0) [12] for secondary SP-frame coding. Six test sequences, ‘‘Riverbed’’ (1280  720 pixels), ‘‘Duckstakeoff’’ (1280  720 pixels), ‘‘Shuttlestart’’ (1280  720 pixels), ‘‘Crew’’ (1280  720 pixels), ‘‘Mobisode1’’ (832  480 pixels) and ‘‘Mobisode2’’ (832  480 pixels), were used for performance comparison. In each sequence, 100 frames were encoded with two QP to generate two bitstreams with

8

14

6

10

size reduction (%)

size reduction (%)

12 8 6 4 2 0 Hybrid

0

Hybrid

QDCT-ME [7]

-4 0

10

20

30

40 50 60 frame number

70

80

0

90

10

20

30

40 50 60 frame number

70

80

90

70

80

90

70

80

90

50

4

40 size reduction (%)

3 size reduction (%)

2

QDCT-ME [7]

-4

2 1 0 -1 -2

Hybrid

30 20 10 0 -10 -20

QDCT-ME [7]

Hybrid

QDCT-ME [7]

-30

-3 0

10

20

30

40 50 60 frame number

70

80

0

90

30

10

20

30

40 50 60 frame number

60 50 size reduction (%)

20 size reduction (%)

4

-2

-2

10 0 -10 -20 Hybrid

-30

9

0

10

20

30

30 20 10 0 -10 Hybrid

-20

QDCT-ME [7]

40 50 60 frame number

40

QDCT-ME [7]

-30 70

80

90

0

10

20

30

40 50 60 frame number

Fig. 8. Frame-by-frame size reduction of secondary SP-frames in percentage achieved by the proposed hybrid scheme and QDCT-ME [7] over pixel-ME [12] for the switching-down scenario in (a) Riverbed, (b) Duckstakeoff, (c) Shuttlestart, (d) Crew, (e) Mobisode1, and (f) Mobisode2.

10

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

conditions recommended in [18]. For the quantized-transform motion estimation, the scaling factor k of Eq. (3) was set to 3, which was found by experimental observations [7]. To make the comparison impartial, both schemes employed a full search motion estimation algorithm. We aim at evaluating the coding efficiency of secondary SP-frames based on our proposed hybrid scheme. To have a comprehensive and fair comparison among the proposed hybrid scheme, QDCT-ME and pixel-ME, we did exhaustive simulation on all possible secondary SPframes. Fig. 7 shows the rate-distortion curves of secondary SP-frames using pixel-ME, QDCT-ME, and the hybrid scheme for switching-down scenario. It is noted that the PSNR values of secondary SP-frames encoded by three

7

4

6

3 size reduction (%)

size reduction (%)

different bit rates. Switching between two bitstreams in both directions was then performed. For the high bit rate bitstream, the quantization parameter QP was fixed at 20. On the other hand, QP was varied from 24 to 32 with a step size of 4 for coding the low bit rate bitstream. According to the optimal setting in [2], QS was set to QP–6, i.e. from 18 to 26. Only the first frames of the bitstreams were encoded as I-frames, and switching frames were encoded in turn as SPframes while all the rest non-switching frames were encoded as P-frames. In our experiments, the extended profile with CAVLC entropy encoding was used to configure the encoder. R-D optimization was enabled. For the motion estimation process, a search range of 64 was set for both P-frames and SP-frames. It is based on the encoding

5 4 3 2 1

1 0 -1 -2

Hybrid

QDCT-ME [7]

Hybrid

QDCT-ME [7]

0

-3 0

10

20

30

40 50 60 frame number

70

80

0

90

10

20

30

40 50 60 frame number

70

80

90

70

80

90

70

80

90

25

20

20 size reduction (%)

15 size reduction (%)

2

10 5 0 -5 -10

Hybrid

15 10 5 0 -5 -10

QDCT-ME [7]

Hybrid

QDCT-ME [7]

-15

-15 0

10

20

30

40 50 60 frame number

70

80

0

90

30

10

20

30

40 50 60 frame number

Hybrid

QDCT-ME [7]

40 30

20

size reduction (%)

size reduction (%)

25 15 10 5 0 -5 -10

Hybrid

10

20

30

40

50

60

frame number

70

10 0 -10 -20

QDCT-ME [7]

-15 0

20

80

90

-30

0

10

20

30

40

50

60

frame number

Fig. 9. Frame-by-frame size reduction of secondary SP-frames in percentage achieved by the proposed hybrid scheme and QDCT-ME [7] over pixel-ME [12] for the switching-up scenario in (a) Riverbed, (b) Duckstakeoff, (c) Shuttlestart, (d) Crew, (e) Mobisode1, and (f) Mobisode2.

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

different schemes are identical for the same Qp. It is because the reconstructed secondary SP-frame is exactly the same as its corresponding primary SP-frame no matter which motion estimation approach is used in secondary SP-frame coding. From Fig. 7, it is obvious that the proposed hybrid scheme can maintain the same quality of secondary SP-frames with lower bitrates. For simplicity but without loss of generality, the following simulations are focused on the bit-counts of secondary SP-frames when different schemes are adopted. Figs. 8 and 9(a)–(f) show a frame-by-frame comparison of size reduction in secondary SP-frames for different sequences in both switching directions. Fig. 8(a)–(f) demonstrates high-to-low bit rate switching (Qp ¼20 28) while Fig. 9(a)–(f) depicts low-to-high bit rate switching (Qp ¼2820). In these figures, the values of the Y-axis mean the average size reduction of secondary SP-frames in percentage of our proposed hybrid scheme and the QDCTME scheme over the pixel-ME scheme. The positive value indicates the tested schemes generate less bit-count as compared with pixel-ME whereas the negative value indicates that the tested schemes require more bit-count as compared with the pixel-ME. In the high-to-low bit rate

11

switching (switching down) scenario as shown in Fig. 8, the proposed hybrid scheme can substantially reduce the size of secondary SP-frames, about 7.5%, 2.1%, 2%, 6.5%, 6.3% and 2.4% in average and up to 13.5%, 7%, 3.5%, 43%, 27% and 52% in ‘‘Riverbed’’, ‘‘Duckstakeoff’’, ‘‘Shuttlestart’’, ‘‘Crew’’, ‘‘Mobisode1’’ and ‘‘Mobisode2’’, respectively, comparing to the traditional pixel-ME scheme. Similarly, in the low-to-high bit rate switching (switching up) case as illustrated in Fig. 9, size reduction of secondary SP-frames using the hybrid scheme is also very significant, about 4.4%, 1.0%, 3.6%, 4.9%, 4.7% and 5.7% in average and up to 6.5%, 3.5%, 18%, 23%, 24% and 35% in ‘‘Riverbed’’, ‘‘Duckstakeoff’’, ‘‘Shuttlestart’’, ‘‘Crew’’, ‘‘Mobisode1’’ and ‘‘Mobisode2’’, respectively. From Figs. 8 and 9, it can be seen that our hybrid scheme can outperform QDCT-ME in all secondary SP-frames for all sequences even though QDCT-ME can also have remarkable size reduction over the pixelME. The significant improvement of the proposed scheme is due to the flexibility of performing motion estimation and compensation in both pixel and QDCT domains. The process of selecting an appropriate domain for motion estimation can prevent the case when the MB has similar pixel values in which the motion estimation operated in

Table 1 Average size of secondary SP-frames with different Qp for the switching-down scenario. The numbers in brackets represent the savings for various schemes as compared with pixel-ME. Qp

Riverbed

24

28

32

pixel-ME [12] (kbits)

QDCT-ME [7] (kbits)

Hybrid (kbits)

pixel-ME [12] (kbits)

QDCT-ME [7] (kbits)

Hybrid (KBits)

pixel-ME [12] (kbits)

QDCT-ME [7] (kbits)

Hybrid (kbits)

1778

1696 ( 4.617%) 2285 ( 1.218%) 375 ( 1.790%) 953 ( 3.939%) 444 ( 2.964%) 196 ( 6.069%)

1656 (  6.836%) 2269 (  1.893%) 374 (  2.053%) 933 (  5.888%) 435 (  5.096%) 195 (  6.880%)

1256

1206 (  3.951%) 1755 (  1.111%) 266 (  1.098%) 669 (  2.558%) 335 (  2.325%) 151 (  2.245%)

1163 ( 7.395%) 1738 ( 2.051%) 263 ( 1.925%) 631 ( 8.069%) 320 ( 6.645%) 144 ( 6.390%)

840

835 (  0.620%) 1279 (  0.440%) 183 (  0.397%) 451 (  3.334%) 232 (  2.824%) 110 (  3.005%)

795 ( 5.365%) 1268 ( 1.260%) 181 ( 1.595%) 447 ( 4.184%) 231 ( 3.217%) 106 ( 6.270%)

Duckstakeoff 2313 Shuttlestart

382

Crew

992

Mobisode1

458

Mobisode2

209

1774 269 687 343 154

1284 184 467 239 113

Table 2 Average size of secondary SP-frames with different Qp for the switching-up scenario. The numbers in brackets represent the savings for various schemes as compared with pixel-ME. Qp

Riverbed

24

28

32

pixel-ME [12] (kbits)

QDCT-ME [7] (kbits)

Hybrid (kbits)

pixel-ME [12] (kbits)

QDCT-ME [7] (kbits)

Hybrid (kbits)

pixel-ME [12] (kbits)

QDCT-ME [7] (kbits)

Hybrid (kbits)

2433

2376 (  2.329%) 2983 (  0.788%) 653 (  1.990%) 1431 (  2.834%) 636 (  0.952%) 320 (  3.206%)

2314 (  4.894%) 2971 (  1.172%) 650 (  2.421%) 1416 (  3.871%) 616 (  3.993%) 318 (  3.725%)

2376

2308 ( 2.846%) 3028 ( 0.585%) 707 ( 1.876%) 1485 ( 1.123%) 652 ( 0.648%) 351 (  3.006%)

2273 (  4.345%) 3017 (  0.939%) 696 (  3.502%) 1423 (  5.244%) 630 (  4.079%) 337 (  6.906%)

2362

2302 ( 2.551%) 3111 ( 0.493%) 760 ( 4.164%) 1564 ( 3.997%) 679 ( 1.882%) 382 ( 5.895%)

2266 ( 4.042%) 3106 ( 0.652%) 774 ( 2.375%) 1495 ( 8.204%) 672 ( 2.846%) 373 ( 8.212%)

Duckstakeoff 3007 Shuttlestart

666

Crew

1473

Mobisode1

642

Mobisode2

330

3046 721 1502 657 362

3126 793 1629 692 406

12

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

pixel domain generates fewer bits than QDCT-ME, as explained in Section 3.1. The evidence of this phenomenon is also depicted in Figs. 8 and 9 where a number of secondary SP-frames of the test sequences generated by QDCT-ME introduce more bits in comparison with pixelME. The proposed hybrid algorithm effectively combines two existing techniques at MB level during the encoding of each SP-frame. The combination is facilitated by the flexible explicit mode in FMO, and is then controlled by the bitcount from the corresponding MB of the primary SP-frame. By making use of this arrangement, the overwhelming reduction in Figs. 8 and 9 of the proposed hybrid scheme highlights the importance of using a proper domain at MB level for motion estimation in secondary SP-frame coding. By the same token, Tables 1 and 2 show the average size of secondary SP-frames with different Qp for switching-down and switching-up scenarios, respectively. Besides, Figs. 10 and 11(a)–(f) show the average percentage reduction in size of secondary SP-frames. Note that the

quantization parameters were varied from 24 to 32 with a step-size of 4 and the switching processes were from and to Qp ¼20. Tables 1 and 2 illustrate that even though the average size of secondary SP-frames using QDCT-ME is smaller than that using pixel-ME, the secondary SP-frame encoded by the proposed hybrid scheme requires the smallest bit-counts. It is obvious from Fig. 10(a) (f) and Fig. 11(a)–(f) that using the proposed hybrid scheme, the size of secondary SP-frames can be remarkably reduced for various quantization parameters. Moreover, it is interesting to take a closer look at a frame of ‘‘Shuttlestart’’, as depicted in Fig. 12. This frame contains a space shuttle launching with large amount of smoke. In this sequence, it includes scenes with challenging motion, which is suitable for QDCT-ME. On the other hand, the MBs located at the sky, as highlighted in Fig. 12, have inferior performance for QDCT-ME, as the pixels in this area have similar values. The results in Figs. 10 and 11 show that the hybrid scheme can make good use of pixel-ME and QDCT-ME.

8

3 size reduction (%)

size reduction (%)

7 6 5 4 3 2 1

Hybrid

2

1 Hybrid

QDCT-ME [7]

0

QDCT-ME [7]

0 24

28 Qp

32

24

28 Qp

32

7

3 size reduction (%)

size reduction (%)

6 2

1

5 4 3 2 1

0

Hybrid

24

QDCT-ME [7]

28 Qp

24

32

QDCT-ME [7]

28 Qp

32

5

7 6 size reduction (%)

size reduction (%)

Hybrid

0

5 4 3 2 1 Hybrid

4 3 2 1 Hybrid

QDCT-ME [7]

0

QDCT-ME [7]

0 24

28 Qp

32

24

28 Qp

32

Fig. 10. Size reduction of secondary SP-frames in percentage achieved by the proposed hybrid scheme and QDCT-ME [7] over pixel-ME [12] with different Qp for the switching-down scenario in (a) Riverbed, (b) Duckstakeoff, (c) Shuttlestart, (d) Crew, (e) Mobisode1, and (f) Mobisode2.

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

2

5

size reduction (%)

size reduction (%)

6

4 3 2 1

Hybrid

QDCT-ME [7]

32

24

Hybrid QDCT-ME [7]

size reduction (%)

size reduction (%)

28 Qp

3 2 1 0 24

28 Qp

3 Hybrid QDCT-ME [7]

1 0 28 Qp

32

8 7 6 5 4 3 2 1 0

28 Qp

32

28 Qp

32

28 Qp

32

Hybrid QDCT-ME [7]

24

size reduction (%)

size reduction (%)

4

24

9 8 7 6 5 4 3 2 1 0

32

5

2

Hybrid

0 24

4

1

QDCT-ME [7]

0

5

13

Hybrid QDCT-ME [7]

24

Fig. 11. Size reduction of secondary SP-frames in percentage achieved by FMO and QDCT-ME [7] over pixel-ME [12] with different Qp for the switchingup scenario in (a) Riverbed, (b) Duckstakeoff, (c) Shuttlestart, (d) Crew, (e) Mobisode1, and (f) Mobisode2.

in Tables 3 and 4 for switching-down and switching-up scenarios, respectively. All the simulations were carried out on a PC with an Intel CoreTM2 Quad Q9450 CPU at 2.66 GHz and 12 GB memory. We also demonstrate the savings of the proposed hybrid scheme over the QDCT-ME scheme in Tables 3 and 4. DTime in these tables represents the percentage change of the average encoding time of the hybrid scheme over QDCT-ME and it is calculated as follows:

DTime ð%Þ ¼

QDCT-ME

Pixel-ME

Fig. 12. Different areas using QDCT-ME [7] and Pixel-ME [12].

To compare the computational complexity required by various schemes, the average encoding time of secondary SP-frames with different Qp was measured and tabulated

TimeHybrid TimeQDCTME  100 TimeQDCTME

ð6Þ

where TimeQDCT-ME and TimeHybrid denote the encoding time used by the QDCT-ME scheme and the proposed hybrid scheme, respectively. Owing to selecting an appropriate domain for motion estimation in secondary SP-frame coding, not all the blocks need to be encoded using QDCTME. As a result, it can be easily seen that the proposed hybrid scheme can substantially reduce the computational

14

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

Table 3 Average time usage of secondary SP-frames with different Qp for the switching-down scenario. 24

Qp

Riverbed Duckstakeoff Shuttlestart Crew Mobisode1 Mobisode2

28

pixel-ME [12] (s)

QDCT-ME [7] (s)

Hybrid (s)

DTime (%) pixel-ME

359 338 156 287 151 104

1945 1874 413 1218 574 293

1008 1191 200 627 264 113

 48.1678  36.4491  51.5083  48.5305  53.9551  61.3450

32

QDCT-ME [7] (s)

Hybrid (s)

DTime (%) pixel-ME

[12] (s)

QDCT-ME [7] (s)

Hybrid (s)

DTime (%)

325 300 109 240 125 76

1456 1386 143 737 382 158

700 794 95 333 172 59

 51.8894 268  42.6974 250  33.5481 40  54.8163 183  54.9783 98  62.3454 58

849 807 80 290 191 69

373 443 67 136 109 47

 56.0815  45.0815  15.9146  53.2507  42.9160  32.3204

QDCT-ME [7] (s)

Hybrid (s)

DTime (%)

[12] (s) 381 381 239 346 150 149

2205 2211 1152 1840 722 595

1059 1606 622 1130 477 284

 51.9970  27.3594  46.0020  38.5817  33.8820  52.2755

[12] (s)

Table 4 Average time usage of secondary SP-frames with different Qp for the switching-up scenario. Qp

24 pixel-ME [12] (s)

Riverbed 377 Duckstakeoff 365 Shuttlestart 210 Crew 327 Mobisode1 2301 Mobisode2 128

28 QDCT-ME [7] (s)

Hybrid (s)

DTime (%) pixel-ME

2209 2114 923 1741 2279 531

1300 1438 448 1018 631 203

 41.1509  31.9572  51.4419  41.5473  72.3058  61.7542

32 QDCT-ME [7] (s)

Hybrid (s)

DTime (%) pixel-ME

[12] (s) 375 368 225 335 167 135

2205 2129 1036 1776 761 600

1287 1447 487 1027 384 219

 41.6392  32.0224  53.0239  42.1835  49.5274  63.4790

complexity of QDCT-ME by 56%, 45%, 51%, 55%, 55% and 62% for the switching-down scenario and 52%, 32%, 53%, 42%, 72% and 63% for the switching-up scenario in ‘‘Riverbed’’, ‘‘Duckstakeoff’’, ‘‘Shuttlestart’’, ‘‘Crew’’, ‘‘Mobisode1’’ and ‘‘Mobisode2’’, respectively, as shown in Tables 3 and 4. Not only encoding time can be saved, but our proposed hybrid scheme can also reduce greatly the bit-counts of secondary SP-frames. 5. Conclusion In this paper, an adaptive approach for motion estimation in coding H.264/AVC secondary SP-frames has been proposed. The proposed hybrid scheme combines motion estimation techniques, which are operated at two different domains: QDCT domain and pixel domain. The QDCTdomain motion estimation in secondary SP-frame coding offers higher coding efficiency for macroblocks with complex spatial or temporal activities, but it increases encoding time. The pixel-domain motion estimation is more suitable for macroblocks in homogeneous area, and requires less encoding time. A new measurement of inter-frame correlation based on the bit-counts of the macroblocks in primary SP-frames has been proposed to determine the combination of the two techniques. The hybrid scheme adaptively selects QDCT or pixel domain motion estimation at macroblock level, which has the advantages of utilizing both domains in coding secondary SP-frames. Owing to the adoption of motion vector prediction in H.264/AVC, unsystematically mixing motion vectors obtained from the two different domains might increase the bit-count required by a secondary SP-frame. By making use of an explicit mode in FMO, defining the patterns of the slice groups according to the bit-counts of primary

SP-frames is another contribution of this paper. Experimental results showed that the proposed measure of determining the proper domain for motion estimation in secondary SP-frame coding is effective. The hybrid algorithm significantly reduces the bit-counts of secondary SP-frames. Additionally, the computational complexity can be saved tremendously comparing to the QDCT scheme. With the proliferation of HDTV and multimedia applications, a new video standard called High Efficiency Video Coding (HEVC) is currently under development for coding video from QVGA (320  240) up to 1080p and Ultra HDTV (7680  4320) [19]. It aims at providing high coding gain with 50% bitrate savings over H.264/AVC at the same video quality representation, probably at the expense of increased computational complexity. It is expected that bitstream switching will still be an important issue in entertainment-quality video services. Consequently, seamless switching with SP-frames has already been adopted in the newest reference software. Our proposed technique could also be extended to HEVC with new coding tools, for instance, more choices of coding modes, larger DCT size and, as expected, larger search window for ultra-HDTV supported by HEVC. Computational reduction of QDCT-ME is of great importance when SP-frames are used in HEVC. It is anticipated that the encoding complexity and the bit-counts of secondary SP-frames will be reduced with the help of the proposed technique.

Acknowledgments The work described in this paper is partially supported by the Centre for Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic

K.-K. Lai et al. / Signal Processing: Image Communication 27 (2012) 1–15

University and a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (PolyU 5122/08E). Ki-Kit Lai acknowledges the research studentships provided by the University.

[9]

[10]

References [11] [1] R. Kurceren, M. Karczewicz, Synchronization-predictive coding for video compression: the SP frames design for JVT/H.26L, in: Proceedings of the IEEE International Conference on Image Processing, vol. 2, September 2002, USA, pp. 497–500. [2] M. Karczewicz, R. Kurceren, The SP- and SI-frames design for H.264, IEEE Transactions on Circuits and Systems for Video Technology 13 (7) (2003) 637–644. [3] C.P. Chang, C.W. Lin, R-D optimized quantization of H.264 SPframes for bitstream switching under storage constraints, in: Proceedings of the IEEE International Symposium on Circuits and Systems, vo1. 2, Kobe, Japan, May 2005, pp. 1241–1235. [4] E. Setton, Bernd Girod, Rate-distortion analysis and streaming of SP and SI frames, IEEE Transactions on Circuits and Systems for Video Technology 16 (6) (2006) 733–743. [5] W.T. Tan, B. Shen, Methods to improve coding efficiency of SP frames, in: Proceedings of the IEEE International Conference on Image Processing, October 2006, Atlanta, USA, pp. 1361–1364. [6] K.K. Lai, Y.L. Chan, C.H. Fu, W.C. Siu, A quantized transform-domain motion sstimation technique for H.264 secondary SP-frames, in: Proceedings of the Pacific Rim Conference on Multimedia, October 2007, Hong Kong, pp. 138–147. [7] K.K. Lai, Y.L. Chan, W.C. Siu, Quantized transform-domain motion estimation for SP-Frame coding in viewpoint switching of multiview video, IEEE Transactions on Circuits and Systems for Video Technology 20 (3) (2010) 365–381. [8] K.K. Lai, Y.L. Chan, C.H. Fu, W.C. Siu, Viewpoint switching in multiview videos using SP-frames, in: Proceedings of the IEEE

[12] [13]

[14]

[15] [16]

[17]

[18]

[19]

15

International Conference on Image Processing, October 2008, San Diego, USA, pp. 1776–1779. Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, ITU-T Recommendation H.264 Advanced video coding for generic audiovisual services, 2005. G. Sullivan, T. Wiegand, Video compression—from concepts to the H.264/AVC video coding standard, in: Proceedings of the IEEE 93 (1) (2005) 18–31. D. Marpe, T. Wiegand, G.J. Sullivan, The H.264/MPEG-4 advanced video coding standard and its applications, IEEE Communications Magazine 44 (8) (2006) 134–143. K. Suhring, H.264 Reference Software JM11.0, 2006. Available: /http://iphone.hhi.de/suehring/tml/S. J. Yeh, M. Vetterli, M. Khansari, Motion compensation of motion vectors, in: Proceedings of the IEEE International Conference on Image Processing, October 1998, Washington DC, USA, pp. 574–577. G. Laroche, J. Jung, B. Pesquet-Popescu, RD optimized coding for motion vector predictor selection, IEEE Transactions on Circuits and Systems for Video Technology 18 (9) (2008) 1247–1257. S. Wenger, H.264/AVC over IP, IEEE Transactions on Circuits and Systems for Video Technology 13 (7) (2003) 645–656. H. Chen, Z. Han, R. Hu, R. Ruan, Adaptive FMO selection strategy for error resilient H.264 coding, in: Proceedings of the International Conference on Audio, Language and Image Processing, July 2008, Shanghai, China, pp. 868–872. S.K. Im, A.J. Pearmain, Error resilient video coding with priority data classification using H.264 flexible macroblock ordering, IET Image Processing 1 (2) (2007) 197–204. Recommended Simulation Common Conditions for Coding Efficiency Experiments Revision 4, ITU-T SG16/Q6 VCEG Doc. VCEGAJ10, San Diego, USA, July 2008. WD2: Working Draft 2 of High-Efficiency Video Coding, Joint Collaborative Team on Video Coding (JCT-VC) ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 Doc. JCTVC-D503, Daegu, Korea, January 2011.