Rate-Distortion Optimal Video Transport over IP with Bit ... - CiteSeerX

0 downloads 0 Views 190KB Size Report
The link layer can apply FEC possibly at a different rate for each IP packet ..... sD objective function evaluations. The proposed method makes WК y. ×o(. &. Ivw.
RATE-DISTORTION OPTIMAL VIDEO TRANSPORT OVER IP WITH BIT ERRORS Oztan Harmanci

A. Murat Tekalp

University of Rochester, Rochester, NY

Koc University, Istanbul, Turkey

ABSTRACT In this paper we propose a method for video delivery over bit error channels. In particular, we propose a rate distortion optimal method for slicing and unequal error protection (UEP) of packets over bit error channels. The proposed method performs full frame based search using a novel dynamic programming approach to determine the optimal slicing configuration in a practically short time. Also we propose a rate and distortion estimation technique that decreases the time to evaluate the objective function for a slice configuration. The proposed method can perform rate-distortion UEP that can be used over forward error correction(FEC) capable channels. We show that the proposed method successfully exploit the local dynamics of a video frame and perform more than 1dB better than common methods. 1. INTRODUCTION The protocol stacks employed in current IP networks, which were conceived for data and/or voice delivery, do not deliver packets with bit errors. The link and physical layers usually fragment IP packets, and the fragments that are received with bit errors are discarded. Any packet with missing fragments are treated as completely lost. Therefore, these networks are typically modeled as packet-loss channels, where bit errors are transformed into packet losses. On the other hand, it is generally agreed that video transport can benefit from delivery of packets which contain bit errors. Hence, new protocols that allow erroneous packet delivery are proposed. For example, UDPLite [1] is such a protocol that allows delivery of packets with bit errors to the application. This paper addresses rate-distortion optimal video transport over IP networks, where packets with bit errors are also delivered. In video transport over lossy channels, some degree of errorresilience can be attained by using such tools as slicing and resynchronization markers. A resynchronization marker (RM) is a special uniquely decodable codeword that typically marks the beginning of a new slice; that is, each slice is prefixed by an RM and encoded such that the decoder can achieve resynchronization at the start of each slice even if some previous slices are lost. It is generally desirable, and hence dictated by the standards, that each slice is independently decodable. Therefore, frequent slicing increases error resiliency by confining the impact of error in a shorter slice. However, there is a trade-off between error-resilience and compression efficiency: i) In order to allow independent decodability, prediction from macroblocks that are outside the slice is disabled thus reducing encoding efficiency, ii) in H.264 context-based adaptive variable-length coding (CAVLC) and context-based adaptive binary arithmetic coding (CABAC) [2], a new slice resets the context thus causing further efficiency loss, and, iii) each slice begins with a header, which can consume a significant amount of the bandwidth at low bit rates. There is little published research on the rate-distortion (RD) optimization of the number and composition of slicing within a frame.

1­4244­0481­9/06/$20.00 ©2006 IEEE

The mainstream research uses experimental selection of the slice size. In these works either the slice size is fixed to a number of macroblocks or it is fixed to a number of bits [3]. An RD optimal approach has been presented in [4], where a “begin a slice” decision is made during encoding a MB. However, the loss in encoding efficiency of the following MBs is not considered at the time of decision making. Other slicing methods exist, that determine the slice locations as inferred from another optimization problem, such as [5] whcih infers slice locations based on FEC decisions for each MB. Furthermore, due to low signal to noise ratios in wireless channels, forward error correction(FEC) is generally used. If the application layer can control the FEC rates, it is shown that protecting certain parts of the bitstream stronger than other parts yields superior results in video communications compared to equal protection. This is known as unequal error protection(UEP). In this paper we propose a novel slicing method combined with spatial UEP for video transportation over bit error channels for a low delay communication scenario. 2. COMMUNICATION MODEL We first review the video transport model and protocols used in this work. We, then, introduce a zero-error propagation method in order to reliably estimate the decoder distortion at the encoder side, needed for rate-distortion optimization. 2.1. Video Transport Model The transport model assumes that erroneous packets and fragments are delivered by the network and link layers, respectively. The link layer can apply FEC possibly at a different rate for each IP packet as requested by the application. If the link layer allows application layer FEC control, the proposed algorithm can optimize the FEC rate, too. Otherwise (i.e. link layer controlled FEC) our method only optimizes the slicing. The application may generate multiple packets for each frame such that each packet contains the slices that will be assigned the same FEC rate. The slices in a packet do not have to be consecutive, but are placed according to their desired FEC rates. This is valid, since slice headers contain the ordering information. Each packet is sent through the link layer with the FEC code rate that is selected by the application layer. bits are inserted to each packet to represent RTP/UDP/IP protocol header overhead. The decoder only decodes the slices, whose fragments are received completely error free. When decoder detects an error, it recovers by seeking to the next resynchronization marker. Within this model there are three optimization problems. i) Determining the number of slices, locations and sizes, ii) determining the number of packets, by optimizing the protocol header trade-off, and iii) determining the unequal error protection rates for each slice.

1305







ICIP 2006

RM RM

RM

the following objective function;







V



(2)



$ O

(I)

(II)

%





$

%





Y

[

$

%



(III)

and are where is chosen as proposed in [9]. expected distortion and bitrate of the frame respectively. Then,

RM







V



Y

$

1

(IV)

L

%



[

$

%



M

(V) ]







 V

% $

Fig. 1. 5 possible scan line order dependencies used during encoding a MB based on resynchronization marker location.









(3)

V

 



% $



,

#

=

, the expected distortion of th MB, is calculated using Eqn. 1. Bitrate of the frame is calculated by combining bitrates of each slice while taking into the account the protocol overhead( ) , which is the FEC code rate for th slice in ; and 







V

%





$

%







2.2. Encoder-Side Decoder Distortion Estimation





%

7

$

We use a zero-error propagation method for distortion estimation. The proposed method combines error tracking [6] and NEWPRED [7], by using a reliable feedback channel and known decoder concealment method. The encoder side receives channel feedback about actual packet losses experienced at the decoder with a delay equal to the round trip time (RTT) of the network. We can then simulate the actual decoder at the encoder side, by using the known error concealment method to conceal the effect of the lost slices with RTT sec. delay. Therefore, zero-error propagation can be achieved by forcing the encoder to select a reference frame from those that have been processed by feedback information (that is RTT sec in the past) while encoding the current frame. Once zero error propagation is achieved, the encoder side estimates decoder distortion for th MB by 

[

8

1

$

%

2

]









(4)



%

7 7

7

[

$

%



$ 

[ [

$ `

%





 a

,

7

=

2

2

,

I d

e

f

g

e

]

(5)



-

7

[

% $

`



[ 



% $



2

d

e #



=







% -

/

/

/

-

4

%

-

7 E

C



P

(

8

$

[ 

$

(6) [

#

7

a



otherwise is the bitrate of th slice, and

i

G

is the bitrate where of MB . Let denote the loss probability of th MB. A MB is lost only if any of the fragments of the slice are in error. Let denote the slice index in which MB resides, when configuration is used. Then, is given by;

7

[

$

`

%





-

8

[



$

%



#















-

`

n



$

%





$

%











2













































!

(1)









2

‚

ƒ

‚

g

„

†

ˆ

‰

v







o



 

s

Š w

u

(7)

%



 







q

$

[

!





where is the loss probability, is the concealment distortion and is the distortion upon successful reception and decoding of MB . Proposed slicing algorithm does not depend on the distortion estimation method, and other methods can also be used.

v #

x

y

z

{

|

}





y

z w

{

|

}

















!





#

where, is the loss probability of a fragment, and is the length of a fragment in terms of bits. is the actual load of the fragment and does not include the parity bits. Due to FEC, and depend on the FEC amount at the link layer. We and are fixed for a frame duration, which assume that is 100msec for 10Hz video. There are 2 major problems; (i) depends on the MBs that have not been encoded yet, hence it is impossible to estimate without encoding the following MBs. (ii) due to predictive coding, , of th MB depend the coding decisions, hence the bitrate on encoding of all of the MBs that came before it in scanline order. These problems can be solved by encoding the whole frame with ’th configuration, but this is clearly too costly to be practical. To overcome this problem, we assume that the dependency of MBs become negligible after immediate neighborhoods. This neighborhood is shown in Fig. 1 based on slice boundaries. We classify these dependency types in 3 groups; no dependency(type I), single dependency(type II), and multiple dependencies(types III, IV and V). Then, we propose the following rate and distortion estimation method; Before encoding frame , ( ), encode 3 frames with special slicing configurations such that dependencies for types I, II and V presented in Fig. 1 are estimated. These configurations are; , , and, , and resulting frames are , and respectively. These 3 frames are then used to estimate and based for any as follows; on 



s

u



q

*





s

u



q



s

q

u

u

q



s

u



q



3. RATE DISTORTION OPTIMAL SLICING OVER BIT ERROR CHANNELS

u

q

*

*

s

s

s

u

q

*

!

[



v

#

w





In this section, we address the problem of determining the location and number of resynchronization markers in a frame of video, in a rate-distortion optimal manner under the proposed transport model. denote the th frame slicing configLet uration where denote the length of ’th slice in terms of MBs, and denote the number of slices in th configuration, such that and . Also let denote the index of the first MB in th slice, such that . There are valid slice configurations. A thorough slicing algorithm for all possible slice should compute an objective function ) and select the configuconfigurations (i.e. that minimizes . It is clear that the number of valid ration slice configurations is too many if we perform exhaustive search to compute and compare all of the possible configurations. We first propose a method to efficiently estimate the objective function for a given configuration, then we introduce the dynamic programming method to perform this search efficiently and in real-time. %

$

%



(

,

-

/

/

/

-

*

%1

2

4

*

6

%

7

*

1

8

2

9

%

6

;

%

%

-

,

7

7

*



9

@

A

C

8

E

*



G

H

#

,

#

=

7

I

;

%

%

,

7

8

H



*

#

1

L

M

,

#

=

I

K



1

L

M

O

$

%



,

I



K

-

6

P

/

/

/

-

4

(



/



$

S

O



%









-

[



$

%





%

$

n





O





,

-

$





-

/

/

/

-

4

-

(

$



(

‘

@

-

‘

A

@

/

/

/

-

4

‘

A

@

A

,



4

$

“



(

9

@

“

O

A

O

O











[









-

`

n



$

%



$

%











q

-

3.1. Estimating the Objective Function for a Configuration





[

q

-



O

 

•

[



-

 O



-

O 



•

-



 O

 





%

2

G

H





! š

›



We use Lagrangian formulation of RD optimization[8]. Bitrate is set as a constraint while minimizing the distortion. Hence we propose

1306

v #

Š 

w

˜

 



%

2



™

H 

!

K



¡

n

¡

v

`

#



Š w

Ÿ

‘

@

A



(8)

Level 1

4. Prune nodes: For all , search the tree for an unbranched . If and unpruned node with equal last entry; , then is pruned, otherwise is pruned. s

5

$



º



s





Ì



Level 2

1

2

3

4

4

3

2

1

§

§ © ¨

ª

¨

§

Level 3

1

1

1

2

3

3

2

§ ¨

§ ¨

1 § ¨

© ª

© ª

§

© ¨ ª

© ª

© ª

§ © ¨ ª

© ª

$

$

º

© ª

9

©

ª

and skip Steps Obviously, for initialization, we insert (1) and (2). is the total number of possible code rates allowed at link layer. If link layer does not let application layer to control the and step (3) is skipped. code rate, then simply Pruning in Step(4) is based on the independence and additivity properties of the objective function. Consider comparing two nodes with same last entries. Let and , where . The optimal sub partitioning of the last entry will be same for both and . This is true because the best configuration that can be achieved that starts with is, and for it is; . Therefore, if then because of independence and additivity properties, it must be true that (or vice-versa). Sub-partitioning the last entry is equivalent to iteratively spawning nodes from the configuration (i.e., creating a sub tree that uses the node as the parent node). Therefore, there is no point in keeping the worse performer and its children nodes in the tree. Figure 2 shows an examplary tree. The exhaustive method has objective function evaluations. The proposed method makes evaluations. Combined with the simplified objective function estimation method from the previous section, we perform this optimization in realtime. 4

1 © ª

© ¨



$

1 © ª

1

§

§ ¨

$

s

3 © ª

2

§ ¨

¨

§

Ì

Ì

©

§ ¨

2 § ¨



º





O

ª

© ª

2

§ ¨

1 § ¨



©

© ª

2 § ¨

$

O

Ÿ

s



ª

§ ¨

¨

1

$



ª



(

9

@

A

Š

y

­

­

«

®

®

­

Level 4

1

­

« ¬

1

1

1

2

2

1

1 ­ ®

­ ®

­ ®

­

« ®

¬

­ ®

« ¬

­

« ®

¬

¤

¤ ¦

¦

1

­ ®

­ ®

« ¬

­

« ®

¬

­

« ®

¬

­

« ®

¬

­

« ¬

« ¬

« ¬

« ¬

« ¬

¬



y

1

« ®

Š

9

« ¬

1

­ ®

­ ®



« ¬

1

­ ®

1 ­ ®

« ¬

« ¬

2

­ ®

2 ­ ®

¬

®

®

1

«

¬

« ¬

1

,

« ¬

s

ˆ

s

s

-

¤

¤ ¦

¦

¤ ¦

$

1

1 ¤ ¦

¤ ¦

¤ ¦



1

Ï

(

/

/

-

4

$

$

$



Ï

1

,

¤ ¦

ˆ

s

¤

-

¦

1 ¤ ¦

/

Ì

¤ ¦

/

/

/

-

4

Ì

Ì

Ì

¤ ¦

(

¤ ¦

$



$

$

$

s

Level 5

1 ¤ ¦

¤ ¦

¤ ¦

Ì

¤ ¦

$

¤

¤ ¦

¦

1 1 ¤ ¦

¤ ¦

¤ ¦

¤ ¦

$

s

¤ ¦

¤ ¦

1

Ó

1

$

,

,

¤ ¦

,

I

ˆ

¤

s

¦

s

s

-

/

/

/

-

-

-

/

/

/

-

4

¿

¿

Ì

Ì

Ò

Ò

S

$



(

$

$

$

1

$

$

S

$



Ï

,

1

,

Ó

,

I



-

/

/

/

-

-

-

/

/

/

-



s

4

¿

¿

Ì

Ì

Ì

Ò





Ò

(

Fig. 2. A fictitious slice configuration tree and pruning with 5 levels( ). The nodes over gray background are pruned, hence objective functions not calculated. 9

@



A

¯

,

, and serve as a look up table for rate Hence, the frames and distortion of MBs. We test the accuracy of the proposed estimation technique of Eqn. 8 by selecting 300 slices of random length and random locations. The proposed estimation method on the average has error for distortion and error for bitrate estimation. This accuracy proves that our assumptions are indeed valid. 

“

O

O

O







K

±

±

²

$

$

$

$

M

,

I

K

K

³

´

9

,

4

$



(

s

9

s

$



@











s



-

s



A



$

Ô

$

O



s

Ì



1

L

S

$

M



Ô

S

$

O



1

,

×

,

I

I

K

K

1

L

M

1

L

M

,

×



È

I

o



v

9

Š

w

x



“

y

4. RESULTS

Exhaustive search is computationally impossible because even when there is no channel coding optimization, it requires determining and comparing configurations’ cost functions, which is for a QCIF frame. However, we make the following observations; (i) the slices are independently decodable than each other, and (ii) the objective function grows additively with each additional slice(i.e., entries in a configuration). We call these independence and additivity properties of the objective function. Based on these properties we propose a method that uses an -ary tree. Tree’s top node starts with and it is at the 1st level meaning that it is composed of only 1 slice. The 2nd level is composed of 2 slices, 3rd level 3 and so on. A node with configuration at th level branches to new nodes at level by the following rule: , generate . All are added to the tree as the children nodes of . The following dynamic tree building algorithm is applied iteratively until termination; L









O

3.2. Search via Dynamic Programming

1

$

O

/

/

/







We used H264 baseline profile for our tests although proposed system can be applied to any other standard. The round trip time is set to 300 milliseconds. We used 100 frames of standard sequences; Foreman, Carphone and News, (high, intermediate and low motion complexity, respectively) at QCIF resolution and 10Hz frame rate. The link layer is simulated by using a heavily modified version of the software provided by [10]. We implemented Reed-Solomon channel coding at 3 rates; and is included as no FEC. The channel bitrate is set to 128kbps and there are no retransmissions. We study the instantaneous uniform bit error rate regime to . We assume Robust Header Compression is availof . Experiments are repeated 20 times and the able, and set average sequence pSNR is calculated. 

Ø



-

(



Ø

Ø

Ø

K

-

Ù

4

²

²

²

K

/

G

²

/

G

G

G

G









²

4

-

s

s



C

¸

P

$

(







s



s

,

-

/

/

/

-

-



4

-

,

I

$

º



(

*

¸

*

*

¸

$

º

#

#

s

$

4.1. Results for Link Layer Controlled FEC In this set of experiments we fixed the link layer FEC rate to (i.e. ) and compared the proposed adaptive slicing algorithm in Section 3 to fixed MB per slice method. Figure 3 shows the resultant average pSNR plots under various channel bit error rates. Figure compares the proposed system with various fixed MB per slice options, in particular, 5, 11, 33 and 99MB per slice. The proposed algorithm operates close to fixed MB per slice methods at lower error rates. At higher loss rates proposed algorithm performs much better than the best of fixed MB per slice options; for News and for Carphone. The performance up to loss at lower error rates can be attributed to the fact the proposed alrogithm does not evaluate the slice performances exactly but only uses an estimate. At low loss rate this estimation inaccuracy causes a slightly lower performance. 

1. Find the unpruned and unbranched node that has the biggest last entry; denote with which is at level . If no node exists, terminate. s



s

-

$



2. Spawn children nodes , s







/

/

/

-

(

s

using 

º





$



such that,

s



s

$

C

¸

P





4

$

s

s

$





s

-

9

,

-

/

/

/

-

,

4

I

º



(

*

*

¸

* ¸

#

#

3. For all spawned nodes, determine the optimal FEC rate for the newly generated slices (only the last 2 slices);



9

Š



y

/

s

¾

¿

G



(9)



s

 1

º ,

$

 [



À

Á

Â

Ã

Ä

Š

¾ s



Ç

Ç

O Å

Ç

$





¿



y

È

É

Æ



ˆ

ˆ

I I

 1 ,  $

[

 g

À

Á

Â



Ã

Š

Ä



Ç

O Å

Ç

Ç

$

 g



} }



y

Æ

È

(10)

É

Ë

1307

Ú

Û

/

G

Û

¯

Ø

K

News

Carphone

36

33.5

34.5 34 33.5

31

32.5

32

33

30.5 30 29.5 29

31.5

32.5 32

Proposed 5MB 11MB 33MB 99MB

31.5

33

Average pSNR

35

32 Proposed 5MB 11MB 33MB 99MB Average pSNR

35.5

Average pSNR

Foreman

34 Proposed 5MB 11MB 33MB 99MB

2

2.5

3

3.5

4

4.5

5

5.5

p

31

6

28.5

2

2.5

3

3.5

4

4.5

5

p

−3

x 10

BER

5.5

28

6

2

2.5

3

3.5

4

4.5

5

p

−3

x 10

BER

5.5

6 −3

x 10

BER

Fig. 3. Comparison of proposed adaptive slicing and fixed MB per slice method. We see that proposed method succesfully works at the convex hull of the fixed MB per slice configurations consistently under various BERs. News

Carphone

36 35

32 Proposed LLC: 1/4 LLC: 1/2 LLC: 3/4

33.5 33

34

31

32

30.5 Average pSNR

33

32 31.5 31 30.5

31

Proposed LLC: 1/4 LLC: 1/2 LLC: 3/4

31.5

32.5 Average pSNR

Average pSNR

Foreman

34

Proposed LLC: 1/4 LLC: 1/2 LLC: 3/4

30 29.5 29 28.5

30

28

29.5

27.5

30 29

2

4

6

8

10

12

14

29

16

p

2

4

6

8

−3

x 10

BER

10 pBER

12

14

27

16

2

4

6

8

−3

x 10

10 pBER

12

14

16 −3

x 10

Fig. 4. Comparison of the link layer controlled(LLC) channel coding and the proposed UEP method

5. CONCLUSIONS

4.2. Results for Application Layer Controlled FEC In this set of experiments we compare link layer controlled FEC with application layer controlled FEC using the proposed UEP optimization method. As we stated in the body of the paper we do not perform FEC at application layer, we only tell the link layer at what rate it should apply FEC. We used FEC rates of , , , . The results are shown on Fig. 4. We see that especially for News sequence at high loss rates the proposed system is up to 1dB better. For Carphone sequence the maximum difference occurs at around BER with 0.5dB and a constant 0.3 dB for higher loss rates. For Foreman sequence peak occus around BER of with 0.4dB and stays contstant at 0.2dB for higher BER. These results agree with the expectations, that is, proposed UEP method operates effectively compared to equal protection when the dynamics of a video differ locally within a frame. The high gain for News is a proof of this. The low motion parts of the video are effectively less protected and the bits conserved by doing so is effectively utilized for higher impact regions. On the other hand, the Foreman sequence presents similar characteristics within each frame: there is a constant camera and actor motion, therefore, UEP does not benefit much from it. 

Ø



Ø

Ø

Ø

K

Ù

9

Š



²

(

4

²

²

²

²

y

×

I

Ý



Ü

G

In this paper we propose a method for slicing over bit error channels. The proposed dynamic programming and performance estimation techniques significantly reduce the search time to practical ranges. The proposed method can be used for unequal error protection that can be used over FEC capable channels. Although the presented experiments used fixed channel models, the proposed algorithms only rely on the current BER of the channel, therefore, they are suitable for dynamic channels. Our experiments show that, with the proposed algorithm, we have about 0.7dB gain for adaptive slicing for link layer controlled FEC and 1.0dB gain for unequal error protection for application layer controllable FEC.

×

I

6. REFERENCES

Ý



¯

G

Overall, controllable code rates effectively enriches the encoder’s decision space by including the FEC rates into the optimization framework. The encoder makes better decisions for the lossy channel when this is combined with the known channel model.

[1] Lars-Ake Larzon, Mikael Degermark, and Stephen Pink, “Udp lite for real time multimedia applications,” HP Labs Technical Report, HPL-IRI-1999-001, 1999. [2] D. Marpe, G. Blattermann, G. Heising, and T. Wiegand, “Video compression using context-based adaptive arithmetic coding,” International Conference on Image Processing, Oct 2001. [3] Iain E. G. Richardson and Martyn J. Riley, “Varying slice size to improve error tolerance of MPEG video,” Proceedings of SPIE, Vol. 2668, p. 365-371, 1996. [4] Guy Cote, Shahram Shirani, and Faouzi Kossentini, “Optimal mode selection and synchronization for robust video communications over error-prone networks,” IEEE JSAC, June 2000. [5] Enrico Masala, Hua Yang, Kenneth Rose, and Juan Carlos De Martin, “Rate-distortion optimized slicing, packetization and coding for error resilient video transmission,” in Proc. IEEE 2004 Data Compression Conference, 2004, pp. 182–191. [6] E. Steinbach, N. Farber, and B. Girod, “Standard compatible extension of H.263 for robust video transmission in mobile environments,” IEEE CSVT, Dec. 1997. [7] ITU-T SG15/WP15/1 LBC-95-033 Telenor RD, “An error resilience method based on backchannel signalling and FEC,” 1996. [8] G.J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” IEEE Communications Magazine, November 1998. [9] Thomas Wiegand, Heiko Schwarz, Anthony Joch, Faouzi Kossentini, and Gary J. Sullivan, “Rate-constrained coder control and comparison of video coding standards,” IEEE CSVT, July 2003. [10] G. Roth, R. Sjberg, G. Liebl, T. Stockhammer, V. Varsa, and M. Karczewicz, “Common test conditions for RTP/IP over 3GPP/3GPP2,” ITU-T SG16 Doc. VCEG-M77, 2001.

1308

Suggest Documents