Scalable Internet Video Streaming With Unequal

0 downloads 0 Views 213KB Size Report
A theoretical framework which relates packet loss of an unequally ..... encoding of MPEG sequences using priority encoding transmission (PET)," Tech. Rep.
Packet Video Workshop 99, April 1999, New York

Scalable Internet Video Streaming With Unequal Error Protection K. Stuhlmuller, M. Link and B. Girod Telecommunications Laboratory, University Erlangen-Nuremberg, Erlangen, Germany E-mail: fstuhl,mlink,[email protected] U. Horn Ericsson Eurolab, Research Department, Herzogenrath, Germany E-mail: [email protected] Abstract

We present a transmission scheme for Internet video streaming that provides an acceptable video quality over a wide range of connection qualities. The proposed system consists of a scalable video coder which uses a fully standard compatible H.263 coder in its base layer. The scalable video coder is combined with unequal error protection using Reed-Solomon codes applied across packets. A theoretical framework which relates packet loss of an unequally protected layered video stream to picture quality at the decoder is presented. Simulation results show that, with our approach, the picture quality of a streamed video degrades gracefully as the packet loss probability of an Internet connection increases.

1 Introduction The demand for Internet video streaming services has rapidly grown over the past few years. Since the Internet supports real-time services only in a best-e ort manner, it remains to be a challenging task to design high-quality video streaming systems which are able to cope with the Internet's unpredictable and varying network conditions. The quality degradation in an Internet video transmission compared to a perfect transmission is mainly determined by the packet loss behavior observed at the video decoder. Congested routers are one typical source for packet losses. By posing a real-time constraint on a transmission system as meant by the term streaming, packets arriving too late at the receiver are another source of packet loss. Internet real-time transmission at high packet rates is often characterized by a bursty packet loss behavior, i.e., if one packet is lost it is very likely that consecutive packets will also be lost. For low to medium packet rates independent packet loss has been reported [1]. In this paper we use the well-known two-state Markov model to approximate the bursty behavior [2]. This simple and tractable model can be fully described by two parameters, PB for the average packet loss probability and LB for the average burst length. During our transmission experiments we measured for PB values between 0 and 0.7 and for LB values between 2 and 50 packets. In particular, low values of PB occured along with high values of LB and vice versa. For a constant channel data rate we observed that an exchange between packet rate and size was possible as long as a rate of roughly 400 packets/s was not exceeded. At higher packet rates packet losses increased quite rapidly. While we will use a burst packet loss model in the following, the application of the presented transmission scheme is not limited to burst channels. On the contrary, it is even more ecient for independent packet losses. Although appropriate error concealment techniques at the decoder can reduce the negative impact of packet loss, an error, once it becomes visible, tends to propagate in time due to motion compensated predictive coding which is employed in all existing video coding standards [3]. Interframe error propagation can be reduced at the cost of an overall increased bit rate if Intra-coded information is sent more frequently [4]. More sophisticated error resilient coding methods can be used if a feedback channel is available [5, 6, 7]. The robustness against packet losses, especially in cases where a feedback channel is not available, can be increased if the encoder protects the packets with an appropriate forward error correction (FEC) scheme. FEC based protection of packet video data has previously been proposed in the PET system [8, 9] for MPEG-1 Internet video transmission and also for lost cell recovery in ATM based video transmission [10, 11, 12]. The IETF currently discusses payload 1

I

P

P

P

P

P

P

P

Enhancement layer CIF, 30 fps Spatial prediction

Base layer QCIF, 15 fps I

P

P

P

Figure 1: The Group Of Pictures (GOP) structure of the scalable coder used throughout this article. It contains a base layer at QCIF resolution complemented by a CIF enhancement layer. This GOP comprises eight frames, i.e., in each layer an Intra-frame is transmitted every 266 ms. Spatial and temporal redundancies are exploited by applying spatial and/or temporal prediction as denoted by vertical and horizontal arrows. formats for generic FEC [13] and for Reed-Solomon FEC [14]. Both approaches propose to send media and FEC packets as separate streams, each using its own sequence number space, to allow backward compatibility to receiver applications which do not support any of the proposed FEC schemes. In this article we outline a video transmission scheme which combines a scalable video coder with unequal error protection (UEP) across packets. The scalable video coder produces a bitstream, decodable at di erent bit rates. It allows computation-time and memory-limited decoding on less powerful hardware platforms [15], and it can substantially improve the quality and acceptance of Internet video services [16]. Scalable video coding has been proposed previously to address the problem of transmitting video over an unknown channel. Besides transmission over heterogeneous networks like the Internet [17, 18, 19] digital TV broadcasting is another example where scalability can achieve graceful degradation of the picture quality depending on the available bandwidth for error free data transmission [20]. A more detailed description of the video transmission scheme outlined here is given in [2].

2 Scalable Video Coding with Unequal Error Protection For a given Internet connection a video transmission scheme using FEC could be designed to guarantee a certain maximum block error rate BER, which in this case would be the probability of a visible error due to residual packet loss after channel decoding. In order to make ecient use of the available resources, a scheme like this must assume stable and in a multicast scenario identical channel conditions for all receivers, as well as identical resources to perform the channel and video decoding of the received packets. Unfortunately, all these assumptions do not hold for the Internet: Channel conditions are both time- and user-dependent, and users have di erent network access and computational resources. Therefore scalable video schemes were proposed [15, 16] which produce bitstreams decodeable at di erent bit rates, requiring di erent computational power. The Internet video transmission scheme investigated in this paper is based on a two-layer scalable video coder which encodes the spatio-temporal resolution pyramid shown in Fig. 1. The QCIF (176 x 144 pels) base layer (BL) has a temporal resolution of 15 frames per second (fps) whereas the CIF (352 x 288 pels) enhancement layer (EL) is encoded at 30 fps. Within each layer an Intra-frame is sent every 266 ms. The base layer is transmitted by a fully standard compatible H.263 coder. The enhancement layer is encoded by a motion compensated hybrid coder where the usual DCT has been replaced by lattice vector quantization to give an improved coding eciency. A more detailed description of this approach can be found in [21, 2]. Note that the base layer bitstream can be used to display the video sequence at reduced spatial and temporal resolution. When additionally the enhancement layer bitstream and sucient computing power are available, the sequence can be displayed at full resolution. The base layer shall have stronger protection against packet loss to be decodeable even if no enhancement layer data are available. Therefore, we combine the scalable video coder with an unequal error protection (UEP) scheme based on Reed-Solomon codes as outlined in Fig. 2.

2

Unequal Error Protection

Video E

ReedScalable Source Solomon Coder B Encoder

Erasure Decoding Internet Channel Model

ReedSolomon Decoder

Source Decoder

Display

Figure 2: Outline of the video transmission scheme. The scalable source coder encodes the input video in two layers, a base layer B carrying the most important and an enhancement layer E carrying the less important image information. Before the layers are transmitted they are packetized and protected against packet loss according to their importance by the FEC method described in section 2.2. Packet losses are simulated by a Markov model. At the receiver an erasure decoder reconstructs for each layer as many packets as possible. The source decoder reacts to missing packets with appropriate concealment techniques. S1

S0

k I Info Layer 1

Layer 0 k1

k0 packet

number of packets n

channel coding redundancy

packet size

Figure 3: Block of packets (BOP) layout for transmitting a layered video stream consisting of two layers. The video stream belonging to layer l is distributed across kl packets and occupies in each packet Sl bytes. The shaded areas are lled with channel coding redundancy by column-wise computing Reed-Solomon codewords.

2.1 Packetization of Layered Video Streams with Unequal Error Protection

As mentioned in the introduction, the IETF currently discusses payload formats for generic FEC [13] and for ReedSolomon FEC [14]. Both approaches propose to transmit media packets and FEC packets in separate streams. However, for the transmission scheme investigated in this paper it is more advantageous to transmit all information in one stream to avoid a situation where the enhancement layer can be reconstructed while the base layer cannot. We multiplex the bit streams of both layers into one block of packets (BOP) as shown in Fig. 3. The bits belonging to layer l are distributed over a number kl of di erent packets. The remaining packets in the BOP are lled with channel coding redundancy which is computed based on Reed-Solomon codes as described in Section 2.2. The number of packets in the BOP n is xed but the packet size may vary. By allowing one group of pictures (GOP) to be encoded with more than one BOP it is possible to limit the maximal packet size to 1024 bytes. In the experiments presented in this paper it was always possible to pack one complete GOP into one BOP. The dimensions of the elds inside each BOP (k ; S ) are transmitted as side information in the info eld in each BOP. We can transmit this info eld eciently by protecting it at least as strongly as the base layer. The BOP packets itself are transmitted as RTP [22] packets. For recovering the information at the decoder we need to know the BOP number. This is transmitted using 1 Byte in the RTP payload header. The position of the packet in the BOP can be derived from the RTP sequence number.

3

2.2 Packet Loss Protection Based on Reed-Solomon Codes

The idea of forward error correction (FEC) across packets is to transmit additional redundant packets which can be used at the receiver to reconstruct lost packets. To compensate for the increased channel data rate introduced by FEC, the video data rate must be decreased accordingly which results in an initially decreased picture quality at the sender. However, due to the capability of reconstructing lost packets at the receiver, the video decoder experiences a lower e ective packet loss rate than without FEC and therefore the overall picture quality increases. Our FEC scheme uses Reed-Solomon (RS) codes across packets. RS codes are perfectly suited for error protection against packet loss because they are the only known non-trivial maximum distance separable codes, i.e., there exist no other codes that can reconstruct erased symbols from a smaller fraction of received code symbols [23]. An RS(n; k) code of length n and dimension k encodes k information symbols containing m bits each into a codeword of n such symbols. The codeword length n is restricted by n  2m ? 1. We chose m = 8, i.e., our symbols are bytes of 8 bits and n  255. The RS code is applied across packets lling the shaded area in Fig. 3 with channel coding redundancy. Unequal error protection is achieved by varying k. At the receiver, all the information of a layer can be reconstructed from any subset of at least kl correctly received packets of this BOP using erasure decoding [24]. This is possible because each sent packet is marked with its packet number, and therefore the exact positions of lost packets in the BOP are known to the receiver. The decoding process can start as soon as any k packets of a BOP have been received. Thus, our FEC scheme requires a receiver bu er which can hold k0 packets at least.

3 Calculation of the Decoded PSNR For designing the transmission scheme it is advantageous to know the probability that a BOP cannot be reconstructed by the erasure decoder as a function of the channel and the RS code parameters. In case of an RS(n; k) code, this is the probability that more than n ? k packets of a BOP are lost. This probability can be calculated using the block error density function P(m; n). P(m; n) is the probability of m lost packets within a block of n packets. It is a simple binomial distribution in case of a memoryless channel with given packet loss rate. To model the packet losses on the Internet we used a two-state Markov model [2, 21]. Comparisons between measured and calculated block error density function P(m; n) for various transmission experiments [2, 21] show that the Markov model describes the packet channel's burst loss behavior at high packet rates very well and can be used for determining the overall picture quality at the decoder. In the following we use the block error density function to estimate the resulting average peak-signal-to-noise-ratio PSNR of a video transmission where a scalable source codec is combined with unequal error protection. Let us assume that layer l is 'switched o ' as soon as residual packet loss after RS decoding is detected in this layer. In a scheme with L layers, l = 0; : : :; L ? 1, each protected with an RS(n,kl ) code, layer l is the highest resolution layer displayed if for the number of lost packets NL in the BOP n ? kl?1 < NL  n ? kl holds, i.e., some data of layer l ? 1 is missing but not of layer l. Using the block error density function P(m; n), the probability pl that layer l is displayed can be computed as n?k pl = P(m; n) (1)

X

l

m=n?kl?1 +1

if we set k?1 = n + 1. By setting kL = 0, (1) yields the probability pL that even the BL L ? 1 is a ected by packet loss. Let PSNRl be the average PSNR when only layer l is displayed over the whole time period encoded in one BOP. Then the average PSNR resulting from the switch{o strategy described above is PSNR =

XL PSNRl  pl : l=0

(2)

PSNRL is the PSNR if there are residual packet losses in the BL, which is of course never switched o . PSNRL depends very much on the duration of the gap. We approximated it by measuring the PSNR between one decoded QCIF frame and all original frames of the test sequence. Note that all PSNRl can be measured at the encoder in advance. If a layer is not completely switched o for the rest of a GOP and error concealment is applied, the average PSNR should be better than the one resulting from (2). This e ect can be seen in Fig. 4 where we compare measured and calculated values for PSNR as a function of the average packet loss rate PB . The average PSNR with concealment is around 1 dB better than calculated. The PSNR estimation (2) is used to adjust the UEP parameters by balancing the 4

32

calculation measurement 30

PSNR

28

26

24

22

20

0

0.05

0.1

0.15

0.2 0.25 Packet Loss Rate

0.3

0.35

0.4

Figure 4: Measured and calculated average PSNR for Silent Voice as a function of the average packet loss rate PB . The average burst length is LB = 5 in this example. The scalable coder with 2 layers combined with UEP is used. For the BL an RS(100; 65) and for the EL an RS(100; 96) code is employed. The measured layer PSNRs are PSNR0 = 30:3 dB (CIF), PSNR1 = 26:6 dB (QCIF), and PSNR2 = 21 dB. Experiment Scalable coder H.263 prot. H.263 unprot.

Source rate Error protection Channel rate [kbps] [kbps] Test sequence: Silent Voice 290 QCIF: RS(100; 65), 340 CIF: RS(100; 96) 290 RS(100; 85) 340 340 { 340

Source rate [kbps] 410 410 480

Error protection

Channel rate [kbps] Test sequence: Foreman QCIF: RS(100; 62), 480 CIF: RS(100; 96) RS(100; 85) 480 { 480

Table 1: This table summarizes the details of the simulation experiments we carried out for two test sequences (Silent Voice and Foreman). In each of the listed experiments we measured the resulting PSNR as a function of the packet loss probability. The corresponding graphs are presented in Figs. 5. available redundancy between the layers in order to accomplish the desired graceful degradation when channel quality decreases.

4 Simulation Results In this section we present simulation results obtained with the transmission scheme presented in section 2. We compare the performance of the proposed scheme with a single layer H.263 coder which is combined with equal error protection and also with an H.263 coder which uses no error protection at all. The input sequences for the experiments in this section are the CIF test sequences Silent Voice and Foreman encoded at 30 fps. We allow 15 % redundancy for error protection. Table 1 gives an overview about the bitrates and the error protection parameters we used in our experiments. Fig. 5 shows the measured average PSNR over 5 channel realizations as a function of the packet loss rate PB . It can be seen that the proposed transmission scheme which combines a scalable coder with unequal error protection achieves graceful degradation. In the error{free case unprotected H.263 coding is of course the best choice since more bits are spent for video source coding. However, the quality drops rapidly resulting in very annoying distortions even if only a few packets are lost. By protecting the single layer H.263 coder the picture quality breaks down rapidly if a certain maximum packet loss rate (PB  0:2) is exceeded. However, as discussed below, the subjective quality has already degraded signi cantly at this point. In contrast, the scalable scheme still receives the base layer almost error-free, if parts of the enhancement layer are lost. This results in a reduced spatial and temporal resolution but the annoying e ects observed in the single layer transmissions are signi cantly reduced. 5

32

32

Pyra UEP H.263 EEP H.263 uncoded

30

Pyra UEP H.263 EEP H.263 uncoded

30

28

26

PSNR [dB]

PSNR [dB]

28

26

24

24

22

20

18 22 16

20

0

0.05

0.1

0.15

0.2 0.25 Packet Loss Rate

0.3

0.35

0.4

14

0

0.05

0.1

0.15

0.2 0.25 Packet Loss Rate

0.3

0.35

0.4

Figure 5: Average PSNR for Silent Voice (left) and Foreman (right). H.263 without error protection, H.263 with EEP RS(100; 85) and Pyramid Coder with 2 layers and UEP QCIF: RS(100; 65) (Silent Voice), RS(100; 64) (Foreman), CIF: RS(100; 96). Average burst length LB = 5. Since every 8th frame is coded in Intra mode (see Fig. 1), error propagation due to motion compensated prediction is limited unless the Intra frame is lost, too. Of course, error propagation can further be reduced by increasing the number of Intra macroblocks in the Inter coded frames. This works for both, H.263 and the scalable codec. For the scalable codec Intra coding is even more ecient in the EL since the base layer can be used for prediction in this case. However, coding more macroblocks in Intra mode decreases the encoded video quality if the rate is kept constant. Obviously there is a trade-o for the amount of Intra coding. An analytical solution for this trade-o can be found in [4]. The optimal amount of Intra coding obviously depends on the loss rate. However, we assume that we do not know the loss rate in advance and instead want to have a system with graceful degradation over a broad range of loss rates. Fig. 6 compares for the test sequence Foreman PSNR curves for a single channel realization with PB = 0:15 obtained with the scalable coder combined with UEP with those obtained with the single layer H.263 coder combined with EEP. Evidently the scalable coder switches to the base layer very often since the enhancement layer is not protected strongly enough for this channel. In contrast, for H.263 the PSNR stays at the CIF quality level most of the time but drops dramatically from time to time resulting in severely distorted images until the next Intra-frame is received. Fig. 7 shows typical examples of the subjective picture quality obtained with the scalable coder and with the single layer H.263 coder unprotected or EEP.

5 Conclusions In this paper we presented a transmission scheme for Internet video streaming which degrades gracefully as the packet loss probability of an Internet connection increases. The transmission scheme consists of a scalable video coder which uses H.263 in its base layer combined with unequal error protection using Reed-Solomon codes applied across packets. Experimental results show that with our approach the picture quality of a streamed video degrades more gracefully as the packet loss probability of an Internet connection increases. With only a slight decrease in video quality for the error{free case we can still maintain an acceptable quality for transmission over a severely lossy channel. The proposed scheme is especially useful for Internet multicast applications where di erent users are subject to di erent channel qualities and no feedback channel can be employed.

References

[1] J.-C. Bolot, \Characterizing end-to-end packet delay and loss in the internet," Journal of High-Speed Networks 2, pp. 305{ 323, Dec. 1993.

6

34

32

30

28

PSNR

26

24

22

20

18

Pyra UEP H.263 EEP

16

14

0

50

100

150 frame number

200

250

300

Figure 6: PSNR for Foreman. Pyramid Coder with 2 layers and UEP RS(100; 64), RS(100; 96) and H.263 with EEP RS(100; 85). One channel realization with packet loss rate PB = 0:15 and average burst length LB = 5. [2] U. Horn, K. W. Stuhlmuller, M. Link, and B. Girod, \Robust internet video transmission based on scalable coding and unequal error protection," Image Com. , 1999. Special Issue on Real-time Video over the Internet, accepted for publication. [3] Y. Wang and Q.-F. Zhu, \Error control and concealment for video communication: A review," Proc. of the IEEE 86, pp. 974{997, May 1998. [4] N. Farber, K. W. Stuhlmuller, and B. Girod, \Analysis of error propagation in hybrid video coding with application to error resilience," in Int. Conf. on Image Processing, (Kobe, Japan), Oct. 1999. submitted. [5] N. Farber, E. Steinbach, and B. Girod, \Robust H.263 compatible video transmission over wireless channels," in Proc. PCS, pp. 575{578, (Melbourne), Mar. 1996. [6] E. Steinbach, N. Farber, and B. Girod, \Standard compatible extension of H.263 for robust video transmission in mobile environments," IEEE Trans. Circuits and Sys. for Video Tech. 7, pp. 872{881, Dec. 1997. [7] N. Farber and B. Girod, \Feedback-based error control for mobile video transmission," in Proc. IEEE, Special Issue on Video for Mobile Multimedia, accepted for publication (Oct. 1999). [8] A. Albanese, J. Bloemer, J. Edmonds, M. Luby, and M. Sudan, \Priority encoding transmission," IEEE Trans. on Inf. Theory 42, pp. 1737{1747, Nov. 1996. [9] C. Leicher, \Hierarchical encoding of MPEG sequences using priority encoding transmission (PET)," Tech. Rep. TR-94-058, International Computer Science Institute, Berkeley, Nov. 1994. [10] S. H. Lee, P. J. Lee, and R. Ansari, \Cell loss detection and recovery in variable rate video," in Proc. 3rd Int. Packet Video Works., (Morriston), Mar. 1990. [11] E. W. Biersack, \Performance evaluation of forward error correction in an ATM environment," IEEE J. on Sel. Areas in Com. 11, pp. 631{640, May 1993. [12] V. Parthsarathy, J. W. Modestino, and K. S. Vastola, \Design of a transport coding scheme for high{quality video over ATM networks," IEEE Trans. Circ. Syst. Video Techn. 7, pp. 358{376, Apr. 1997. [13] J. Rosenberg and H. Schulzrinne, \An RTP payload format for generic forward error correction." http://www.ietf.org, IETF Internet Draft, Nov. 1998. [14] J. Rosenberg and H. Schulzrinne, \An RTP payload format for Reed Solomon Codes." http://www.ietf.org, IETF Internet Draft, Nov. 1998. [15] B. Girod, \Scalable video for multimedia systems," Computers & Graphics 17(3), pp. 269{276, 1993. [16] U. Horn and B. Girod, \Scalable video transmission for the Internet," Computer Network and ISDN Systems 29, pp. 1833{ 1842, Nov. 1997. [17] N. Chadda, G. Wall, and B. Schmidt, \An end-to-end software-only scalable video delivery system," in Proc. NOSSDAV'95, Apr. 1995. [18] D. Ho man and M. Speer, \Hierarchical video distribution over Internet-style networks," in Proc. ICIP'96, vol. I, pp. 5{8, (Lausanne), Sep. 1996.

7

Figure 7: One decoded image of H.263 with EEP (left) and the Pyramid Coder (right). [19] W. Tan, E. Chang, and A. Zakhor, \Real time software implementation of scalable video codec," in ICIP'96, vol. I, pp. 17{20, (Lausanne), Sep. 1996. [20] U. Horn, B. Girod, and B. Belzer, \Scalable video coding for multimedia applications and robust transmission over wireless channels," in 7th Int. Workshop on Packet Video, pp. 43{48, (Brisbane), Mar. 1996. [21] B. Girod, K. W. Stuhlmuller, M. Link, and U. Horn, \Packet loss resilient internet video streaming," in Proc. Symp. on Visual Comm. and Image Proc., SPIE, (San Jose, California), Jan. 1999. [22] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, \RTP: A transport protocol for real-time applications." IETF RFC 1889, Jan. 1996. [23] R. E. Blahut, Theory and Practice of Error Control Codes, Addison Wesley, Reading, MA, 1983. [24] A. J. McAuley, \Reliable broadband communication using a burst erasure correcting code," in Proc. ACM SIGCOMM, pp. 297{306, (Philadelphia), Sep. 1990.

8