COMPARISON BETWEEN MULTIPLE DESCRIPTION AND SINGLE DESCRIPTION VIDEO CODING WITH FORWARD ERROR CORRECTION R. Bernardini, M. Durigon, R. Rinaldo∗
A. Vitali∗
Universit`a degli Studi di Udine - Italy Dipartimento di Ingegneria Elettrica, Gestionale e Meccanica Via delle Scienze 208, Udine, e-mail:
[email protected]
ST Microelectronics Via C. Olivetti n. 2, 20041, Agrate Brianza - Italy email:
[email protected]
ABSTRACT Video streaming over packet switched best-effort networks is a challenging topic, due to low latency, scalability and fault tolerance requirements. Many techniques can be used to deal with delay, loss and the time-varying nature of best-effort networks. In this paper we compare two techniques to improve the performance of video streaming, i.e., a Multiple Description (MD) scheme based on spatial polyphase downsampling, and a Single Description (SD) scheme where robustness to packet loss is increased using Forward Error Correcting (FEC) codes. We consider both a single channel scenario and a multiple channel (or multi-path) scenario. We span a large set of channel conditions, to consider the high packet loss probabilities common in wireless communication systems. A H264/AVC video coding standard with advanced error concealment capabilities is used. Experimental results show that MD can be competitive in practical scenarios with more flexibility and less complexity than the SD+FEC scheme. 1. INTRODUCTION Many solutions have been proposed to deal with video streaming over best-effort packet-switched networks. From the channel coding perspective, Forward Error Correcting (FEC) codes have been proposed to perform this goal at the expense of increased bit rate [1]. FEC adds specialized interpacket redundancy that enables data recovery up to a loss threshold. When the packet loss ratio exceeds this threshold, redundancy packets become unusable. The amount of redundancy can be adjusted dynamically, but adaptation is problematic when network conditions change quickly, or when there is not a return channel to communicate feedback to the sender. Combining FEC with interleaving may help combat loss variability, but the added delay can make interleaving unacceptable for media streaming. From a source coding perspective, Multiple Description coding techniques [2] are designed to increase packet loss robustness by creating several descriptions of the original source. All descriptions have the same importance, and the quality of the reconstructed signal increases with the number of correctly received descriptions. In a MD system all correctly received packets are useful to improve the quality of the reconstructed video. Recently, streaming over multiple paths has also emerged as an approach to help overcome the problems of best-effort networks by exploiting path diversity [5]. Many architectures exist for achieving path diversity between single or multiple senders and a single receiver. ∗ This work has been supported under project FIRB-PRIMO (Reconfigurable Platforms for Wideband Wireless Communications) of the Italian Ministery of University and Research, MIUR.
In [3], a MD scheme where descriptions are generated by separately coding the odd and even frames of the original sequence is considered and compared with an optimized layered video coding scheme with feedback from the receiver. It is shown there that the layered scheme has a better performance than the MD scheme. In this paper we consider a MD video coding system which originates two descriptions from the spatially downsampled polyphase components of the original frames [4]. We compare the above mentioned coding systems (SD+FEC and MD) in different channel conditions, both in the single and in the multi-path scenarios, and up to the high packet loss probabilities that are common in wireless networks. We show that, with no feedback and source adaptation procedures, MD provides very good overall performance and ease of implementation. 2. OVERVIEW OF THE SYSTEMS The considered MD scheme increases robustness to packet loss by creating different descriptions from the original video stream via spatial polyphase downsampling [4]. A simple two description scheme (MD2) can be obtained by processing the input video sequence frame by frame, and subsampling each frame of the sequence by a factor two along the columns. The two descriptions are then processed by independent H264/AVC coders [7], [8]. Packets produced by the coders can be interleaved and sent over the channel (single path scenario), or directly sent over different paths (multiple path scenario). On the decoder side there are two synchronised H264/AVC decoders, that simultaneously process the two subframes corresponding to one original frame. A restoring block performs error concealment if some packets are lost, and applies spatial polyphase downsampling to the recovered full size frame in order to copy each concealed description into the corresponding frame buffer. This prevents error propagation from reference frames due to interframe coding. Interpolation from the correctly received description is the basic concealment strategy. Both linear interpolators and non-linear (edge preserving) interpolators can be used, as presented in [4]. The intrinsic error concealment capabilities of the H264/AVC decoder are used only when both descriptions are lost and no interpolation can be employed. This scheme is compared with that proposed in [1] which creates a SD stream protected with FEC. Each group of k data packets is replaced with a group of n packets obtained via an (n, k) ReedSolomon code. Systematic codes are used in order to speed up packet recovery when no losses occur. The decoding process in presence of packet erasures is performed using a slightly modified
version of the alghoritm presented in [9]. In particular, the matrix inversion computation performed by this algorithm is computed once for the whole packet length and makes the procedure computationally feasible. To simplify the code construction, constant 500 byte data packets are used. In the presence of packet erasures the decoding process of the SD+FEC stream may be completed with a delay that depends on k. In the single channel scenario, by properly interleaving MD streams, we can improve the MD scheme robustness to burst errors and compare the two schemes with the same delay constraints. In particular, k data packets of the SD+FEC stream cover F frames of the original video sequence. If k1 and k2 packets cover the same number F of frames in the corresponding descriptions, the MD streams are sent on the single channel by interleaving k1 and k2 packets, respectively from the first and the second description (Figure 1). In the multi-path scenario packets from the SD+FEC n−k redundancy packets
SD + FEC k 2 data packets F frames desc 2
k 1 data packets F frames desc 1
For the Missa sequence, the simulation parameters are
MD 2 descriptions
Fig. 1. Single path scenario: SD+FEC and MD interleaved stream. stream are sent over different paths according to the algorithm presented in [1] for identical channels. In particular, packets are evenly distributed along the two paths. For the MD scheme, the two channels carry one description each. 3. NETWORK MODEL The adopted network model is a simple two-state discrete-time Markov chain as depicted in Figure 2. 1−p
1−q
p
g
Rate = 850 kbps, 765 kbps actual video + 85 kbps FEC redundancy packet size = 500 bytes n = 100 total number of packets in FEC block k = 90 number of data packets in FEC block.
b q
Fig. 2. Two-state Markov chain used for the channel model. The chain state at transmission slot n is denoted by Xn , where Xn ∈ {g, b}, and it is characterized by the transition probabilities p and q from good to bad state and viceversa, respectively. We assume that a packet transmitted at slot n is successfully received if Xn = g, and is lost otherwise. The stationary probability of p , while the stationary probability of corpacket loss is πb = p+q q rectly received packets is πg = p+q . The average congestion period during which packets are lost, has an average length 1/q, in number of slots.
Rate = 815 kbps, 725 kbps actual video + 90 kbps FEC redundancy packet size = 500 bytes n = 100 total number of packets in FEC block k = 90 number of data packets in FEC block. With k = 90, a single FEC block approximately covers one Group of Pictures (GOP). As stated before, to have the same maximum decoding delay, the MD stream is obtained by sending sequences of 15 frames (one GOP) from the two descriptions (see Fig. 1). The MD stream has an aggregate rate of about 850 kbps and 815 kbps for the Foreman and the Missa sequence, respectively. The MD concealment is implemented using a bilinear interpolator [4]. We consider a channel with packet loss probabilities of 0.01, 0.05, 0.1, 0.15 and 0.2, and with an average burst length of 10 packets. Figure 3 shows the PSNR of the reconstructed sequence as a function of packet loss probability (Ploss ) for the Foreman sequence. Results are averages of 25 independent transmission trials, and each loss pattern is applied both to the SD+FEC and MD streams. In terms of average PSNR, for Ploss ≥ 0.1 the MD2 scheme performs better than the SD+FEC scheme. Foreman, Mean burst length=10
38
SD+FEC (100,90) MD 2
36
34
PSNR (dB)
k data packets F frames
of fixed 500 byte size (one slice per packet). Compression is performed at constant Quantization Parameter (QP), i.e., no rate control is employed. In case of unrecoverable errors, the adopted decoder has improved error concealment capabilities with respect to the original standard, mainly because it applies interframe concealment techniques on intra coded information, and detects scene changes. We consider 8 seconds of the CIF sequence Foreman at 30 fps, and 4.5 seconds of the CIF sequence Missa at 30 fps. The first sequence has high motion content, whereas the second one is more static and its characteristics are typical of videoconferences. In the first experiment, we simulate a single sender and receiver by sending all the video packets using a single route in which packet losses are modelled by the two-state Markov chain described above. For the SD+FEC stream, the simulation parameters are very similar to those considered in [1]. In particular, for the Foreman sequence, we have
32
30
4. EXPERIMENTAL RESULTS In this section we compare the performance of the two proposed schemes in both the single path and the multi-path scenarios, for different channel conditions. Both the SD and MD sequences are compressed using the H264/AVC video coder, with GOP IBBPBBPBBPBBPBBI, slices
28
26 0
0.01
0.05
0.1
P
0.15
0.2
loss
Fig. 3. PSNR of the Foreman reconstructed sequence, with average burst length = 10.
37
36
35
34 0
0.01
0.05
0.1
0.15
Ploss
0.2
Fig. 6. PSNR of the Missa reconstructed sequence, with mean burst length = 10.
0.8
Cumulative distribution
Cumulative distribution
38
Missa,Ploss=0.1 Mean burst length=10
1
SD+FEC (100,90) MD 2
0.8 0.7 0.6 0.5 0.4 0.3
0.7
SD+FEC (100,90) MD 2
0.6 0.5 0.4 0.3 0.2
0.2
0.1
0.1
0 20
0 15
20
25
PSNR (dB)
30
35
40
Fig. 4. Cumulative Distribution Function for the PSNR of the Foreman reconstructed sequence.
35
30
25
20
SD + FEC MD 2 50
100
Frame number
150
25
30
PSNR (dB)
35
40
45
Fig. 7. Cumulative Distribution Function for the PSNR of the Missa reconstructed sequence. The PSNR gap observed in Figure 3 at Ploss = 0 depends on the specific video sequence. This gap is especially high for the Foreman video sequence due to its high frequency content, but disappears for the Missa sequence (see Fig. 6). For the Foreman sequence, separately coding the descriptions obtained by subsampling the original full size video sequence is particularly inefficient. For a fair comparison, a third set of simulations is performed on the Foreman sequence, to increase the amount of FEC redundancy in the SD+FEC scheme in order to have the same video quality as the MD2 scheme in the absence of packet erasures. For the SD+FEC scheme the new simulation parameters become
Foeman, Ploss=0.1 Burst=10
40
PSNR (dB)
MD 2
39
0.9
0.9
15
SD+FEC (100,90)
40
Foreman, Ploss=0.1 Mean burst length=10
1
Missa, Mean burst length=10
41
PSNR (dB)
Besides average PSNR, a measure of the perceived quality of the reconstructed sequence can be obtained by considering the cumulative distribution of PSNR, i.e., the function describing the probability that the PSNR of a reconstructed frame is below a specified threshold. The cumulative distribution function for the Foreman sequence is depicted in Figure 4, for Ploss = 0.1 and average burst length of 10 packets. The percentage of reconstructed frames with PSNR < 30 dB (poor quality) is about 27% for the SD+FEC scheme, and only about 5% for the MD2 scheme. Figure 5 shows the PSNR for every frame of the Foreman sequence, for a single channel simulation. It confirms that the variance of the PSNR for the MD2 reconstructed sequence is smaller than that of the SD+FEC scheme. The visual quality of MD2 reconstructed sequences is nearly constant, whereas it varies rapidly and with annoying artifacts for the SD+FEC scheme. Similar conclusions can be drawn for the Missa sequence, according to Figures 6, 7 and 8.
200
Fig. 5. PSNR for a single channel simulation (Foreman sequence). In another experiment, we send video packets along two independent and identical routes. Each channel receives packets at half the full video rate. According to the Markov model, the average burst length observed in each independent path is halved due to the reduced rate. A comparison between MD2 and SD+FEC in the two-path scenario and in the single-path scenario for the two sequences is given in Figures 9 and 10. The SD+FEC scheme has a slight PSNR improvement at low Ploss , where smaller bursts in a single path allow for enhanced recovery capabilities using FEC codes. For higher packet loss probabilities this advantage vanishes. The MD2 scheme in the multi-path scenario has a PSNR very close to that of the single-path system.
Rate = 850 kbps, 510 kbps actual video + 340 kbps FEC redundancy packet size = 500 bytes n = 100 total number of packets in FEC block k = 60 number of data packets in FEC block. Figure 11 shows the average PSNR in the single and two channels scenario for a mean burst length of 20 packets. In the single channel scenario the two schemes have similar performance in terms of PSNR. As before, we observe a sudden quality drop when the FEC protection fails. In the two-path scenario, SD+FEC with increased redundancy obtains a significant improvement in terms of mean PSNR. This is due to the reduction of the mean burst length observed in each different channel. However, if the average burst length increases, the MD2 scheme becomes again more robust to packet erasures. This can be especially convenient in a multi-path
Missa,Ploss=0.1 Mean burst length=10
42
Missa, Mean burst length=10
41
40
SD+FEC (100,90) single path SD+FEC (100,90) two paths MD2 single path MD2 two paths
40 38 39
34
PSNR (dB)
PSNR (dB)
36
32 30 28 26
38
37
36
SD+FEC (100,90) MD 2
24
35 22 20
20
40
60
80
100
Frame number
34 0
120
0.01
0.05
0.1
P
0.15
0.2
loss
Fig. 8. PSNR for a single channel simulation (Missa sequence).
Fig. 10. PSNR of the Missa reconstructed sequence, in the single and multi-path scenario, with mean burst length=10.
Foreman, Mean burst length=10
38
Foreman, Mean burst length = 20
35.5
SD+FEC (100,90) single path SD+FEC (100,90) two paths
36
35
MD 2 single path MD 2 two paths
34.5
PSNR (dB)
PSNR (dB)
34
32
30
34
33.5
33 SD+FEC (100, 60) single path SD+FEC (100, 60) two paths
32.5
28
MD 2 single path MD 2 two paths
32 26 0
0.01
0.05
0.1
P
0.15
0.2
loss
31.5 0
0.01
0.05
0.1
P
0.15
0.2
loss
Fig. 9. PSNR of the Foreman reconstructed sequence, in the single and multi-path scenario, with mean burst length=10. scenario with on-off channels, where a channel can get completely congested for a long time. 5. CONCLUSIONS Results presented in Section 4 compare SD+FEC and MD at the same aggregate rate. For video sequences with a reduced PSNR gap between the SD and the MD schemes at Ploss = 0, the MD scheme is preferable in terms of objective and subjective quality. For video sequences with high motion content (e.g., Foreman), the intrinsic inefficiency of MD coding can make the SD+FEC scheme preferable. In particular, the use of multiple paths improves the performance of the SD+FEC system due to the reduction of the mean burst length observed in each single channel. The MD scheme is especially robust to long bursts of erasures, and in the presence of on-off channels. It suits well to the multi-path environment without the need of ad-hoc coding/decoding algorithms. The concealment implemented is very simple and computationally less complex than the recovery algorithm needed by FEC schemes. In channels with high packet loss probability (for example wireless channels), on-off multi-path channels, and channels with long burst of errors, MD schemes seem to be a valid alternative to more complex SD+FEC schemes. 6. REFERENCES [1] T. Nguyen, A. Zakhor, “Distributed Video Streaming with Forward Error Correction”, Packet Video Wksp., April 2002.
Fig. 11. PSNR of the reconstructed sequence, in the single and multi-path scenario, with mean burst length=20. [2] V. K. Goyal, “Multiple Description Coding: Compression Meets the Network”, IEEE Signal Proc. Mag., Sept. 2001, pp. 74-93. [3] J. Chakareski, S. Han, B. Girod, “Layered Coding vs. Multiple Description for Video Streaming over Multiple Paths”, Proceedings of the eleventh ACM international conference on Multimedia, Berkeley, CA, USA, 2003, pp. 422-431. [4] R. Bernardini, M. Durigon, R. Rinaldo, L. Celetto, A. Vitali, “Polyphase Spatial Subsampling Multiple Description Coding of Video Streams with H264”, IEEE International Conference on Image Processing, October 2004. [5] J. G. Apostolopoulos, M. D. Trott, “Path Diversity for Enhanced Media Streaming”, IEEE Communications Magazine, August 2004, pp. 80-87. [6] J. Apostolopoulos, T. Wong, W. Tan, S. Wee “On Multiple Description Streaming with Content Delivery Networks”, IEEE Infocom, 2002. [7] Joint Video Team of ITU-T and ISO/IEC JTC 1, ITU-T Rec. H.264 — ISO/IEC 14496-10 AVC, March 2003. [8] T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, “Overview of the H.264 / AVC Video Coding Standard”, IEEE Transactions on Circuit and Systems for Video Technology, VOL. 13, NO. 7, July 2003 [9] L. Rizzo, “Effective Erasure Codes for Reliable Computer Communication Protocols”, Computer Communication Review, April 1997