FORWARD ERROR PROTECTION FOR LOW-DELAY ... - CiteSeerX

1 downloads 0 Views 282KB Size Report
ABSTRACT. We study different forward error correction (FEC) codes for packet video streaming over erasure channels with strict de- lay constraints. Our study ...
Proceedings of 2010 IEEE 18th International Packet Video Workshop

December 13-14, 2010, Hong Kong

FORWARD ERROR PROTECTION FOR LOW-DELAY PACKET VIDEO Zhi Li1 , Ashish Khisti2 and Bernd Girod1 1

2

Dept. of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA Dept. of Electrical and Computer Engineering, University of Toronto, Toronto, ON, M5S 3G4, Canada {leeoz, bgirod}@stanford.edu, [email protected]

ABSTRACT We study different forward error correction (FEC) codes for packet video streaming over erasure channels with strict delay constraints. Our study includes traditional maximum distance separable (MDS) codes and streaming burst erasure codes with optimal delay performance. We develop a continuous-times model to calculate burst error correction capabilities of these codes with a delay constraint. Our analysis also incorporates Systematic Lossy Error Protection (SLEP) that achieves stronger error protection in exchange for a slight drop in video quality when error correction is needed. We provide simulation results for transmitting H.264/AVC encoded video over a bursty packet erasure channel and show that the combination of streaming erasure codes and SLEP greatly outperforms conventional MDS FEC for video streaming with a tight delay constraint.

are designed specifically for low-delay media streaming applications. In particular, these codes require that the encoder have access to all information symbols within a block before a codeword is generated. Likewise, the decoder needs to wait for all the codeword symbols before it can retrieve the source symbols. In contrast, the streaming burst erasure codes, proposed in [4–6] are designed specifically for low-delay streaming applications. The encoder produces channel packets, as it receives the stream of source packets. The decoder recovers the source stream with a fixed delay, as it receives channel packets. Fundamental tradeoffs between the receiver delay and burst-erasure correction capabilities are established. In this work, we compare the performance of these delayoptimized codes with traditional MDS codes. In our analysis, we fix the source and channel data rates, impose an endto-end delay constraint for each source packet and select the best possible codes in each class that meet these constraints. Burst-erasure correction capabilities of each of these codes are then compared in a common framework. Interestingly, we observe that even among MDS codes, the performance depends on how the parity packets are combined with the source packets in the output stream. Furthermore, except in cases where either the channel redundancy is too small or too high, the streaming burst erasure codes have a superior burst erasure correction capability compared to the MDS codes. We further combine streaming erasure codes with a crosslayer mechanism called Systematic Lossy Error Protection (SLEP). This scheme, introduced in [7, 8], reduces the FEC overhead by generating parity packets from a coarsely quantized video representation. While some portions of the video frames reconstructed using these parity check packets will be of lower quality compared to portions that do not use these packets, the resulting parity checks are smaller in size compared to a transmitted video packet, allowing a stronger error protection at the same bitrate overhead. As shown in Fig. 1, the SLEP-streaming erasure codes system presents a framework allowing tradeoff among delay T , redundancy α, correctable erasure length B and reconstructed video quality. This paper is structured as follows. Section 2 reviews the streaming burst erasure codes and the SLEP architecture whereas Section 3 introduces our framework for comparing different designs. In Section 4, we present the schemes dis-

Index Terms— Forward error protection, low-delay, streaming erasure codes, SLEP 1. INTRODUCTION Application-layer forward error correction (FEC) can be used to increase the robustness of Internet Protocol Television (IPTV) which streams live television programs over the digital subscriber lines (DSL). The source video packets along with FEC packets are delivered over a source-specific multicast (SSM) session. Transmission of video over DSL links involves a unique set of challenges. The DSL links introduce impulse (bursty) noise due to AC power switches, motors or lightening strikes that can have a devastating effect on the video quality. FEC codes (combined with retransmissions) provide a natural solution in this scenario [2, 9]. Another scenario involves broadcasting video streams to mobile devices. Instead of unicasting a video stream to each mobile device, it is far more bandwidth-efficient to broadcast a single stream to a large number of devices. Recent standards for mobile broadcasting such as the DVB-H (digital video broadcasting for handhelds) use coding techniques in the upper layers to combat multipath fading on the wireless channel [1]. Some popular choices of FEC codes in the upper layers include maximum distance separable (MDS) codes [1] and digital fountain codes [3]. However, neither of these codes

978-1-4244-9520-7/10/$26.00 ©2010 IEEE

1

PV 2010

Streaming Erasure Code

Redundancy

for some sequence of functions ft : S t → X . The channel introduces a single erasure burst of length Bs at some arbitrary time, i.e., for some j ≥ 0,  , t ∈ [j, j + Bs − 1] (2) y[t] = x[t], otherwise

Systematic Lossy Error Protection (SLEP)

(Į) Video Quality (~ȕ)

Delay (T)

where  denotes packet erasure. A decoder with delay Ts ≥ 0 outputs the source packet s[t] at time t + Ts , i.e., there exists a sequence of decoding functions gt : Y t+Ts → S, such that ˆs[t] = gt (y[0], . . . , y[t], . . . , y[t + Ts ]) and Pr(ˆs[t] = s[t]) = 0, ∀t ≥ 0. (3) The maximum attainable rate such that there exists a sequence of encoding and decoding functions that satisfy (3) is the streaming capacity C. As established in [6],  Ts , Ts ≥ Bs C = Ts +Bs (4) 0, otherwise.

Erasure Length (B)

Fig. 1: Design space when SLEP is combined with streaming burst erasure codes. The system involves tradeoff among delay (T ), correctable erasure burst length (B), redundancy (α) and video reconstruction quality (approximately proportional to SLEP streaming compression factor β). Symbol n k T B α β Ts Bs R δ

represents Total number of symbols in a coding block Number of source symbols in a coding block Maximum delay of decoding a packet Minimum duration of correctable erasure burst Fraction of bitrate used in FEC (i.e., protection overhead) SLEP stream compression factor Delay in number of symbols Erasure burst length in number of symbols Video source stream bitrate Source packet size

The authors also propose a class of streaming erasure codes that achieve the capacity. An example of this construction that corrects a burst of packet length Bs = 2 with a delay of Ts = 3 packets is provided in Fig. 2. In this construction a stream of incoming source packets s[i], each consists of 3 symbols, is mapped to a stream of outgoing channel packets, each consists of 5 symbols. The construction involves splitting each source packet s[i] into three sub-packets s0 [i], s1 [i] and s2 [i] and then appending two parity check packets to yield the channel packet x[i] = (s0 [i], s1 [i], s2 [i], s0 [i − 3] ⊕ s2 [i − 1], (5) s1 [i − 3] ⊕ s2 [i − 2]) .

Table 1: Symbols used in this paper.

cussed in this paper (both FEC mechanisms and SLEP) and analyze their performance. In Section 5, we provide simulation results for transmitting H.264/AVC encoded video using different schemes over a bursty channel and compare these with our analytical results. The symbols used throughout this paper are summarized in Table 1. 2. PRELIMINARIES

The decoding block is formed diagonally with the property that it can recover from of burst-length of Bs = 2 erasures within a delay of Ts = 3 packets. This construction naturally generalizes to arbitrary values of Bs and Ts , provided the packet length is sufficiently large. The decoding complexity is comparable to Reed-Solomon decoding of a sub-block.

Two commonly used mechanisms for recovering from packet losses are forward error correction (FEC) and retransmissions. In this paper, we focus on FEC mechanisms where the sender injects controlled redundancy in the source stream to enable recovery in the presence of packet erasures. MDS codes is an important class of codes commonly utilized to correct erasures. A C(n, k) MDS block code maps k information symbols (i.e., packets) into n ≥ k symbols of a codeword and is able to correct up to n − k erasures in the transmitted codeword. To apply MDS codes to an incoming source stream, the encoder buffers blocks of k packets and then produces n packets for each block. A different class of codes is streaming erasure codes, which is to be introduced in the sequel.

2.2. Systematic Lossy Error Protection (SLEP) Systematic Lossy Error Protection (SLEP) is a cross-layer video error protection technique proposed in [7,8] for achieving a graceful tradeoff among the video reconstruction quality, FEC overhead and error protection strength. A schematic description of SLEP is shown in Fig. 3. A SLEP encoder first generates coarsely quantized representation of the video (named redundant slice) and then produces parity packets from this representation via Reed-Solomon encoding. In this paper, we denote the ratio of the size of the parity check packets to the size of the original video packets by β, where β ≤ 1 denotes the compression factor. Thus the size of each parity check packet is given by β×{size of source packet}. Conventional FEC corresponds to the special case β = 1. The sender transmits the original video packets along with the SLEP parity packets. At the decoder, when parity packets are used to reconstruct a portion of the video, the resulting

2.1. Delay-optimal streaming erasure codes Streaming erasure codes have been introduced in [4–6]. The encoder receives a stream of source packets {s[t]}t≥0 , each packet is over an alphabet S. It produces a stream of channel packets {x[t]}t≥0 . The channel packet at time t depends on the source packets s[0], s[1], . . . , s[t], i.e., (1) x[t] = ft (s[0], . . . , s[t]) ,

2

s0 [i − 2]

s0 [i − 1]

s0 [i]

s0 [i + 1]

s0 [i + 2]

s1 [i − 2]

s1 [i − 1]

s1 [i]

s1 [i + 1]

s1 [i + 2]

s2 [i − 2]

s2 [i − 1]

s2 [i]

s2 [i + 1]

s2 [i + 2]

s0 [i − 5] ⊕ s2 [i − 3]

s0 [i − 4] ⊕ s2 [i − 2]

s0 [i − 3] ⊕ s2 [i − 1]

s0 [i − 2] ⊕ s2 [i]

s0 [i − 1] ⊕ s2 [i + 1]

s1 [i − 5] ⊕ s2 [i − 4]

s1 [i − 4] ⊕ s2 [i − 3]

s1 [i − 3] ⊕ s2 [i − 2]

s1 [i − 2] ⊕ s2 [i − 1]

s1 [i − 1] ⊕ s2 [i]

Fig. 2: Streaming erasure code for Bs = 2 and Ts = 3. The source packet s[i] is split into three sub-packets (s0 [i], s1 [i], s2 [i]) of equal size. The channel packet x[i] is obtained by appending two parity-check sub-packets as shown above. Thus x[i] = (s0 [i], s1 [i], s2 [i], s0 [i − 3] ⊕ s2 [i − 1], s1 [i − 3] ⊕ s2 [i − 2]). Primary Slice

The channel introduces an erasure burst of length B seconds. Outside the erasure interval B, the receiver observes the coded stream without any delay or error, i.e., y[t] = x[t] when t ∈ / B. Within this interval, however, the receiver observes y[t] = , where  denotes erasure. Another key parameter that we consider in our analysis is the end-to-end delay T . Each packet is required to be reconstructed at the decoder within time T after it enters the encoder. We deterministically characterize the feasible values of (B, T ) for various coding schemes. A good scheme is characterized by a large B/T ratio. The main distinction with the discrete-time model (2) is that we fix the time duration of burst B rather than the number of packets (or symbols) Bs that are erased. This choice allows a fair comparison between different coding schemes that we study. Like the discrete-time model, our channel model assumes sequential arrival of packets1 and does not introduce any further delays (e.g., propagation delay). This approach allows us to focus on the delays introduced in channel encoding/decoding and characterize the tradeoff between (B, T ) for various coding schemes.

Redundant Slice Redundant Slice



Filler

k

Redundant Slice SLEP Parity Slice

Helper Info

SLEP Parity Slice



Helper Info

n−k

(a) Received Primary Slice

Redundant Slice Erasure



Decode and display in lieu of lost portion of video frames

Redundant Slice SLEP Parity Slice

Helper Info

SLEP Parity Slice



Helper Info

(b) Fig. 3: SLEP parity packet generation (a) and decoding (b).

quality for these frames will be slightly lower than the original stream. But since it can correct more errors (although imperfectly), SLEP provides substantial improvements in the received video quality when faced with a range of degraded channels.

4. ANALYSIS OF FEC MECHANISMS We analyze different FEC mechanisms, combined with SLEP. Our first two schemes are both based on MDS codes but differ in the manner in which the parity packets are combined with the source packets. The third scheme is based on the streaming erasure codes. For all these schemes, we measure the performance in terms of the minimum correctable erasure burst duration B for a given protection overhead α, SLEP compression factor β and peak delay T . Whenever convenient, we restrict our analysis to the case when α/β < 1. This choice ensures that the number of parity check packets is sufficiently small, as would be desirable in a realistic implementation. Throughout the analysis, we make the simplifying assumption that a burst always begins at the starting point of a packet. While this assumption makes our analysis elegant, it does underestimate the number of packets a burst could corrupt, because if the burst is mis-aligned with the packets, the partially corrupted packets will typically also be thrown away. Nevertheless, its impact vanishes asymptotically as the coding

3. CONTINUOUS-TIME MODEL The model that we study is a continuous-time counterpart of the discrete-time model in (2). A live stream {s[i]}i≥0 of incoming source packets at the rate R Kbps is input into an encoder. Each source packet is of size δ Kb, and thus source packet s[i] is observed between time t ∈ [iδ/R, (i + 1)δ/R). The incoming stream of packets is mapped to a coded stream {x[i]} whose rate is (1+α)R Kbps. We call α > 0 the protection overhead as it characterizes the excess channel bandwidth available for error protection. In producing the coded packet stream, the encoder selects a particular error correction code to produce parity checks from the video source packets. We study the performance with both MDS codes and streaming erasure codes. Further, for the case of MDS codes, we study two different approaches for combining parity checks packets with video packets. In addition, we also incorporate the SLEP scheme in our analysis, in which case the size of each parity packet is β times the size of each source packet, where β ≤ 1.

1 On the Internet, this assumption sometimes breaks down, e.g. when the packets are routed through different paths. In this case, extra buffering time is required to accommodate this non-ideal case.

3

time (s) 0

1

4

3

2

5

1Kbps

s[0]

s[1]

s[2]

s[3]

s[4]

1Kbps

s[0]

s[1]

s[2]

s[3]

s[4]

1/3 Kbps

p[0]

6 s[5]

s[6]

s[7]

s[5]

s[6]

s[7]

p[1]

p[2]

To characterize the symbol delay incurred, recall that for the MDS code the decoder is required to wait until all the packets within the block have been transmitted. This leads to a duration of

8

7

s[8]

s[8]

T =

p[3]

2kδ , R

k = 1, 2, 3, . . .

(7)

To compute the burst duration, recall that since the above scheme introduces a total of αk/β parity check packets, from the MDS property, it can recover from a maximum of Bs = αk/β packet erasures. Let us for now assume that the first αk/β packets in a block are erased, where α/β < 1. As we explain below, this assumption in fact provides the worstcase analysis. This yields the burst duration 1 αkδ αk ×δ× = , k = 1, 2, 3, . . . (8) B= β R βR Relating B to T , we establish (6). We finally argue that our assumption that the first αk/β packets in a given block are erased provides us with the worst case scenario. In particular, note that the maximum delay occurs when the first packet in a block is erased. When α/β ≤ 1, the source packets span a shorter duration than the parity packets, thus the more source packets the burst spans, the shorter the correctable burst. Hence the burst that begins at the beginning of the block spans the shortest duration. Thus the proposed burst simultaneously minimizes B and maximizes T in (6), thus yielding the smallest value of B/T .

Fig. 4: An example of the bandwidth splitting scheme with parity packet compression using SLEP. The source packets are grouped into blocks of length 3 and a C(5, 3) MDS code is applied to produce 3 source packets of 1 Kb each and 2 parity packets of 1/2 Kb each, (i.e., compression factor β = 1/2). The source packets are transmitted at a rate of 1 Kbps and the parity packets are transmitted at a rate of 1/3 Kbps. The packet p[0] is transmitted right after s[2]. The first block of packets in the source and channel stream is shaded. The transmission of p[1] is completed by time t = 6 s, just after the second group of source packets are transmitted.

block size becomes large. We will include the mis-alignment influence in the experiments in Section 5. 4.1. MDS Codes with Bandwidth Splitting In this setting, the full bandwidth is split into two portions and each is used for transmitting the source and parity packets, respectively2 . The incoming source packets are grouped into blocks of k packets. A systematic C((1 + α/β)k, k) MDS code is applied to this block of k packets to produce αk/β parity packets. Each of the source packets are transmitted at a rate R. This requires a total duration of kδ/R. After all the k packets of a given block have been transmitted, the parity check packets of this block are transmitted in the remaining portion of the available bandwidth at a rate of αR. Each parity packet is of size βδ Kb with a compression factor β. The transmission of the parity packets requires another kδ/R. Fig. 4 shows a numerical example of the bandwidth splitting scheme with δ = 1 Kb, R = 1 Kbps, αR = 1/3 Kbps and β = 1/2. In this example the source stream is divided into blocks of 3 packets and a C(5, 3) MDS code is applied across each block. We transmit source packets at 1 Kbps, so that each group requires 3 seconds for transmission. The two parity packets from each group are transmitted at a rate of 1/3 Kbps immediately after the block and their transmission also requires 3 seconds.

4.2. MDS Codes without Bandwidth Splitting A different approach to combine parity check packets with the source packets is considered in this section. In particular, both the source packets and parity packets are transmitted using the full bandwidth. The source packet stream is grouped into blocks of k packets. A systematic C((1 + α/β)k, k) MDS code is applied to this block of k packets to produce αk/β parity packets. The source packet is of size δ Kb; the parity packet is of size βδ Kb, generated with SLEP. Each of the packets is transmitted at a rate of (1+α)R Kbps. Thus it takes a total of kδ/R seconds to transmit each block consisting of k source packets and αk/β parity packets. Note that, since each of the source packets is transmitted at a higher rate than its generating rate, at a minimum, some initial buffering is needed at the encoder where source packets of the current block are stored. This introduces an initial delay ΔT at the encoder. Fig. 5 shows an example. The source stream is divided into blocks of 3 packets and a C(5, 3) MDS code is applied across each block to generate 5 packets. Each source packet is 1 Kb and each parity packet is 1/2 Kb, all transmitted over the channel at a rate of 4/3 Kbps. Thus the transmission time for each block is 3 seconds. In order to ensure that no source packet appears in the channel before it appears in the source stream, an initial delay of ΔT = 3/4 seconds at the encoder is needed.

Analysis The performance tradeoff between the burst-erasure correction and delay for this scheme is stated below. Proposition 1 The MDS code with bandwidth splitting is guaranteed to correct an erasure burst of length B seconds with an end-to-end delay of T seconds provided αT , (6) B ≤ B = 2β where we assume that α/β ≤ 1 and the feasible values of T are given by (7). 2 Bandwidth splitting is primarily a conceptual tool. It corresponds to the case where the source and parity packets are sent independently.

4

time (s) 0 1Kbps

4/3Kbps

1 s[0]

s[1]

s[0]

3

2

s[1]

s[2]

s[2]

4 s[3]

p[0] p[1]

6

5 s[4]

s[3]1

s[5]

s[4] 2 s[5]

s[6]

3 p[3] p[2]

The associated delay  is  i δ k − 1+α R + ΔT T = ΔT

8

7 s[7]

s[6]1

s[8]

s[7] 2 s[8]

0≤i