About Priority Encoding Transmission Stephane Boucheron, Mohammad Reza Salamatian
Abstract |
Recently, Albanese et al. introduced Priority Encoding Transmission (pet) for sending hierarchically organized messages over lossy packet-based computer networks [1]. In a pet system, each symbol in the message is assigned a priority which determines the minimal number of codeword symbols that is required to recover that symbol. This note revisits the pet approach using tools from Network Information Theory. We rst outline that Priority Encoding Transmission is intimately related with the Broadcast Erasure Channel with Degraded Message Set. Using the information spectrum approach, we provide an informational characterization of the capacity region of general Broadcast Channels with Degraded Message Set. We show that the pet inequality has an information-theoretical counterpart: the inequality de ning the capacity region of the broadcast erasure channel with degraded message sets. Hence the pet approach which consists in timesharing and interleaving classical erasure resilient codes achieves the capacity region of this channel. Moreover, we show that the PET approach may achieve the sphere packing exponents. Finally we observe that on some simple non-stationary broadcast channels, time-sharing may be outperformed. The impact of memory on the optimality of the PET approach remains elusive. Keywords | Erasure-resilient codes, broadcast channels, coding exponents, Priority Encoding Transmission, Information spectrum.
T
I. Introduction
HE quality of packet voice and image on the Internet has been mediocre due, in part, to congestion-induced packet losses. From the end-user viewpoint, the internet can actually be modeled as an erasure channel acting over the large input alphabet formed by ip packets. On an erasure channel, each input symbol is either faithfully transmitted or erased, independently from its value. The output alphabet contains the input alphabet plus a special symbol denoting erasure. Although retransmission upon request (arq) has traditionally been the way to turn computer networks into reliable channels, the delay requirement of multimedia applications eliminates the possibility of retransmission and renews the interest for forwarderror- correcting (fec) [2]. Potential users of fec also have to take into account that standard multimedia compression techniques [4] introduce a hierarchical structure in the information source. The Priority Encoding Transmission (pet) approach has been motivated by the search for robust multicasting of digital video sequences conforming to the mpeg standard [1]. In a rst approximation, the dierent sorts of frames constituting a gop (Group Of Pictures in the mpeg methodology [4]) are assumed to form independent sources (this would be true if compression were perfect). When using pet, the lri, cnrs umr 8623, B^ at 490, Universite Paris-Sud, 91405 Orsay France E-mail: fbouchero,
[email protected]. This research was partially supported by cnet under grant 96 1-B 212, and Esprit Working Group rand II.
source information is protected in such a way that even receivers undergoing high loss rates can reconstruct essential parts of the source ow (for example I-frames), while receivers undergoing lower loss rates can reconstruct most of the ow. This note outlines the connection between the pragmatically motivated pet approach and the broadcast channel with degraded message set described in multi-user information theory [5]. In such a broadcast channel, one transmitter tries to send k independent messages m0; : : : mk 1 (corresponding to dierent priority levels) to k receivers (enjoying dierent reception conditions). The messages are multiplexed by a channel encoder into a sequence of input symbols x, x = f (m0 : : : mk 1 ). The ith receiver (0 i < k) gets a corrupted version yi of the input x, and tries to reconstruct messages m0 : : : mi . At least from an intuitive viewpoint, the task faced by Priority Encoding Transmission and coping with broadcast channels are similar. To go along with the comparison, let us recall more precisely the pet code description. A pet code over alphabet X , with message length m, codelength n and non-decreasing priority function is a pair of mappings f (encoder) from X m to X n and (decoder) from (X [ feg)n to X m [ frejectg such that if w0 is obtained from f (w) by erasing at most n (i) symbols, then (w0 ) coincides with w at least on the rst i symbols. The values in the range of a priority function are called the priority levels. In the sequel, the ith level of the priority function is denoted by i for 0 i < k. The rate (resp. normalized rate) of a code represents the fraction of information bits (resp. symbols) per symbol. Formally, the rate R (resp. normalized rate, R~ ) of a code of length n, with M codewords over alphabet X is n1 log M (resp. n1 logjXj M), all logarithms being given in base 2. Following [1], a tuple of normalized rates (R~ 0 ; : : : R~ k 1 ) is achieved by a pet system if and only if there are exactly n R~ i symbols from the message that are protected at level i. Though pet was not presented in a Shannon-theoretical perspective, the following relevant inequality is proved in [1, theorems 3.3,5.4]. For any pet system: kX1
nR~ i i 0=i
1:
(1)
Inequality (1) puts combinatorial limits on nite sets of xed-length words. When restricted for example to the case of one priority level, it coincides with the Singleton bound: R~ 0 0 =n. Note that the latter bound cannot be achieved by codes of arbitrary length (cf. for example theorem 6.1 in [1]). On a given alphabet, arbitrarily long codes satisfying the pet-inequality do not exist and thus cannot be used to achieve arbitrarily reliable transmission
on lossy broadcast channels. Moreover the pet approach advocates a very simple method to design broadcast codes. Packets are divided into slots, each slot size de ning an alphabet. For each priority level, apply a good point-to-point erasure-resilient code over the small alphabet corresponding to the slot size and then interleave the resulting codewords in the packets. This is a version of the engineering approach called time-sharing in [6]. This approach is known to be non-optimal on many broadcast channels. Assessing the pet approach from an information-theoretical viewpoint amounts to rst check whether the capacity region of broadcast erasure channels with degraded message sets can be exhausted using Priority Encoding Transmission. In the aÆrmative, the second natural question is: does error probability decline as fast as possible when pet is used? We rst use the information spectrum approach [7] to provide an informational characterization of the capacity region of general broadcast channels with degraded message set (dms) in section II. This characterization illustrates the relevance to general broadcast channels with memory of superposition codes as introduced in [6]. Then we show in section III that the pet approach achieves the capacity region of memoryless broadcast erasure channels with dms and moreover, that it may achieve the sphere packing exponent of this channel. The last section shows that on some simple non stationary broadcast erasure channels, time-sharing may be outperformed.
dierent levels of transmission reliability) will be indexed using boldface indices i; : : : k. De nition 1: A k-ary broadcast channel W consists in a sequence of joint probability transitions W n (Y0n : : : Ykn 1 jX n) from X n towards Y nk . The marginal probability transitions Wi (Yin jX n ) are called the component channels. A degraded message set can be transmitted at rate (R0 ; : : : Rk 1) over W with error probability if and only if there exists a family of (broadcast) codes f n (m0 ; : : : mk 1) 7! x and ni (yi ) = (m ^ 0 ; : : : m^ i ), such that log for almost all block-length n, lim inf njMi j Ri and for all i < k, the error probability experienced by the ith receiver satis es:
p lim inf 1 log PrfX n = xn Y n = yn g : I (X; Y) = n!1 n PrfX n = xn g PrfY n = yn g
0 Ri I (Ui ; Yi jUi 1;:::0) for i < k ; X 0 Rj I (Ui ; Yi ) for i < k :
max lim sup ei (f n ; ni ) = m ;:::m
0
k 1
W i (yi ) 6= (m0 : : : mi )jf (m0 ; : : : mk 1) n
n
:
The rate-tuple (Ri ) is than said to be -achievable over W. A broadcast code with blocklength n, rates (Ri ) and error rate smaller than for all receivers on channel W is called an (n; R0 : : : Rk 1; )-code over W. A tuple of rates is achievable if it is -achievable for all > 0. Remarks. 1) From the de nition, it is immediate that the set of achievable (resp. -achievable) rates is closed. 2) For broadcast channels, achievable rates do not depend on whether we consider average (over codewords) or worstII. Broadcast channel with degraded message case error probability (cf. [5], where no assumption on set channel memory is made). In the sequel, we will adopt the most convenient viewpoint depending on the situation. A. General de nitions The memoryless broadcast channel with degraded mesIn the sequel X; Y denote families of input processes sage set has received a single letter characterization [8],[5, and their corresponding output through a channel W. For theorems III.4.1 & 3]. The information spectrum approach n n n each n, X , denotes a random variable over X and Y is allows to give an informational (though not computational) distributed over Y n according to W n (:jX n ). X n(i); Y n (i) characterization of operationally de ned achievable rates denotes the ith element of X n; Y n . over general broadcast channels. The entropyPof a random variable X , H (X ) is de ned Let us now de ne the class of families of input processes by H (X ) = i PrfX = ig log PrfX = ig. The mutual S as U0 ; : : : Uk 2 ; Ukk 11 with Uk 1 = X and correspondinformation between X and Y is de ned by: ing output families Yi=0 as follows. For every block length X Pr f X = x Y = yg n, Uin is independent from Uin+2 : : : X n; Y n conditionally on I (X ; Y ) = PrfX = x Y = yg log PrfX = xg PrfX = ygU:n , and Y n is distributed according to W n (:jX n ). Let us i+ 1 i i x;y insist on the fact that the variables Uin (i < k 1) do not The information spectrum approach has been proposed re- necessarily live in an n dimensional product space. Those cently to handle systems with general dependencies [7]. families of input processes are general since no consistency constraint between processes with dierent indices n and It relies on the notion of liminf in probability: p lim inf n!1 Xn is de ned as the supremum of such that m is imposed. The structure of input families is inspired lim supn!1 PrfXn < g = 0. Given two random process- from the superposition codes construction [6]. RW (Ui M0 e any y0: Tn
n
0
u0 :(u0 ;y0 )2T1n
n
(14)
:
A similar argument is developed for the decoding error probability over W1n . e1 PrfT2 or T3 g + X M1 M0 Prfu1; x1;1; y1 g n;c
n;c
u1 ;x1;1 ;y1
M1
X
u1 ;x1;1 ;y1
Prfu1; x1;1 ; y1g
X u0 ;x0 ;(x0 ;y1 )2T2n
X
x0 ;(u1 ;x0 ;y1 )2T3n
Prfu0; x0 g +
Prfx0 ju1 g :
Prfx0 ; u0 g Me M : 0 1 u0 ;x0 (x0 ;y1 )2T n
(16)
Prfx0 ju1g eM :
(17)
n
2
and X x0 (u1 ;x0 ;y0 )2T3n
n
1
Plugging (16,17) into (15): e1 PrfT2n;c or T3n;cg + 2:e
This terminates the proof of lemma 1.
y1 i=1;j =1 0 ;M1 X MX y1 i=1;j =1
Prf(ui ; y0 ) 2 L1n and n0 (y0 ) = ig +
Prf(ui ; xj ; y1 ) 2 L2n and n1 (y1 ) = (i; j )g + Prf(ui ; xj ; y1 ) 2 L3n and n1 (y1 ) = (i; j )g :
n
:
y0 i=1 M0 X
=
Prf(ui ; y0 ) 2 L1n and n0 (y0 ) = ig X
i=1 y0 :n 0 (y0 )=i M0 X
X
i=1
y0 :n 0 (y0 )=i
X y0
Prfy0g e
Prf(ui ; y0) 2 L1ng Prfy0g e
n
n
:
Similar arguments are applied to the second and third sum(15) mand, completing the proof of the lemma.
But by de nition of T2n and T3n for any y1 for any u0; x0 , if (x0 ; y1) 2 T2n: W1n (y1 jx0 ) > e n M0 M1 Prfy1g, and if (u1 ; x0 ; y1 ) 2 T3n , W1n (y1jx0 ) > e n M1 Prfy1ju1g hence X
y0 i=1 M ;M 0 X X1
n
M0 XX
Plugging (13) into (12) gives: e0 PrfT1n;cg + e
en +
n
M0 XX
Now, using rst the de nition of L1n , then the fact that (13) each y0 belongs to at most one decoding set:
Prfu0g eM :
X
n
(18)