Joint Source-Channel Coding for Deep Space Image Transmission using Rateless codes Ozgun Y. Bursalioglu, Giuseppe Caire
Dariush Divsalar
Ming Hsieh Department of Electrical Engineering University of Southern California Email: {bursalio, caire}@usc.edu
Jet Propulsion Laboratory California Institute of Technology Email:
[email protected]
Abstract—A new coding scheme for image transmission over noisy channel is proposed. Similar to standard image compression, the scheme includes a linear transform followed by embedded scalar quantization. Joint source-channel coding is implemented by optimizing the rate allocation across the source subbands, treated as the components of a parallel source model. The quantized transform coefficients are linearly mapped into channel symbols, using systematic linear encoders of appropriate rate. This fixed-to-fixed length “linear index coding” approach avoids the use of an explicit entropy coding stage (e.g., arithmetic or Huffman coding), which is typically non-robust to postdecoding residual errors. Linear codes over GF (4) codes are particularly suited for this application, since they are matched to the alphabet of the quantization indices of the dead-zone embedded quantizers used in the scheme, and to the QPSK modulation used on the deep-space communication channel. Therefore, we optimize a family of systematic Raptor codes over GF (4) that are particularly suited for this application since they allow for a continuum of coding rates, in order to adapt to the quantized source entropy rate (which may differ from image to image) and to channel capacity. Comparisons are provided with respect to the concatenation of state-of-the-art image coding and channel coding schemes used by Jet Propulsion Laboratories (JPL) for the Mars Exploration Rover (MER) Mission.
I. I NTRODUCTION In conventional image transmission over noisy channels, the source compression and channel coding stages are designed and applied separately. Image coding is usually implemented by a linear transformation (transform coding), followed by the quantization of the transform coefficients, and by entropy coding of the resulting redundant discrete source formed by the sequence of quantization indices. Due to the catastrophic behavior of standard entropy encoding/decoding schemes, even if only a few bits are in error after the channel decoder, the decompressed source transform coefficients are dramatically corrupted, resulting in a substantially useless reconstructed image. In order to prevent this catastrophic error propagation, the source is partitioned into blocks, such that the errors are confined in the blocks, at the cost of some extra redundancy. In order to preserve integrity, which is a strict requirement in deep space exploration scientific missions, blocks affected by errors are requested for retransmission, at the cost of significant extra This research in part was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with NASA. The work by the University of Southern California, and JPL was funded through the NASA/JPL/DRDF/SURP Program.
delay and power expenditure. Furthermore, due to the typically sharp waterfall BER behavior of powerful modern codes, when channel conditions change due to different atmospheric conditions, or antenna misalignment, the resulting post-decoding BER may rapidly degrade, producing a sequence of highly corrupted images that need retransmission. In this paper we consider a scheme for Joint Source-Channel Coding (JSCC) that avoids conventional entropy coding. This JSCC scheme [1], [2] consists of a wavelet transform, a scalar embedded quantizer, and linear encoding of the quantization indices. These three components are examined in Sections III, IV and V. Before getting into the details of transform, quantizer and linear code design, Section II introduces the notation used throughout the paper and define the relevant system optimization problem for JSCC based on the concatenation of embedded quantization and channel coding in general, for practical quantization defined in terms of an operational rate distortion function, and practical coding defined in terms of a certain overhead from capacity. II. S YSTEM S ETUP For the problem at hand, the deep-space transmission channel is represented by the discrete-time complex baseband equivalent model √ yi = Es xi + zi , (1) where yi ∈ C, xi is a QPSK symbol and zi ∈ CN (0, N0 ) is the complex circularly symmetric AWGN sample. Consider a “parallel” source formed by s independent components. A block of source symbols of length K is s×K denoted by S ∈ R , where the i-th row of S, denoted by (i) (i) (i) S = (S1 , . . . , SK ), corresponds to a source component of the parallel source. A (s × K)-to-N source-channel code for transmitting the source block S onto the channel (1) is formed by an encoding function that maps S into a codeword of N QPSK symbols, and by an encoding function that maps the sequence of channel b outputs (y1 , . . . , yN ) into the reconstructed source block S. We consider a Weighted MSE (WMSE) distortion measure defined as follows. Let the MSE for the ith source component 1 b (i) ∥2 ]. Then, the WMSE E[∥S(i) − S be given by di = K
distortion at the decoder is given by D=
1 s
s ∑
v i di ,
(2)
where, for the particular DWT considered in this work (namely, the CDF 9/7 [6] wavelet used by JPEG2000 for lossy compression), we have l = 1.96 and h = 2.08.
i=1
where {vi ≥ 0 : i = 1, . . . , s}, are weights that depend on the specific application. In our case, these coefficients correspond to the weights of a bi-orthogonal wavelet transform as explained in Section III. Let ri (·) denote the R-D function of the ith source component with respect to the MSE distortion. Then the R-D function of S with respect to the WMSE distortion is given by 1∑ 1∑ ri (di ), subject to vi di = D, (3) s i=1 s i=1 s
R(D) = min
s
where the optimization is with respect {di ≥ 0 : i = 1, . . . , s}, corresponding to the individual MSE distortions of the ith source components. For example, for parallel Gaussian sources and equal weights (vi = 1 for all i), (3) yields the well-known “reverse waterfilling” formula (see [3, Theorem 10.3.3]). For a family of successively refinement codes with R-D functions {ri (d) : i = 1, . . . , s}, assumed to be convex ∆ and non-increasing [4], and identically zero for d > σi2 = 1 (i) 2 K E[∥S ∥ ], the operational R-D function of the parallel source S is also given by (3). Therefore, in the following, R(D) is used to denote the actual operational R-D function of for some specific, possibly suboptimal, successive refinement code. For a source with a total number of samples equal to Ks, encoded and transmitted over the channel using N channel uses, we define b = N/(Ks) as the number of channel-coded symbol per pixel (spp). Hence, b is a measure of the system bandwidth efficiency. Obviously, the minimum distortion D that can be achieved at channel capacity C and bandwidth expansion b is given by D = R−1 (bC).
Figure 1. W = 3, partitioning of an image into 10 subbands and 64 source components.
The subband LL0 (that coincides with the first source component of the parallel source model) consists roughly of a decimated version of the original image. As explained in [7], in order to obtain better compression in the transform domain, Discrete Cosine Transform (DCT) is applied to subband LL0 so that its energy is “packed” into a very few, very high valued coefficients (note the nonzero density points shown by arrows in Fig. 2-b). These high values coefficients are separately transmitted as part of the header and not considered here. The remaining coefficients show sample statistics similar to the other subbands, (see Fig. 2-c). 4
100
2
50
1
x 10
1000
0 −1000
0
1000
2000
0 −5
a
500
0
5 b
1000
4
x 10
300
III. WAVELET T RANSFORM
0
1000 2000 c
400
200 500
The image is decomposed into a set of “parallel” source components by a Discrete Wavelet Transform (DWT). Here, we use the same DWT of JPEG2000 [5]. With W levels of DWT, the transformed image is partitioned into 3W + 1 “subbands”. A subband decomposition example is given in Fig. 1 for W = 3. This produces 3W + 1 = 10 subbands, which in the figure are indicated by LL0, HL1, LH1, HH1, HL2, LH2, HH2, HL3, LH3, HH3, respectively. The subbands have different lengths, all multiples of the LL0 subband length. For simplicity, here we partition the DWT into source components of the same length, all equal to the the length of the LL0 subband. This yields s = 22W source component blocks of length K = K2 /s, where K × K indicates the size of the original image. Since the DWT is a bi-orthogonal transform, the MSE distortion in pixel domain is not equal to the MSE distortion in wavelet domain. In our case, for W = 3, the weight of a source component block in subband w = {1, . . . , 10} is given by the w-th coefficient of the vector [l6 , l5 h, l5 h, l4 h2 , l3 h3 , l3 h3 , l2 h2 , lh, lh, h2 ],
0 −1000
10
200 100
0 −200
0 d
200
0 −200
0 e
200
0 −200
100
100
100
50
50
50
0 −100
0 g
100
0 −100
0 h
100
0 −100
0 f
200
0 j
100
Figure 2. a: Histogram of 1st source component, b: Histogram of 1st source component after DCT transform, c: Zoomed in version of b, d-j: Histograms of source components 2 − 7, respectively.
IV. Q UANTIZER The simplest form of quantization employed by JPEG2000 is a special uniform scalar quantizer where the center cell’s width is twice the width of the other cells for any resolution level. This type of quantizers are called “dead-zone” quantizers. In Fig. 3, an embedded dead-zone quantizer is shown for 3 different resolution levels. The number of cells at any level p is given by 2p+1 − 1. We indicate the cell partition at
every level by symbols {0, 1, 2} as shown in Fig.3. The scalar quantization function is denoted as Q : R → {0, 1, 2}P , where 2P +1 − 1 is the number of quantization regions for the highest level of refinement.
Figure 4. Piecewise linear operational R-D function for the i-th source corresponding to a set of discrete R-D points.
set given in (5). Hence, we can write Figure 3. Quantization cell indexing for an embedded dead-zone quantizer with p = 1, 2, 3
ri (d) = max {ai,p d + bi,p },
Let U(i) = Q(S(i) ) denote the sequence of ternary quantization indices, formatted as a P × K array. The p-th row (i) of U(i) , denoted by U(p,:) , is referred to as the p-th “symbol-
where the coefficients ai,p and bi,p are obtained from (5) (details are trivial, and omitted for the sake of brevity). Using (6) into (3), we obtain the operational R-D function of the parallel source as the solution of a linear program. Introducing the auxiliary variables γi , the minimum WMSE distortion with capacity C and bandwidth expansion b is given by Min Weighted Total Distortion (MWTD):
(i)
(i)
plane”. Without loss of generality, we let U(1,:) , . . . , U(P,:) denote the symbol-planes with decreasing order of significance. A refinement level p consists of all symbol planes from 1 to p. The quantization distortion for the i-th source component at refinement level p is denoted by DQ,i (p). The quantizer output U(i) can be considered as a discrete 1 memoryless source, with entropy rate H (i) = K H(U(i) ) (in bits/source symbol). Using the chain rule of entropy [3], this ∑P (i) can be decomposed as H (i) = p=1 Hp , with ) 1 ( (i) (i) (i) Hp(i) = H U(p,:) U(1,:) , . . . , U(p−1,:) , p = 1, . . . , P. K (4) Then, the set of R-D points achievable by the concatenation of embedded scalar quantizer using 0, 1, . . . , P quantization levels 1 and an entropy encoder is given by p ∑ (i) Hj , DQ,i (p) , p = 0, . . . , P, (5) j=1
where, by definition, DQ,i (0) = σi2 . Using time-sharing, any point in the convex hull of the above achievable points is also achievable. Finally, the operational R-D curve ri (d) of the scalar quantizer is given by the lower convex envelope of the convex hull of the points in (5). It is easy to see that ri (d) is a piecewise linear function. Therefore, the resulting function ri (d) is convex and decreasing on the domain DQ,i (P ) ≤ d ≤ σi2 . Fig. 4 shows, qualitatively, the typical shape of the functions ri (d). As seen from Fig. 4, it is possible to represent ri (d) as the pointwise maximum of lines joining consecutive points in the 1 Notice: 0 quantization levels indicates that the whole source component is reconstructed at its mean value.
p=1,...,P
minimize
1∑ vi di s i=1
subject to
1∑ γi ≤ bC, s i=1
(6)
s
s
DQ,i (P ) ≤ di ≤ σi2 , ∀ i, γi ≥ ai,p di + bi,p , ∀ i, p.
(7)
(linear program in {di } and {γi }, that can be solved by standard optimization tools). Note that MWTD problem (7) is derived assuming separate source and channel coding scheme and a capacity achieving channel code. On the other hand, for the proposed JSCC scheme, each refinement level (symbol plane) of each source component is encoded separately with some practical code. For example, consider the pth plane of the ith source component (i) (i) whose conditional entropy is given by Hp , let np denote the number of encoded channel symbols for this plane. The ∑s ∑P (i) sum i=1 p=1 np yields the overall coding block length N (channel symbols). Consistent with the definition of the Raptor code overhead for channel coding applications [8], we (i) (i) define the overhead θp for JSCC as the factor relating np (i) to its ideal value KHp /C as follows: (i)
(i)
KHp (1 + θp ) . (8) C As shown in [1], the MWTD problem for JSCC with overheads (i) (i) θp and entropies Hp takes on the same form of (7), where n(i) p =
the coefficients {ai,p , bi,p ) correspond to the modified R-D points p ∑ (i) (i) Hj (1 + θj ), DQ,i (p) , p = 0, . . . , P, (9) j=1
instead of (5). For given code families and block lengths, the (i) overhead factors θp are experimentally determined, and used in the system design according to the modified optimization problem (7). Fortunately, the overhead factors are only sensi(i) tive to the entropy of the plane Hp , and to the coding block length (which is an a priori decided system parameter). Hence (i) instead of finding θp for each i and p, one can simply find the overheads for different entropy values on a sufficiently fine grid. To get an idea about the range of entropy values resulting from deep-space image quantization, here we report the conditional entropies of the first source component of an image from the Mars Exploration Rover.2 We have: { (1) (1) {H1 , . . . , H8 } = 0.0562, 0.0825, 0.2147, 0.4453, } 0.8639, 1.1872, 1.1917, 1.1118
through the physical channel. This equivalence holds also for the Belief Propagation iterative decoder (see [2] for details). Notice that the additive noise channel over GF (4) has capacity 2 − H bit/symbol, where H is the source entropy. It turns out that if we represent the symbols of GF (4) using the additive vector-space representation of the field over GF (2), i.e., as the binary pairs (0, 0), (0, 1), (1, 0), (1, 1) and map the “bits” over the QPSK symbols using Gray mapping, then the symmetry of the physical channel under GF (4) sum holds. Therefore, systematic linear coding over GF (4) is particularly suited to “linear index coding” of the symbol planes produced by the embedded dead-zone quantizer and for the QPSK/AWGN deep space communication channel.
(10) These values span the range of entropy values for all MER images we used in this work, and can be considered as “typical” for this application. V. C ODE D ESIGN In this section we discuss the code design for a single discrete source of entropy H, corresponding to a symbol plane, and the QPSK/AWGN channel with capacity C. For notation simplicity, source and plane indices are omitted. As discussed in Sec. I, in the proposed JSCC scheme symbol planes are directly mapped to channel symbols. Since symbol planes are nonbinary, i.e. {0, 1, 2}, and we consider QPSK modulation, we consider linear codes over GF (4). This is particularly well suited to this problem for the following reason. Assume that a block length of K symbols is encoded by a systematic encoder of rate K/(K + n). Only the n parity symbols are effectively sent through the channel. At the receiver, the decoder uses the a-priori nonuniform source probability for the systematic symbols and the posterior symbol-by-symbol probability (given the channel outputs y1 , . . . , yn ) for the parity symbols. It turns out that if the physical channel is “matched”, such that the transition probability is symmetric (in the sense defined by [9]) with respect to the sum operation in GF (4), then the source-channel decoding problem is completely equivalent to decoding the allzero codeword of the same systematic code, transmitted over a “virtual” two-block symmetric channel (see Fig.5), where the first K symbols go through an additive noise channel over GF (4) whose noise sequence realization is exactly equal to the source block, and the remaining n parity symbols go 2 Specifically
image, “1F178787358EFF5927P1219L0M1”.
Figure 5. nents
Block memoryless channel with two symmetric channel compo-
Given this equivalence, it turns out that the optimization of a linear systematic code ensemble for the source-channel coding problem is obtained through the more familiar optimization for a channel coding problem, where the channel is composite, and consists of two blocks, one discrete symmetric quaternary DMC with capacity 2 − H, and one quaternaryinput continuous output symmetric channel induced by (1) and by Gray mapping, with capacity 0 ≤ C ≤ 2 (that depends on ∆ the SNR = 10 log10 Es /N0 ). Different symbol planes may have different entropies and therefore they may result in codes with different overhead values for various SNR conditions. Nonuniversality of Raptor codes is shown in [8] using the fact the Stability Condition on the fraction of degree-2 output nodes depends on the channel parameter for Binary Symmetric and Binary Input AWGN channels. Following the derivation in [8], we established a stability condition on degree-2 output nodes which is a function of both H and C. Hence the code ensembles must be optimized for each pair of (H, C) values. In this paper, we consider the optimization of Raptor codes over GF (4) for various H and C pairs chosen with respect to a fine grid spanning the typical range of the source symbol plane entropies (see (10)) and the typical range of deep space channel capacities [10]. In order to optimize the Raptor code ensembles, we consider an EXIT chart and linear programming method inspired by [8] and [9], extended to handle the two-block memoryless channel as in Fig. 5.
A. EXIT CHART ANALYSIS
on a left-bound edge (from check to variable nodes). The various degree distributions for the Tanner graph in Fig. 6 are defined as follows: ∑ i−1 • For the LDPC code, we let λ(x) = and i λi x ∑ j−1 ρ(x) = denote the generating functions of j ρj x the edge-centric left and right degree distributions, and we let ∫x ∑ λ(u)du i , Λ(x) = Λi x = ∫01 λ(u)du i 0 •
Figure 6.
Raptor Code with LDPC outer code
EXIT chart analysis for the JSCC scheme using Raptor codes can be found in [2] for the binary case. The derivations of [2] can be easily modified to the case of nonbinary codes, under the symmetry “matched” condition said above, and using a Gaussian approximation as in [9], the conditional distribution of each message L is ∼ N (µ1, Σµ ) where [Σµ ]i,j = 2µ for i = j and [Σµ ]i,j = µ for i ̸= j. Letting V the code variable corresponding to the edge message L, we define the mutual information function [ ( )] 3 ∑ ∆ −Li J(µ) = I(V ; L) = 1 − E log4 1 + e .
•
For the concatenation of the LT code with the LDPC code we also have the node-centric degree distribution of the LT input nodes. This is given by ∫x ∑ ι(u)du i (גx) = גi x = ∫01 . ι(u)du i 0 Note that for large number of nodes we have ∑ theαnfollowe−α n α(x−1) ing approximation for (גx) ∼ e = n n! x ∑ where α = i גi i is the average node degree for the input nodes [8]. Hence ι(x) is approximated by the following coefficients αi−1 e−α ιi = . (11) (i − 1)!
i=1
We use base-4 logarithm for mutual information calculations, hence in these sections H and C are in units of two bits per source symbol or per channel symbol, respectively. We first introduce the necessary notation and then write the EXIT equations for our problem. A Raptor code is formed by the concatenation of a precode, here implemented by a high rate regular LDPC code, and an “LT” code, which is a low-density generator matrix code with a special generator matrix degree distribution [8]. For the Tanner graph of the LT code, we define the input nodes and the output nodes. For the Tanner graph of the LDPC code, we define the variable nodes and the check nodes (see Fig. 6). EXIT charts can be seen as a multidimensional dynamic system. The EXIT “state variables” are x, y, X and Y defined as follows (See Fig. 6) : x denotes the average mutual information between a randomly chosen input node symbol and a message sent on a left-bound adjacent edge (from input to output nodes). y denotes the average mutual information between a randomly chosen input node symbol and a message sent on a rightbound adjacent edge (from output to input nodes). X denotes the average mutual information between a randomly chosen variable node symbol and a message sent on a right-bound edge (from variable to check nodes). Y denotes the average mutual information between a randomly chosen variable node symbol and a message sent
denote the node-centric left degree distribution. ∑ i−1 For the LT code, we let ι(x) = denote i ιi x the edge-centric degree distribution of the input nodes, ∑ and we let ω(x) = j ωj xj−1 denote the edge-centric degree distribution of the “output nodes”. The nodecentric degree distribution of the output nodes is given by ∫x ∑ ω(u)du j . Ω(x) = Ωj x = ∫01 ω(u)du i 0
Then, the LT EXIT equations are given by: ∑∑ x = Λk ιi J((i − 1)J −1 (y) + kJ −1 (Y)), k
y
= 1−
i
∑
(12)
{ ωj γJ((j − 1)J −1 (1 − x) + J −1 (H))
j
} +(1 − γ)J((j − 1)J −1 (1 − x) + J −1 (1 − C)) , (13) where γ = K/(K + n). Eq. (13) follows from the fact that a random edge (o, v) is connected with probability γ to a source node (i.e., to the channel with capacity 1 − H symbols per channel use), while with probability 1 − γ to a parity node (i.e., to the channel with capacity C). It is also easy to see that γ = rlt rldpc , where rlt and rldpc denote rate of the LT code and rate of the LDPC part respectively. The LDPC EXIT equations are given by: ∑∑ X = λk גi J((k − 1)J −1 (Y) + iJ −1 (y)), (14) k
Y
= 1−
i
∑ ℓ
ρℓ J((ℓ − 1)J −1 (1 − X)).
(15)
Eqs. (12), (13), (14), and (15) form the state equations of the global EXIT chart of the concatenated LT – LDPC graph with parameters H, C and γ, and the degree sequences ω, ι, ρ and λ. Let µj denote the mean of the LLR of a source variable node connected to a checknode of degree j, given by ( ( )) µj = J −1 1 − J jJ −1 (1 − x) + J −1 (1 − H). Then, the average symbol error rate (SER) of the source symbols can be approximated by ( √ )] ∑ [ µj . (16) Pe = Ωj 1 − Q3 − 2 j For simplicity, we fix the LDPC code to be a regular (2, 100) code (rldpc = 0.98). For this LDPC code, we find the mutual information threshold Y0 (using (14) and (15)) such that when y = Y0 is the input, then the LDPC EXIT converges to Y = 1. The value of Y0 depends on the LT input degree distribution ι(x), which in turns depends on α via (11). Fig. 7 shows Y0 as a function of α. On the other hand, as seen from (14) and (15), Y0 does not depend on H or C. Hence, for a fixed α, we can use the same Y0 (α) threshold for all H, C pairs. 1 0.9 0.8 0.7 Y0
j
yi < 1 −
B. LT Degree Optimization
∑
C H,C, C+H ,α
ωj fj
(yi ),
j
∀yi ∈ [0, Y0 (α)].
(19)
For fixed α, the inner optimization problem in (19) is a linear program in terms of {ωi }. The outer maximization of the coding rate rlt can be obtained through an educated search over the value α, for each pair (H, C). With some more effort, it is also possible to write the stability condition for the degree-2 output nodes, that reads Ω2 ≥ g(H, C, γ), for some complicated function g(·) omitted here for brevity. Differently than in the classical memoryless stationary channel case, in our case γ appears in the condition in a non-linear manner. Therefore, instead of including the stability condition as a constraint in the optimization, we simply check that the result of the optimization satisfies the stability condition (otherwise, it is discarded). VI. R ESULTS
0.6 0.5 0.4 0.3 0.2
objective of this optimization consists of maximizing γ (the code rate) for a given H, C pair, where the optimization is with respect to {ωi } and α. Since the LDPC code is fixed, γ is just a function of rlt = α ∑ 1ωj /j . In order to linearize the j constraints in {ωi } we approximate γ in (18) with its ideal value, i.e., C/(C + H), arguing that for a good code design, γ should be close to C/(C + H). The resulting optimization problem is given by: ∑ ωj minα min{ωj } α j j ∑ s. t. ωj = 1, ωj ≥ 0,
5
Figure 7.
10
α
15
20
LDPC mutual information threshold vs α
Next, we use (12) and (13) to eliminate x and write y recursively. Note that the recursion for y depends on the input Y coming from the LDPC graph. For simplicity, we decouple the system of equations (14-15) and (12-13) by fixing the LDPC and the target mutual information Y0 (α), and by disregarding the feedback from LDPC to LT in the BP decoder. Therefore, we let Y = 0 in (12). The recursion function fjH,C,γ,α (y) for a degree-j output node is given in (17) on the top of the next page. At this point, the LT EXIT recursion converges to the target Y0 if ∑ y