Robust and Ecient Video Communication Based on Combined Source- and Channel Coding Arild Fuldseth and Tor A. Ramstad Department of Telecommunications, Norwegian University of Science and Technology (NTNU) N-7034 Trondheim, Norway e-mail:ffuldseth,
[email protected]
ABSTRACT
A video communication system based on combined source- and channel coding is proposed. The proposed system uses a subband video coder structure in combination with powerconstrained channel-optimized vector quantization (PCCOVQ) to achieve ecient and robust transmission of the video signal over a channel with multidimensional PAM signaling. The side information is protected by using a suciently large minimum distance of the PAM signal constellation. This ensures error free transmission for the channel-optimized case, and graceful degradation for channel mismatch situations. For the main information, however, PCCOVQ with direct mapping from the source parameter space to the channel (modulation) space is used. A video communication system based on the H.263 video coder is used as a reference. The proposed system compares favorably to the reference system, and oers graceful degradation for channel mismatch situations.
1. INTRODUCTION
Combined source and channel coding systems have proven to achieve good performance for communication over noisy channels. Such communication systems can be divided into two main categories; robust source-optimized systems, and channeloptimized systems. In robust source-optimized systems, the source coder is optimized for a noise free channel taking only the statistics of the source into account. Robustness to channel errors for these systems is achieved by mapping the source coder parameters to the channel symbols such as to minimize the in uence of channel errors at the receiver side. This technique is usually referred to as index assignment, and various methods have been proposed in the literature [1, 2, 3, 4]. In channel-optimized systems, the source coder is optimized for a particular channel quality taking both the source- and the channel statistics into account. For these systems, it is assumed that the encoder and/or the decoder have knowledge about the channel quality. Examples of such systems are channel-optimized quantization [1, 5, 6, 7], and soft decision detection [8, 9].
In this paper, we describe a video communication system based on combined source- and channel coding. The proposed system is an improvement of the video communication system described in [10, 11] where a robust source-optimized subband video coder with index assignment to a multidimensional pulse amplitude modulation (PAM) signaling alphabet was proposed. The purpose of this work, is to design a channel-optimized video coder using the same basic video coding scheme and the same signaling alphabet as in [10, 11]. The proposed system is compared to a traditional video communication system based on the H.263 video coder. Due to the variable length coding of the source coder parameters in H.263, the traditional system is more ecient in terms of bits per pixel (bpp). For a given channel symbol rate and channel signal-to-noise ratio (CSNR), however, the proposed system outperforms the traditional reference system.
2. SYSTEM DESCRIPTIONS
In this section, the reference system based on the H.263 video coder and the proposed system based on a subband video coder are described in detail.
2.1. Reference System
The reference system is illustrated in Figure 1. The H.263 video coder is implemented as in [12] without any of the optional coding methods, but with the possibility for the motion vectors at the picture boundaries to point outside the picture. In addition to the basic mode described above, the H.263 coder is also implemented in an error resilient mode. In this mode, error detection based on the group of block (GOB) header as well as on non-used codewords is used. In addition, error concealment is implemented by repeating information from the previous frame or from the previous GOB whenever an error is detected. In order to reduce the propagation of errors through subsequent frames, 3 intra-blocks are introduced systematically in each frame at the encoder side. This technique increases the robustness to errors at the cost of a small increase in the bit rate compared to the basic mode. The output bit stream is mapped to the PAM signal constellation by traditional Gray coding. Note that since the H.263 coder uses variable length coding, it is not possible to take the
Video frames
-
Bit stream
H.263 encoder
-
Bit map
PAM symbols
-
Figure 1. Reference system (transmitter only). classi cation table scale factors
Video frames
- Analysis
lterbank
-
n
+
6
Classi -
- cation
-
Lossless coding
-
Bit map
-
-
VQ 3
-
Map 3
-
-
VQ 2
-
Map 2
-
-
VQ 1
-
Map 1
-
Bit map
-
-
???
Merge classes
Class 0
MUX
PAM symbols -
k
? -+ ?
Synthesis lterbank
LL 6
Motion compensation 6
-
Motion estimation
Analysis lterbank
Frame delay
motion vectors- Lossless coding
-
Figure 2. Proposed system (transmitter only). signi cance or the meaning of the individual bits to the amplitude levels of the multidimensional into account when mapping to the channel symPAM channel symbols by the index maps. For bols. Note also that by assuming additive white the proposed system, the choices of source vector channel noise, a PAM symbol can be considered as dimension (L), codebook size (N ), and channel the real or imaginary part of a quadrature amplispace dimension (K ) for each of the three classes tude modulation (QAM) symbol. This might be are listed in Table 1. Note that a K -dimensional useful for practical implementations. PAM symbol can be composed of K consecutive one-dimensional symbols in time (or equivalently, 2.2. Proposed System K= 2 consecutive QAM symbols). As an example, The proposed system, illustrated in Figure 2, uses consider class 1. The subband samples belonging a lter bank with 4 4 subbands for signal deto this class are quantized using a 2-dimensional composition, and temporal DPCM coding with inVQ of size 256 (VQ1). The 256 codebook vecband motion estimation and compensation as detors from a 2-dimensional source space are mapped scribed in [13]. The subband signals are coded in directly to 256 points in a 1-dimensional channel blocks of 4 4 adjacent samples from the same space (256-PAM) by MAP1. In [10], the vector subband. Each block is classi ed into one of quantizers were optimized for a noise free chanfour classes according to its mean squared value. nel, while the index maps were optimized using The subband samples belonging to classes 1-3 are simulated annealing to reduce the eects of channormalized and quantized using vector quantiznel errors. In this work the vector quantizers and ers (VQs) of three dierent rates, with the highthe index maps are optimized simultaneously usest rate for the class corresponding to the highest ing power constrained channel optimized vector mean squared value. The blocks having the smallquantization (PCCOVQ) [7]. est mean squared value (class 0) are not transmitted. The VQ output indices are mapped directly The side information parameters consisting of
the block classi cation table, the scale factors, and the motion vectors are treated dierently from the main information. The classi cation table and the motion vectors are eciently coded with a lossless method referred to as hierarchical enumeration [14], while the quantizer scale factors are quantized and coded by a 7-bit xed length code. The resulting bits streams are mapped directly to the PAM symbols in the same manner as for the reference system. Note also that the minimum distance of the PAM signal constellation for the side information and the main information might be dierent. In addition, leaky prediction is implemented by introducing a gain factor < 1 in the DPCM feedback loop as shown in Figure 2. By introducing leaky prediction the propagation of channel errors to subsequent frames is reduced at the cost of reduced prediction gain. Class 1 2 3
L N K 2 256 1 1 256 1 1 256 2
Table 1. Source vector dimension (L), codebook size (N ), and channel space dimension (K ) for each class.
3. PCCOVQ
The vector quantizers VQ1-VQ3 and the corresponding index maps MAP1-MAP3 are designed using PCCOVQ [7]. For PCCOVQ, the optimization problem can be formulated as follows. Let x, y, and x^ be the L-dimensional source vector, the K -dimensional vector of PAM symbols, and the L-dimensional reconstructed vector, respectively. Now, the design goal is to minimize E [kx ? x^ k2 ] (1) subject to E [kyk2] KSmax (2) where Smax is the maximum available power, and where the expectations are taken over the sourceand channel statistics. Assuming maximum likelihood (ML) signal detection at the receiver side, PCCOVQ can be optimized by Lagrange multiplier techniques as described in [7]. For source-optimized block-based classi cation (or bit allocation) the operational rate-distortion function of the various quantizers are used to classify each block according to its mean squared value. For the channel-optimized system a similar classi cation scheme is used. The operational rate-distortion functions, however, are calculated taking both the source- and the channel statistics into account.
4. EXPERIMENTS
For both transmission systems, the PAM channel symbols were transmitted over an additive white
Gaussian noise (AWGN) channel with maximum likelihood detection at the receiver side. For the error resilient reference system, the simulations were conducted by assuming error free transmission of the picture header, ensuring correct resynchronization of the bit pattern at the beginning of each frame. Furthermore, the H.263 coder was implemented without rate control, using the average bit rate of the entire sequence as a reference. For the proposed system the PCCOVQ was trained assuming a Laplacian distribution of the normalized subband samples. Furthermore, it was found that a leaky prediction coecient equal to 0:9 gave good results for the CSNR values of interest, and this value is used in all experiments. The two transmission systems were simulated using 100 luminance frames of the 'Foreman' QCIF sequence at a frame rate of 10 Hz. All experiments were conducted by averaging over 20 separate simulations of the entire video sequence using dierent random noise sequences. Furthermore, all comparisons between the two system were performed using the same channel symbol rate. Two dierent experiments were conducted. In the rst experiment the channel-optimized system was evaluated assuming that the actual CSNR was the same as the CSNR for which the system was designed. For this experiment, the side information symbols of the proposed system and the symbols of the reference system were transmitted without errors. This was achieved by using a minimum distance of the PAM signal constellation corresponding to a symbol error rate of 10?9 . In Figure 3 the performance is evaluated in terms of peak signal-to-noise ratio (PSNR) vs. CSNR. The reference system was evaluated both for 2-PAM and for 4-PAM. Similarly, the proposed system was evaluated by using 2-PAM and 4-PAM for the side information. Note that to achieve a symbol error rate of 10?9 a higher CSNR value is required for 4-PAM compared to 2-PAM. Note also that for the proposed system, the side information was transmitted using a xed energy (minimum distance) per channel symbol independent of the CSNR. Thus, at low CSNR values, the transmitted energy is dominated by the portion of energy used for the side information. As can be seen from the gure, the channel-optimized system oers a signi cant gain compared to the reference system. In the second experiment, channel mismatch was investigated. In this case, the actual CSNR is dierent from the design CSNR. Consequently, channel errors occured both for the side information of the proposed system, and for the reference system. The proposed system was designed for CSNR = 21.4 dB, and the minimum distance of the PAM constellation for the side information was chosen suciently large to ensure graceful degradation whenever the actual CSNR falls bellow the design CSNR of 21.4 dB. The results are shown in Figure 4. From the gure, we observe a graceful degradation of the proposed system when compared to the reference system.
37
38
36 37
35 36
34
PSNR
PSNR
35
34
33
32 33
31
32
30
31
29
30 10
15
20
25
30
35
CSNR
Figure 3. PSNR vs. CSNR, 0.187 PAM symbols/pixel. Reference system with 2-PAM (o), Reference system with 4-PAM (*), Proposed system with 2-PAM for side information: (|||), Proposed system with 4-PAM for side information: ({{{).
5. CONCLUSIONS
A video communication system based on combined source and channel coding has been proposed. The channel-optimized system compares favorably to the reference system using either 2-PAM or 4PAM. For channel mismatch situations, the proposed system degrades far more gracefully than the reference system making it particularly useful for video transmission over wireless channels such as in mobile and broadcasting applications.
REFERENCES
[1] N. Farvardin, \A study of vector quantization for noisy channels," IEEE Trans. Inform. Theory, vol. 36, pp. 799{809, July 1990. [2] K. Zeger and A. Gersho, \Pseudo-Gray coding," IEEE Trans. Commun., vol. 38, pp. 2147{2158, Dec. 1990. [3] P. Knagenhjelm, \How good is youe index assignment?," in Proc. Int. Conf. on Acoustics, Speech, and Signal Proc. (ICASSP), (Minneapolis, Minnesota, USA), pp. II:423{426, 1993. [4] A. Fuldseth and J. M. Lervik, \Combined source and channel coding for channels with a power constraint and multilevel signaling," in Proc. Nordic Signal Processing Symposium (NORSIG lesund, Norway), pp. 38{42, NORSIG, May 94), (A 1994. [5] A. Kurtenbach and P. Wintz, \Quantizing for noisy channels," IEEE Trans. Commun., vol. COM-17, pp. 291{302, Apr. 1969. [6] H. Kumazawa, M. Kasahara, and T. Namekawa, \A construction of vector quantizers for noisy channel," Electron. and Eng. in Japan, vol. 67-B, pp. 39{47, 1984. [7] A. Fuldseth and T. A. Ramstad, \Bandwidth compression for continuous amplitude channels based
28 14
16
18
20 CSNR
22
24
26
Figure 4. PSNR vs. CSNR for channel mismatch, 0.200 symbols/pixel. Reference system: (|||), Proposed system: ({{{), design CSNR = 21.4 dB. on vector approximation to a continuous subset of the source signal space," in Proc. Int. Conf. on Acoustics, Speech, and Signal Proc. (ICASSP), vol. IV, pp. 3093{3096, 1997. [8] F.-H. Liu, P. Ho, and V. Cuperman, \Joint source and channel coding using a non-linear receiver," in Proc. IEEE ICC'93, (Geneva, Switzerland), pp. 1502{1507, June 1993. [9] M. Skoglund and P. Hedlin, \Vector quantization over a noisy channel using soft decision decoding," in Proc. Int. Conf. on Acoustics, Speech, and Signal Proc. (ICASSP), vol. V, (Adelaide, Australia), pp. 605{608, Apr. 1994. [10] A. Fuldseth and T. A. Ramstad, \Combined video coding and multilevel modulation," in Proc. Int. Conf. on Image Processing (ICIP), vol. I, (Lausanne, Switzerland), pp. 941{944, Sept. 1996. [11] A. Fuldseth and T. A. Ramstad, \Robust subband video coding with leaky prediction," in Seventh IEEE Digital Signal Processing Workshop, (Loen, Norway), pp. 57{60, Sept. 1996. [12] TelecommunicationStandardization Sector, Study Group 15, Working Party 15/1, Expert's Group on Very Low Bitrate Videophone, Video Codec Test Model, TMN5, Jan. 1995. Source: Telenor Research. [13] A. Fuldseth and T. A. Ramstad, \Subband video coding with smooth motion compensation," in Proc. Int. Conf. on Acoustics, Speech, and Signal Proc. (ICASSP), (Atlanta, USA), pp. 2331{2334, May 1996. [14] A. Fuldseth, I. Balasingham, and T. A. Ramstad, \Ecient coding of the classi cation table in low bit rate subband image coding by use of hierarchial enumeration," in Proc. Int. Conf. on Image Processing (ICIP), 1997. Submitted.