Fixed-Packet-Length Transcoding for Error ... - Semantic Scholar

1 downloads 0 Views 247KB Size Report
may be exploited by tailoring the channel coding scheme ac- cording to the different ... Unequal error protection has been used for audio transmis- sion, as in GSM and adaptive multi rate (AMR) [4, 5], with convolutional or turbo codes, ...
FIXED-PACKET-LENGTH TRANSCODING FOR ERROR RESILIENT VIDEO TRANSMISSION OVER WCDMA RADIO LINKS M. G. Martini, M. Mazzotti and M. Chiani D.E.I.S., University of Bologna, V.le Risorgimento 2, 40136 Bologna, ITALY ABSTRACT Robust video transmission over wireless channels is very critical, due to channel impairments and to their effect on the compressed bitstream. In order to improve error resilience and, consequently, the received video quality, a novel unequal error protection (UEP) technique (fixed-packet-length transcoding assisted UEP, FPT-UEP) is proposed in the paper, and applied to the case of MPEG-4 video transmission. The proposed scheme increases at the same time the error resilience and enables the application of unequal error protection in the case of coded sources with variable length packets and partitions. The technique is applied to the case of MPEG-4 video transmission over a WCDMA radio link. Simulation results show that a significant improvement in terms of PSNR and of subjective quality can be achieved with the proposed scheme compared to equal error protection. 1. INTRODUCTION Video transmission will be the main application characterizing third generation and beyond mobile communication systems. It is thus of critical importance to ensure good quality video services. The provision of error resilience to video sources will play a main role in the achievement of this target, together with the exploitation of joint source and channel coding techniques. In the case of wireless video transmission, where tights constraints in terms of bandwidth and delay are present due to the channel, and a residual redundancy in coded video sources is always present, the hypotheses of the Shannon’s separation theorem [1] are not fulfilled and a joint source and channel approach is advisable. In particular, the residual redundancy of the source may be exploited by tailoring the channel coding scheme according to the different sensitivity of the different bitstream portions. Unequal error protection has been used for audio transmission, as in GSM and adaptive multi rate (AMR) [4, 5], with convolutional or turbo codes, for progressive image trans-

mission [6] and for subband coded audio and video transmission, as some kinds of sources lend themselves to be partitioned into different sensitivities groups of bits. Also unequal error protection for block based (DCT) video coded sources has been proposed, as in [7], where UEP is exploited for H.263 video transmission. In this case the encoder was modified to output the bit classification information to the channel coder. In [11, 12] MPEG-4 video transmission through unequal error protection is afforded, neglecting the transmission of side information about packet partitions length (different in different packets): ideal knowledge of the bitstream structure is considered at the input of the channel encoder and of the channel decoder. Proportional Unequal Error Protection (P-UEP), proposed in [13], is a technique allowing the differentiation of the protection applied over a compressed video bitstream also in the case of sources with dynamically varying partition lengths: each packet is in fact unequally protected according to fixed percentage values chosen for the partition lengths. A not negligible advantage of this technique is the possibility to perform unequal error protection without transmitting additional information about the size of the partitions in each packet. The mere knowledge of where each packet starts allows to determine the partition lengths, which are statistically corresponding to the classes of different importance of the bits. In this paper we propose an alternative solution: a transcoding technique which allows the construction of packets of a predetermined fixed length, starting from the variable length packets provided, e.g., by MPEG-4. This technique statistically allows to protect the partitions within each packet in accordance with their different significance. We call it Fixed-Packet-length Transcoding assisted UEP (FPT-UEP) and it basically relies on a proportional decomposition scheme similar to that described in [13]. It may be useful in any case of packet transmission over a network, where a fixed length packet is required. Some additional error resilience tools are applied in the paper. The performance of the proposed techniques is here evaluated over a vehicular channel, considering the WCDMA

uplink characteristics. The remainder of this paper is organized as follows. In section 2, after a short description of the MPEG-4 bitstream structure, the proposed transcoding based unequal error protection scheme is introduced and described in detail in the case of MPEG-4 video transmission. In section 3, after a short description of the WCDMA standard, the experimental environment considered is described. Performance results of MPEG-4 video transmission over the UMTS/FDD vehicular environment are presented in section 4.

2. FIXED-PACKET-LENGTH-TRANSCODING ASSISTED UEP As shortly described in the introduction, proportional unequal error protection is a useful technique to avoid transmitting side information when in presence of a source bitstream with variable packets and partitions length. It is evident that, with the application of this P-UEP scheme, one of the most critical aspects is the detection of Start Codes (SC), which identify the beginning of video packets. In fact, if we fail the identification of one of these synchronization points, we obtain an incorrect value for the length of the received packet and, as a direct consequence, the size of the differently protected partitions within the packet are also wrongly calculated. In [14], a technique to make sufficiently robust the start codes detection has been proposed. It has been suggested to substitute the original MPEG-4 start codes with other sequences more reliably detectable in a particularly noisy environment, as a wireless channel. Here an alternative technique, mainly based on the reorganization of the bitstream in packets of a predetermined fixed length is proposed. This technique may be useful to increase the bitstream error resilience and to enable a direct application of unequal error protection in any case of source bitstreams composed of packets and partitions of different lengths. In order to focus on a specific application, the description of the technique is given in the following with reference to the MPEG-4 video coding standard.

2.1. The MPEG-4 bitstream structure Thanks to the tools of Re-synchronization and Data Partitioning provided by MPEG-4 standard, the MPEG-4 bitstream results logically organized in an hierarchical structure. The elementary packets transmitted are called video packets (VP) and they contain the coded description of the various blocks and macro-blocks of pixels in which each frame is decomposed. A proper number of VPs forms a video object plane (VOP), that, for sequences containing

a single video object (VO), coincides with a frame. Each VOP of the sequence is coded either in intra mode (frame I), i.e. without any reference to previous images, or in inter mode, i.e. differentially predicted from the previous frame (P frame) or from a combination of the previous and the following frames (B frame). A group of VOPs, in turn, constitutes a group of video object planes (GOV). Finally, at a logical upper level, there are the video object layers (VOL) and the video objects (VO). In fig.1 the logical structure of the MPEG-4 bitstream is schematically depicted: for the first intra-coded and intercoded packets a simplified scheme of how the data are structured within is also reported.

2.2. FPT-UEP The technique proposed allows to provide fixed length packets to the lower protocol layers, and to univocally reconstruct from the corrupted version of these packets the corresponding MPEG-4 compliant packets to be fed to the MPEG4 decoder. The proposed FPT-UEP technique, as shown in fig.2, is structured in four steps: 1. SC substitution; 2. append of stuffing bits; 3. partitioning of the packet and insertion in different queues; 4. assembling of the fixed-length packet. In the following, these points will be described in more details. The original MPEG-4 start codes are substituted with other codewords more robust to channel errors. In particular, the new start codes are chosen in order to avoid their emulation in the bitstream even if some bits have been corrupted during the transmission. The resultant length of the packets is indicated with l. The lengths of the substituted start codes are listed in table 1. A sequence of s stuffing bits with value ’1’ is appended at the end of the packet, so that ˆl = l + s = mW

(1)

where ˆl is the length of the ”stuffed” packet, m is an integer and W is an appropriate number of bits, whose meaning will become clear below. In other words, the stuffing bits are aimed at making the length of the packet multiple of W bits.

I - Video Packet VO SC & VO header

VOL SC & VOL header

VOP SC Frame I

Header

Partition 1 DC Partition 2 Partition 2 DC info Marker AC info

Resync VP SC

Resync VP SC I-Video Packet

VOP SC Frame P

P-Video Packet

Resync VP SC

Resync VP SC

Partition 2 DC/AC info

Header

VOP Start ‘VOP P’

P - Video Packet

P-Video Packet

P-Video Packet

Resync VP SC

Motion Partition 1 Marker Motion info

GOV SC

GOV Header

VOP SC Frame I

I-Video Packet

END CODE

Fig. 1. MPEG-4 bitstream logical structure. The packet is decomposed in N partitions in a proportional way according to the coefficients Pi =

wi W

(2)

with wi ∈ {1, 2, ..., W }, and under the constraint N X

Pi = 1

(3)

wi = W.

(4)

i=1

or, equivalently, N X i=1

The bits belonging to the N different partitions are then inserted in N distinct queues, scheduled according to FIFO rules. From the queues, the bits are taken in correct proportions in order to build a packet of the fixed length desired. The overall size L of the fixed-length packet may be any multiple of W, i.e. L=q·W (5) where q is a positive integer. The packet results composed of N distinct parts, each containing the bits from the corresponding queue. The ith partition has length Li = Pi · L, so that N N X X L= Li = Pi · L. (6) i=1

i=1

The fixed-length packet is then coded and punctured if needed. As a result, the bitstream transmitted is a regular sequence of fields of different importance, and their lengths are now fixed. In this way, the decoding process may be correctly performed with the exact knowledge of partitions lengths; if punctured codes are used, the de-puncturing process will be always correctly performed: in fact, the knowledge of L and of the coefficients Pi is sufficient to determine exactly the portions of the bitstream where to apply the different de-puncturing matrices.

At the receiver side, an inverse algorithm allows the reconstruction of the original packets. N memory buffers are required. For i = 1, ..., N , Li bits are taken from the generic packet decoded and inserted in the ith buffer. Then the start codes detection is performed. Thanks to the technique described above, all the start codes belong to the first partition. As a consequence, their search can be limited to the first buffer: in other words, the processing window is formed by P1 · L bits every L bits received. The search is performed through hard correlation between copies of each SC and the decoded bits contained in the first buffer. The output of the correlator is then compared with a threshold, initially set equal to the SC length, so that the process is equivalent to searching the SC’s by identity. It follows that, if one or more bits have been corrupted during the transmission, the SC is not immediately detected. In this case, the search can be repeated after properly lowering the threshold. It is now evident the convenience of substituting the original SCs with others, less simply emulated by the MPEG bitstream. The re-synchronization tool included in MPEG-4 permits to define a sort of mean value for the length of the video packets. Thanks to this property, it is always possible to determine a maximum dimension for the variable-length packets (i.e. a size they never exceed), which is never far away from the mean value specified during the source coding process. Calling ˆlmax the maximum length after the SC substitution and the stuffing process, we know a SC has been surely skipped when we have processed ˆlmax · P1 bits in the first buffer without detecting any Start Code. In this case, the threshold is lowered of a prefixed quantity and the search is repeated through the last ˆlmax · P1 bits. This process is repeated until a SC has been detected and, considering hard correlations, this is equivalent to admit a number of errors in the searched sequence increasing of 1 at each step. The delay introduced by this iterative process is not particularly high, because of the limited search window. From the knowledge of the position of the SCs, we can determine the size of the first partition of the VPs. We may thus compute

Variable length packets MPEG-4 Coder

SC substitution & Stuffing

Stuffed variable length packets (multiple of W bits)

l l3

l2

l1

Substituted SC

Substituted SC

Queue 2

Queue 1

L1

Queue 3

L3

L2

L bits Fixed length packet

to channel coding and puncturing

Fig. 2. Fixed-length-packet transcoding (N=3). the overall length of the packet, according to ˆl = l1 P1

Table 2. MPEG-4 stuffing bits. (7)

and the lengths of the other partitions, contained in the different buffers: l1 li = Pi · ˆl = Pi · (8) P1 . With the knowledge of these lengths, an equivalent MPEG4 packet may be reconstructed: after the substitution of the original start codes and de-stuffing, the bitstream obtained is MPEG-4 compatible. The stuffing bits added at the transmitter side are then removed from the packet. We observe that the stuffing and the correspondent de-stuffing (at the receiver side) are allowed by the characteristics of the compression standard considered. All the added bits, in fact, have to be recognizable at the receiver, in order to be eliminated. MPEG4 variable-length packets are multiple of 8 bits: if not the particular stuffing sequences listed in tab.2.2 are appended, according to the number of bits required to reach that multiplicity. As evident from the table, a direct consequence is that the last byte of the packet contains always a 0, so it is easily recognizable during the inverse process of elimination of the stuffing 1’s. In practice, the de-stuffing consists in truncating the packet to the last byte containing a 0. Table 1. SC substitution performed. SC type GOV SC VOP SC VP SC ENDCODE SC

Original SC length 32 32 17 32

Substituted SC length 32 32 32 40

Number of required bits 1 2 3 4 5 6 7 8

MPEG-4 stuffing sequence 0 01 011 0111 01111 011111 0111111 01111111

3. EXPERIMENTAL ENVIRONMENT As anticipated above, the proposed technique has been applied to the transmission of MPEG-4 video over an UMTSlike physical link. A short description of the MPEG-4 bitstream structure and a short outline of the UTRA/FDD uplink are given in the following. 3.1. The WCDMA physical link We evaluated the performance of MPEG-4 video transmission with the proposed scheme over the UMTS uplink. The considered air interface is the UTRA-FDD, which is based on a W-CDMA [16] scheme with a basic chip rate of 3.84 Mcps, corresponding to a bandwidth of approximately 5 MHz. Two channels, namely the Dedicated Physical Data Channel (DPDCH), where data information is transmitted, and the Dedicated Physical Control Channel (DPCCH), for control information, are considered separately. The transmitted information is separated in frames of length of Tf =10 ms (38400 chips), divided in Ns =15 slots. In the uplink, DPDCH and DPCCH are simultaneously transmitted in each

AWGN BS Transmitter Video Input

MPEG-4 Encoder

FPT

UEP & Interleaving

Mod. Spreading Scrambling

Power Amplifier

Reverse link Propagation Channel

Forward link Propagation Channel

Power Control Command Generation

Video Output

MPEG-4 Decoder

Inverse FPT

De-interleaving & UEP Decoding

SIR Estimation

De-spreading De-scrambling 1

RAKE Receiver

De-spreading De-scrambling 2

De-spreading De-scrambling N

MS Receiver

Fig. 3. The transmission scheme considered - WCDMA uplink. slot after spreading as, respectively, the in-phase and quadrature components of a QPSK transmitted symbol. User terminals transmit data packets to the base station on the DPDCH. Scrambling is then performed on the complex symbols. The reference transmission scheme is depicted in fig. 3. We considered a spreading code of length 64, providing a gross data rate (after channel coding level operations) of 60 kbps. Forward error correction through 1/3 convolutional or turbo coding is adopted in the UTRA-FDD standard. Here a different coding scheme, as described in the following paragraph, is considered. The wideband channel has been modelled using a tapped delay line model, with six uncorrelated Rayleigh processes. The vehicular A channel with a mobile speed of 30km/h is considered. A signal to noise ratio (SNR) based closed-loop power control (CLPC) has been applied, with a command rate coincident with a slot (10ms/15) and a step size of 1 dB. At the receiver side, a Rake receiver with maximum ratio combining, with 3 fingers, has been considered. No antenna diversity is exploited. The parameters considered are summarized in table 3. 3.2. Simulation set-up Simulations have been carried out considering the ”Carphone” test sequence at QCIF resolution. We coded the sequence according to the MPEG-4 standard in a single scalability layer and considered rectangular objects; a VOP is thus coincident with a frame.

Table 3. parameters WCDMA uplink Channel bandwidth 5Mhz DPDCH + DPCCH Spreading Factor=64 - QPSK Channelization codes OVSF Scrambling code 38400 chip long code Frame length 10ms(15 slots) PC command rate 1500 Hz PC step size 1dB MS speed 30 km/h

Some of the error resilience tools provided by the MPEG4 standard have been exploited: packets, data partitioning and HEC have been considered, but simple VLC coding has been considered instead of Reversible VLC. Furthermore, the technique proposed in [15] has been applied in order to allow error detection in headers. Some other error resilience techniques developed by the authors, made possible by header’s error detection, have been applied in all cases (both for UEP and EEP), although not described here for brevity. We transmitted the Carphone sequence in QCIF format, at 10 frames per second and at a bit-rate of 35kbps. GOV’s composed of an I frame followed by nine P frames have been considered. An MPEG-4 packet length of (about) 1000 bits has been considered in the simulations performed. The MPEG-4 coded bitstream was then channel coded ac-



1 1  G= 1 1 1

0 0 1 0 1

1 0 0 1 1

1 1 1 0 1

1 1 1 1 0

 1 1  1 . 1 1

The puncturing matrices have been obtained from those in [2, 3]. Percentage lengths of the ”new” partitions have been evaluated through simulation. In order to simplify the implementation and to reduce the information needed at the receiver, we decided to consider the same percentages for I and P frames, although a more accurate scheme could consider different percentages for I and P frames. We consider the first partition (start codes and header) the 6.25% of the packet length, the first data partition the 25% of the packet length and the second data partition the 68.75% of the packet length. We considered for the first data partition a higher value than the average one evaluated by simulation, in order to make sure that highly sensitive DC/motion markers are protected with the higher rate. Coded packets are transmitted over the channel described above, characterized by Es /N0 , the average Signal to Noise Ratio (SNR) per transmitted symbol. In all the cases considered, coded packets are channel decoded through a soft input Viterbi decoder and re-organized to generate the bitstream fed to the MPEG-4 decoder. An MPEG-4 decoder modified from the MoMuSys [9] with improved error robustness (similarly as in [10]) has been used. Performances are evaluated in terms of peak signal to noise ratio (PSNR) and compared to the case of equal error protection (EEP) with channel code rate equal to the average rate of the considered UEP scheme.

4. RESULTS Fig.4 shows the results obtained in terms of PSNR, for an average Es /N0 of 2dB.

CARPHONE SEQUENCE, Es/No=2 dB, IPPPPPPPPP, average coding rate 8/13, UMTS channel 32

FPT−UEP EEP 30

28

26

24

PSNR

cording to the proposed scheme. In particular, the header partition is protected with code rate 1/3, the first data partition (DC partition for I frames, motion partition for P frames) is protected with a rate 2/3 code for P frames and rate 2/5 for I frames; the second partition (AC partition for I frames and texture partition for P frames) with a rate 8/9 code for P frames and rate 1/2 for I frames. The average channel coding rate is thus about 8/13. Different channel coding rates have been used for I and P frames, in order to allow the availability of a correct reference frame for the predictive coding of P frames. We considered a more unbalanced scheme among partitions of P frames, due to the more evident difference in sensitivity to errors of P packets partitions. RCPC codes obtained by a mother code of rate 1/5 have been considered, with memory equal to 5 (32 states). The generator matrix considered is thus:

22

20

18

16

14

0

5

10

15 20 frame number

25

30

35

Fig. 4. Comparison between FPT-UEP and EEP. PSNR vs. frame number - Carphone sequence - UMTS physical link Es /N0 = 2dB. We may observe that the proposed technique remarkably outperforms the equal error protection scheme with the same total bit rate. About 4 dB in terms of PSNR are in average gained with the proposed scheme, after the first GOV. The observed quality of I frames is much higher than the quality of P frames, since I frames are more protected than P frames in order to have a correct reference frame for the successive P frames. We may also observe that, even if P frames are much less protected than in the case of equal error protection, for most channel conditions the PSNR of P frames is higher than in the case of EEP. This is both due to the fact that a more correct reference frame is available and to the fact that inside P frames the rate is unbalanced among partitions, providing a better perceived quality. Fig. 5 shows the visual results obtained for one realization of the channel, for an average Es /N0 =2dB. The frame no. 20 of the ”Carphone” sequence is shown. On the left, the received frame transmitted over the WCDMA like uplink described, in the case of equal error protection, with channel coding rate Rc = 8/13, is shown; in the center, the same frame received in the case on unequal error protection with the proposed scheme, with the same average channel coding rate, is reported. The same frame received without noise is reported on the right. Fig. 6 shows the frame n.32 of the ”Carphone” sequence. Also in this

case, the frames reported are those received with equal error protection and unequal error protection for Es /N0 =2dB and in the case of reception without noise. It may be observed that the quality of the I frame is very high in the case of FPT-UEP: I frames are more protected in this case. As anticipated before, the quality of the P frame is also much higher than in the case of equal error protection, even if less protected. In the case of the reference frames on the right, the distortion is only due to source coding. 5. CONCLUSIONS A novel unequal error protection (UEP) technique (fixedpacket-length transcoding assisted UEP, FPT-UEP) for wireless video transmission has been proposed in the paper, considering the case of MPEG-4 video transmission. The proposed scheme increases at the same time the error resilience and enables the application of unequal error protection in the case of coded sources with variable length packets and partitions. The technique is applied to the case of MPEG-4 video transmission over a WCDMA radio link. Simulation results have shown that a significant improvement in terms of PSNR and of subjective quality can be achieved in different channels with the proposed scheme compared to equal error protection. 6. REFERENCES

on Selected Areas in Communications, vol. 18(6), pp. 814–818, June, 2000. [7] C. W. Yap, K. T. Tan and K. N. Ngan, ”Error Resilient Video Coding over DS-CDMA Channels”, IEEE Int. Workshop on Intelligent Signal Processing and Communication Systems, Melbourne, Nov. 1999. [8] MPEG-4 Video Group, ”Overview of the MPEG-4 Standard”, ISO/IEC JTC1/SC29/WG11 N3444, Geneva, May-June 2000. (http://www.cselt.it/mpeg/standards/mpeg4/mpeg-4.htm) [9] MoMuSys project website: http://www.cordis.lu/infowin/acts/rus/projects/ac098.htm [10] S.Valente, C. Dufour, F. Groliere, D. Snook, ”An efficient error concealment implementation for MPEG-4 video streams”, IEEE Transactions on Consumer Electronics, No.3, August 2001. [11] M. Budagavi, W. Rabiner Heinzelman, J. Webb, R. Talluri, ”Wireless MPEG-4 Video Communication on DSP Chips”, IEEE Signal Processing Magazine, January 2000. [12] M.G. Martini, M.Chiani, ”Wireless Transmission of MPEG-4 Video: Performance Evaluation of Unequal Error Protection over a Block Fading Channel”, IEEE Veh. Technol. Conf. (VTC 2001), Rhodes, May 2001.

[1] C. E. Shannon, ”A Mathematical Theory of Communication”, The Bell System Technical Journal, vol. 27, pp 379-423, 623-656, July, October, 1948.

[13] M. G. Martini, M. Chiani, “Proportional Unequal Error Protection for MPEG–4 video transmission”, Proc. IEEE ICC 2001, Helsinki, June 2001.

[2] J. Hagenauer, ”Rate-Compatible Punctured Convolutional Codes (RCPC Codes) and their Applications”, IEEE Trans. Commun., vol.36, no.4, pp. 389-400, April 1988.

[14] M. G. Martini, M. Chiani, ”Robust Transmission of MPEG-4 Video: Start Codes Substitution and Length Field Insertion Assisted Unequal Error Protection”, Picture Coding Symposium - PCS 2001, Seoul, April 2001.

[3] L.Diana, J.M.Kahn, ”Rate-adaptive modulation techniques for infrared wireless communications”, IEEE ICC 1999. [4] F. Burkert, G. Caire, J. Hagenauer, T. Hindelang, G. Lechner, ”Turbo decoding with Unequal Error Protection applied to GSM speech coding”, Proc. Globecom 1996. [5] W. Xu, S. Heinen, M. Adrat, P. Vary, T. Hindelang, M. Schmautz, J. Hagenauer, ”An Adaptive Multirate Speech Codec Proposed for the GSM”, Third ITG Conference on Source and Channel Coding, Munchen, Jan 2000. [6] A. A. Alatan, Minyi Zhao, and A. N. Akansu, “UEP of SPIHT encoded image bit-streams”, IEEE Journal

[15] M.G.Martini, M.Chiani, ”Joint Source-Channel Error Detection with Standard Compatibility for Wireless Video Transmission”, Proc. IEEE WCNC 2002, Orlando, Florida, March 2002. [16] H. Holma, A. Toskala (eds.), ”WCDMA for UMTS”, Wiley, 2001.

Fig. 5. Carphone frame no. 20 (I), received with average Es /N0 = 2dB. On the left: EEP with avg. rate 8/13, PSNR=15.17 dB; in the center: the FPT-UEP case with the same avg. rate, PSNR=28.17; on the right: the frame in the ideal noiseless case, PSNR=32.16.

Fig. 6. Carphone frame no. 32 (P), received with average Es /N0 = 2dB. On the left: EEP, PSNR= 16.65 dB; in the center: FPT-UEP, PSNR=23.03. On the right: the frame received in the ideal case of no noise, PSNR=31.92

Suggest Documents