H. A. Karim et al.: Scalable Multiple Description Video Coding for Stereoscopic 3D
745
Scalable Multiple Description Video Coding for Stereoscopic 3D H. A. Karim, C. T. E. R. Hewage, S. Worrall and A. M. Kondoz Abstract — Scalable multiple description video coding (MDC) provides adaptability to bandwidth variations and receiving device characteristics, and can improve error robustness in multimedia networks. In this paper, a scalable MDC scheme is proposed for stereoscopic 3D video. Scalable MDC has previously been applied to 2D video but not to 3D video. The proposed algorithm enhances the error resilience of the base layer of the scalable video coding (SVC) standard using even and odd frame based MDC. The performance of the algorithm is evaluated in error free and error prone environments. Simulation results show improved performance using the proposed scalable MDC at high error rates compared to the original SVC. To improve the rate distortion performance, down-sampling of the depth information is proposed for SVC and scalable MDC. The proposed method reduces the overall bit rates and consequently: 1) improves their rate distortion, particularly at lower bit rates in error free channels; and 2) improves their performance in error prone channels1. Index Terms — 3D video, scalable multiple description video coding, down-sampling and up-sampling.
I. INTRODUCTION Modern video transmission and storage systems are typically characterized by a wide range of access network connection qualities and end terminal capabilities. Scalable MDC has been proposed to tackle the scalability issues and at the same time provide error resilience for video transmission. 3D scalable MDC video can provide significant benefits for videoconferencing in terms of: 1) a more immersive communication experience due to 3D; 2) adaptability to different terminal types using scalability; and 3) improved robustness to packet losses using MDC. Example applications include virtual collaboration, mobile television and internet video streaming. 3D video allows users to feel the presence of the persons they are communicating with or be truly immersed in the event they are watching. 3D video is being used in many applications and research has been carried out on 3D-TV and 3D audio and video communication, with the aim of simple 1 This work was supported in part by VISNET II, a European Network of Excellence (http://www.visnet-noe.org), funded under the European Commission IST FP6 programme. H. A. Karim is with the University of Surrey, Guildford, GU2 7XH UK, on leave from the Multimedia University, Cyberjaya, Malaysia (e-mail:
[email protected]). C. T. E. R. Hewage is with the University of Surrey, Guildford, GU2 7XH UK (e-mail:
[email protected]). S. Worrall is with the University of Surrey, Guildford, GU2 7XH UK (email:
[email protected]). A. M. Kondoz is with the University of Surrey, Guildford, GU2 7XH UK (e-mail:
[email protected]).
Contributed Paper Manuscript received February 14, 2008
provision of 3D television contents to users and of exploring the potential for 3D video communication [1] [2]. Over the years, many manufacturers have developed 3D displays that offer auto-stereoscopic 3D displays, allowing multiple users to view 3D content at the same time without 3D glasses [3]. 3D mobile phones are also being built, such as in [4], allowing communication in 3D. Scalable video coding produces a number of hierarchical descriptions that provide flexibility in terms of adaptation to user and network requirements. However, in error prone environments, the loss of a lower layer prevents all higher layers being decoded, even if the higher layers are correctly received, which means that a significant amount of correctly received data must be discarded in certain channel conditions. However, there is an error resilient technique, called multiple description coding (MDC), which divides a source into two or more correlated layers. This means that a high-quality reconstruction is available when the received layers are combined and decoded, while a lower, but still acceptable quality reconstruction is achievable if only one of the layers is received. MDC is an effective way to combat burst packet losses in wireless and Internet networks. MDC is a promising approach for video application where retransmission is unacceptable [5]. Hence, with 3D scalable MDC, users can experience flexibility in terms of the terminals used and at the same time have a 3D visual communication system that is robust to packet losses. There are many MDC methods in the literature. One of the most popular one is the multiple state video coding (MSVC) proposed in [6]. This method splits the input video into sequences of even and odd frames, each being coded as an independent description. In this paper, the MSVC technique is used to produce the scalable MDC for stereoscopic 3D video. Other MDC types are potentially more efficient, but MSVC is computationally simple, and standard compliant bit streams can be produced. It also introduces no mismatch when only one of the descriptions is received because the decoder uses the same prediction signal as the encoder for each generated description. In this paper, an overview of the 2D video plus depth representation for 3D video is described in Section II. A brief review of scalable multiple description video coding algorithms is presented in Section III, followed by the proposed scalable multiple description coding (MDC) for stereoscopic 3D video. Reduced resolution depth compression for scalable MDC is then proposed in Section IV. The performance of the algorithms is investigated through extensive simulation in error free and error prone channels in Section V. The paper is finally concluded in Section VI.
0098 3063/08/$20.00 © 2008 IEEE
Authorized licensed use limited to: University of Surrey. Downloaded on November 6, 2008 at 07:15 from IEEE Xplore. Restrictions apply.
746
IEEE Transactions on Consumer Electronics, Vol. 54, No. 2, MAY 2008
II. 3D STEREOSCOPIC VIDEO FROM 2D VIDEO PLUS DEPTH Stereoscopic video is the simplest form of 3D video and can be integrated into communication applications using existing video technologies. Stereoscopic video renders a different view for each eye, which facilitates depth perception of the 3D scene. An alternative form of stereoscopic video is described in [7], which uses 2D video from an ordinary 2D video camera and depth information from a laser range or depth camera as shown in the example in Fig. 1. The depth information tells us how far each pixel is from the camera. A rendering algorithm, such as in [7] is used to create two views for the user, one for each eye, creating the 3D impression. The two rendered views (for the left and right eye) can then be played using a Stereoscopic Player to produce 3D stereoscopic video [8] as shown in Fig. 2 and can be viewed using 3D viewing glasses or an auto stereoscopic display.
(a)
(b)
Fig. 1. (a) 2D video and (b) its depth information
Fig. 2. A CIF 3D Stereoscopic sequence played in Stereoscopic Movie Player
One of the advantages of this stereoscopic format is that the associated depth information contains a lot of low frequency components [9]. Therefore, it can be compressed much more efficiently using standard video codecs such as MPEG4 and H.264 [10] to below 10-20% of the basic color video bit rate without significantly affecting the quality. The amount of depth needed can also be controlled at the receiver. The 2D+depth image sequences are available at [2]. The 2D video plus depth format was also recently standardized in ISO/IEC 23002-3 (MPEG-C part 3) [11] [12]. An example of a product in the market that uses the 2D video plus depth format for television is described in [13]. An example of a 3DTV system using the 2D plus depth format is described in [14]. III. 3D SCALABLE MULTIPLE DESCRIPTION CODING
is a video coding standard that supports spatial, temporal and quality scalability [15]. Scalability of SVC is provided at the bit stream level, which means bit streams for a reduced spatial and/or temporal resolution can be obtained by discarding NAL units that are not required from a global SVC bit stream. Scalable MDC has been proposed to improve the error resilience of the transmitted video over unreliable networks and at the same time provide adaptation to bandwidth variations and receiving device characteristics [16]. Scalable MDC approaches can be divided into two categories. The first approach begins with an MDC coder, and the MDC descriptions are then made scalable [17], [18]. The second approach begins with a scalable video coder and the layers are then mapped into MDC descriptions [19][24]. In [18] for example, a single MDC stream is split into base and enhancement layers. In [21], a non-standardized scalable wavelet encoder is used to produce MDC streams. The number of descriptions, rate of each individual description, and redundancy level in each description can be varied using post encoding. Another example of a nonstandardized wavelet scalable MDC encoder is presented in [25]. It is based on a motion-compensated scheme derived from Haar multi-resolution analysis, which produces temporally correlated descriptions. The embedded MDC proposed in [26], uses embedded multiple description scalar quantizers and a combination of motion compensated temporal filtering and a wavelet transform. In this approach the transmission of motion data is assumed to occur without errors. Scalable MDC based on SVC is proposed in [24]. Two descriptions are generated for each enhancement layer of an SVC coded stream by embedding in each description only half of the motion and texture information of the original coded stream with minimum redundancy. However, noticeable artifacts are observed in an ideal MDC channel due to motion vector approximation from neighboring blocks. All of the scalable MDC schemes described above have been applied to 2D video, but so far have not been applied to 3D video. Non-scalable MDC schemes were proposed for 3D stereoscopic left and right views in [27] using spatial scaling and multi-state coding. B. Proposed 3D scalable MDC The proposed 3D scalable MDC in this paper begins with a standard SVC coder. The error resilience of the base layer of SVC is then enhanced using temporal MDC. The temporal MDC of the base layer is produced by using the MSVC approach as described in [6], which separates the even and odd frames of a sequence into two MDC streams as shown in Fig. 3.
A. Review of scalable MDC Scalable video coding has been developed as an extension for most of the video coding standards, including H.264/AVC. The scalable extension of H.264/AVC, simply known as SVC,
Authorized licensed use limited to: University of Surrey. Downloaded on November 6, 2008 at 07:15 from IEEE Xplore. Restrictions apply.
H. A. Karim et al.: Scalable Multiple Description Video Coding for Stereoscopic 3D
Even frames
0
2
4
6 Stream
Odd frames
0
1
3
5
7 Stream
Fig. 3. Contents of stream 1 and 2 at the frame level
Fig. 4 shows the general block diagram of the proposed scalable MDC encoder and decoder for stereoscopic 3D video. The depth data, which will be combined with the texture data using a depth image base rendering technique (DIBR) [7] to produce left and right views, is placed in the enhancement layer. This type of stereoscopic video coding configuration has better coding efficiency than other configurations of stereoscopic video, such as left and right coding and interlaced coding [28]. Enhancemen t Layer Stream 1 Depth Base Laye r
H.264/ SVC-MDC Encoder Stream 2
Color
747
frames are coded in the enhancement layer (layer 3). Table I shows the description of the layers that can be produced by the scalable MDC for a CIF size image sequence with only one spatial resolution. An example with spatial scalability is given in Section V-B. A single standard compliant bit stream is produced from the above configuration. At the decoder, a bit stream extractor is used to extract the even and odd bit streams of the texture and the depth. Each bit stream can then be decoded by the standard SVC decoder. Finally, the decoded even and odd frames are merged together to produce a full resolution decoded sequence. TABLE 1 DESCRIPTION OF LAYERS FOR THE SCALABLE MDC ENCODER
Layer 0 1 2 3
Resolution 352x288 352x288 352x288 352x288
Description Base layer, even – Texture Base layer, odd – Texture Enhancement layer, even - Depth Enhancement layer, odd - Depth
For both texture and depth, if both the even and odd streams are received, the decoder can reconstruct the coded sequence at full temporal resolution. If only one stream is received, the decoder can still decode the received stream at half the original temporal resolution. Since the even frames are predicted from previous even frames (independent from odd frames), there will be no mismatch if one of the streams are lost at the decoder. Additionally, in the case where one stream is received, the decoder can decode at full resolution by interpolating between the received frames or by repeating the frame. Even frames: 0
2
4
6…
Depth
Stream 1
Errors propagate H.264/ SVC-MDC Decoder
DIBR
Stream 2 Color
Odd frames: 0
1
3
5
Displayed : 0 frames
1 2 3
5
3D Stereoscopic video
7…
7
9
11
Reduced frame rate Fig. 5. Example of error in one of MDC stream
Fig. 4. Proposed scalable MDC
SVC can produce scalable layers that can be exploited for MDC. It is possible to switch off the interlayer prediction in SVC [15]. This allows the creation of a multiple description scenario among the SVC layers. Hence, for scalable MDC simulation, the even and odd frames are separated before the encoding process. Next, the even texture frames are coded in the base layer (layer 0) and the odd texture frames are coded in the enhancement layer (layer 1). With the interlayer prediction switched off, it can be assumed that layer 1 is also a base layer for scalable MDC. The even depth frames are coded in the enhancement layer (layer 2) and the odd depth
If an error occurs, for example, in one of the frames in the even MDC stream the error will propagate to the next even frame as shown in Fig. 5. The MDC decoder can then switch to the odd stream and display the frame at a reduced frame rate. Alternatively, the MDC decoder can interpolate between the reduced frame rates or repeat the frame to achieve full temporal resolution.
Authorized licensed use limited to: University of Surrey. Downloaded on November 6, 2008 at 07:15 from IEEE Xplore. Restrictions apply.
748
IEEE Transactions on Consumer Electronics, Vol. 54, No. 2, MAY 2008
In this paper, frame interpolation is performed using: I ip (i, j )
I prev (i, j ) I fut (i, j )
2 (1) where Iip(i,j) is the frame to be interpolated at location (i,j), Iprev(i,j) is the previous frame and Ifut(i,j) is the future frame. The performance of this proposed method is investigated in Section V as SVC-MDC-Org.
IV. SCALABLE MULTIPLE DESCRIPTION CODING WITH REDUCED RESOLUTION DEPTH COMPRESSION This section investigates the compression of depth information, but at a reduced resolution for low bit rate applications. The depth video frames are down-sampled prior to encoding and up-sampled after decoding. The application of reducing the resolution for image compression has been investigated in [29] with the aim of improving subjective and objective performance at low bit rates. The application of reduced resolution for depth image sequences is investigated in [30]. Reduced resolution for 2D video is also used for example in mobile applications with the aim of reducing bandwidth and providing portability. A simple way to down-sample an image by a factor of two is by using sub-sampling. If f(i,j) is the pixel value of an image at location (i,j), then the down-sampled image is, fd(i/2,j/2) = f(i,j) For i=0,2,4,…XSIZE, j=0,2,4,…YSIZE
(2)
XSIZE is the vertical size and YSIZE is the horizontal size of the image to be down-sampled. It should be noted that three neighboring pixels to f(i,j), which are f(i,j+1), f(i+1,j) and f(i+1,j+1), can also be used. In this paper, the pixel f(i,j) plus the three neighborhood pixel values are averaged as below to produce an image down-sampled by a factor of two f (i, j ) f (i, j 1) f (i 1, j ) f (i 1, j 1) j (3) fd ( 2i , 2 ) 4 For i=0,2,4…XSIZE-1, j=0,2,4…YSIZE-1 Equation (3) is repeated for every frame in the 720x576 depth image sequence resulting in a 360x288 depth image sequence. The latter sequence has to be cropped to 352x288 (CIF resolution) to make it suitable for SVC encoder operation. During the up-sampling from 352x288 to 720x576, the depth video quality is slightly affected due to pixel copying at the edge to match the cropped columns, thus slightly reducing the PSNR of the up-sampled depth information. However, the overall bit rate is reduced due to down-sampling of the depth information. The DSUS algorithm can be applied to the enhancement layer of SVC which is used to code the depth information. The block diagram of the proposed method for the encoder is shown in Fig. 6. At the decoder, the coded depth information
from the enhancement layer is up-sampled back to its original resolution. The depth information is down-sampled from 720x576 resolution to CIF (352x288) resolution. SVC spatial scalability does not allow an enhancement layer to be used to send video at lower resolution than the base layer. Enhancement
Down- Layer sample Dept h
Base Layer
SV Encode
Textur eFig. 6. Block diagram of H.264/SVC encoder with DSUS
Therefore, a three layer configuration is used in the simulation, as shown in Table II. For both SVC without DSUS (SVC-Org) and SVC with DSUS (SVC-DSUS), the base layer is used to send the texture information at CIF resolution. Note that for SVC-DSUS, the enhancement layer 1 is used the send the depth at reduced resolution. The enhancement layer 2 for both SVC-Org and SVC-DSUS is used to send the texture information at full resolution. TABLE II CONFIGURATION USED FOR SVC-ORG AND SVC-DSUS
Encoder
Layer
Spatial Resolution 2 (e) 720x576 SVC-Org 1 (e) 720x576 0 (b) 352x288 2 (e) 720x576 SVC1 (e) 352x288 DSUS 0 (b) 352x288 e: enhancement layer, b: base layer
Type Texture Depth Texture Texture Depth Texture
The DSUS technique can also be applied to scalable MDC to improve scalable MDC performance. Hence, H.264/SVCMDC without DSUS is known as SVC-MDC-Org and H.264/SVC-MDC with DSUS is known as SVC-MDC-DSUS. A six layer configuration, as shown in Table III, is used in the simulation for SVC-MDC-Org and SVC-MDC-DSUS. The SVC-Org, SVC-DSUS, SVC-MDC-Org and SVC-MDCDSUS performances are investigated in Section V.
Authorized licensed use limited to: University of Surrey. Downloaded on November 6, 2008 at 07:15 from IEEE Xplore. Restrictions apply.
H. A. Karim et al.: Scalable Multiple Description Video Coding for Stereoscopic 3D TABLE III CONFIGURATION OF LAYERS FOR SVC-MDC-ORG AND SVC-MDC-DSUS
Encoder
Layer Size 5 (e) 720x576 4 (e) 720x576 SVC3 (e) 720x576 MDC2 (e) 720x576 Org 1 (b) 352x288 0 (b) 352x288 5 (e) 720x576 4 (e) 720x576 SVC3 (e) 352x288 MDC2 (e) 352x288 DSUS 1 (b) 352x288 0 (b) 352x288 e: enhancement layer, b: base layer
Type Texture Texture Depth Depth Texture Texture Texture Texture Depth Depth Texture Texture
Stream Odd Even Odd Even Odd Even Odd Even Odd Even Odd Even
749
MDC-DSUS performs better than SVC-MDC-Org up to 1800 kbit/s, and has similar compression efficiency to SVC-Org at bit rates less than 400 kbit/s. This is because, for a given QP, SVC-MDC-DSUS has a lower overall bit rate than SVCMDC-Org due to the used of down-sampling for the depth information. Similar performance is obtained for the Interview sequence in Fig. 8. The coding efficiency of SVC-DSUS is improved compared to SVC-Org for the Orbi image sequence for bit rates up to 1200 kbit/s. SVC-MDC-DSUS performs better than SVC-MDC-Org until 1700 kbit/s, and has similar compression efficiency to SVC-Org at bit rates less than 800 kbit/s, due to the used of depth down-sampling.
36
Average PSNR (dB)
SVC−DSUS SVC−MDC−Org 32
SVC−MDC−DSUS
30
28
26
24
200
400
600
800
1000 Bit rate
1200
1400
1600
1800
Fig. 7. Rate distortion for the Orbi image sequence 38 36 SVC−Org
34 Average PSNR (dB)
A. Error free The rate distortion performances of SVC-Org, SVC-DSUS, SVC-MDC-Org and SVC-MDC-DSUS are compared in an error free environment. SVC-Org is the original JSVM software [31]. The distortion is measured using the PSNR of the left and right views. An original left-and-right view sequence is produced, from the original 2D texture and its associated original depth information, using the DIBR technique described in Section 2. Similarly, a compressed leftand-right view sequence is obtained from the 2D and depth data reconstructed at the decoder. The left and right view PSNR is obtained by comparing the original left-and-right view data with the left-and-right view data output from the decoder. The Orbi and Interview sequences, with resolution 720x576 and 125 frame numbers are used in the error free comparison. Fixed quantization parameters are employed to obtain the bit rates. The bit rates shown in the figures are in kbit/s. I-frames are inserted every 45 frames and only P frames are used between the I-frames. The left-and-right view PSNRs for SVC-Org, SVC-DSUS, SVC-MDC-Org and SVCMDC-DSUS are plotted over the range of bit rates shown in Fig. 7 and Fig. 8. The configuration shown in Table 2 is used for SVC-Org and SVC-DSUS, and the configuration shown in Table 3 is used for SVC-MDC-Org and SVC-MDC-DSUS. It can be seen in Fig. 7 and Fig. 8 that SVC-MDC-Org is less efficient than SVC-Org in terms of compression efficiency. SVC-MDC-Org PSNR is about 1-2 dB lower than SVC-Org for the same bit rate. This is due to the larger temporal difference between the frames used for prediction in the MDC scheme. The larger prediction usually incurs a larger residual, and requires that larger motion vectors be coded. It can also be seen from Fig. 7 that the compression efficiency of SVC-DSUS is better than SVC-Org for the Orbi video sequence, when bit rates are less than 1300 kbit/s. SVC-
SVC−Org
34
V. SIMULATION RESULTS AND DISCUSSION
SVC−DSUS SVC−MDC−Org
32
SVC−MDC−DSUS
30 28 26 24 22 200
400
600
800
1000 1200 1400 1600 1800 2000 2200 Bit rate
Fig. 8. Rate distortion for the Interview image sequence
B. Scalable performance Table IV shows the configuration for the scalable MDC scheme as an example to achieve spatial scalability. All the layers are contained in a single bit stream. Layer 0 and Layer 1 are the MDC layers (base layer). The user can select the required bit stream, according to their terminal requirement, using an extractor application. For example, two video resolutions, QCIF and CIF can be decoded from the bit stream. If a stereoscopic 3D terminal is available, the user can
Authorized licensed use limited to: University of Surrey. Downloaded on November 6, 2008 at 07:15 from IEEE Xplore. Restrictions apply.
750
IEEE Transactions on Consumer Electronics, Vol. 54, No. 2, MAY 2008
also decode the depth layer and used the DIBR technique to achieve stereoscopic 3D video.
36
TABLE IV DESCRIPTION OF LAYERS
Resolution 176x144 176x144 176x144 176x144 352x288 352x288 352x288 352x288
SVC−Org
Description Base layer, even – Texture Base layer, odd – Texture Enhancement layer, even - Depth Enhancement layer, odd - Depth Enhancement layer, even - Texture Enhancement layer, odd - Texture Enhancement layer, even - Depth Enhancement layer, odd – Depth
Average PSNR (dB)
Layer 0 1 2 3 4 5 6 7
34 SVC−DSUS 32
SVC−MDC−Org SVC−MDC−DSUS
30
28
26
1000
2000
3000 Bit rate
4000
5000
6000
Fig. 9. Rate distortion for Orbi at 3% packet loss
32 31
Average PSNR (dB)
30 29
SVC−Org SVC−DSUS
28
SVC−MDC−Org 27
SVC−MDC−DSUS
26 25 24 23
500
1000
1500 Bit rate
2000
2500
3000
Fig. 10. Rate distortion for Orbi at 5% packet loss
28 27 26
Average PSNR (dB)
C. Error prone performance In this section, the performance of SVC-Org, SVC-DSUS, SVC-MDC-Org and SVC-MDC-DSUS are evaluated in an error prone environment. The compressed 3D video is transmitted over a simulated internet channel [32]. The internet channel simulator in [32] has four internet packet loss error patterns, namely 3%, 5%, 10% and 20%. In the simulation, the loss of one packet is assumed to mean the loss of one video frame. Frame copy error concealment is used for SVC-Org and SVC-DSUS. SVC-MDC-Org and SVCMDC-DSUS use frame interpolation in case of error. The Orbi and Interview image sequences are used in the simulation. An I-frame is inserted every 45 frames in SDC and every 45 frames in each MDC stream. Fig. 9-Fig. 11 show the rate distortion performance of the Orbi sequence for packet losses of 3%, 5% and 10%respectively. Fig. 12-Fig. 14 show the rate distortion performance of the Interview sequence for packet losses of 3%, 5% and 10% respectively. In these figures, the decoded left-and-right view PSNR for SVC-Org, SVC-DSUS, SVCMDC-Org and SVC-MDC-DSUS is plotted. From Fig. 9 to Fig. 11, it can be seen that most of the time, SVC with MDC has better performance compared to standard SVC, particularly for packet losses of 5% and above. As an example, for the Orbi sequence, in high bit rate range, SVC-MDC-DSUS has better performance than the other schemes at all packet loss rates. This is due to the properties of the MDC scheme, which is able to effectively mitigate the effects of errors. SVC-MDC-DSUS also has better error free coding efficiency compared to SVC-MDC-Org in the low bit rate range. The worst performance is given by SVC-Org at high and low bit rates for the Orbi sequence. Sometimes, SVC-DSUS performs better than others with 3% packet loss in the low bit rate range, but performs comparatively to SVCMDC-DSUS.
25
SVC−Org SVC−DSUS
24
SVC−MDC−Org 23
SVC−MDC−DSUS
22 21 20 19 500
1000
1500 Bit rate
2000
2500
Fig. 11. Rate distortion for Orbi at 10% packet loss
Authorized licensed use limited to: University of Surrey. Downloaded on November 6, 2008 at 07:15 from IEEE Xplore. Restrictions apply.
H. A. Karim et al.: Scalable Multiple Description Video Coding for Stereoscopic 3D
VI. CONCLUSION
38 36
Average PSNR (dB)
34 SVC−Org
32
SVC−DSUS SVC−MDC−Org
30
SVC−MDC−DSUS 28 26
24 22
0
500
1000
1500
2000 Bit rate
2500
3000
3500
4000
Fig. 12. Rate distortion for Interview at 3% packet loss 34
32
Average PSNR (dB)
751
30 SVC−Org SVC−DSUS 28
SVC−MDC−Org SVC−MDC−DSUS
26
A scalable MDC scheme for stereoscopic 3D video based on even and odd frames is proposed in this paper. The proposed algorithm generates two descriptions for the base layer of SVC based on even and odd frame separation, which reduces its coding efficiency. The proposed scheme also can achieve scalability through the layered coding of SVC. To improve rate distortion, the DSUS algorithm is applied to the enhancement layer of SVC, which is used to code the depth information. The application of DSUS results in an improvement in the rate distortion performance of SVC and scalable MDC, particularly in the low bit rate range. This is due to the reduction in depth bit rates. Although there is a risk of up-sampling distortion, simulation results show that the up-sampling distortion only reduces the coding efficiency in the high bit rate range. Finally, this paper presents the performance of SVCOrg, SVC-DSUS, SVC-MDC-Org and SVC-MDC-DSUS for stereoscopic 3D video in an error prone Internet environment. Simulation results show that most of the time, SVC-MDC algorithms (original or DSUS) perform better in error prone environments than SVC-DSUS and SVC-Org, particularly in the presence of high channel error rates. ACKNOWLEDGMENT
24
22 200
400
600
800 1000 Bit rate
1200
1400
1600
The work presented was developed within VISNET II, a European Network of Excellence (http://www.visnet-noe.org), funded under the European Commission IST FP6 program.
Fig. 13. Rate distortion for Interview at 5% packet loss
REFERENCES [1] 32
Average PSNR (dB)
30
[2] [3]
28
[4] [5]
SVC−Org 26
SVC−DSUS SVC−MDC−Org SVC−MDC−DSUS
24
[6] 22
20 200
400
600
800
1000 1200 Bit rate
1400
1600
1800
2000
Fig. 14. Rate distortion for Interview at 10% packet loss
For the Interview sequence, SVC-MDC-DSUS performs better than others. SVC-DSUS performs worst at high bit rate range probably due to the up-sampling distortion in the depth image and its inability to recover quickly from errors. In the low bit rate range, SVC-MDC-DSUS performs better and comparable to SVC-DSUS most of the time. Similar to the Orbi sequence results, the worst performance is given by SVC-Org most of the time in the low bit rate ranges.
[7] [8] [9]
[10]
[11]
[12] [13]
A. Redert, M. O. d. Beeck, C. Fehn, W. Ijsselsteijn, M. Pollefeys, L. V. Gool, E. Ofek, I. Sexton and P. Surman, “ATTEST: Advanced threedimensional television system technologies,” in proc. of the 1st Int. Symp. on 3D Data Processing and Transmission (3DPVT’02), Padova, Italy, June 2002. http://www.3dtv-research.org/ A. Sullivan, “3-deep,” IEEE Spectrum, vol. 42, no. 4, pp. 22-27, April 2005. https://www.ddd.com/ Y. Wang, A. R. Reibman and S. Lin, “Multiple description coding for video communications,” IEEE Proc., vol. 93, no. 1, pp. 57-70, January 2005. J. G. Apostolopoulos, “Error-resilient video compression via multiple state streams,” in Proc. of Int. Workshop on Very Low Bit rate Video Coding, VLBV99, Kyoto, Japan, October 1999. C. Fehn, “A 3d-tv approach using depth-image-based rendering (dibr),” in Proc. of VIIP2003, Spain, September 2003. http://www.3dtv.at/ C. Fehn, “Depth-image-based rendering (dibr), compression and transmission for a new approach on 3d-tv,” in Proc. of Stereoscopic Displays and Application, CA, USA, January 2004. C. Fehn, K. Schuur, P. Kauff, and A. Smolic, “Coding results for EE4 in MPEG 3DAV,” ISO/IEC JTC1/SC29/WG11 MPEG02/M9561, Pattaya, March 2003. A. Bourge, J. Gobert and F. Bruls, “MPEG-C part 3: Enabling the introduction of video plus depth contents,” in Proc. of IEEE Workshop on Content Generation and Coding for 3D-television, Eindhoven, The Netherlands, June 2006. A. Bourge and C. Fehn, “Auxiliary video data representation,” ISO/IEC JTC 1/SC 29/WG 11/N8038, Montreux, Switzerland, April 2006. http://www.business-sites.philips.com/3dsolutions/about/Index.html
Authorized licensed use limited to: University of Surrey. Downloaded on November 6, 2008 at 07:15 from IEEE Xplore. Restrictions apply.
752
IEEE Transactions on Consumer Electronics, Vol. 54, No. 2, MAY 2008
[14] C. Fehn, K. Hopf and B. Quante, “Key technologies for and advanced 3D-TV system,” in Proc. of SPIE Three-Dimensional TV, Video and Display III, Philadephia, PA, USA, Volume 5599, pp. 66-80, October 2004. [15] H. Schwarz, D. Marpe and T. Wiegand, “Overview of the scalable H.264/MPEG4-AVC extension,” in Proc. of Int Conf. on Image Processing (ICIP 2006), Atlanta, GA, USA, Oct. 8-11 2006. [16] M. v. d. Schaar and D. S. Turaga, “Multiple description scalable coding using wavelet-based motion compensated temporal filtering,” in Proc. of Int. Conf. in Image Processing (ICIP 2003), pp. III-489-III-492, 14-17 Sept. 2003. [17] N. Franchi, M. Fumagalli, R. Lancini and S. Tubaro, “Multiple description video coding for scalable and robust transmission over ip,” IEEE Trans. Circ. and Sys. for Video Tech., vol. 15, no. 3, pp. 321-334, March 2005. [18] P. A. Chou, H. J. Wang and V. N. Padmanabhan, “Layered multiple description coding,” in Proc. Packet Video Workshop, April 2003. [19] H. Wang and A. Ortega, “Robust video communication by combining scalability and multiple description coding techniques,” in Proc. of the SPIE, Vol. 5022, pp. 111-124, 2003. [20] L. P. Kondi, “A rate-distortion optimal hybrid scalable/multipledescription video codec,” IEEE Trans. Circ. and Sys. for Video Tech., vol. 15, no. 7, pp. 921-927, July 2005. [21] E. Akyol, A. M. Tekalp and M. R. Civanlar, “Scalable multiple description video coding with flexible number of descriptions,” in proc. of Int. Conf. in Image Processing (ICIP 2005), pp. III-712-III-715, 11-14 Sept. 2005. [22] C-Ming Chen, Y-C. Chen and C-Min Chen, “Multiple description motion compensation video coding for MPEG-4 FGS over lossy packet networks,” in proc. of IEEE Int. Conf. on Mult. and Expo (ICME 2004), 2004. [23] M. Yu, X. Yu, G. Jiang, R. Wang, F. Xiao and Y-d. Kim, “New multiple description layered coding method for video communication,” in proc. of Int. Conf. on Parallel and Distributed Computing, Applications and Tech. (PDCAT05), 2005. [24] H. Mansour, P. Nasiopoulus and V. Leung, “An efficient multiple description coding scheme for the scalable extension of H.264/AVC (SVC),” in proc. of IEEE Int Symposium on Signal Processing and Information Tech., 2006. [25] C. Tillier, T. Petrisor and P-Popescu, “A motion-compensated over complete temporal decomposition for multiple description scalable video coding,” EURASIP Journal on Image and Video Processing, vol. 2007, Article ID 31319. [26] F. Verdicchio, A. Munteanu, A. L. Gavrilescu, J. Cornelis and P. Schelkens, “Embedded multiple description coding of video,” in proc. of IEEE Int Conf. on Image Processing (ICIP 2006), Vol. 15, No. 10, pp. 3114 – 3130, Oct. 2006. [27] A. Norkin, A. Aksay, C. Bilen, G. Bozdagi Akar, A. Gotchev and J. Astola, “Schemes for multiple description coding of stereoscopic video,” in Proc. LNCS, Mult. Content, Representation and Security, vol. 4105, pp. 730-737, Istanbul, Turkey, Sept. 2006. [28] C.T.E.R. Hewage, H.A. Karim, S. Worrall, S.Dogan, A.M. Kondoz, “Comparison of Stereo Video Coding Support in MPEG-4 MAC, H.264/AVC and H.264/SVC,” in Proc. of VIE2007, London, July 2007 [29] A. M. Bruckstein, M. Elad, and R. Kimmel, “Down-scaling for better transform compression,” IEEE Trans. Image Processing, vol. 12, no. 12, pp. 1132-1144, September 2003. [30] H. A. Karim, A. H. Sadka, A. M. Kondoz, “Reduced resolution depth compression for 3D video over wireless networks,” IEE 2nd Int Workshop on Signal Processing for Wireless Comm. (SPWC2004), London, UK, June 2004, pp. 96-99
[31] JSVM 8.7 reference software from CVS server, garcon.ient.rwthaachen.de/cvs/jvt [32] S. Wenger, “Error patterns for Internet experiments,” ITU Telecommunications Standardization Sector, Doc. Q15-I-16r1, Oct. 1999.
H. A. Karim received B.Eng. degree in Electronics with Communications from University of Wales Swansea, UK, in 1998. He was an intern at Bell Labs, Lucent Technologies, USA, from October 1999 to March 2000. He received his M.Eng degree from Multimedia University, Malaysia in 2003. He worked as a Tutor and then as a Lecturer at the university from 1998 to 2003. He is currently on study leave for his Ph.D. degree at ILAB, Center for Communication Systems Research (CCSR), Faculty of Electronics and Physical Sciences, University of Surrey, UK. His research interests include telemetry, 2D/3D image/video coding and transmission, error resilience and multiple description video coding. Chaminda T.E.R. Hewage received the B.Sc. (Hons.) degree in Electrical and Information Engineering from the University of Ruhuna, Galle, Sri Lanka. During 20042005, he worked as a Telecommunication Engineer in the field of Data Communication. Currently, he studies for his Ph.D. at University of Surrey, Guildford, UK. His research aims at providing QoS support for 3D video communication application. He works on VISNET II Network of Excellence (NoE) of IST FP6 program in the area of robust immersive media communications. S. Worrall received MEng Electronic Engineering degree from the University of Nottingham in 1998. He joined the Centre for Communication Systems Research (CCSR), at the University of Surrey, in 1998 as a research student, and obtained his PhD in 2001. From 2001-2003, he continued to work within CCSR as a research fellow, and was appointed as a lecturer in multimedia communications in 2003. His research interests include error robust video coding, multiple description coding, video compression, transmission of video over wireless networks, and multi-view video coding A.M. Kondoz received the B.Sc. (Hons.) degree in engineering, the M.Sc. degree in telematics, and the Ph.D. in communication in 1983, 1984, and 1986, respectively. He became a Lecturer in 1988, a Reader in 1995, and then in 1996, a Professor in Multimedia Communication Systems and deputy director of Centre for Communication Systems Research (CCSR), University of Surrey, Guildford, U.K. He has over 250 publications, including a book on low bit-rate speech coding and several book chapters. He has graduated more than 40 Ph.D. students in the areas of speech/image and signal processing and wireless multimedia communications, and has been a consultant for major wireless media terminal developers and manufacturers. Prof Kondoz has been awarded several prizes, the most significant of which are The Royal Television Societies’ Communications Innovation Award and The IEE Benefactors Premium Award. He has been on the Refereeing College for EPSRC and on the Canadian Research Councils. He is a member of the IEEE and the IET.B.Eng.
Authorized licensed use limited to: University of Surrey. Downloaded on November 6, 2008 at 07:15 from IEEE Xplore. Restrictions apply.