Error Resilient Video Coding Techniques Using Spare Pictures

2 downloads 0 Views 422KB Size Report
The mechanism for signaling the spare pictures was adopted to H.263 version 3 and ITU-T H.264 | MPEG-4. AVC as Supplemental Enhancement Information [1], ...
ERROR RESILIENT VIDEO CODING TECHNIQUES USING SPARE PICTURES Dong Tian1, Miska M. Hannuksela2, Ye-Kui Wang3 and Moncef Gabbouj4 1

[email protected], [email protected], [email protected], [email protected] 1,3 Tampere International Center for Signal Processing (TICSP), Tampere, Finland 2 Nokia Mobile Software, Tampere, Finland 4 Tampere University of Technology, Finland ABSTRACT

In error prone environments not all the picture data can be received correctly during video transmission. Error propagation introduced by employing predictive coding may degrade the sequence quality severely. A novel method, called spare pictures, is proposed to indicate the similarity between a reference picture and other pictures. With the help of the signaled spare pictures information, receivers may avoid unnecessary picture freezing, feedback and complex error concealment, which are normally done as a response to missing picture data. Furthermore, spare pictures may also help 1) to improve the quality of some decoded pictures by replacing the reference pictures by better ones, and 2) to improve error concealment by concealing a lost block as a corresponding block in one of the spare pictures. An efficient coding method for the spare macroblock maps is also proposed. The mechanism for signaling the spare pictures was adopted to H.263 version 3 and ITU-T H.264 | MPEG-4 AVC as Supplemental Enhancement Information [1], [2].

two or more correlated bitstreams so that a high-quality reconstruction can be obtained from all the bitstreams together, while a lower, but still acceptable, quality reconstruction is guaranteed if only one bitstream is received [3]. Video redundancy coding (VRC) generates several independent bitstreams by using independent prediction loops [4]. For example, an even frame is predicted from the previous even frame, and an odd frame from the previous odd frame. Reference picture selection (RPS) [5], etc. can be utilized if a feedback channel is available. Systematic description of error resilience video coding can be found in [6]. In this paper, we propose an alternative method, spare pictures, to mitigate the effect of error propagation introduced by employing predictive coding. Section 2 presents the idea of spare pictures and how to signal the spare pictures information. Section 3 discusses more details in the processing of the codec. Section 4 gives the simulation results and section 5 concludes the paper. 2. SPARE PICTURES 2.1. The principle of spare pictures

1. INTRODUCTION In video transmission via error prone channels, errors are likely to be found in the received picture data. When a picture is lost or corrupted so severely that the concealment result is not acceptable, the receiver typically pauses video playback and waits for the next INTRA picture to restart decoding and playback. If possible, the receiver also requests the transmitter for an INTRA picture update. In some applications, e.g., in multicast video streaming, the transmitter cannot react to INTRA update requests, but rather the transmitter encodes an INTRA picture relatively frequently, such as every few seconds, to enable new clients to join the multicast session and to enable recovery from transmission errors. Consequently, receivers may have to pause video playback for a relatively long time after a lost picture, and users typically find this behavior annoying. There are numerous ways to decrease the probability of such transmission errors that would force the decoder to pause playback. Multiple description coding will produce

The idea behind spare pictures is based on the fact that sometimes another picture (or a part of another picture) resembles the actual motion compensation reference picture (or part of the picture) so well that it could be used as a reference instead of the actual one when the actual one is lost. The actual reference picture is called target picture herein. In other words, the quality degradation caused by the alternative reference picture is so small that the quality is still considered acceptable. Thus, it would be beneficial to signal which coded pictures are similar entirely or partly. For example, we coded Hall at 15 pictures per second with constant quantization parameter of 10 with JM 2.0 [7]. Figure 1 illustrates the beginning of the coded example sequence. Figure 3 shows the second P frame (P4) of the sequence, and the image on the left is errorfree. Then, we discarded the bitstream of the first P frame of the sequence (P2) and decoded the sequence. In other words, the image on the right was decoded using the frame preceding the deleted frame as a reference as illustrated in Figure 2. Practically, you cannot see the

I0

P2

P4

Figure 1. Example of coding pattern: P2 is predicted from I0 and P4 is predicted from P2

I0

P2

P4

Figure 2. Usage of a spare reference picture: Use I0 instead of P2 to recover P4 when P2 is lost

Decoded P4 using the original reference Decoded P4 using the spare picture Figure 3. Example of spare reference pictures

Spare macroblock map Frame 74 in Hall Figure 4. Example of spare pictures map difference, and therefore the encoder could have signaled that I0 can be used as a spare picture for P2. Figure 4 shows a more complicated example where frame 74 in Hall can be used as a spare picture of frame 75 except for a scattered area. In this case, a bi-level map is used to represent the spare macroblock information. In Figure 4, the picture on the left is the spare macroblock map between frames 74 and 75. The similar areas which can be used as spare macroblocks are shown in black. The spare macroblock map indicates to the decoder which parts of frame 74 can be used to replace corresponding parts of frame 75, in case the latter is a reference frame that is lost or corrupted. 2.2. Signaling of spare pictures We propose signaling of entire and partial spare pictures with the Supplemental Enhancement Information (SEI) mechanism in JVT. If the actual reference block is lost or seriously damaged, decoders may use an entire picture or partial spare pictures instead.

Frame 75 in Hall Typically the spare picture SEI message contains the resemblance between a certain target picture and couple of other pictures. When delivered over packet network, the spare picture SEI message shall not be packetized together with the target picture in order to avoid the spare picture SEI message to be lost together with the target picture. In other words, spare picture SEI messages should be conveyed independently or together with the slices or data partitions belonging to pictures other than the target picture. The indication of entire picture being spare picture is straightforward. If only parts of the picture resemble the target picture, the bi-level map of such areas need be signaled. To code the spare macroblock maps, note the fact that the neighboring spare macroblock maps resemble each other. Therefore we encode the first spare macroblock map directly, which is between the target picture and the first spare picture. For the other spare macroblock maps, the current spare macroblock map is first processed by applying an exclusive or operation on the previous spare macroblock map. Last, the macroblock

map will be arranged into 1-demensional signal so that the number of consecutive spare macroblocks can be encoded using exponential-Golomb codes. Thus the coded macroblock map need be scanned in a certain scan order. The counter-clockwise box-out proves to be an efficient scan order in our simulations. 3. PROCESSING IN THE CODEC To determine whether a candidate reconstructed macroblock can be a spare macroblock of the current reconstructed macroblock in a target picture during encoding, the average pixel difference in luminance is used as the criterion to evaluate the similarity between the target macroblock and the candidate macroblock. If the average pixel difference is less than a certain threshold, it will infer that the candidate macroblock can be a spare macroblock. The empirical average pixel difference threshold is 6. If the number of the macroblocks in a candidate picture that can be used as spare macroblock is large enough, the encoder will assume that the entire picture can serve as a spare picture to reduce the overhead in signaling spare picture information. The empirical percentage threshold is selected as 92%. During decoding, if the spare pictures information is received for a picture that is a reference picture and is partly or entirely lost during transmission, the lost parts of the picture should be replaced with correctly reconstructed spare reference macroblocks or picture, if available. 4. SIMULATIONS This section presents the simulations of using spare pictures information to improve quality of the displayed sequences in error prone environments. We also present the performance of encoding spare macroblock maps using different methods in terms of coding efficiency. 4.1. Evaluation method Delivery of video over Internet is likely to be facilitated by the real-time transport protocol (RTP). Some RTP packets may be lost during transmission. The lossy bitstream is fed to the decoder implemented based on JM 2.0 [7]. Three types of output sequences (displayed pictures) will be created according to different decoders. Decoder 1: Reconstruct the sequence normally without freezing or spare pictures. This case corresponds to a “careless”decoder that displays any decoded picture regardless whether the picture is corrupted or not. Decoder 2: Reconstruct the sequence normally with freezing only. If a picture is correctly reconstructed, output it normally. Otherwise, repeat the latest displayed picture until the next INTRA picture. This case

corresponds to a “cautious” decoder that freezes the output of pictures if there are any errors in the received data stream. Decoder 3: Reconstruct the sequence with help of spare pictures. If a picture is correctly reconstructed, output it normally. If a reference picture of the current picture is missing but the reference picture has a correctly reconstructed spare picture, decode and output the current picture using the alternative reference picture. Otherwise, repeat the latest displayed picture, that is, freeze the output of reconstructed pictures. To assess these different decoders, we shall compute the following quantities: • the number of displayed pictures (NumDispPics), • the number of correctly received pictures (NumCorRecPics), and • the number of purposely repeated pictures (NumPurRepPics) for each decoder, and • the number of rescued pictures using spare pictures (NumResPics) for the proposed technique. NumResPics indicates how many pictures would have been frozen in decoder 2, which were rescued using spare pictures in the proposed decoder. 4.2. Simulation method and conditions The simulations are performed based on JM 2.0, and the detailed simulation conditions are as shown below, which follow closely the common conditions in [8]. Sequences, Intra Coding Period and Error Concealment Simulations are performed using three QCIF sequences Hall at 64 kbps and 15 fps, Akiyo, and News at 144 kbps and 15 fps, with periodical INTRA coded frames inserted to stop error propagation. The INTRA refresh period is one second. Except for the first frame, if an INTRA coded frame contains losses, INTER (spatial-temporal) error concealment is used instead of INTRA (spatial) error concealment. It is considered that the shot change signaling proposed in [9] is used to choose an appropriate error concealment method for INTRA pictures. Bitrate Calculation As stated in the common conditions specified in [8], coding parameters such as quantization parameter are chosen to make the resulting bitrate as close as possible to but not larger than the channel bitrate, taking into account the 40 bytes of IP/UDP/RTP headers per packet. Packet Loss Simulation We assume that the packet containing the parameter set is conveyed reliably (possibly out-of-band during the session setup). At least one packet of the first frame should be

Frame #89

Frame #90 Decoder 1 Decoder 2 Decoder 3 Figure 5. Snapshots of Hall @ packet loss rate 5%, 16 frames before frame 90 are frozen by Decoder 2 and rescued by Decoder 3 operation is applied first as explained in section received to avoid decoder crash. To meet the requirement, 2.2. the first packet of the first frame is assumed to be always The ratio of the average data amount of the encoded received regardless of the corresponding error pattern. The maps to that of the original bi-level maps (one bit per simulations are run with packet loss rates being selected as macroblock), is given in terms of percentages for each 3% and 5%. sequence in Table 4. As can be seen from the results, the most efficient 4.3. Simulation results way is the differential encoding method combined with counter-clockwise box-out scan order. The simulation results are shown in Table 1 - 3. As can be seen from the results, spare pictures information can 1) prevent displaying some corrupted frames that 5. CONCLUSIONS would be displayed by a “careless” decoder 1, In this paper, we introduced an alternative method, spare and pictures, to improve the quality of the received sequences, 2) help the decoder to rescue some pictures with which can help the receiver to recover pictures referring to acceptable quality that would have been frozen a lost picture, and prevent unnecessary picture freezing, by a “cautious”decoder 2. feedback and complex error concealment. An efficient Both “careless” decoder 1 and “cautious” decoder 2 coding method for the spare macroblock map is also will cause annoying effects: unacceptable picture quality proposed. Simulation results show that the solution is or slow frame rate (see Figure 5: Snapshots of Hall), efficient in combating packet losses in video transmission respectively. Furthermore, from the results of Akiyo, we over Internet. The mechanism for signaling the spare can see that no frames were frozen in decoder 3. By pictures was adopted to H.263 version 3 and ITU-T H.264 observing the output sequences, no rescued frames can be | MPEG-4 AVC as Supplemental Enhancement identified by human eyes. In sequences which have more Information. motion, such as in News, fewer frames can be rescued. 4.4. Simulations for encoding spare macroblock maps We encode two spare macroblock maps for all the frames. Direct encoding and differential encoding are compared under different scan modes: 1) Direct encode using raster/counter-clockwise box-out scan order, namely DirRaster/DirBox, that is, no exclusive-or operation between spare macroblock maps is applied, and, 2) Differential encode using raster/counterclockwise box-out scan order, namely DiffRaster/DiffBox, that is, when encoding the second spare macroblock map, an exclusive or

6. REFERENCES [1] Thomas Wiegand, “Editor's Proposed Draft Text Modifications for Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC), Geneva modifications draft 37”, document JVT-E146d37, JVT of ISO/IEC MPEG & ITU-T VCEG, October 2002 [2] Dong Tian, Ye-Kui Wang and Miska M. Hannuksela, “Spare Pictures”, document JVT-D100, JVT of ISO/IEC MPEG & ITU-T VCEG, July 2002 [3] Vivek K Goyal, “Multiple Description Coding: Compression Meets the Network”, IEEE Signal Processing Magazine, Volume: 18 Issue: 5 , Sept. 2001

[4] Stephan Wenger, “Video Redundancy Coding in H.263+”, Workshop on Audio-Visual Services for packet networks (aka Packet Video Workshop), 1997 [5] Yi J. Liang, Markus Flierl and Bernd Girod, “Low-Latency Video Transmission over Lossy Packet Networks Using RateDistortion Optimized Reference Picture Selection”, IEEE ICIP 2002 [6] Yao Wang, Stephan Wenger, Jiangtao Wen and Aggelos K. Katsaggelos, “Error Resilient Video Coding Techniques”, IEEE Signal Processing Magazine, July 2000

[7] JVT, “JVT reference software 2.0”, available ftp://ftp.imtc-files.org/jvt-experts/reference_software

at

[8] Stephan Wenger, “Common Conditions for wire-line, low delay IP/UDP/RTP packet loss resilient testing”, document VCEG-N79r1, ITU-T Q.6/SG16 VCEG [9] Ye-Kui Wang and Miska M. Hannuksela, “Signaling of Shot Changes”, document JVT-D099, JVT of ISO/IEC MPEG & ITU-T VCEG, July 2002

Table 1. Hall@64kbps + 15fps, result bitrate=62.6kbps, QP=16, totally 150 coded frames 3% packet loss rate, NumDispPics = 150, NumCorRecPics = 97 Number of Frames Decoder 1 Decoder 2 Decoder 3 NumPurRepPics 0 53 0 NumResPics NA NA 53 5% packet loss rate, NumDispPics = 150, NumCorRecPics = 77 Number of Frames Decoder 1 Decoder 2 Decoder 3 NumPurRepPics 0 73 24 NumResPics NA NA 49 Table 2. Akiyo@144kbps + 15fps, result bitrate=140.31kbps, QP=5, totally 150 coded frames 3% packet loss rate, NumDispPics = 150, NumCorRecPics = 98 Number of Frames Decoder 1 Decoder 2 Decoder 3 NumPurRepPics 0 52 0 NumResPics NA NA 52 5% packet loss rate, NumDispPics = 150, NumCorRecPics = 33 Number of Frames Decoder 1 Decoder 2 Decoder 3 NumPurRepPics 0 117 56 NumResPics NA NA 61 Table 3. News@144kbps + 15fps, result bitrate=132.66kbps, QP=12, totally 150 coded frames 3% packet loss rate, NumDispPics = 150, NumCorRecPics = 103 Number of Frames Decoder 1 Decoder 2 Decoder 3 NumPurRepPics 0 47 34 NumResPics NA NA 13 5% packet loss rate, NumDispPics = 150, NumCorRecPics = 51 Number of Frames Decoder 1 Decoder 2 Decoder 3 NumPurRepPics 0 99 99 NumResPics NA NA 0 Table 4. Comparison between different methods for encoding spare macroblock maps DirRaster DirBox DiffRaster DiffBox Foreman@144kbps + 15fps, result bitrate = 142.96kbps, QP = 15, totally 200 coded frames 101 % 98 % 84 % 82 % Akiyo@144kbps + 15fps, result bitrate = 140.31kbps, QP = 5, totally 150 coded frames 31 % 24 % 27 % 22 % News@144kbps + 15fps, result bitrate = 132.66kbps, QP = 12, totally 150 coded frames 52 % 43 % 43 % 38 %