High Performance Video Codec with Error Concealment

0 downloads 0 Views 1MB Size Report
[4] X. Li, C. Kwan, G. Mei, and B. Li, “A Generic Approach to Object Matching and ... B. Chou, and L. M. Kwan, “A Comparative Study of Conventional and Deep ... [17] W.-Y. Kung, C.-S. Kim, and C.-C. Jay Kuo, “Spatial and Temporal Error ...
High Performance Video Codec with Error Concealment Chiman Kwan, Edward Shi, and Yool-Bin Um *

Applied Research LLC 9605 Medical Center Dr. Rockville, MD20850, USA [email protected]

Abstract: This paper summarizes the development of a high performance video codec with error concealment. The key idea is a systematic search for similar blocks in the neighborhood of the damaged blocks. The proposed idea has been integrated with the H264 reference codec known as JM and extensive experiments clearly demonstrated the performance of the new approach.

1. Introduction Videos have been widely used in many applications such as traffic monitoring, spacecraft operations, robotic applications, machine tools operations, and security surveillance [1][10]. Video compression is a well-developed area [11]-[14]. In recent years, videos have been widely used in mobile applications, security monitoring, and surveillance operations. For some of the above applications, compressed video data are sent in packets over a communication channel, which can be a wired or a wireless channel. Due to channel interference, bits can be corrupted during transmission. In data packets, a corrupted bit may cause the loss of the whole packet, which is extremely costly in video distribution because a lost packet can affect many subsequent frames in a video. In many non-real-time applications, receivers can ask senders to retransmit the lost packets again and again until it receives a correct packet. This is the case for TCP/IP protocol [15]. However, in real-time video applications such as video broadcasting, video conferencing, video chatting, video streaming, etc., packet retransmission will introduce excessive delay and also inefficient bandwidth usage. Existing video error concealment algorithms in [16]-[18] use neighboring macroblocks (MB) to conceal damaged MBs in I frames and motion vectors of neighboring MBs to conceal errors in P frames. These algorithms work well to a certain extent in low percentage packet loss scenarios. It is well known that in real-time broadcasting and streaming applications, videos are normally encoded in the so-called “baseline mode” where there are fewer I frames and more P frames in order to reduce latency and save bandwidth. As a result, error propagation effects become very serious in the later frames inside a group-of-pictures (GOP) and the resulting video quality can be very poor. Recently, there is some research for error concealment based on sparsity based search [16][19]-[22]. However, the speed is slow and hence not suitable for real-time applications. In this paper, we propose a new and high performance approach to video compression that is robust to packet losses. First, the damaged MBs’ locations are determined. This can be done by extracting information from the decoded headers. Second, a reference data cube is built by using one or two earlier frames. Some blocks of 1

the current frame can also be inserted into the reference cube. Third, each damaged MB is divided into four quadrants. Each quadrant will be repaired by searching similar blocks in the reference data cube. The close ones will be ranked. Fourth, error concealment is done by picking the closest matched block in the reference data cube to replace the damaged quadrant. This process repeats for all the damaged MBs. Finally, the repaired MBs are sent to decoded buffer for display and future predictions. The proposed scheme has been integrated with the JM reference and extensive experiments have demonstrated the performance of the proposed algorithm. This paper is organized as follows. Section 2 describes the technical details. Section 3 summarizes our experimental results. Finally, concluding remarks are given in Section 4.

2. Proposed approach 2.1 Architecture

Packet loss due to interference

Error concealment

Fig. 1. Proposed error concealment architecture to deal with packet loss. Since compressed bit stream is usually contained in data packets, channel interference may introduce bit errors in the packets. In video broadcasting and streaming, TCP/IP is not suitable, as it may introduce excessive delay. Instead, RTP [23] or RTSP [24] is normally used where no packet retransmission is done. In H264 standard, there are some built-in mechanisms to cope with packet losses. In the encoder, users can choose some error resilient methods such as flexible macroblock ordering (FMO) to make the bit stream more tolerant to packet losses. In the decoder, H264 uses neighborhood MB to conceal errors in I frames and neighboring motion vectors to deal with losses in P frames. Despite the above methods in H264, the video quality is still not acceptable when there is packet loss during transmission. As shown in Fig. 1, we propose to conceal errors in videos by performing some processing at the decoder outputs. There are some basic notions that need to be explained. A video consists of many frames. In each frame, it is divided into many slices and within each slice, there are numerous macro-blocks (MB). In our error concealment algorithm, the following steps are carried out: 1. For each frame, determine the damaged MB locations; 2. Build a reference data cube using the previous one or two decoded frames; part of the current frame can also be used to build the reference data cube. 3. Re-rank the sub-blocks: For each damage MB, divide it into 4 quadrants (subblocks). Form a sparse block where 3 sub-blocks contain known pixels from the 2

neighbors and only 1 sub-block contains unknown pixels. Rank all the sparse blocks and determine the highest ranked sparse block. 4. Concealment 4.1 Patch selection: Determine the location of sub-block in current frame to be concealed 4.2 For a sparse block containing the highest ranked sub-block, search the best patch in the reference data cube. 4.3 Compute matching scores and find the best few patches. Replace the subblock with the best match. 4.4 Optional: Matrix or tensor construction using the best matched patches; the low rank part of the matrix or tensor construction will replace the sub-block containing unknown pixels with the selected patch. This step takes more computations and hence is optional. 5. Move to the next sub-block until all missing sub-blocks are concealed. 2.2 Baseline Version of Our Proposed System As shown in Fig. 2, our algorithm consists of the following key modules: damaged MB location determination (dotted line rectangle), ref_frcube generation (mt_ftcube); re-rank sub-blocks; concealment (patch selection: selects sparse block); scoring and matching: selects best patch). Step 1: Damaged MB location determination Each MB (16x16) has header information and corrupted MB’s can be determined by collecting information from headers in all the uncorrupted MBs in a frame. Step 2: ref_frcube generation A user can select the number of past reference frames. For each frame, the ref_frcube is built by stacking every possible 16x16 block after vectorizing each block into a vector to form a big matrix. The video frames consist of Y, U, V components. U and V are upsampled to the same dimension as Y. Consequently, there are 3 vectors (Y,

Fig. 2. Baseline flow chart with smart adjacent search.

3

U, V) in each block. Each column in the ref_frcube has 768 elements. In our experiments, we normally use the immediate past frame as the reference frame in the decoded_picture_buffer (DPB). There is also an option in our design. We can also use the undamaged blocks in the current frame to expand the ref_frcube. This will help in certain situations where it is hard to find similar patches in the previous frame. The ref_frcube creation takes time and can be created using multiple workers in parallel. See Fig. 3 below. Each worker focuses on one quadrant of the frame. The speed can be improved linearly with the number of workers.

Fig. 3. Creation of reference data cube using multiple workers (threads). Step 3: Re-rank sub-blocks Depending on which sub-blocks have the most “good neighbors”, the next target location to patch is selected. Sub-block locations with 3 neighboring sub-blocks that were properly decoded have the highest rank. Sub-blocks with neighbors that have been concealed will have a lower rank. Properly decoded neighbors contribute +10 to the rank of a given sub block. The max rank is 30 (3 properly decoded neighbors) and after concealment, the location of the patched sub block is given the score (max neighbor rank - 1), in this case, 10 - 1 = 9. Now if the next sub-block is adjacent to the one just patched, there would be a +9 contributing to the rank instead of a +10. If a sub-block has 3 concealed neighbors, it would have a rank of 27 (9+9+9) and would yield a score of 8 afterwards because the max neighbor rank is only 9 here. This strategy ensures that we

4

are looking for patches and scoring them using the most accurate references, rather than using concealed patches to look for subsequent patches. Pseudo codes for rerank_subblks are shown below. rerank_subblks Inputs: **miss_subblks – list of missing subblocks, incudes processed and unprocessed blocks **miss_scr – miss score matrix sblk_sz – subblock size of 1 dimension misslen – number of missing subblocks msr – number of rows in miss_scr msc – number of cols in miss_scr Outputs: *rnk – will contain the rank of each subblock in the list Description: Reranking subblocks will rank blocks with the most number of good neighbors as higher, and those with fewer as lower ranks. Blocks already processed will generate a rank of -1. Pseudo: 1. int * rerank_subblks(int **miss_subblks, int sblk_sz, int **miss_scr, int misslen, int msr, int msc, int *rnk) { 2. for (ii from 0 to misslen){ 3. if(miss_subblks[ii] is processed){ 4. rnk[ii] = -1 5. }else{ 6. int mxrank, mxrank_idx; 7. calc_rank(miss_subblks[ii], sblk_sz, miss_scr, msr, msc, &mxrank, &mxrank_idx) //gives subresult of patch selection 8. rnk[ii] = mxrank 9. }end ifelse 10. }end for 11. }end Step 4: Concealment A typical concealment process can be described as follows. In the current corrupted frame f , consider one damaged image block bi of size L  L ( say L  16) . By dividing bi into 4 smaller sub-blocks bi = 4j=1 pij , we step by step recover each of these sub-blocks and then complete the block bi . The first missing sub-block pi1 is grouped with the three quarters of the upper left neighborhood to form a new patch of the same size L  L . This new patch is called sparse block. The partial information from the spatial neighborhood and all adjacent frames are exploited to fill in pi1 . After pi1 is completed, it can be considered as known information and the same technique to complete pi1 can be pij

5

applied to recover pi 2 . Then pi 3 and finally pi 4 are recovered in succession just like what we have done with pi1 and pi 2 . The four steps to complete one missing block can be illustrated as in Fig. 4.

Fig. 4. Illustration of concealment steps for a MB. Now we will consider how one patch, called yij , with three quarters of known information and one fourth of the unknown piece pij can be concealed. Setting this patch as a reference patch, we search within a given number of neighborhood frames for the best patch that are similar to yij . Only three fourth of known information in the patch yij are used for the comparison of mean squared error (MSE) to find one candidate patch. The concealment consists of quite a few steps, which are described below. Patch selection An example is shown in Fig. 5 for illustrating the patch selection process. Based on the re-rank function outputs, a sparse block with the highest rank will be selected.

Fig. 5. Patch selection. Patch search in the ref_cube In our current implementation for patch search as shown in Fig. 7, we first create a subref-cube in the neighborhood of the sparse block to be concealed. We then use the 3 subblocks in the sparse block and compare with those corresponding sub-blocks in some 6

candidate patches/blocks in the sub-ref-cube. A score is computed by summing up all the residuals between the 3 sub-blocks. The scores will be ranked. We will then either choose the patch with lowest score or take the average of several patches with the lowest scores. Some complicated algorithms using matrix or tensor completion can also be used.

Fig. 7. Illustration of patch search in the neighborhood of the previous frame. Sparse block is the green vector. In practice, videos are coded in packets with each packet containing a number of macroblocks (MBs). The MBs in each packet may be in interleave or consecutive formats. See Fig. 6 for the consecutive format. In this scenario, we will conceal from both ends: top left and bottom right. The determination is based on the re-rank 7

A lost packet contains many MBs

frame

Fig. 6. A lost packet contains a number of damaged MBs, which are in sequential order.

program mentioned earlier. Eventually, the concealment will meet somewhere in the middle. 2.3 Integration of our concealment in JM JM is the reference software for H264 [13]. We integrated our video error concealment software (baseline version) into JM. We also implemented fast scoring, fast search, and parallel processing using multiple CPUs and GPUs. Details can be found in our pending patent [25].

3.

Experimental Results

In this section, we applied our video error concealment approach to UAV videos. The theory has been briefly described in an earlier section. So we focus on presenting the results. The video size is 480x320 with 60 frames. The frame rate is 30. Here, we introduced 5% packet losses randomly. Each packet has 20 macro-blocks (MBs). By definition, a MB is a 16x16 block. Case 1: Video encoded with no FMO. Two error concealment algorithms. From Fig. 8, it can be seen that our results are better than that of JM by several dBs. One can also see from Fig. 9 that the JM residuals have more artifacts than ours. Case 2: Video encoded with FMO. Two error concealment algorithms. From Fig. 10, it can be seen that FMO indeed improved the PSNR of the video by 2-3 dBs as compared to that of no FMO (Fig. 8). Also, as can be seen from Fig. 10, our algorithm performed better in most of the frames. From Fig. 11, one can see that JM residuals have more artifacts than ours.

Fig. 8: Comparison of PSNR between our method and H264 JM reference software. No FMO case.

8

(a) Original frame 41.

(b) Missing pixel locations. (c) Our results. No FMO.

(f) Difference of JM (e) JM results. (d) Difference of our results and the original. results and the original. Fig. 9: Frame 41. No FMO. Since the PSNRs of the concealed images are very high, it is hard to visually see the differences of different methods. The residual for JM has more artifacts.

Fig. 10: Comparison of PSNR between our method and H264 JM reference software. FMO case.

(a) Original with FMO.

(b) Missing pixel locations.

9

(c) Our results with FMO.

(d) JM result with FMO. (d) The difference of our (f) The difference of JM results with the original. results with the original. Fig. 11: Frame 50 with FMO. Since the PSNRs of the concealed images are very high, it is hard to visually see the differences of different methods.

4. Conclusions In this paper, we detailed a new approach to video compression. It is robust to heavy packet losses. Most importantly, it will not incur additional bandwidth usage as the error concealment is done at the receiving end. The approach has been implemented and integrated into the H264 JM reference codec and experimental results demonstrated that the proposed method performed much better than that of the JM codec.

Acknowledgment This research was supported by NAVSEA under contract # N00024‐15‐P‐4514. This document has been approved for public release. Distribution is unlimited.

References [1] J. Zhou and C. Kwan, “Anomaly detection in Low Quality Traffic Monitoring Videos Using Optical Flow,” SPIE Defense + Security Conference, Orlando, FL., April, 2018. [2] C. Kwan, J. Zhou, Z. Wang, and B. Li, “Efficient Anomaly Detection Algorithms for Summarizing Low Quality Videos,” SPIE Defense + Security Conference, Orlando, FL., April, 2018. [3] C. Kwan, J. Yin, and J. Zhou, “The Development of a Video Browsing and Video Summary Review Tool,” SPIE Defense + Security Conference, Orlando, FL., April, 2018. [4] X. Li, C. Kwan, G. Mei, and B. Li, “A Generic Approach to Object Matching and Tracking,” Proc. Third International Conference Image Analysis and Recognition, Lecture Notes in Computer Science, Volume 4141, pp 839-849, 2006. [5] J. Zhou and C. Kwan, “Tracking of Multiple Pixel Targets Using Multiple Cameras,” 15th International Symposium on Neural Networks, 2018. [6] C. Kwan, K. S. Yeung, “Robust Adaptive Control of Revolute Flexible-Joint Manipulators Using Sliding Techniques,” Systems and Control Letters, Vol. 20, No. 4, pp. 279-288, 1993. [7] C. Kwan, “Robust Adaptive Force/Motion Control of Constrained Robots,” IEE Proceedings Pt. D, Vol. 143, No. 1, pp. 103-109, 1996. [8] C. Kwan, H. Xu, F. L. Lewis, “Robust Spacecraft Attitude Control Using Adaptive Fuzzy Logic,” International Journal of Systems Science, Vol. 31, no. 10, pp. 1217-1225, 2000.

10

[9] J. Dohner, C. Kwan, and M. Ruggelbrugge, “Active Chatter Suppression in An Octahedral Hexapod Milling Machine: A Design Study,” SPIE Smart materials & Structure Conference, vol. 2721, 1996. [10] C. Kwan, B. Chou, and L. M. Kwan, “A Comparative Study of Conventional and Deep Learning Target Tracking Algorithms for Low Quality Videos,” 15th International Symposium on Neural Networks, 2018. [11] G. Strang and T. Nguyen, Wavelets and filter banks, Wellesley-Cambridge Press, 1997. [12] C. Kwan, B. Li, R. Xu, T. Tran, and T. Nguyen, “Very Low-Bit-Rate Video Compression Using Wavelets,” Wavelet Applications VIII, Proc. SPIE (vol. 4391), pp.176-180, 2001. [13] X264, http://www.videolan.org/developers/x264.html [14] X265, https://www.videolan.org/developers/x265.html [15] TCP/IP, http://searchnetworking.techtarget.com/definition/TCP-IP [16] D. Nguyen, M. Dao and T. D. Tran, “Video Error Concealment using Sparse Recovery and Local Dictionaries,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), May 2011. [17] W.-Y. Kung, C.-S. Kim, and C.-C. Jay Kuo, “Spatial and Temporal Error Concealment Techniques for Video Transmission Over Noisy Channels,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 16, No. 7, July 2006. [18] JM, http://iphome.hhi.de/suehring/tml/. [19] J. Zhou, C. Kwan, and B. Ayhan, “A High Performance Missing Pixel Reconstruction Algorithm for Hyperspectral Images,” 2nd. International Conference on Applied and Theoretical Information Systems Research, Taipei, Taiwan, December 27-29, 2012. [20] J. Zhou and C. Kwan, “High Performance Image Completion using Sparsity based Algorithms,” SPIE Commercial + Scientific Sensing and Imaging Conference, Orlando, FL., April, 2018. [21] J. Zhou, B. Ayhan, C. Kwan, and T. Tran, “ATR Performance Improvement Using Images with Corrupted or Missing Pixels,” SPIE Defense + Security Conference, Orlando, FL., April, 2018. [22] J. Zhou and C. Kwan, “Missing Link Prediction in Social Networks,” 15th International Symposium on Neural Networks, 2018. [23] RTP, https://tools.ietf.org/html/rfc3550 [24] RTSP, https://tools.ietf.org/html/rfc2326 [25] C. Kwan, “Method and System for High Performance Video Signal Enhancement,” nonprovisional patent # 15259447, filed on September 8, 2016.

11

Suggest Documents