IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 4, APRIL 2007
997
Lossless Video Sequence Compression Using Adaptive Prediction Ying Li and Khalid Sayood
Abstract—We present an adaptive lossless video compression algorithm based on predictive coding. The proposed algorithm exploits temporal, spatial, and spectral redundancies in a backward adaptive fashion with extremely low side information. The computational complexity is further reduced by using a caching strategy. We also study the relationship between the operational domain for the coder (wavelet or spatial) and the amount of temporal and spatial redundancy in the sequence being encoded. Experimental results show that the proposed scheme provides significant improvements in compression efficiencies. Index Terms—Integer wavelet transform (IWT), lossless video coding, pixel prediction.
I. INTRODUCTION
D
UE to its importance in multimedia applications, most research on video compression has centered on lossy video compression where the focus is on achieving a good tradeoff between the reconstructed quality and the compression ratio. Current lossy video compression standards provide substantial compression efficiency at the cost of minimal degradation of quality. Historically, significantly less interest has been paid to the development of lossless video compression algorithms. Lossless video compression is important to applications in which the video quality cannot tolerate any degradation, such as archiving of a video, compression of medical and satellite videos, etc. Recently, there has been increasing interests in developing lossless video compression techniques [1]–[9]. Memon and Sayood [1] presented a hybrid compression approach, which adaptively utilized spatial, spectral or temporal prediction based on their performance, and employed a 3-D version of the lossless JPEG predictor to incorporate temporal prediction. The authors also investigated an adaptive scheme shifting between the temporal prediction based on block motion compensation and spectral prediction which used the best predictor from one spectral plane on another. In [7], Zhang et al. applied an adaptive combination of a temporal predictor based on block motion compensation and a spatial predictor as de-
Manuscript received April 27, 2006; revised September 18, 2006. This work was supported in part by a grant from the NASA Goddard Space Flight Center. The associate editor coordinating the review of this manuscript and approving it for publication was Giovanni Poggi. The authors are with the Department of Electrical Engineering, University of Nebraska, Lincoln, NE 68588-0551 USA (e-mail:
[email protected];
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2006.891336
scribed in CALIC [12]. Brunello et al. [5] introduced a novel temporal prediction technique based on block motion compensation and an optimal 3-D linear prediction algorithm. In their scheme, based on motion information, the pixel to be encoded is predicted by a linear combination of neighboring pixels in the current and reference frames. The optimal linear prediction coefficients are adaptively obtained. Note that the above approaches are based on block motion compensation which requires the transmission of motion vectors, which, in turn, may significantly decrease the compression efficiency. Kyeong et al. proposed a pixel-based prediction scheme [2], in which intraframe or interframe coding is chosen based on temporal or spatial variations. These variations are calculated by using the past reconstructed pixels in the current or reference frame. The encoder selects the mode with less variation. Carotti et al. [4] presented a combination of a MED spatial predictor as described in JPEG-LS [11] and a pixel-based temporal predictor formed from the pixels in the same neighborhood in the previous two frames. In [4], a fixed weighting is used to combine both spatial and temporal prediction. Later, the authors improved the scheme by replacing the fixed weight with an optimal adaptive weight aimed at minimizing the energy of the prediction residual in [3]. Note that [2]–[4] used pixel-based predictors, and the temporal prediction in these schemes did not utilize motion information. Gong et al. [8] proposed a wavelet-based lossless video coding algorithm. Noting that motion compensation in the wavelet domain can be very inefficient for high motion video sequences due to the shift-variant property of the wavelet transform [8], their scheme switches between two operational modes based on the amount of the motion activity between adjacent frames. Differential coding of wavelet coefficients without motion compensation was applied directly to a low motion activity sequence. Otherwise, block motion compensation was first performed in the spatial domain, and then wavelet coefficients of the prediction residuals were entropy coded and transmitted. Park et al. [9] presented an adaptive lossless compression algorithm for color video sequences with a wavelet transform. A novel temporal prediction technique was proposed in this work. The basic idea is similar to that of block motion estimation, but without the requirement of the transmission of motion vectors. This is implemented by searching a best match for a target window in the reference frame. The target window is composed of the three neighboring blocks of the block to be encoded. The temporal predictor is the corresponding neighboring block to the matched target window in the reference frame. The encoder adaptively selects the best predictor among the candidates
1057-7149/$25.00 © 2007 IEEE
998
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 4, APRIL 2007
based on their performances and transmits the prediction residuals from the selected mode to the decoder as well as an indicator of the selected mode. In this paper, we take the temporal predictor in [9] as a reference. We propose an enhanced pixel predictor which exploits the motion information among adjacent frames using extremely low side information. We also study the relationship between the operational domain and the amount of the temporal and spatial redundancies of the sequence to be encoded. The computational complexity can be reduced by caching previous computational results or using refined target windows and refined search ranges. As opposed to [9], the proposed predictor uses a backward adaptive prediction mode selector instead of using extra bits to indicate the prediction mode. Meanwhile, to maximize the prediction performance, the proposed scheme adaptively selects the predictor from all candidates based on previous prediction accuracy. This paper is organized as follows. In Section II, we look at the gain in compression efficiency from the omission of motion vector in [9]. Section III presents the proposed lossless coding scheme. Section IV gives experimental results with some discussion, and, finally, conclusions are drawn in Section V. II. OVERVIEW As stated earlier, Park et al. [9] presented a new temporal predictor without the requirement of the transmission of motion vectors. Much overhead is reduced due to the omission of motion vectors. The evaluation of this algorithm showed that it provides better compression than other state-of-the-art algorithms, e.g., JPEG-LS [11] and CALIC [12], making it stand out as one of the best techniques in lossless video compression. This algorithm is referred to as PARDEL hereafter in this paper. To better understand the gain in compression efficiency from the reduction of side information, let us consider a straightforward approach denoted by WITH-MV, which is a lossless video coding algorithm that utilizes a block motion compensation. WITH-MV follows the same process as in PARDEL except that it uses block motion compensation technique with motion vectors. In WITH-MV, the side information for each block consists of the indication of prediction mode as well as a motion vector if a temporal prediction is chosen. However, the compression efficiency of WITH-MV is degraded because of increase in rate for the side information. Our goal is to examine the tradeoff between the additional overhead due to the motion vectors and the reduction in prediction accuracy which results when motion vectors are not used. Our study shows that PARDEL outperforms WITH-MV because of the reduction of side information. Generally speaking, WITH-MV has a slightly better prediction accuracy than PARDEL because PARDEL locates the corresponding predictor according to the best matched target window rather than the block to be encoded, while PARDEL has a much lower overhead than that of WITH-MV. As a result, PARDEL provides a better overall compression efficiency than WITH-MV. For example, for the RGB video sequence “Claire” with a block size of 2 2 pixels, the overhead occupies 3.67 bits per pixel (bpp) for WITH-MV and only 0.80 bpp for PARDEL. As a result, the compression result with PARDEL (6.69 bpp) is much
smaller than that of WITH-MV (7.74 bpp). This observation is also valid for other video sequences and block sizes in our study, revealing much compression efficiency can be gained from the removal of side information. Note that this comparison is just qualitative to show the tradeoff between the size of side information and the prediction accuracy. In practice, usually an appropriate block size is choosen to balance the tradeoff. Therefore, the starting point of our work is to further reduce the side information while still maintaining a high prediction accuracy. More specifically, we introduce a scheme with improved temporal prediction performance. It just uses a fixed size side information which is independent of the size of the original video sequence. Note that PARDEL still uses some side information to indicate the selected prediction mode for each block. III. ADAPTIVE PIXEL-BASED PREDICTION CODING It is well known that the choice of block-size used in motion compensation is always a tradeoff between prediction accuracy and the size of side information. Smaller and more numerous blocks can provide smaller prediction residuals but result in more side information and more motion vectors. Choosing a large block size means a small number of bits for the side information, but also lower prediction efficiency resulting in a larger prediction residual. Even in the scheme proposed in [9], which requires no transmission of motion vectors, there still exists a similar tradeoff between the prediction accuracy and the size of side information. By considering the above tradeoff, we introduce a new lossless coding scheme with two key points: It adopts a pixel-based motion compensation concept in order to minimize the prediction residual, and it only needs to transmit fixed size side information, which is also extremely low compared to the size of the original video sequence. In this section, we present the proposed adaptive pixel-based prediction scheme which exploits the redundancies in the wavelet domain or in the spatial domain with fixed size side information. As shown in Fig. 1, the proposed scheme consists of the following main steps: preprocessing, adaptive symbol prediction, adaptive prediction mode selection, and context-based arithmetic coding of the prediction residuals. Preprocessing aims to determine the operational domain of a video sequence according to its temporal and spatial redundancies. The candidate operational domains are the wavelet domain and the spatial domain. In the adaptive symbol prediction step, three separate predictors are utilized to reduce the spatial and temporal redundancies: a novel temporal predictor, a spatial predictor and a direct predictor. In the adaptive prediction mode selection step, a predictor is adaptively selected from three candidates based on the causal previous predictive information to maximize the prediction performance. Context-based arithmetic coding follows. A. Preprocessing The wavelet transform has been extensively used for compression due to its excellent energy compaction characteristics. In the proposed scheme, an integer wavelet transform (IWT) mapping integer pixels to integer coefficients is used, which is suitable for lossless application. Gong et al. [8] pointed out that motion compensation in the wavelet domain might be inefficient
LI AND SAYOOD: LOSSLESS VIDEO SEQUENCE COMPRESSION
999
Fig. 1. Block diagram of the proposed algorithm.
due to the shift-variant attributes of wavelet transform. Therefore, it might be unwise to process all kinds of video sequences in the spatial domain alone or in the wavelet domain alone. We present a method to determine the operational domain of a video sequence in two steps according to its temporal and spatial redundancies. In the first step, the amount of temporal redundancy is estimated by the interframe correlation coefficients of to repthe test video sequence. For example, let us use in frame resent the pixel to be encoded that is located at , then the interframe correlation coefficient between frame and can be calculated by (1), shown at the bottom of the page [17], where is the average value of the pixels in frame , is the average value of the pixels in frame . If the average of the interframe correlation coefficients are smaller than a predefined threshold, then the sequence is likely to be a high motion video sequence. In this case, motion compensation in the wavelet domain would be inefficient; therefore, it is wise to operate on the sequence in the spatial domain. The threshold is set as 0.9 in our experiments. Fig. 2 illustrates the interframe correlation coefficients of some test sequences from 0th frame to 39th frame. It is observed that the “Football (720 480)” sequence and the “Football (352 240)” sequence have relatively small interframe correlation coefficients. Therefore, it is wise to operate on these two sequences in the spatial domain. The sequences that have larger interframe correlation coefficients will be operated on in the wavelet domain.
In the second step, we determine the suitable IWT for the test sequence by estimating its spatial redundancy. [8], [13] discussed that there is no one transform that has higher performance in lossless and lossy compression for all types of images as well as low computational complexity. Here, we use two common examples of IWT in the proposed scheme, the and 5/3 transformations, based on the consideration of their simplicity and effectiveness, respectively. The transform is the integer version of the Haar transform [13] which has the lowest computational complexity among all transforms, and the 5/3 transform performs reasonably well for both lossy and lossless compression [13]. The forward transform equations are
(2) The forward 5/3 transform equations are
(3) where is the input signal, is the high-frequency subband signal and is the low-frequency subband signal.
(1)
1000
There are many approaches to estimate the amount of the spatial redundancy, such as correlation coefficients, high frequency coefficient magnitudes of the wavelet transform and high frequency coefficient magnitudes of the discrete cosine transform (DCT) [15]. Here, we adopt a simple and fast approach by calculating the energy of the high frequency DCT coefficients. In Fig. 3, the sum of the amplitudes of the four highest frequency coefficients of some test sequences are displayed. Larger amplitudes from the high frequency coefficients imply that the adjacent pixels vary greatly within a frame, and a 5/3 wavelet transformation is used because it can exploit the spatial redundancy more effectively. For the sequence “Mobile,” the average of the sum of high frequency coefficient amplitude is 10.56, and a 5/3 wavelet transformation is suggested. Small amplitudes of the high frequency coefficients mean that the adjacent pixels are very close to each other, then an wavelet transformation is used due to its simplicity. For example, sequences “Claire” and “Miss America,” which averages of the sum of the high frequency amplitudes are less than a predefined threshold, are transformed by an transformation. In our experiment, we set this threshold to 9. Another similar approach used to approximate the intraframe redundancy is to observe the amplitude of the high frequency coefficients after IWT. For example, Fig. 4 illustrates the average of the amplitudes of the 16 highest frequency coefficients after a three level IWT. These 16 coefficients are located at the lower-right corner of each frame. It is obvious that the same conclusion can be arrived as in Fig. 3. The threshold is set to 3 here. Table I shows the compression results for various RGB test sequences processed in the wavelet domain and in the spatial domain. As we expected, for small temporal redundancy sequences, such as sequences “Football (720 480)” and “Football (352 240),” prediction in the spatial domain provides better results. While for large temporal redundancy sequences, prediction in the wavelet domain provides better compression. As observed in Fig. 3, the sequence “Mobile” has relatively small spatial redundancy, and a 5/3 transform achieves the best performance as shown in Table I. This is because the 5/3 transform is more effective though more complex than transform. the The proposed method using interframe and intraframe redundancy is suitable for RGB sequences, yet it does not always work for YUV sequences. For YUV video sequences, we find that operating in the spatial domain always provides the best compression performance. Sequence classification for the operational domain and the transformation type for a given video sequence is a hard task and the optimum approach remains an open question. In the proposed algorithm, we use the above method to determine the operational domain of a video sequence if it is in the RGB color domain. The encoder will use a flag to represent the selected operational mode among spatial domain, transformation and 5/3 transformation. If a video sequence is in the YUV color domain, we will process it in the spatial domain without any transformation. Once the operational domain of a video sequence is determined, it is transformed into the YUV color space if it is in the
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 4, APRIL 2007
RGB color space to reduce color redundancy. This is implemented by applying the reversible color transform [9], [16] to all frames directly. For a video sequence to be processed in the wavelet domain, a one level IWT is applied to all frames after the color transform, and the encoder uses a flag to represent the selected transformation. B. Spatial Prediction To reduce the spatial redundancy, a prediction is computed based on the neighboring symbols in the same frame as the symbol to be encoded (here, we use the term symbol because it can be a pixel in the spatial domain or a wavelet coefficient in the wavelet domain). In the proposed scheme, we use a simple but robust spatial predictor, the median edge detector (MED), as used in JPEG-LS [11]. MED estimates the symbol to be encoded based on the values of the three previously encoded neighboring to represent the symbol to be encoded symbols. We use in frame . The spatial predicted value that is located at is represented as , which is given in [11] as of if if otherwise where , Thus, the spatial prediction residual is
,
(4) .
(5) C. Temporal Prediction Inspired by the temporal prediction scheme presented in [9], we introduce a novel adaptive pixel-based predictor based on the symbols in the reference frame with improvement to reduce temporal redundancy. The proposed temporal predictor is effective and accurate, and does not require the transmission of any extra side information. be the symbol to be encoded. The proposed temLet poral predictor aims to find the best matched symbol in refer, which is denoted as the temporal predictor ence frame . Instead of exploiting the motion activity of between adjacent frames directly, the predictor investigates the in frame and motion activity of the target window of within a search range as illustrated in Fig. 5, frame where the target window is composed of the upper-left neigh. boring symbols of searches and loThe temporal predictor of symbol which cates the best matched target window in frame achieves the minimum cumulative absolute difference (CAD) within the search range, where (6) where denotes the target window, , and denote the symbol values of the current frame and the ref, respectively, and where a motion vector erence frame is determined for the region ,
LI AND SAYOOD: LOSSLESS VIDEO SEQUENCE COMPRESSION
1001
Fig. 2. Interframe correlation coefficients of the test RGB sequences.
Fig. 3. Sum of the amplitudes of four highest frequency coefficients after DCT for the RGB test sequences.
to minimize the CAD. Similar to block motion compensation techniques, the best motion vector for the target window with the minimum CAD is determined by
indicates the motion displacement of the target where can be obwindow. Then, the temporal predictor of tained by
(7)
(8)
1002
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 4, APRIL 2007
Fig. 4. Average of the the amplitudes of the 16 highest frequency coefficients after three-level S IWT on the RGB test sequences.
TABLE I COMPRESSION RESULTS IN AVERAGE BIT PER PIXEL OF VARIOUS RGB SEQUENCES IN THE DIFFERENT DOMAINS
and the temporal prediction residual is (9) Since the motion activity of the target window between frame and frame can be perfectly reconstructed at the decoder, there is no requirement for the transmission of the motion vector to the decoder. Note that unlike [9] we are looking for a match of a window surrounding the symbol to be encoded rather than a block of symbols. As such, we expect better prediction accuracy. Our results bear out our expectations.
Test results show that the above proposed temporal predictor achieves excellent performance. However, it needs to search and for all symbols in match the target windows in frame frame independently. This is very time consuming and places an undue computational burden in real applications. In our implementation, this problem is solved by caching previous computation results. An important, though simple, observation is that the target windows of two neighboring pixels overlap significantly. Therefore, there is no need to repeat the computation for pixels in the overlapped portion. In addition, we propose a refined approach that can further greatly reduce
LI AND SAYOOD: LOSSLESS VIDEO SEQUENCE COMPRESSION
1003
Fig. 5. Adaptive pixel-based temporal prediction of symbol p (x; y ).
the time for prediction, but might cause a slight decrease in compression ratio. Therefore, the refined approach should be used in applications that have higher temporal constraints. The basic idea of the refined approach is that motion activities of neighboring symbols for a specific frame are different but highly correlated since they usually characterize very similar motion structures. Therefore, motion information of symbol can be approximated by the motion information of the neighboring symbols in the same frame, and be refined over a relatively small search range with a relative small target window. of the current symbol The initial motion vector is approximated by the motion activity of the upper-left neighboring symbols in the same frame. It is the average motion vector of the four past neighboring symbols in (10), shown at the bottom of the page, where is the motion vector of symbol which can be obtained by (7), and so on, for , and . The computation only uses the past information which is available at of both the encoder side and the decoder side. is refined by finding The motion trajectory of symbol the best match of a small target window within a small . The procedure is the same as the previous search range tightly relies discussion. The final motion estimation of on the initial motion vector ; thus, an unsound initial motion vector will have a negative impact on the final prediction performance. Another control parameter named pixel group, ,
is introduced in order to prevent that negative impact. Within each pixel group, the initial motion vector of each symbol is acquired according to (10). Then, it is refined in a smaller range. For symbols that have not enough neighbors to apply (10), the motion vectors can be acquired directly with the method discussed before, as shown in Fig. 5. When the number of encoded symbols achieves , the procedure is re-initialized. The motivation for the refined scheme is to utilize the inherent connection of the motion information among neighboring symbols to significantly reduce the searching and matching time. Note that the compression efficiency is the first design objective of this paper. The refined approach only provide a potential solution to balance the tradeoff of compression efficiency and computation overhead. Therefore, all experiments in this work are conducted with the caching approach, which considerably reduces the searching and matching time. Considering that video data is nonstationary, and the characteristics of different video sequence always vary greatly from each other. It is impossible to find a set of fixed parameter values that work well for all video sequences. Therefore, in the proand are adjustable posed algorithm, parameters to improve the compression performance. D. Direct Mode If a video sequence is to be processed in the wavelet domain, then we use another prediction mode which is similar to the concept of direct sending mode as described in [9]. Because of the energy compaction property of the IWT, the wavelet coefficients in the high frequency subbands (LH, HL, HH) usually
(10)
1004
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 4, APRIL 2007
which represents the sum of amplitudes of the direct prediction residuals of the past neighboring symbols in Fig. 6. The final prediction mode is indicated by the mode with the , , and , which is minimal value among (14)
Fig. 6. Template of backward adaptive prediction mode selector of symbol p (x; y ).
have small amplitudes, which may be smaller than the amplitudes of the spatial prediction residuals and temporal prediction residuals. Therefore, in this case the wavelet coefficients are encoded and transmitted directly denoted as . E. Adaptive Backward Prediction Mode Selection As stated before, our scheme contains two key points: one is the use of pixel based prediction (that is, the prediction is performed by pixel other than by block) to remove the temporal redundancy as much as possible as described in Section III-C, the other is the extremely low side information transmission which will be discussed in this section. This is accomplished by utilizing a simple but effective backward adaptive prediction mode selector. The scheme adaptively selects the predictor among three candidates ( , , or ) based on previous prediction accuracy. The adaptive prediction selection is based on the sum of amplitudes of the prediction residuals of the past neighboring pixels as illustrated in Fig. 6. Suppose we want to determine . We calculate the prediction mode of
(11) which represents the sum of amplitudes of the spatial prediction residuals of the past neighboring symbols in Fig. 6, and
(12) which represents the sum of amplitudes of the temporal prediction residuals of the past neighboring symbols in Fig. 6, and
(13)
For example, if is the smallest value in for symbol , then temporal prediction is selected as the as discussed prediction mode and the prediction residual is in Section III-C. The selection and calculation of the prediction mode only uses the past information; hence, it has the advantage of not requiring the transmission of any extra side information. This approach not only reduces the size of the compressed data by removing the extra bits used to represent the prediction mode, but also achieves high prediction efficiency by adaptively selecting the predictor that performs best for neighboring pixels. For an entire video sequence, the side information needed to be transmitted includes the following items: • transformation type ( , 5/3 or spatial); this is only applied to the RGB video sequences; specify the size of the used target window; • specify the size of the used search range; • • frame width and height; • original color space (RGB or YUV). As we can see from the above list, the spatial requirement for the . That is, the size of the side information side information is is irrelevant to the size of video sequences. F. Context Modelling Context modelling is used for efficient coding of the prediction residuals. By utilizing suitable context models, the given prediction residual can be encoded by switching between different probability models according to already encoded neighboring symbols of the symbol to be encoded. In the proposed scheme, two causal context models are used. One is used for intraframe symbols and another is used for interframe symbols as illustrated in Fig. 7. In the intraframe mode, a . In nine-symbol context is built for the current symbol the interframe mode, temporal redundancy is exploited by using symbols from the same neighborhood forming a corresponding as shown in Fig. 7, and a nine-symbol block in frame . context is built for symbol The prediction residuals under the selected mode are encoded with one of the corresponding context models. The context template can be obtained by (15) where is the prediction residual of the corresponding symbol as shown in Fig. 7. Each context mode is quantized into 16 regions, where the quantizers are , , , , , , , , , , , , , , and , respectively. The quantizers are determined by observing the
LI AND SAYOOD: LOSSLESS VIDEO SEQUENCE COMPRESSION
1005
Fig. 7. Templates for the context of symbol p (x; y ) to be coded in the intraframe mode or interframe mode.
TABLE II RESULTS (DATA RATE (BITS/PIXEL) FOR YUV SEQUENCES)
histogram of the prediction residuals experimentally. It can be optimized in different applications. The prediction residuals are encoded with the corresponding context followed by an adaptive arithmetic coder as discussed in [15].
IV. RESULTS We tested the proposed scheme using standard color video sequences. As discussed before, all experiments were conducted to get the best performance.
1006
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 4, APRIL 2007
TABLE III RESULTS (DATA RATE (BITS/PIXEL) FOR RGB SEQUENCES)
TABLE IV EXPERIMENTAL PARAMETERS FOR YUV SEQUENCES
To evaluate the proposed scheme, we consider the lossless image coding algorithm JPEG-LS [11] and CALIC [12] as references. The results of PARDEL [9], [10] are also referenced to evaluate the performance of the proposed scheme. Note that, to the best of our knowledge, the results of PARDEL are the best available in the literature. Tables II and III illustrate the performance (bit per pixel) of the proposed scheme compared with JPEG-LS, CALIC, and PARDEL on various YUV and RGB sequences, which verify that our approach achieves significant performance enhancement with respect to JPEG-LS, CALIC, and PARDEL in terms of bit rate. The average bit rate reduction from PARDEL to the proposed scheme is 8.6% on YUV sequences, and the corresponding average reduction is 3.1% on RGB sequences. The detail experimental parameters are listed in Tables IV and V. Table II shows the compression results for various YUV sequences. The proposed scheme shows better performance than JPEG-LS, CALIC, and PARDEL for all test video sequences.
For example, the bit rate reduction from JPEG-LS to the proposed scheme is 10.2% for “Foreman (352 288),” and the corresponding reduction is 7.4% from CALIC to the proposed scheme. Moreover, the bit rate reduction from JPEG-LS to the proposed scheme is 21.0%, for “Foreman (176 144),” and the corresponding reduction is 18.3% from CALIC to the proposed scheme. It shows that the proposed scheme works better on smaller frame size sequences when compared with JPEG-LS and CALIC. This may be because the image compression algorithms, JPEG-LS and CALIC, work worse due to large amounts of temporal redundancy present in the smaller size frames. On the other hand, when compared with PARDEL, the proposed scheme works better for larger frame size sequences. For example, the bit rate reduction from PARDEL to the proposed scheme is 3.0% for “Foreman (176 144),” while the corresponding reduction is 4.8% for “Foreman (352 288).” This is because the proposed temporal predictor works accurately in a more stable fashion for both high temporal correlation
LI AND SAYOOD: LOSSLESS VIDEO SEQUENCE COMPRESSION
1007
TABLE V EXPERIMENTAL PARAMETERS FOR RGB SEQUENCES
and low temporal correlation sequences, while PARDEL gives better performance in the case of sequences with high temporal correlations. From Table III, it can be seen the proposed scheme outperforms JPEG-LS by up to 12.3% in terms of bit rates, outperforms CALIC by up to 24.6% in terms of bit rates, and outperforms PARDEL by up to 4.2% in terms of bit rates for the high motion sequence “Football (720 480).” Our implementation of PARDEL provided the same result as in [9], [10] except for RGB “Football (720 480).” This is probably due to a difference in the number of frames used. The “Mobile (720 576)” sequence has very low spatial correlation as shown in Fig. 3. CALIC shows better performance than JPEG-LS by utilizing a more efficient spatial prediction technique. The bit rate reduction from PARDEL to the proposed scheme (1.9%) is relatively small, because the proposed temporal predictor uses the spatial information to some extent as described in Section III-C, which is less efficient for the sequences with low spatial correlations. V. CONCLUSION We presented a new scheme for lossless coding of video sequences. It exploits spectral, spatial, and temporal redundancies and adaptively selects the best predictor out of a set of predictors without using any side information. Moreover, it employs a backward pixel-based temporal predictor without using motion vectors, which has state-of-art prediction performance and moderate complexity. Results show that the proposed algorithm has superior performance compared to other algorithms currently available in the literature. REFERENCES [1] N. D. Memon and K. Sayood, “Lossless compression of video sequences,” IEEE Trans. Commun., vol. 44, no. 10, pp. 1340–1345, Oct. 1996. [2] K. H. Yang and A. F. Faryar, “A context-based predictive coder for lossless and near-lossless compression of video,” in Proc. Int. Conf. Image Processing, Sep. 2000, vol. 1, pp. 144–147. [3] E. S. G. Carotti, J. C. De Martin, and A. R. Meo, “Low-complexity lossless video coding via spatio-temporal prediction,” in Proc. Int. Conf. Image Processing, Sep. 2003, vol. 2, pp. 197–200. [4] ——, “Backward-adaptive lossless compression of video sequences,” in Proc. IEEE Int. Conf. Audio, Speech, Signal Processing, Apr. 2002, pp. 3417–3420.
[5] D. Brunello, G. Calvagno, G. A. Mian, and R. Rinaldo, “Lossless compression of video using temporal information,” IEEE Trans. Image Process., vol. 12, no. 2, pp. 132–139, Feb. 2003. [6] B. Martions and S. Forchgammer, “Lossless compression of video using motion compensation,” in Proc. IEEE DCC, 1998, pp. 560–589. [7] Z. Ming-Feng, H. Jia, and Z. Li-Ming, “Lossless video compression using comnination of temporal and spatial prediction,” in Proc. IEEE. Int. Conf. Neural Netwoks Signal Processing, Dec. 2003, pp. 1193–1196. [8] Y. Gong, S. Pullalarevu, and S. Sheikh, “A wavelet-based lossless video coding scheme,” in Proc. Int. Conf. Signal Processing, 2004, pp. 1123–1126. [9] S.-G. Park, E. J. Delp, and H. Yu, “Adaptive lossless video compression using an integer wavelet transform,” in Proc. Int. Conf. Image Processing, 2004, pp. 2251–2254. [10] S.-G. Park, “Adaptive lossless video compression,” Ph.D. dissertation, School Elect. Comput. Eng., Purdue Univ., West Lafayette, IN, Dec. 2003. [11] JPEG-LS, Lossless and Near-Lossless Coding of Continuous Tone Still Images, ISO/IEC JTC1/SC 29/WG 1, Jul. 1997. [12] N. Memon and X. Wu, “Context-based, adaptive, lossless image coding,” IEEE Trans. Commun., vol. 45, no. 4, pp. 437–444, Apr. 1997. [13] A. R. Calderbank, I. Daubechies, W. Sweldens, and B. L. Yeo, “Wavelet transforms that map integers to integers,” Appl. Comput. Harmon. Anal., vol. 5, no. 3, pp. 332–369, Jul. 1998. [14] ——, “Lossless image compression using integer to integer wavelet transforms,” in Proc. IEEE Int. Conf. Image Processing, Oct. 1997, vol. 1, pp. 596–599. [15] K. Sayood, Introduction to Data Compression, 2nd ed. San Mateo, CA: Morgan Kaufmann, 2000. [16] D. S. Taubman and M. W. Marcellin, JPEG2000 Image Compression Fundamentals, Standards and Pratice. Norwell, MA: Kluwer, 2002. [17] M. G. Bulmer, Principles of Statics. New York: Dover, 1979. Ying Li was born in Xi’an, China, in 1977. She received the B.S. degree in computer engineering from Xi’an University of Architecture and Technology, China, in 1997, and the M.S. degree in computer science from Xidian University, China, in 2001. She is currently pursuing the M.S. degree in electrical engineering at the University of Nebraska, Lincoln. Her research interests are video and image coding. Khalid Sayood received his B.S. and M.S. degrees in electrical engineering from the University of Rochester, Rochester, NY, in 1977 and 1979, respectively, and the Ph.D. degree in electrical engineering from Texas A&M University, College Station, in 1982. He joined the University of Nebraska, Lincoln, in 1982, where he is currently the Henson Professor of Engineering. From 1995 to 1996, he served as the founding Head of the Computer Vision and Image Processing Group, Turkish National Research Council Informatics Institute. He spent the 1996 to 1997 academic year as a Visiting Professor at Bogazici University, Turkey. He is the author of Introduction to Data Compression, 3rd ed., (Morgan Kaufmann, 2005) and Understanding Circuits: Learning Problem Solving Using Circuit Analysis (Morgan Claypool, 2005), and Editor of the Handbook of Lossless Compression (Academic, 2002). His research interests include data compression, joint source-channel coding, and bioinformatics.