Embedded to lossless coding of motion ... - Semantic Scholar

27 downloads 4004 Views 141KB Size Report
coding of the motion compensated prediction residuals. ... 2 Correspondence: Email: [email protected] ; WWW: http://dmsun4.bath.ac.uk ; Fax: +44 1225 ...
Header for SPIE use

Embedded to lossless coding of motion compensated prediction residuals in lossless video coding G. C. K. Abhayaratne1 and D. M. Monro2 Signal and Image Processing Group, Department of Electronic and Electrical Engineering, University of Bath, Bath, BA2 7AY, United Kingdom.

ABSTRACT Lossless video coding is useful in applications where no loss of information or visual quality is tolerable. In embedded to lossless coding an encoded video stream can be decoded into any bit rate up to the lossless bit rate, which is quite useful in numerous applications. In this paper, the research we present is based on lossless video coding, which uses motion compensated prediction to eliminate temporal redundancy. We specifically address the problem of embedded to lossless coding of the motion compensated prediction residuals. Since the statistical properties of the residuals are different from still images, the best wavelet bases for still images do not perform well for residuals. Since residuals contain a higher portion of high frequency information in high motion regions, a fixed transform for the entire frame is not very efficient. We introduce a spatially adaptive wavelet transform method, which takes the local frame statistics into account before choosing the wavelet base. This transform technique provided the best performance for most of the test frames.

Keywords: Lossless Video Coding, Lossless Residual Coding, Embedded to Lossless Coding, Adaptive Wavelets, Nonlinear Wavelets

1. INTRODUCTION Digital video sequences in uncompressed formats require excessive storage capacity and huge transmission bandwidth. Therefore, compression techniques are required for efficient storage and transmission of digital video. However coding at high compression factors causes loss of visual quality of the original video sequences. There are many applications, in which no loss of either visual or pixel value information is tolerable. Examples are studio quality digital video archives, inter studio transmission and coding of medical and astronomical sequences, in which exact pixel recovery is required. Further, lossless coding is useful in studio applications in order to prevent accumulation of the quantisation effects from repetitive encoding and decoding processes performed in program production. Although lossless image coding has been studied extremely in recently published literature, lossless coding research for digital video has been published only in a few instances. In [1], Memon and Sayood investigated lossless video coding techniques by extending the 2-D prediction based methods used in lossless image coding into 3-D, thus considering the frames in groups in the temporal domain. In [2], Oami and Ohta presented a lossless video coding technique compatible with MPEG-II [3] by modifying the Discrete Cosine Transform (DCT) coefficients with a ‘lossless quantisation’ process. In both above instances, losslessly coded bit streams can be decoded only at the lossless level. The embedded coding, in which all encodings of the same source at lower bit rates are embedded in the bit stream for the targetted bit rate, has become a widely chosen option in still image coding [4-6]. With embedded lossless coding, it is possible to compress a master copy of the source losslessly and thereby to retrieve the source at any desired bit rate from the losslessly compressed master copy. This feature is also achieved in the lossless video coding process presented in this paper, so that the losslessly

1 2

Correspondence: Email: [email protected] Correspondence: Email: [email protected] ; WWW: http://dmsun4.bath.ac.uk ; Fax: +44 1225 826073

coded bit stream can be decoded at near lossless or visually lossless bit rates as well. This is quite useful in inter studio transmissions with a variable bandwidth channel and in quick previewing of losslessly coded video. For the research presented in this paper, a lossless video coding scheme was considered, with a block matching based motion compensated prediction technique as used in MPEG-II. In MPEG-II, a frame in a sequence is identified as either an intra (I type) or a non-intra (P and B types) frame. Since intra frames are coded without any reference to previously coded frames, they can be treated as still images. Non-intra frames are coded with reference to the previously coded frames. Thus they consist of motion compensated prediction residuals, which show statistical properties different to those of still images (intra frames). The main objective of this paper is to investigate embedded to lossless coding of motion compensated prediction residuals in non-intra frames. The rest of the paper is organised as below. In section 2, the background for embedded coding of video is briefly explained. Section 3 compares the statistical characteristics of residuals and corresponding intra frames. Section 4 introduces wavelet transforms on residuals. A novel adaptive wavelet transform technique and a non-linear sub band coding method is presented in sections 5 and 6 respectively. Finally, results and discussion can be found in section 6.

2. EMBEDDED VIDEO CODING In video coding, the embedded feature can be implemented as either in individual frames or in a group of pictures (GOP). In this paper, frame wise embedding, in which the individual frames are separately coded, was considered. Since I type frames are considered as still images, they are coded by applying integer wavelet transforms, which produce integer coefficients [7], followed by an embedded scanning scheme and context based adaptive arithmetic coding as in ELIC [6] algorithm. A similar approach as in I type frames can be used for embedded to lossless coding of the residual frames obtained from P and B type frames. However, because intra frames and non-intra frames are different in statistical properties, the transforms, scanning orders of coefficients and context modeling processes in the codecs should take those differences into account. The research presented in this paper, concentrates mainly on the wavelet transform techniques that can be applied to residuals for embedded to lossless coding.

3. INTRA AND NON-INTRA FRAMES The statistical characteristics of intra and non-intra frames are significantly different. Therefore, careful investigation of the statistical properties is vital for efficient encoding. The important statistical properties of the above are illustrated and compared using a non-intra frame and the corresponding intra frame from the Mobile sequence as in Figure 1. The magnitude histogram counts of residual frames possess a zero centred double sided geometrical distribution with a mean value close to zero as in Figure 1B. The shape of the distribution is dependent on the motion content of the frame. Frames with low motion content give a high and narrow peak at zero and short tails, while the frames with high motion content produce longer tails. This shape of distribution suggests that the values in the residuals are already decorrelated to a certain extent depending on the motion level of the frame. The variance normalised autocorrelation plots for a pixel neighbourhood of –4 …4 in both x and y directions are as in Figure 1C. and 1D for intra and non-intra frames respectively. Intra frame pixels contain a high inter pixel correlation and non-intra frame pixels contain a low inter pixel correlation. Intra frames possess an exponentially decreasing power spectrum in both directions. The normalised magnitudes of the high frequency components in intra frames are smaller compared to those of non-intra frames (Figure 1E and 1F). This presence of large amplitude high frequency components is further seen in the larger ac components in DCT coefficients for residuals as in Figure 1H. By the above statistical properties, it can be concluded that the residuals are fairly decorrelated frames with large magnitude high frequency components. This amount of decorrelation may be adequate for efficient lossless coding. But for an embedded coder the ordering of data according to their importance is vital. Therefore, the choice of wavelet filters should be considered taking into account the characteristics of the data.

A)

B)

Histogram for Intra

1400

Histogram for Non-intra

15000

1200

1000

Frequency

Frequency

10000 800

600

5000 400

200

0

0

50

100 150 Magnitude Values

200

0 -200

250

-150

-100

0 Magnitude Values

50

100

1 Normalised Magnitude

0.98

0.96

0.94

0.8 0.6 0.4 0.2 0 4

0.92 4 2

2

4 2

0

4 0

-2

-2 -4

2

0

0

-2

-2 -4

-4

E)

-4

F) Normalised Magnitude Spectrum- Row wise

Normalised Magnitude Spectrum -Column wise

1

1 Intra Non-intra 0.9

0.9

0.8

0.8

Normalised Magnitude

Normalised Magnitude

200

D)

1

0.7

0.6

0.5

0.4

Intra Non Intra

0.7 0.6 0.5 0.4

0.3

0.3

0.2

0.2 0.1

0.1

0

150

Normalised Autocorrelation for Non-intra

Normalised Autocorrelation for Intra

C) Normalised Magnitude

-50

0

20

40

60

G) x 10

80 100 Frequency Points

120

140

160

0

180

0

50

100

150

Frequency Points

H)

16X16 DCT ac coefficients for Intra

16X16 DCT ac coefficients for Non-intra

4

7 8000

6 7000 6000

4

Magnitude

Magnitude

5

3 2

5000 4000 3000 2000

1 1000

0 0

0 0

0

5

0

5

5

10 10 15

15 20

20

5

10 10 15

15 20

20

Figure 1: Intra and non-intra characteristics, A) Histogram of intra, B) Histogram of non-intra, C) Auto correlation of intra, D) Auto correlation of non-intra, E) &F) Column and Row wise power spectrum, G) & H) Intra and non-intra 16X16 DCT (ac) coefficients

Since the reference frames used for forward and backward motion compensated prediction are coded losslessly, the residuals in lossless video coding do not contain quantisation noise resulted from lossy coding of reference frames as in lossy video coding. These residuals mainly consist of the noise due to motion compensated prediction process and the natural noise in individual frames, whose level is increased by the subtraction used to find the residuals.

4. WAVELET TRANSFORMS ON RESIDUALS Although the DCT has been the preferred transform method for residuals in MPEG-II [3] and in H.263 [8], with the introduction of wavelet transforms for image coding some research on using wavelets for non intra frames also has been published [9] [10]. In this paper, the use of integer wavelet transforms implemented by lifting is considered. In lifting, [7] the input signal is sub sampled into two sub sets, namely even indexed samples (S) and odd indexed samples (D). Then a predictor (P) is used to predict the D samples using S samples, the size of which is determined by the required number of vanishing moments in the wavelet. Then S samples are updated (U) with D samples. The output from P and U are rounded, so that the integer nature of the input is preserved (figure 2). 2

+

S

P U X

2

+

D

Figure 2: Lifting Block Diagram

In lossless image coding, it is well known that wavelet transforms with more vanishing moments (such as (4,4), (4,2) and (2+2,2)) outperform wavelet transforms with fewer vanishing moments, since still images mainly consist of highly smooth regions and a few edges. Such wavelets use a larger neighbourhood of pixels for predicting and updating lifting steps. Since the residuals contain a higher proportion of high frequency information, wavelets with fewer vanishing moments, such as (0,0) (lazy wavelet), (1,1) (S transform) and (2,2) transform were chosen for our initial experiments on residuals. We used residuals from Claire (a talking head), Mobile (object and camera motion), Unicycle (moving texts) and Kiel harbour (zooming in) sequences of gray scale (Y component only- 8 bits per pixel) in CIF size.

4.1. Sub band entropy and energy distributions Wavelet transforms are applied separately in horizontal and vertical directions giving four sub bands. When still images are wavelet transformed most of the image energy is concentrated in the (low-low) LL sub band, which normally contains a sub sampled original image with statistics similar to those of the original image. Further, the LL sub band contains higher entropy compared to that of the other three sub bands. These characteristics lead to iteration of the wavelet transforms on LL sub band for four or five levels in still image coding. On the other hand, the LL sub band in wavelet transformed residuals does not possess the highest entropy or the highest energy compared to other sub bands. This is illustrated for 40 non-intra frames from the Mobile sequence as in Figure 3. The Figure 3C and 3D show the average entropy and energy values per frame for each of the four sub bands for different wavelet bases. All four sub bands contain comparable energy and entropy values. This is due to the larger proportion of high frequency information present in the residuals. Because of this, further decomposition of LL sub band does not improve the total weighted entropy. Therefore, we applied wavelet transforms only up to one scale, resulting in only four sub bands.

Entropy as a % of tota

A)

Sub band entropy distribution for Mobile using (2,2) 28

27

26 LL HL LH HH

25

24

23

22

Energy as a % of total

B)

0

5

10

15

20 Frame No.

25

30

35

40

Sub band energy distribution for Mobile using (2,2) 45

40

35

30 LL HL LH HH

25

20

15

10

0

5

10

15

20 Frame No.

C)

25

30

35

40

D) Sub band entropy distribution for Mobile for different wavelets

Sub band energy distribution for Mobile for different wavelets

30

60 LL HL LH HH

50 Energy as a % of total energy

Entropy as a % of total entropy

25

20

15

10

5

0

LL HL LH HH

40

30

20

10

1 (0,0)

2 (1,1) Wavelet Transform

3 (2,2)

0

1 (0,0)

2 (1,1) Wavelet Transform

3

(2,2)

Figure 3: Sub band characteristics for residual frames. A) & B) Entropy and Energy in % for Mobile sequence using (2,2), C) & D) Average sub band entropy and energy %per frame for different wavelets for Mobile sequence.

MOBILE 5.4 No Transform (0,0) (1,1) (2,2) (4,4)

Entropy (BPP)

5.2 5 4.8 4.6 4.4 4.2 4

C)

3.8

D) 0

5

10 Frame No.

15

20

KIEL 5

Entropy (BPP)

B)

No Transform (0,0) (1,1) (2,2) (4,4)

4.5

4

3.5

0

5

10 Frame No.

15

20

Figure 4: Entropy values for residuals from Mobile and Kiel Sequences for different wavelets

4.2. The best wavelet basis The zero order weighted entropy values in bits per pixel (bpp) for the test sequences using the transforms applied with single level iteration are as in Figure 4. Figure 5 and Table 1 show the mean weighted entropy (in bpp) per frame for different wavelets applied on different sequences. This shows that performing wavelet transforms on residuals gives only a slight advantage over not using a transform. The S transform and (2,2) transform give the lowest entropy values for the

Average Entropy per Frame 5 4.5

Average Entropy (BPP)

4 3.5 3

No Transform (0,0) (1,1) (2,2) (4,4)

2.5 2 1.5 1 0.5 0

1 (Claire)

2

(Mobile) Sequence

3 (Kiel)

4

(Unicycle)

Figure 5: Average BPP per frame for different wavelets

NO Transforms (0,0) (1,1) (2,2) (4,4)

CLAIRE 2.176 2.174 2.204 2.160 2.208

MOBILE 4.529 4.524 4.531 4.594 4.625

KIEL 4.502 4.499 4.428 4.439 4.452

UNICYCLE 4.499 4.494 4.383 4.326 4.338

Table 1: Average BPP per frame for different wavelets

sequences. Neither of the transforms performed as the best option for all the sequences. Either of the transforms performs better for a given sequence according to the extent of which they are initially decorrelated. Wavelets with fewer vanishing moments performed well as expected.

5. SPATIALLY ADAPTIVE WAVELET TRANSFORMS As stated above, the four wavelet transforms considered perform differently for each of the sequences. Here we propose a method to choose P and U functions in the lifting process (Figure 2), adaptively depending on the local statistics of the residuals. A similar approach has been presented in adaptively switching between different predictors based on the local edginess of the image in still image coding [11]. In any given frame, there are regions with different amount of motion. Regions with high motion cause a higher decorrelation in residuals, while regions with low motion cause less decorrelated residuals. In this work, wavelets with a maximum of four vanishing moments were considered. 5.1. Choosing the Predictor With the initial results, four predictors, namely (x,0), (x,1), (x,2) and (x,4), where x is the number of vanishing moments to be introduced in the updating(U) step, can be considered. Therefore, in order to choose the most suitable predictor, the normalised auto correlation factors at single pixel displacement for different sizes of neighbourhoods are considered. To predict the ith D sample Di, two normalised auto correlation factors C2i and C4i, where a neighbourhood of two and four are chosen respectively from the corresponding S samples, are considered as the decision metric. C2i and C4i are calculated by,

C 2i = C 4i =

S i × S i +1 ( S + S i2+1 ) / 2 2 i

( S i −1 × S i + S i × S i +1 + S i +1 × S i + 2 ) / 3 ( S i2−1 + S i2 + S i2+1 + S i2+ 2 ) / 4

(1)

(2)

Then the best predictor is chosen according to the value of C2i and C4i as below. If (C4i > C2i) and (C4i >0) Perform the predictor in (x,4) Else if (C2i >T1) Perform the predictor in (x,2) Else if (C2i > T2) Perform the predictor in (x,1) Else Perform the predictor in (x,0) i.e. No prediction,

(3)

Where, T1 and T2 are thresholds determined by the suitable correlation factors. Three regions were identified with T1 and T2 namely, positive, low negative and high negative correlation with respect to C2i. When there exists a negative correlation between two neighbouring pixels in S, a prediction of (x,2) type is not suitable. Thus the (x,1) predictor or (x,0) predictor (i.e. no prediction) is more appropriate and T1 is defined as 0. The best value for T2 was experimentally determined as -0.1, which divides the negative region into low and high regions. A normalised correlation factor of -0.1 is regarded as a low correlation. This adaptive selection of predictors gives a lower entropy value in the HH sub bands compared to fixed individual predictors alone. 5.2. Choosing the Updator The main objective of the updating process is to transfer the original image mean to the low pass sub bands. This is more important when the transform is iterated on the LL sub band of the current level. An adaptive updating for Si can be determined as above by considering the corresponding samples in D. It was found that after the predicting process, Di-1 and Di are further decorrelated to a higher extent. A higher degree of decorrelation in D samples, gives a higher number of

samples that can be used for updating. Therefore, either (2,x) or (4,x) updating can be used, by considering the normalised correlation factors CC2i and CC4i as below.

CC 2 i = CC 4 i =

Di −1 × Di ( Di2−1 + Di2 ) / 2

(4)

( Di −2 × Di −1 + Di −1 × Di + Di × Di +1 ) / 3 ( Di2− 2 + S i2−1 + Di2 + Di2+1 ) / 4

(5)

If CC4i is smaller than CC2i, then (4,x) is used for updating. Otherwise, (2,x) is used for updating. With this adaptive selection technique, any wavelet base can be applied to a frame depending on the local statistics.

6. RESULTS AND DISCUSSION The performance of the proposed spatially adaptive wavelets technique was compared with other transforms ((0,0), S, (2,2) and (4,4)) and without any transforms for the above mentioned test sequences. The results for 40 non-intra frames are compared with the no transform case as in Figure 6. The average bit rates per frame for each sequence (y component only), using all above transforms are tabulated in the table 2.

NO Transforms (0,0) (1,1) (2,2) (4,4) Adaptive Wavelets

CLAIRE 2.176 2.174 2.204 2.160 2.208 2.179

MOBILE 4.529 4.524 4.531 4.594 4.625 4.572

KIEL 4.502 4.499 4.428 4.439 4.452 4.418

UNICYCLE 4.499 4.494 4.383 4.326 4.338 4.347

Average 3.927 3.923 3.886 3.880 3.906 3.879

Table 2: Average Entropy (BPP) per frame

Using spatially adaptive wavelets produced the best result for most of the frames. It outperformed other methods in sequences like Kiel and Unicycle where a specific type of motion was present. The proposed schemes showed an overall advantage in bit rate saving of 0.03 bpp over non-adaptive methods and 0.05 bpp over methods without any transforms on average. Further, adaptive wavelets performed well for the residuals, which contain more power as in P type frames and in frames with greater motion. Finally, considering the slight improvement achieved over non-transform techniques, it might be considered unnecessary to transform the residual frames at all in the case of a non-embedded lossless coding. However, for embedded lossless coding, the adaptive wavelet transform presented here can be used to organise the residual values in the order of their importance. By considering the orthogonality, these coefficients can be coded with an embedded lossless codec to produce an embedded bit stream, so that they can also be decoded at lower bit rates from the lossless bit stream.

7. SUMMARY Residuals arising from the motion compensated prediction used to code non-intra frames in video coding possess statistical characteristics different to those of corresponding intra frames. Residuals have mainly high frequency components, which are already decorrelated to a certain extent. Therefore, the wavelets with fewer vanishing moments are suitable for residuals. With our experiments, we found that iterative decomposing of LL sub band as in still image coding was not necessary in residual coding as all four sub bands possessed comparable energy and entropy contents. Further, the best wavelet base for each frame was dependent on the motion content in that frame. Thus we presented an adaptive wavelet transform which is

CLAIRE

MOBILE

3

5.2 No Transform Adaptive

No Transform Adaptive 5

2.8

4.8

Entropy (BPP)

Entropy (BPP)

2.6

2.4

4.6

4.4

2.2 4.2 2

1.8

4

0

5

10

15

20 25 Frame No.

30

35

3.8

40

0

5

10

15

KIEL

30

35

40

30

35

40

UNICYCLE

5.4

5.5

5.2

No Transform Adaptive

No Transform Adaptive

5 4.8

5

4.6

Entropy (BPP)

Entropy (BPP)

20 25 Frame No.

4.4 4.2

4.5

4 3.8 3.6 3.4

0

5

10

15

20 Frame No.

25

30

35

40

4

0

5

10

15

20 Frame No.

25

Figure 6: Comparison of adaptive wavelets with no transform technique

also a non-linear sub band coding technique. With the spatially adaptive wavelet transform, a suitable predictor can be selected according to the local residual statistics. Those adaptive wavelets showed the best results for most of the test frames. Finally it is concluded, by considering the improvement in bit rates, the wavelet transforms are not very useful in nonembedded lossless coding as the amount of original decorrelation in motion compensated prediction process is adequate. But, for an embedded losssless coding, the wavelet transform techniques presented above can be used.

8. ACKNOWLEDGEMENTS The sponsorship of Tandberg Television Ltd. for G. C. K. Abhayaratne is gratefully acknowledged.

9. REFERENCES 1.

N. D. Memon and K. Sayood, “Lossless compression of video sequences”, IEEE Trans. on Communications, Vol. 44, No. 4, pp. 1340-1345, 1996. 2. R. Oami and M. Ohta, “Efficient lossless video coding compatible with MPEG-II”, Proc. IEEE International Conference in Communications, Vol. 2, pp. 901-905, IEEE, 1998. 3. J. L. Mitchell, W. B. Pennebaker, C. E. Fogg and D. J. LeGall, MPEG Video Compression Standard, Chapman and Hall, New York, 1996. 4. J. Shapiro, “Embedded Image coding using zero trees of wavelet coefficients”, IEEE Trans. on Signal Processing, Vol. 41, No. 12, pp. 3445-3462, 1996. 5. A. Said and W. Pearlman, “A new fast and efficient image codec based on set partitioning in hierarchical trees”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 6, No. 3, pp. 243-250, 1996. 6. G. C. K. Abhayaratne and D. M. Monro, “Embedded to lossless image coding (ELIC)”, Proc. IEEE Nordic Signal Processing Symposium (NORSIG), pp. 255-258, Kolmarden, Sweden, 2000. 7. A. R. Calderbank, I Daubechies, W. Sweldens and B. Yeo, “Lossless image compression using integer to integer wavelet transforms”, Proc. International Conference on Image Processing, Vol. 1, pp. 596-599, IEEE, Santa Barbara, CA, 1997. 8. ITU-T Rec. H.263, “Video Codec for low bit rate communiction”, 1996. 9. S. A. Martucci, I. Sodagar, T. Chiang and Y.Q. Zhang, “A zero tree wavelet video coder”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 7, No. 1, pp. 109-118, 1997. 10. D. Blasiak and W-Y. Chan, “Efficient wavelet coding of motion compensated prediction residuals”, Proc. IEEE International Conference on Image Processing, Vol. 2, pp. 287-290, IEEE, Chicago, IL, USA, 1998. 11. R. Claypole, G. Davies, W. Sweldens and R. Baraniuk, “Non linear wavelet transforms for image coding using lifting”, Proc. 31st Asilomar Conference on Signals, Systems & Computers, Vol. 1, pp. 662-667, IEEE Comp.Soc., Los Alamitos, CA, USA, 1998.

Suggest Documents