Lossy/lossless Region-of-interest Image Coding Based ... - IEEE Xplore

20 downloads 0 Views 714KB Size Report
Abstract. We have incorporated a Region-of-Interest (Rol) coding ficnctionality into Said and Pearlman's SPIHT coding with integer transforms. By placing a ...
Lossy/Lossless Region-of-Interest Image Coding Based on Set Partitioning in Hierarchical Trees

'

Eiji Atsumi and Nariman Farvardin 'Information Technology R&D Center, Mitsubishi Electric Corp., Japan 'Department of Electrical Engineering and Institute for Systems Research, University of Maryland, College Park, MD 20742 E-mail: '[email protected], [email protected] of increased computational complexity or reduced overall rate-distortion performance. It is important to mention that the combination of the above three features provides the capability to reconstruct the RoI (with a desired level of fidelity, possibly losslessly) faster than the rest of the image, i.e., with a smaller number of transmitted bits compared to the case where the entire image is treated with the same priority. In an image communication system, such as in transmission over the Internet or in telemediciine applications, this will enable the user to terminate transmission as soon as the RoI is reconstructed with a quality acceptable to the user, thus saving bandwidth (or time) and computational cost. Said and Pearlman have developed a simple and efficient embedded image coding system, which provides scalability by fidelity and resolution, based on Set Partitioning In Hierarchical Trees (SPIHT) concept[13. They have extended their studies to a system which utilizes integer-coefficient wavelet transforms, providing lossy and lossless compression within the same embedded bit strearn[2]. Frujka et a1.[3] and Nister and Christopoudos[4] have exploited the SPIHT to realize the RoI capability. In [3], RoI is reconstructed with a higher fidelity by a given priority, while losslessness under a given quantization scheme may be compromised. In [4], losslessness of the RoI is preserved, while the priority to the RoI is not flexibly specified. In this work, we will build upon the lossyflossless SPIHT algorithm[2], and through a very simple idea amounting to minimal complexity increase, incorporate a feature of priority controllable RoI coding without compromising the other desirable features such as scalability by fidelity and resolution, and lossless reconstruction of the overall image. In what follows, we describe the basic structure of the proposed IRoI coding methodology[5] in Section 2 and present a modified SPIHT algorithm incorporating RoI coding in Section 3. Section 4 includes numerical results on the system's rate-distortion performance and Section 5 includes concluding remarks.

Abstract We have incorporated a Region-of-Interest (Rol) coding ficnctionality into Said and Pearlman's SPIHT coding with integer transforms. By placing a higher emphasis on the transform coeficients pertaining to the RoI, the RoI is coded with higherfidelity than the rest of the image in earlier stages of progressive reconstruction, thus the "important" part of the image is reconstructed more quickly than the rest of the image. This method signijicantly saves transmission time and storage space by terminating encoding or transmission in situations where the RoI needs to be coded losslessly and the rest of the image visually losslessly (lossy). In our model, the RoI can be flexibly specified either in the beginning or in the middle of the encoding process (either on the original image or on the full- or low-resolution image reconstructed by the decoder), through interaction with the user at the transmitting or the receiving end. Also, the speed with which the quality of the RoI improves in progressive decoding is flexibly specijied by the user at either end. The proposed method is especially advantageous in an application where the image is browsed interactively, e.g. telemedicine..

1. Introduction In still image compression, not only is it essential for the coder to provide a good rate-distortion performance, but a number of other requirements are becoming increasingly important. Examples of such requirements are: (i) the ability to provide scalability by fidelity and resolution through embedded coding, (ii) the ability to provide lossy and lossless compression within the same encoding algorithm, and (iii) the ability to give higher priority to a Region-of-Interest (RoI), so that the RoI can be reconstructed with higher fidelity (possibly losslessly) than the rest of the image. Clearly, it would be desirable to incorporate the above mentioned features in an image coding system without incurring too heavy a cost in terms 87

0-8186-8821-1198 $10.00 0 1998 IEEE

influenced by the selection of the RoI by the decoder. Therefore, encoding needs to be done on-line, ie., interactive encoding.

2. ROI Coding 2.1. Modification on information ordering

2.3. Controllable reconstruction speed

Our proposed ROI coding is based on the SPIHT algorithm['], which generates an embedded bit stream through N+1 stages of successive quantization. In the absence of an RoI, let the encoder's output bit sequence of each stage, be denoted by SO, SI ,..., SN. Then, roughly speaking, these bit sequences are organized in such a way that SO consists of the most important bits in terms of reducing the overall mean squared error (MSE), s1 consists of the next most important bits in terms of reducing the MSE, and so on. In our proposed system, as soon as the RoI is identified, the order in which the encoder outputs are transmitted is modified so as to place more emphasis on the RoI, Le., an embedded bit stream is generated in such an order that, in progressive decoding of the image, the RoI is refined earlier than the rest of the image. If we denote portions of the bits in each stage's bit sequence pertaining to the RoI as so(RoI), sl(RoI),. .., and the portions of the bits for the rest of the image as &rest), sl(rest), a simple example of the information ordering with RoI coding would be so(RoI), sl(RoI),. .., sN(RoI), so(rest), sl(rest),..., sN(rest), whereas the ordering with non-RoI coding would be so(RoI+rest), sl(RoI+rest),..., sN(RoI+rest).

To achieve our objective of placing a higher emphasis on the RoI, as soon as the RoI is identified in the image domain (either before the encoding has begun or during the encoding process, i.e., either on the original image or on the full or lower resolution images under progressive decoding), only the integer-valued wavelet transform (WT) coefficients which correspond to the RoI (named RoI coefficients) are scaled up through a fixed number of left bit-shifts (each left shift corresponds to scaling up by a factor of two) in each subband. As a result, during the encoding process, based on the successive quantization used in SPIHT, the RoI coefficients would be encoded earlier than they would have been, had they not been scaled up by left bit-shifts. Clearly, the larger the number of left shifts, the higher will be the emphasis placed on the RoI coefficients and the more noticeable will be the speed-up of the RoI reconstruction. Therefore, in practice, not only can the decoder select the RoI, but also it can dictate the speed with which the RoI should be reconstructed (or, equivalently, the amount of additional emphasis the RoI should receive vis-a-vis the rest of the image). To describe the proposed modification of the SPIHT algorithm, let us use the index rz to denote the n-th stage of successive quantization, n=O, I, ..., N (in the SPIHT coding algorithm, stage n corresponds to a combination of the sorting pass and the refinement pass of the significance threshold d").Therefore, it takes N+1 stages to encode the entire image losslessly. As shown in Fig. 1, suppose after completing P (P=O, ..., or N) stages of the encoding algorithm and transmitting the resulting encoder output, the encoder or the decoder identifies the Rol and the RoI coefficients are left-shifted by S (S=O,l, ..., N+1P) bits (P=O is a legitimate case corresponding to when the RoI is specified in the beginning of encoding). Large values of S result in a speedy lossless reconstruction of the RoI; lower values of S give rise to smaller speed-up but lead to a better reconstruction of the rest of the image (better overall rate-distortion performance) when the RoI is losslessly reconstructed. This tradeoff can be controlled by the user (through S) depending on the importance helshe places on the RoI relative to the rest of the image.

2.2.On-line/off-lineencoding The identification of the RoI can be done either (i) before the encoding process has begun or (ii) during the encoding process. In the first case, it is the user at the transmitting end who decides what areas of the image should constitute the RoI (or, which parts of the image should be considered important). In this case, the decoder does not interfere with defining the RoI and hence encoding can be done off-line. In the second case, initially, the encoder places no priority on any part of the image and transmits an embedded bit stream to the receiver as does the Said and Pearlman's SPIHT algorithm['1~[21. In the receiving end, the user views the sequence of refined images based on the incoming stream of bits (or, packets), i.e., images under progressive reconstruction, and at some point when helshe has received enough information, corresponding to a certain stage of successive quantization or "significance threshold" in the SPIHT algorithm, it identifies the RoI (those parts of the image helshe considers more important for the rest of the transmission). The coordinates of the RoI and its priority will then be fed back to the encoder and the encoding algorithm will be modified so as to give a higher priority to the RoI for the rest of the encoding process. In this case, the behavior of the encoder is

3. Proposed Coding Algorithm The modified SPIHT encoding algorithm, which incorporates RoI coding, works as follows.

88

101 coeffs are not sealed

-N -

a) Modified Sorting Pass: a. 1) for each entry (ij) in the LIP do: # if (id) belongs to the RoI coefficients do: a. 1.1) output Sn(i,j); a. 1.2) if Sn(i,j)=l then move (ij) to the LSP and output the sign of cij; a.2) for each entry (ij) in the LIS do: a.2.1) if the entry if of type A then .. output Sn(D(ij)); .. if Sn(D(i,j))=l then - for each (k,l) belongs to O(i,j) do: # if(k,l) belongs to the RoI coefficients

UD.

e overall WT

-

I+l-P

I

Rol eoeffs are sealed up.

I

“4

do:

- output Sn(k,l); - if Sn(k,l)=l then add (k,l) to the LSP and output the sign of ck,l; - if Sn(k,l)=O then add (k,l) to the end

of the LIP;

- if L(ij)!=O then move (ij) to the end of

Figure 1. Rol coding: reconstruction speed control. Rol coding starts at stage P and the Rol coefficients are left-shifted by S bits; maximum significance threshold is N.

the LIS as an entry of type B, and go to Step a.2.2); otherwise, remove entry (id) from the LIS; a.2.2) if the entry if of the type B then (the same as in the [ 11); b) Modified Refinement Pass: for each entry (ij) in the LSP, except those included in the last sorting pass (i.e., with same n), # ij“( i j ) belongs to the RoI coefficients do: - output the n-th most significant bit of Icijl. c) Quantization-StepUpgrade: decrement n by 1 and go to Step a) if n>=N-P.

1) Calculate the wavelet transform of the image and scale up the WT coefficients to be approximately unitary without increasing each coefficient’s actual bit depth (The transform does not have to be an integercoefficient transform. When non-integer transform is used, the RoI and overall image are reconstructed up to the highest quality with a given quantizer, instead of losslessly.) 2) Start encoding the WT coefficients as in [ 11 (n=N). 3)As soon as the shape of the RoI is available to the encoder (the shape of the RoI is given to the encoder in the beginning of the encoding process or after the decoder has reconstructed a rough replica of the image in the receiver end), interrupt Step 2) (e.g., in the middle of the P-th stage as shown in the Fig. 1 and perform the following: 3.1) Identify all the WT coefficients (RoI coefficients) that are necessary and sufficient for the lossless reconstruction of the RoI. 3.2)Specify the relative “importance”, S, of the RoI compared to the rest of the image: very important, somewhat important, etc. 3.3) Encode the RoI coefficients from the beginning of the (P+l)-st stage, i.e., significance threshold n=N-P, as follows (the following 2 steps are shown in Fig. 1): Scale up all the RoI coefficients by S left bit-shifts. Increase the significance threshold to n=N+S-P. Exclusively encode the RoI coefficients for S consecutive stages (when S=N+l-P, i.e., highest relative importance in this case, every bit in the RoI coefficients is fully encoded by the end of this exclusive process):

Steps starting with ## are the modification to the ;PIHT algorithm[l] in order to incorporate the RoI apability. The rest of the steps and the notations are h e same as in the [ 11. ’

4) Resume encoding the overall WT coefficients as in [l] until the last stage when the significance threshold is n=O. Rol is losslessly reconstructed at an earlier stage correspoinding to n=S, which is S stages earlier than the lossless reconstruction of the overall image. The decoding algorithm of the modified SPIHT can be deduced from the encoding algorithm in the same manner as the decoding algorithm of the SPIHT algorithm1’] except that the RoI Coefficients are scaled down by S right bit-shifts. Recall that in the encoding process, some WT coefficients are scaled up in order to approximate a unitary transform. Therefore, a few of the least significant bits of these Coefficients are uniquely equal to zero. The encoding and decoding algorithm should disregard these all-zero bit in the coding process.

-

89

that the bit rate and the PSNR at which the RoI is losslessly reconstructed is controlled by the value of S rather than P. However, the value of P determines the range of choices of S: the earlier the RoI coding starts, the wider would be the range of the values of S, thus allowing more flexibility in controlling the RoI reconstruction speed. Fig 6 shows the rate-distortion penalty associated with the inclusion of the RoI coding into the Said and Pearlman’s SPIHT algorithm (S&P). Clearly, all the coding schemes are identical to the S&P up to the bit rate when RoI coding starts. Once the RoI coding starts, the larger the value of S , the more encoding stages will be focused on the RoI coefficients. As a result, reconstruction of the RoI will be speeded up and the PSNR of the whole image will be compromised during the exclusive RoI reconstruction. The larger the value of S, the greater is the PSNR loss (with S=7, encoding does not move on to the overall WT coefficients until the RoI is losslessly reconstructed). If S is small, fewer number of stages will be focused on the RoI and the coding process quickly goes back to the overall WT coefficients. Thus, for small S , the overall rate-distortion performance would be closer to the S&P’s performance. Losslessness of the whole image is achieved at a slightly increased bit-rate compared to that obtained by S&P. The increase is generally larger with the larger S (S=7, 4 and 2 under P=7 give rise to lossless reconstruction of the whole image at 4.393, 4.362 and 4.340 bpp, respectively. Cf. 4.318bpp with S&P’s SPIHT[’]). All the above results show that the proposed RoI coding methodology provides a simple and flexible mechanism for embedded, lossy to lossless (RoI lossless to the whole image lossless) image coding system.

4. Results Numerical results are obtained to evaluate the proposed RoI coding methodology. The integercoefficient transform used in this simulation is the socalled 2-10 or TT transform[6]. Five levels of discrete wavelet decomposition are employed amounting to 16 subbands. Each subband is appropriately scaled, as in [2], to approximate a unitary transformation leading to a maximum significance threshold N at most 13 bits for 8bit grey level images. The test image used to collect the following results is the 5 12x512 Girl. In this example, the RoI is specified as the 125x125 square shaped region containing the face of the girl as marked in Figs. 2 and 3. Fig. 4 shows that, under a given P, the larger the left bit-shift, S, the quicker is the speed of lossless reconstruction of the RoI (number of bits per pixel needed by the decoder to achieve losslessness on the RoI) and the slower is the PSNR improvement of the whole image in progressive decoding. In this example, RoI coding starts at 0.083 bpp (P=7) where the SPIHT algorithm achieves an overall PSNR of 28.77 dB (Fig. 2). If the highest priority is placed on the RoI (Le., S=7), lossless reconstruction of the RoI will be achieved with an overall bit rate of 0.454 bpp, resulting in an overall PSNR of 29.35 dB (Fig. 3). A modest priority (e.g., S=4) achieves lossless reconstruction of the RoI at 1.115 bpp and leads to an overall PSNR of 39.32 dB, while a low priority (e.g., S=2), achieves lossless reconstruction of the RoI at 3.052 bpp and leads to a PSNR of 48.17 dB. For S=O (no priority), or no RoI specified, lossless reconstruction of the entire image is achieved at 4.318 bpp. These results show that the bit rate needed for lossless reconstruction of the RoI in this example can be reduced by as much as one order of magnitude (4.3 18/0.454) compared to the rate needed for lossless reconstruction of the whole image. Also, lossless on the RoI and “visually lossless” on the rest of the image can be achieved at a bit rate about 4 times smaller than what is needed for lossless reconstruction of the overall image (4.318D.115). Fig. 5 illustrates that the value of S plays a more important role than the value of P in determining the ratedistortion performance of the overall image after the RoI is losslessly reconstructed. This figure shows that, regardless of the value of P, as long as the same left-shift value S (in this example S=5) is used, the rate-distortion performance of the whole image converges approximately to the same point when the RoI is losslessly reconstructed. Although a smaller value of P (i.e., earlier start of RoI coding) gives rise to a slightly faster lossless reconstruction of the RoI (P=5,7, and 8 gives RoI lossless at 0.702bpp, 0.732bpp and 0.773bpp, respectively), the overall PSNR at which the RoI is losslessly reconstructed is almost the same (about 35.7dB). These results show

5. Conclusions In this paper, we have incorporated a Region-ofInterest (RoI) coding functionality which modifies the information ordering of the SPIHT with integer transform so as to place a higher emphasis on the RoI. This is achieved without compromising the algorithm’s scalability by fidelity and by resolution, and lossless reconstruction of the RoI and the whole image. The greatest advantage of our proposed RoI coding is to provide a quicker feel of the “important” part of the image in a progressive transmission mode. This image coding system mitigates a bottleneck that arises in Internet browsing and telemedicine applications by reducing the transmission time.

90

--8

7.References

40.0 311.0 36.0 .E 34.0 $ 32.0 f 30.0 28.0 26.0 24.0 5 221.0

4

[l] Said and Pearlman, “A New, Fast, and Efficient Image Codec Based On Set Partitioning in Hierarchical Tree“, IEEE. Trans. CSVT, Vo1.6, pp. 243-250, June 1996. [2] Said and Pearlman, “An Image Multiresolution Representation for Lossless and Lossy Compression,“ IEEE. Trans. IP, Vo1.5, pp. 1303-1310,Sept. 1996.

2

[3] Frajka, Shenvood, and Zeger, “Progressive Image Coding with Spatially Variable Resolution”,IEEE. ICIP-97.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Overall bit-rate (bpp)

[4] Nister and Christopoulos, “Progressive lossy to lossless coding with a Region of Interest using the Two-Ten integer wavelet”,ISOLEC JTCl/SC29/WGl N744, March 1998.

Fig. 4 Fieconstruction speed of the Rol with different values of S.

[5] Atsumi and Farvardin, “Lossyhssless Region-of-Interest coding”, ISODEC JTCl/SC29/WGl N792, March 1998. [6] Boliek and Zandi, “CREW Image Compression Standard”, ISOLEC JTCl/SC29/WGl N616, Oct. 1997.

h

8

40.0 38.0 2 36.0 .E 34.0 4z 32.0 iz 30.0 28.0 5 LC 26.0 24.0 6 22.0

v p)

- .+

- -8,

s=5

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Overall bit-rate (bpp)

Fig. 5 Reconstruction speed of the Rol when Rol coding starts at different stage of successive quantization.

Fig 2. The Rol (125x125 face of the girl) is identified in the middle of encoding (P=7). PSNR of the whole image: 28.77 dB; bit-rate: 0.083 bpp.

h

-4

44.0

5

24.0

42.0 40.0 .-E 38.0 -o 36.0 2 34.0 32.0 5 30.0 28.0 26.0

n

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Overall bit-rate (bpp)

Fig. 6 Rate-distortion performance of the Said and Pearlman’s SPIHT algorithm and its modified version for Rot codinlg.

Fig 3. Lossless reconstruction of the Rol in Fig. 3 with P=7 and S=7. PSNR of the whole image: 29.35 dB; bitrate: 0.454 bpp.

91