596
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 11, NO. 6, JUNE 2002
Optimization and Implementation of the Integer Wavelet Transform for Image Coding Marco Grangetto, Enrico Magli, Member, IEEE, Maurizio Martina, and Gabriella Olmo, Member, IEEE
Abstract—This paper deals with the design and implementation of an image transform coding algorithm based on the integer wavelet transform (IWT). First of all, criteria are proposed for the selection of optimal factorizations of the wavelet filter polyphase matrix to be employed within the lifting scheme. The obtained results lead to IWT implementations with very satisfactory lossless and lossy compression performance. Then, the effects of finite precision representation of the lifting coefficients on the compression performance are analyzed, showing that, in most cases, a very small number of bits can be employed for the mantissa keeping the performance degradation very limited. Stemming from these results, a VLSI architecture is proposed for the IWT implementation, capable of achieving very high frame rates with moderate gate complexity. Index Terms—Factorization, finite precision filters, image compression, integer wavelet transform, lifting architecture, lifting scheme.
I. INTRODUCTION
T
HE compression of still images is a crucial task in a growing number of applications, not only general purpose—such as videoconferencing, HDTV and Internet browsing, but also related to multi/hyperspectral or SAR images for remote sensing [1]–[3]. These applications dictate severe requirements to the compression algorithm employed, in terms of the rate-distortion performance. For example, in the field of remote sensing, the very high resolution of the acquired images, along with the need for transmission over bandlimited channels, often impose lossy compression also for applications critical from the point of view of data interpretation [1], [4]; clearly, in this case, the distortion must be carefully controlled. As a consequence, many research efforts have been recently spent toward the definition of a universal compression algorithm [3], suitable for heterogeneous images. The policy of a transparent encoding, independent of the image origin and characteristics, perfectly fits the concept of information distribution over integrated service networks; on the other hand, this poses new challenges to the compression algorithms, which must be able to yield adaptive tradeoffs between rate and distortion, possibly up to the lossless level. The availability of a comprehensive theoretical framework [5]–[8] greatly helps to find suitable solutions to this issue.
Manuscript received February 29, 2000; revised February 28, 2002. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Touradj Ebrahimi. The authors are with the Dipartimento di Elettronica, Politecnico di Torino, 10129 Torino, Italy (e-mail:
[email protected];
[email protected];
[email protected];
[email protected]). Publisher Item Identifier S 1057-7149(02)04795-4.
The discrete wavelet transform (DWT) is widely acknowledged to feature excellent decorrelation properties (see [9] and [10] and references therein). The DWT exhibits excellent lossy compression performance, and has been selected for the new standard JPEG 2000 [3]. In fact, followed by efficient encoders which exploit the intra and interband residual correlation [11]–[17], and suitable rate-distortion optimization techniques [7], [8], the DWT allows to form finely scalable bitstreams [3], [12]. On the other hand, DWT-based encoders cannot achieve genuine lossless compression, due to the limited precision of computer arithmetic. Some papers in the recent literature [18], reporting lossless compression performance of DWT-based encoders for medical images, actually refer to quasi lossless compression, i.e., peak signal-to-noise ratio (PSNR) values as high as 60 dB. These methods employ a real valued transform, and devote a very large number of bits for the representation of both the filter coefficients and the partial multiply-accumulation results; therefore, in the case of hardware implementation, the quasi lossless performance is achieved at the expenses of noticeable complexity. A more efficient approach to lossless compression is the use of integer transforms, such as the integer wavelet transform (IWT) [19]–[26] or the integer discrete cosine transform [27], [28]. The transform coefficients exhibit the feature of being exactly represented by finite precision numbers, and this allows for truly lossless encoding. The lossless and lossy image compression performance of the IWT have already been reported in the literature [21], [29]–[31]; it is recognized that, in the case of lossy compression of general purpose images at rates below 0.5 bpp, the DWT outperforms the IWT of about 0.5 dB in PSNR. This is the price to be paid in order to achieve the lossy/lossless option. Recently, the lifting scheme (LS) [32] has been introduced for efficient computation of the DWT. Its main advantage with respect to the classical filter bank structure lies in its better computational efficiency and in the fact that it enables a new method for filter design. Moreover, the IWT can be computed starting from any real valued wavelet filter by means of a straightforward modification of the LS [21]. Therefore, the LS represents a distinguished choice for the implementation of encoders with progressive lossy-to-lossless compression capabilities, providing a common core for computing both the DWT and the IWT. Although the LS has been widely studied in the literature, at present some topics have not been satisfactorily deepened, relevant from both the theoretical and the implementative points of view. In particular, it is well known [32] that the factorization of the polyphase matrix is not unique. In this paper, we address
1057-7149/02$17.00 © 2002 IEEE
GRANGETTO et al.: INTEGER WAVELET TRANSFORM FOR IMAGE CODING
the issue of finding optimal factorizations of the polyphase matrix, comparing various selection criteria, and achieving very efficient IWT implementations. Moreover, we evaluate the effect of finite precision representation of the lifting coefficients, showing that in most cases a reduced number of bits can be employed for the mantissa without relevant performance degradation. Based on these results, we propose an architecture for the dedicated VLSI implementation of the IWT. The paper is organized as follows. In Section II, we detail problems related to the IWT design and implementation. In Section III, the LS and issues concerning the factorization of the polyphase matrix are reviewed, the novel proposed methods for finding the best factorizations are described, and results are given. In Section IV, the effect of finite computer representation is analyzed, whereas in Section V, a novel architecture for hardware implementation of the IWT is proposed. Finally, in Section VI, conclusions are drawn and further research developments proposed.
II. MOTIVATION A. Filter Factorization Unlike the DWT, due to the nonlinear operations involved, the IWT performance is heavily dependent on the chosen factorization of the polyphase matrix of the wavelet filter, for both lossless and lossy compression [30], [31], [33], [36]. This raises the need for criteria for the selection of the factorization which achieves the best results on the recovered images given a suitable error measure. The trivial solution is to try all possible factorizations, seeking the one which yields the best result, according to a selected error measure (e.g., PSNR) over a set of test images. This approach has some major drawbacks. First of all, the IWT compression performance depends on the input images [30], [36], making the result of the search too dependent on the data set; moreover, the exhaustive search would be computationally burdensome due to the complexity of the error metric. More efficient solutions could be devised, based on simple, input-independent metrics. In [36], the ratio between the lifting coefficients of maximum and minimum magnitude has been suggested as a possible criterion, even if no specific factorization has been proposed based on this choice. In [33], another criterion has been introduced, based on a measure of minimum nonlinearity of the iterated graphic functions. The technique, which is computationally tractable and independent of the input data, only relies on the characteristics of the wavelet filter, and leads to IWT implementations featuring excellent performance. In Section III, the minimum nonlinearity criterion is recalled and validated by comparisons with other suitable criteria, i.e., the minimization of the number of lifting steps and the achievement of the closest-to-one normalization constant. B. Finite Precision Representation An issue of considerable practical interest is the robustness of the wavelet transform, implemented by means of the LS, with respect to numerical errors in the representation of both the
597
lifting coefficients [i.e., the coefficients of the LS filters, whose zeta-transforms are denoted in the following as and ] and the partial results. These considerations can drive specific hardware design, leading to implementations with optimized tradeoff between performance and complexity/area. Moreover, the robustness of the lifting coefficients can affect the selection of the wavelet filters, as the performance of the most popular wavelets proposed in the literature [3], [34] can heavily depend on the coefficient representation. As a matter of fact, when the effects of finite precision representation are taken into account, the best ranked filters can be outperformed by other ones, which exhibit features not usually considered. For example, preliminary results reported in [37], and mainly focused on the DWT, show that filters whose factorization exhibits rational coefficients are more robust with respect to numerical errors; hence, they should be preferred if hardware efficiency must be taken into account (see [38] as to the filter bank case). The robustness of the filter coefficients employed for the IWT implementation is investigated in Section IV. C. Architecture A VLSI implementation of a complete progressive lossy-to-lossless image encoder represents a very challenging issue; not only typical consumer electronics, e.g., digital cameras, but also all those applications characterized by high rate, variable rate-distortion requirements could benefit of high efficiency and low cost hardware implementations. In order to make feasible wavelet based applications at high frame-rate, it is necessary to explore specialized hardware realizations for the computation of the IWT, which can be very demanding as for hardware resources (such as on-chip area). Many VLSI architectures have been proposed in the literature for the filter bank scheme implementation [18], [39]–[43] all based on multiply and accumulate (MAC) units, and memory modules to store partial results. However, to the best of our knowledge, no architecture has been proposed yet for the LS implementation. In Section V an optimized VLSI architecture for the IWT implementation via LS is proposed, designed on the basis of the results obtained by the theoretical and simulative study described in Sections III and IV, and leading to an optimal allocation of hardware resources. III. LIFTING SCHEME AND THE FACTORIZATION POLYPHASE MATRIX
OF THE
In its basic form, the LS computes the DWT by means of a sequence of primal and dual lifting steps, as described in the following and reported in the block diagram of Fig. 1; the inverse transform is achieved performing the same steps in reversed order [32]. is The polyphase representation of a discrete-time filter defined as (1) and are respectively obtained from the even where , where denotes and odd coefficients of
598
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 11, NO. 6, JUNE 2002
Fig. 1. Block diagram of the DWT using the LS.
the zeta transform. The synthesis filters and (lowpass and highpass respectively) can thus be expressed through their polyphase matrix
and can be analogously defined for the analysis filters. , , and , along with their The filters analysis counterparts, are Laurent polynomials. As the set of all Laurent polynomials exhibits a commutative ring structure, within which polynomial division with remainder is possible, long division between two Laurent polynomials is not a unique operation [32]. The Euclidean algorithm [32] can be used to decompose and as
As this factorization is not unique, several pairs of and filters are admissible; however, in case of DWT implementation, all possible choices are equivalent. As for computational complexity, it has been shown in [32] that using the LS instead of the standard filter bank algorithm, can asymptotically lead to a relative speed-up up to 100% for long filters. For example, the LS-based implementation of the DB(9, 7) filter [10] is about 60%, and of the (13, 7) [46] about 70% faster than their filter bank counterparts. The IWT can be very simply achieved rounding off the output and filters right before adding or subtracting of the [21]; the rounding operation introduces a nonlinearity in each filter operation. As a consequence, in the IWT the choice of the factorization impacts on both lossless and lossy compression, so making the transition from the DWT to the IWT not straightforward. A. Criteria for the Choice of the Best Factorization Method I—Minimally Nonlinear Iterated Graphic Functions: Let us define the -times iterated graphic function [44] for where is the impulse response of the -times iterated synthesis bandpass filter. It is known that the limit converges to the synthesis wavelet , and, as the recovered
signal is obtained as a weighted sum of scaled and translated versions of , the lossy performance of the DWT is related [10]. to the regularity degree of functions The iterated functions associated to the IWT have been experimentally verified to be less regular than the corresponding DWT ones, and different factorizations exhibit rather different levels of regularity [30]. Although the behavior of the IWT is not theoretically predictable due to the nonlinearity involved, the regularity of the integer iterated graphic functions has been experimentally verified to affect the IWT performance, and therefore represents a sound criterion for the selection of proper factorizations. It is worth noticing that the iterated graphic functions depend on the input signal [30], and are very irregular for a unit input ), whereas they tend to become smooth, impulse ( and equal to their DWT counterparts, for amplified impulses with ). So, if we define as ( the -times graphic function associated to the th factorization, ,a achieved iterating the response to an amplified input simple measure of nonlinearity of the th implementation can be defined as the following norm:
The interpretation is that if the th factorization needs a high value of in order to achieve a good smoothness of the -times will assume a high value; iterated impulse response, then therefore, this norm can be effectively used to discard those factorizations which lead to highly nonlinear implementations, and eventually to seek the minimally nonlinear one. In [33], the usefulness of this method has already been tested on a set of popular wavelet filters. Besides being independent of the IWT input signal, the is computationally inexpensive, thus leading error norm to a tractable algorithm for factorization optimization. On the other hand, it must be noticed that this norm is not additive on a tree structure where the root node contains the quotients of all the possible ways in which the first polynomial division can be performed, and the -level nodes of subsequent divisions; in all possible quotients other terms, it is not possible to associate partial error metrics norm to each of the parent–child transitions. based on the Although this prevents from using dynamic programming techniques, we emphasize the fact that the low complexity of makes the exhaustive search tractable in the evaluation of any practical case. Method II—Minimum Number of Lifting Steps: A simple criterion, driven by completely different considerations, is to select factorizations characterized by the minimum number of lifting steps. An intuitive justification of this method is that a reduced number of lifting steps implies a reduced number of rounding operations, resulting in a less nonlinear transforms. Obviously, a small number of lifting steps has direct implication in both software and hardware implementation in terms of speed, memory, chip area. Method III—Closest-to-One Normalization Constant: Another relevant issue able to affect the factorization choice is the final scaling operation [45] performed within
GRANGETTO et al.: INTEGER WAVELET TRANSFORM FOR IMAGE CODING
599
TABLE I BEST FACTORIZATIONS OF THE POLYPHASE MATRIX FOR THE (9, 7) AND (6, 10) FILTERS
the LS framework (see Fig. 1). In order to perform an integer can be implemented reversible transform, the scaling factor performing three extra lifting steps, or simply omitting the normalization [21], so reducing the overall complexity; in this latter case, as Calderbank et al. pointed out [21], it is be as close to 1 as possible, exploiting the preferable that nonuniqueness of the lifting factorization. Consequently, the search for factorizations with scaling factor close to one turns out to be another important method to be investigated. The effect of normalization removal on compression performance will be discussed in Section III-B. It is worth noticing that, whereas Method I always yields a , unique solution as it is based on the real valued metric this is not the case of Methods II/III which can provide multiple solutions. In case this happens, it is necessary to resort to further discrimination criteria. In the following, the regularity criterion has been employed as secondary level decision for both Methods II and III unless otherwise stated. B. Results In Table I, the best factorizations obtained respectively with Methods I and II are reported for the popular (9, 7) [10] and of the error (6, 10) filters [34], along with the value metric for the best factorization. The PSNR values achieved in the compression at 0.5 bpp of the standard Lena image using the IWT followed by the state-of-the-art SPIHT encoder [12], [19] are also reported. This algorithms has been selected for its excellent performance, tractable computational complexity, and for its capability of yielding progressive transmission; in
our simulations, we have employed binary unencoded SPIHT, i.e., followed by neither Huffman, nor arithmetic coding. The simulations have been performed with and ; a discussion on the proper choice of these parameters can be found in [30] and [31]. It can be noticed that Method I leads to factorizations with the best IWT lossy compression performance. As for the (9, 7) filter, the factorization yielded by Method II has a performance 0.1 dB inferior to the former. Both solutions yield the same value of , which has turned out to be the closest possible to unit. Actually, for the (9, 7) filter, Method III selects either factorization depending on whether the minimum nonlinearity or the minimum number of steps is chosen to discriminate among equivalent solutions. In other cases [e.g., for the (6, 10) filter] the three methods provide the same solution. Factorizations selected by Method I for filters other than the (9, 7) can be found in [33], and are omitted here for conciseness. Another remarkable property, which can be deduced by the results in Table I, is that minimally nonlinear factorizations of and filters which exhibits the DB(9, 7) filter have linear phase at each step. This result holds in general for factorization obtained with Method I; this validates the suitability of the proposed method, and agrees with Sweldens’ considerations about symmetry-preserving filters [32]. Another notable result is that the best factorizations are generally those whose filter coefficients have a low dynamic range; as a rule of thumb, this is related to a non-unbalanced distribution of the nonlinear effects, and has been suggested in [36] as possible error metric. From the obtained results, it turns out that none of the achieved factorizations has the normalization constant close
600
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 11, NO. 6, JUNE 2002
TABLE II
IWT PERFORMANCE WITH
K NORMALIZATION PERFORMED OR OMITTED
to one. As having equal to one would avoid the need for any normalization, thus very positively impacting on the algorithm complexity, it is worth investigating the effect on the SPIHT is different encoder, of omitting this normalization even if from 1. In Table II, this effect is shown for a set of standard filters, for both the lossy and lossless case. For lossy com) pression, the impairment in PSNR performance ( is reported for the image Lena encoded at 0.5 bpp. It can be seen that, even for the factorizations selected by Method III ( closest to 1), omitting the normalization leads to a significant performance degradation up to 5 dB. This is due to the fact that the transform is no more unitary, and the information content of each coefficient is no longer directly related to its magnitude; this is particularly harmful for encoders with rate allocation based on bitplanes, such as SPIHT. On the other hand, as shown in the last two columns of Table II, the lossless performance has turned out to be almost independent of the normalization being performed or omitted. Even if it is possible to circumvent by renormalizing after the complete subband the effect of decomposition, in general this could preclude the progressive lossy-to-lossless option, as the coefficients after normalization would not be integer any more. The results in Table II are obtained using the classical SPIHT algorithm, and are meant to measure the performance degradation due to omitting the normalization. A modification of the rate allocation in SPIHT could be devised, which accounts for the subband dynamic range, thus reducing the performance gap; however, this amounts to designing a different encoder, which goes beyond the scope of this paper. Finally, in Table III, the performance of the proposed factorizations, achieved by Method I and implemented with the normalization, are compared for the IWT and DWT, on a number of standard filters and test images, the best results for each image are emphasized with boldface. It can be noticed that the best factorizations exhibit IWT performance which is very close to the DWT one, especially at low bit rates. It is worth mentioning that this gap increases as the bit rate grows; this is due to the fact that, in this case, the nonlinear effects of the integer transform are dominant with respect to the quantization error. At last, it can be seen that, in the DWT case, the (9, 7) filter nearly always achieves the best performance, as well known from [34]. Not surprisingly, the IWT filter ranking is different, and depends on the linearity of the employed factorization, according to the measure defined for Method I. It turns out that the best filters for IWT implementation are the (13, 7) [46] and . (6, 10), which exhibit a very low value of
to finite precision representation, as this strongly impacts on hardware design. Unlike the DWT, where the need for high precision representation of partial transform results1 raises problems of storage and complexity, in the IWT these results are integer numbers. Therefore, no storage problems occur, whereas complexity is limited to the convolution kernel, implemented through MAC units. As a consequence, it is important to evaluate the precision to be devoted to these units in order to achieve performance satisfactorily close to the floating point implementation. To this end, a selection of the best filters proposed in the literature has been employed, namely the Villasenor ones [34] and some filters recently proposed for the forthcoming JPEG2000 standard. The optimal factorizations, in the sense of the Method I presented in Section III-A, have been used. It is worth noticing that, in principle, one could attempt at embodying the effect of finite precision arithmetic directly in the search for the optimal fixed-point factorization. However, the convergence of the iterated graphic function computed on the quantized filters is by no means guaranteed. In practice, we have found that the joint approach yields the same results as the separable one, so that the most linear factorizations are also the best ones in fixed-point arithmetic. The bit assignment is as follows. The partial transform results are represented with a number of bits, which is kept fixed to a value dictated by the need of accommodating the transform coefficient overall dynamic range; therefore, it is dependent on both the input signal representation (typically 8 bpp) and the number of decomposition levels. For 256 gray level nat(13 respectively) for ural images, typical values are a two (five respectively) level decomposition. Also, we have found that the optimal lifting coefficients for the considered filters can be represented using 2 bits for the integer part. Their mantissa is represented on a number of bits, which turns out to be the most critical and significant parameter, as it heavily affects hardware design, and strongly impacts on both performance and complexity. Given this bit assignment, the precision bits. of each MAC unit must be of In Table IV, the rate-distortion performance is reported for ), and fixed point the IWT with floating point precision ( precision, using the same coding scheme as in Section II-B. The lossless compression results are reported in terms of achieved bit rate, compared with the first order Shannon source entropy [9]. Several comments are possible; first of all, the value of required in order to achieve performance very close to the floating point one is the same for both the lossless and the lossy case. As for filters with nonrational coefficients [i.e., the (9, 7) and (6, 10) filters], it can be seen that a high value of is necessary in order to guarantee satisfactory performance (typically 8–10 bits). On the other hand, the required value of depends on the degree of linearity of the best factorization; for instance, the (6, 10) filter, which is more linear than the (9, 7) one [33], requires fewer bits in order to converge to the floating point performance. It can be also noticed that, in the case of lossless performance, it happens that the achieved rate is below the Shannon source entropy. This is due to the fact that the source cannot be considered as uncor-
IV. EFFECT OF FINITE COMPUTER PRECISION In this section, the performance of the IWT implemented via the LS is evaluated in the presence of numerical errors due
1In the following, the transform coefficients after each level of decomposition will be referred to as partial transform results, whereas the temporary convolution addenda will be referred to as partial convolution results.
GRANGETTO et al.: INTEGER WAVELET TRANSFORM FOR IMAGE CODING
601
TABLE III LOSSY PERFORMANCE OF THE BEST IWT FACTORIZATION (METHOD I) COMPARED WITH THE DWT, FOR A SET OF TEST IMAGES
related, so the first order entropy does not any more represent a lower bound to the rate. Moreover, it has turned out that rational filters [i.e., the (5, 3), (2, 6), (13, 7) ones], which also have rational lifting coefficients scaling), can be efficiently represented with (except for the a limited number of bits devoted to the mantissa, according to the results in [47] for the filter bank scheme. Such filters are therefore of great interest for hardware implementation where complexity constraints must be taken into account. In conclusion, the reported analysis on the robustness of the LS to numerical errors has put into evidence that efficient implementations of the IWT are indeed possible also in fixed point arithmetic. Moreover, the analysis has revealed that filters whose factorization coefficients are rational numbers require fewer bits; in hardware implementations this can improve the algorithm data processing rate due to a reduced number of operations for each multiplier. Furthermore, the reduced number of bits per filter coefficient positively impacts on both the chip area and the latency of such multipliers. V. ARCHITECTURE FOR THE IWT IMPLEMENTATION MEANS OF THE LS
BY
In this section, the basic concepts related to an architecture for VLSI implementation of the IWT by means of the LS are described. For a more detailed description and discussion about
the memory allocation and comparisons with a similar architecture implementing the filter bank scheme, the interested reader is referred to [48]. The LS has a modular structure that is well suited for VLSI implementation, and the kernel of the lifting processor is clearly represented by the single lifting step. In Fig. 2, the proposed scheme of the primary lifting step is represented; its major characteristic is the fact that it employs a series of MAC in a parallel configuration, with the advantage of a reduced latency. The filter coefficients enter the shift register and the results of mulare stored in the accumulators as tiplication with samples follows: MAC
MAC
MAC
.. .
.. .
.. .
After a number of MAC operations equal to the filter length the multiplexer can begin to pick up the filtered samples moving from the first to the last MAC (on the right in the scheme). The highest throughput is obtained if the number of MAC units is can equal to the filter length; in this case, the actual sample be read a single time from the memory and all its contributes
602
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 11, NO. 6, JUNE 2002
TABLE IV IWT PERFORMANCE FOR IMAGE LENNA VS. m, COMPARED WITH THE FLOATING POINT IWT (m )
=1
Fig. 2. Block diagram of primary lifting step; d and a are the i-level odd (high frequency) and even (low frequency) sample, h s represents the lifting filter.
=
to the filtering operation are computed in one clock period. The scheme of the lifting step is completed by the rounding oper, suitably ation and by the subtraction between the sample delayed, and the filtered coefficient. In Fig. 3, the complete scheme of the lifting processor is reported; the scaling operation is performed with three extra lifting steps. This scheme has been studied to accommodate all the most popular wavelet filters, whose factorization are reported in [33]. The proposed block diagram realizes the one-dimensional IWT and a simple control unit can reuse it in order to perform a separable two-dimensional IWT, transforming sep-
Fig. 3.
Proposed scheme of the lifting scheme processor.
arately columns and rows of an image. The main features of the proposed architecture are the low latency and the optimal memory access as any input sample is read a single time and can be replaced by the transformed sample without allocating memory for intermediate results (see [48] for details). The MAC precision has been optimized according to the results presented in Section IV. If irrational filters are employed
GRANGETTO et al.: INTEGER WAVELET TRANSFORM FOR IMAGE CODING
a large number of bit must be allocated for both the filters representation and the MAC units; in the case of (9, 7) or (6, 10) filters at least 11 bits must be allocated to the filters coefficients representation (2 bits for modulus and sign and 9 bits for the mantissa); therefore MAC units must be able to perform high precision multiplication limiting the speed of the system and increasing the chip area. Although, according to the preceding section there is a particular class of rational filters, i.e., (13, 7), (5, 3), and (2, 6), whose wavelet transform can be computed with a lower number of bits; employing these filters the LS processor speed and chip area can be improved resorting to faster MAC or even substituting MAC units with simple shift and accumulate operations. The complexity of the proposed architecture has been estimated around 54 000 equivalent gates; this leads to an area of about 3 mm for a 0.35 m technology. The architecture is able to achieve a frame rate of 25 frames/s with a maximum input frequency of 4.5 MHz for a 512 512 pixel 8-bit grayscale image. VI. CONCLUSIONS In this paper, some issues related to the design and implementation of an image transform coding algorithm based on the IWT are addressed. In particular, methods for the selection of the best factorization of wavelet filters within the LS framework are proposed and compared; the methods are based on the search for the minimally nonlinear iterated graphic functions associated to integer transforms, on the minimization of the number of lifting steps, and on the achievement of the closest-to-one normalization constant. Results are reported for a number of filters and test images and for both lossless and lossy compression, showing that the obtained IWT implementation achieve compression performance very close to the real valued DWT. A second aspect addressed is the evaluation of the effects of finite precision representation of the lifting coefficients and the partial results; the analysis has revealed that a very small number of bits can be devoted for the mantissa, with an acceptable performance degradation. Finally, a VLSI architecture has been proposed for the IWT implementation, and preliminary evaluations of the hardware complexity and frame rate are reported. ACKNOWLEDGMENT The authors would like to thank Prof. G. Masera of the VLSI Lab, Politecnico di Torino, for help in the definition of the LS architecture and for the useful suggestions about the manuscript. REFERENCES [1] M. Datcu and G. Schwartz, “Advanced image compression: Specific topics for space applications,” in Proc. 6th Int. Workshop Digital Signal Processing Techniques for Space Applications, Noordwijk, The Netherlands, Sept. 1998. [2] JPEG Image Compression Standard, docs. IS 10918-1, IS 10918-2, IS 10918-3. [3] ISO/IEC FCD 15 444-1V1.0. (1999, Dec.). [Online]. Available: www.jpeg.org/public/fcd15444-1.pdf. [4] B. Fiethe, P. Rüffer, and F. Gliem, “Image processing for Rosetta Osiris,” in Proc. 6th Int. Workshop Digital Signal Processing Techniques for Space Applications. Noordwijk, The Netherlands, Sept. 1998. [5] N. S. Jayant and P. Noll, Digital Coding of Waveforms. Englewood Cliffs, NJ: Prentice-Hall, 1984.
603
[6] R. C. Gonzales and P. Wintz, Digital Image Processing, 2nd ed. Reading, MA: Addison-Wesley, 1987. [7] T. Berger, Rate-Distortion Theory. A Mathematical Basis for Data Compression. Englewood Cliffs, NJ: Prentice-Hall, 1971. [8] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Processing Mag., vol. 15, pp. 23–50, Nov. 1998. [9] M. Vetterli and J. Kovaˇcevic´, Wavelets and Subband Coding. Englewood Cliffs, NJ: Prentice-Hall, 1995. [10] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding using wavelet transform,” IEEE Trans. Image Processing, vol. 1, pp. 205–220, Apr. 1992. [11] J. M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Processing, vol. 41, pp. 3445–3462, Dec. 1993. [12] A. Said and W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits, Syst., Video Technol., vol. 6, pp. 243–250, June 1996. [13] P. J. Sementilli, A. Bilgin, J. H. Kasner, and M. W. Marcellin, “Wavelet TCQ: Submission to JPEG-2000,” Proc. SPIE, July 1998. [14] M. J. Weinberger, G. Seroussi, and G. Sapiro, “LOCO-I: A low complexity, context-based lossless image compression system,” in Proc. 1996 Data Compression Conf.. [15] X. Wu and N. Memon, “Context-based, adaptive, lossless image coding,” IEEE Trans. Commun., vol. 45, pp. 437–444, Apr. 1997. [16] A. Munteanu, J. Cornelis, G. Van der Auwera, and P. Cristea, “A wavelet-based lossless compression scheme with progressive transmission capabilities,” Int. J. Imag. Syst. Technol., vol. 10, pp. 76–85, 1999. , “Wavelet image compression—The QuadTree coding approach,” [17] IEEE Trans. Inform. Technol. Biomed., vol. 3, pp. 176–185, Sept. 1999. [18] C. Chakrabarti, M. Viswanath, and R. M. Owens, “Architectures for wavelet transforms: A survey,” J. VLSI Signal Process., vol. 14, pp. 171–192, Dec. 1996. [19] A. Said and W. A. Pearlman, “An image multiresolution representation for lossless and lossy compression,” IEEE Trans. Image Processing, vol. 5, pp. 1303–1310, Sept. 1996. [20] A. Zandi, J. D. Allen, E. L. Schwartz, and M. Boliek, “CREW: Compression with reversible embedded wavelets,” in Proc. 1995 Data Compression Conf.. [21] R. C. Calderbank, I. Daubechies, W. Sweldens, and B. Yeo, “Wavelet transforms that map integers to integers,” Appl. Comput. Harmon. Anal., vol. 5, no. 3, pp. 332–369, 1998. [22] R. Claypoole, G. Davis, W. Sweldens, and R. Baraniuk, “Nonlinear wavelet transforms for image coding via lifting,” IEEE Trans. Image Processing, submitted for publication. [23] A. Bilgin, P. Sementilli, F. Sheng, and M. Marcellin, “Scalable image coding using reversible integer wavelet transforms,” IEEE Trans. Image Processing, vol. 9, pp. 1972–1977, Nov. 2000. [24] C. Lin, B. Zhang, and Y. Zheng, “Packed integer wavelet transform constructed by lifting scheme,” in Proc. ICASSP 1999. [25] S. Dewitte and J. Cornelis, “Lossless integer wavelet transform,” IEEE Signal Processing Lett., vol. 4, pp. 158–160, June 1997. [26] R. C. Calderbank, I. Daubechies, W. Sweldens, and B. Yeo, “Lossless image compression using integer to integer wavelet transforms,” in Proc. ICIP 1997. [27] W. Philips, “The lossless DCT for combined lossy/lossless image coding,” in Proc. ICIP 1998. [28] Integer Cosine Transform for Image Compression, TMO Progr. Rep. TDA PR 42-105, May 15, 1991. [29] F. Sheng, A. Bilgin, P. J. Sementilli, and M. W. Marcellin, “Lossy and lossless image compression using reversible integer wavelet transforms,” in Proc. ICIP 1998. [30] M. Grangetto, E. Magli, and G. Olmo, “Lifting integer wavelets toward linearity,” in Proc. 33rd Asilomar Conf. Signals, Systems, Computers, Pacific Grove, CA, 1999. , “Efficient common-core lossless and lossy image coder based on [31] integer wavelets,” Signal Process., vol. 81, no. 2, pp. 403–408, Feb. 2001. [32] I. Daubechies and W. Sweldens, “Factoring wavelet transforms into lifting steps,” J. Fourier Anal. Applicat., vol. 4, no. 3, pp. 247–269, 1998. [33] M. Grangetto, E. Magli, and G. Olmo, “Minimally nonlinear integer wavelets for image coding,” in Proc. ICASSP 2000. [34] J. D. Villasenor, B. Belzer, and J. Liao, “Wavelet filter evaluation for image compression,” IEEE Trans. Image Processing, vol. 4, pp. 1053–1060, Aug. 1995.
604
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 11, NO. 6, JUNE 2002
[35] M. D. Adams and F. Kossentini, “Reversible integer-to-integer wavelet transforms for image compression: Performance evaluation and analysis,” IEEE Trans. Image Processing, vol. 9, pp. 1010–1024, June 2000. [36] M. D. Adams, “Reversible wavelet transforms and their application to embedded image compression,” M.A.Sc. thesis, Univ. Victoria, Victoria, BC, Canada, [Online]. Available: http://www.ece.ubc.ca/~mdadams 1998. [37] M. Grangetto, E. Magli, and G. Olmo, “Finite precision wavelets for image coding: Lossy and lossless compression performance evaluation,” in Proc. ICIP 2000. [38] D. B. H. Tay, “Rationalizing the coefficients of popular biorthogonal wavelet filters,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, pp. 998–1005, Sept. 2000. [39] H. Y. H. Chuang and L. Chen, “VLSI architecture for fast 2D discrete orthonormal wavelet transform,” J. VLSI Signal Process., vol. 10, pp. 225–236, Aug. 1995. [40] M. Vishwanath, R. M. Owens, and M. J. Irwin, “VLSI architectures for the discrete wavelet transform,” IEEE Trans. Circuits Syst. II, vol. 42, pp. 305–316, May 1995. [41] A. Grzeszczak, M. K. Mandal, S. Panchanathan, and T. Yeap, “VLSI implementation of discrete wavelet transform,” IEEE Trans. VLSI Syst., vol. 4, pp. 421–433, Dec. 1996. [42] C. Yu, C. Hsieh, and S. Chen, “Design and implementation of a highly efficient VLS architecture for discrete wavelet transform,” in Proc. Custom Integrated Circuits Conf., 1997. [43] J. C. Limqueco and M. A. Bayoumi, “A VLSI architecture for separable 2D discrete wavelet transforms,” J. VLSI Signal Process., vol. 18, pp. 125–140, 1998. [44] O. Rioul and M. Vetterli, “Wavelets and signal processing,” IEEE Signal Processing Mag., vol. 8, no. 4, pp. 14–38, Oct. 1991. [45] W. Sweldens, “The lifting scheme: A construction of second generation wavelets,” SIAM J. Math. Anal., vol. 29, no. 2, pp. 511–546, 1997. [46] G. Strang and T. Nguyen, Wavelets and Filter Banks. Wellesley, MA: Wellesley-Cambridge Press, 1996. [47] A. M. Rassau, S.-I. Chae, and K. Eshraghian, “Simple binary wavelet filters for efficient VLSI implementation,” Electron. Lett., vol. 35, no. 7, pp. 555–557, Apr. 1999. [48] M. Martina, G. Masera, G. Piccinini, and M. Zamboni, “A VLSI architecture for IWT (Integer Wavelet Transform),” in Proc. 43rd Midwest Symp. Circuits Systems, Lansing, MI, Aug. 2000.
Marco Grangetto received the electrical engineering degree (summa cum laude) from the Politecnico di Torino, Turin, Italy, in 1999, where he is currently pursuing the Ph.D. degree. His research interests are in the field of digital signal processing and multimedia communications. In particular, he is working at the development of efficient and low complexity lossy and lossless image encoders based on wavelet transforms. Moreover, he is challenging the design of reliable multimedia delivery systems for tetherless lossy packet networking. Mr. Grangetto was awarded the premio optime by the Unione Industriale di Torino in 2000 and a Fulbright Grant in 2001 for a research period at the Center for Wireless Communications (CWC) at the University of California at San Diego.
Enrico Magli (S’98–M’01) received the degree in electronics engineering in 1997 and the Ph.D. degree in electrical and communications engineering in 2001, both from the Politecnico di Torino, Turin, Italy. He is currently a Postdoctoral Researcher at the same university. His research interests are in the field of robust wireless communications, compression of remote sensing images, superresolution imaging, and pattern detection and recognition. In particular, he is involved in the study of compression and detection algorithms for aerial and satellite images, and of signal processing techniques for environmental surveillance from unmanned aerial vehicles. From March to August 2000, he was a Visiting Researcher at the Signal Processing Laboratory of the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
Maurizio Martina received the electrical engineering degree in March 2000 from the Politecnico di Torino, Turin, Italy, where he is currently pursuing the Ph.D. degree. His activity is in the field of design and development of VLSI circuits for telecommunication and multimedia applications. In particular, he is involved in SOC design and technology research activities.
Gabriella Olmo (S’89–M’91) received the Laurea degree (cum laude) and the Ph.D. degree in electronic engineering at Politecnico di Torino, Turin, Italy, in 1986 and 1992, respectively. From 1986 to 1988, she was a Researcher with Centro Studi e Laboratori in Telecomunicazioni (CSELT), Turin, working on network management, nonhierarchical models, and dynamic routing. Since 1991, she has been Assistant Professor with the Politecnico di Torino, where she is Member of the Telecommunications Group and the Image Processing Lab. Her main recent interests are in the field of wavelets, remote sensing, image and video coding, resilient multimedia transmission, joint source-channel coding, and stratospheric platforms. She has joined several national and international research programs under contracts by Inmarsat, ESA (European Space Agency), ASI (Italian Space Agency), and the European Community. She has coauthored more than 80 papers in international technical journals and conference proceedings.