I Image Compression Concepts Overview

0 downloads 0 Views 1MB Size Report
YCbCr, where the Y component represents the luminance information while the Cb and Cr components represent the color information. The image is then ...
Category: Multimedia Technology

1805

Image Compression Concepts Overview Alan Wee-Chung Liew Griffith University, Australia Ngai-Fong Law The Hong Kong Polytechnic University, Hong Kong

IntroductIon

Background

Image compression aims to produce a new image representation that can be stored and transmitted efficiently. It is a core technology for multimedia processing and has played a key enabling role in many commercial products, such as digital camera and camcorders. It facilitates visual data transmission through the Internet, contributes to the advent of digital broadcast system, and makes possible the storage on VCD and DVD. Despite a continuing increase in capacity, efficient transmission and storage of images still present the utmost challenge in all these systems. Consequently, fast and efficient compression algorithms are in great demand. The basic principle for image compression is to remove any redundancy in image representation. For example, simple graphic images such as icons and line drawings can be represented more efficiently by considering differences among neighbor pixels, as the differences always have lower entropy value than the original images (Shannon, 1948). These kinds of techniques are often referred to as lossless compression. It tries to exploit statistical redundancy in an image so as to provide a concise representation in which the original image can be reconstructed perfectly. However, statistical compression techniques alone cannot provide high compression ratio. To improve image compressibility, lossy compression is often used so that visually important image features are preserved while some fine details are removed or not represented perfectly. This type of compression is often used for natural images where the loss of some details is generally unnoticeable to viewers. This articles deals with image compression. Specifically, it is concern with compression of natural color images because they constitute the most important class of digital image. First, the basic principle and methodology of natural image compression is described. Then, several major natural image compression standards, namely JPEG, JPEG-LS, and JPEG 2000 are discussed.

A common characteristic of most images is that the neighboring pixels are correlated and thus contain redundant information. The main goal of image compression is to reduce or remove this redundancy. In general, two types of redundancy can be identified (Gonzalez & Woods, 2002): • •

Spatial redundancy: This refers to the correlation between neighboring pixels. This is the only redundancy for grayscale images. Spectral redundancy: This refers to the correlation between different color planes or spectral bands. This redundancy occurs in color images or multispectral images and exists together with the spatial redundancy.

Image compression techniques aim at reducing the number of bits needed to represent an image by removing the spatial and spectral redundancies as much as possible. The compression is lossless if the redundancy reduction does not result in any loss of information in the original image. Besides redundancy, an image may also contain visually irrelevant information. The visually irrelevant information refers to information that is not perceived by human observers. Irrelevancy reduction thus aims at removing certain information in the image that is not noticeable by the Human Visual System (HVS). In general, some form of information loss is incurred when irrelevancy reduction is performed (Xiao, Wu, Wei, & Bao, 2005). A number of standards have been established over the years for natural image compression. JPEG is the most common image file format that is found in existing Internet and multimedia systems (Pennebaker & Mitchell, 1993; Wallace, 1991). JPEG stands for Joint Photographic Experts Group. It is the name of the joint ISO/CCITT committee that created the image compression standard in 1992. There are two compression modes in the JPEG compression standard: lossless and lossy. However, the lossy mode dominates in almost all applications. The JPEG image compression codec has low complexity and is memory efficient. However, its

Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.

I

Image Compression Concepts Overview

main criticism is the appearance of the blocking artifacts, especially at high compression ratios. In 2001, the Joint Photographic Experts Group created another new image compression standard, called the JPEG 2000 (Taubman & Marcellin, 2002). This new standard provides an improved compression performance over JPEG and avoids the blocking artifacts completely. Besides the better compression performance, it also provides progressive capability in which the JPEG 2000 bitstream is organized in such a way that the image quality gets better progressively in terms of quality or resolution (Lee, 2005). Compared to the JPEG standard, JPEG 2000 is not widely supported at present. It is hindered by the fact that some of the algorithms are patented. As a result, it cannot be included in open-source Web browsers, which affects its popularity.

IMage coMpressIon Methodology Basic compression scheme In general, most natural image compression schemes have a common structure, as shown in Figure 1. The first stage is usually a color space conversion module. Typically, a color image is stored in the RGB format. Because compression in the RGB domain is very inefficient, the image is converted into a luminance-chrominance color representation, that is, YCbCr, where the Y component represents the luminance information while the Cb and Cr components represent the color information. The image is then subjected to a transformation like the discrete cosine transform (DCT) (Ahmed, Natarajan, & Rao, 1974) or discrete wavelet transform (DWT) (Heil, Walnut, & Daubechies, 2006). The transformation decorrelates the image data, and thus reduces redundancy. The resulting coefficients are next quantized and entropy encoded. To quantize a signal means to describe it with less precision. Hence, some image information is inevitably discarded. Scalar quantization quantizes each coefficient separately using a predefined quantization table. It is the most common quantization scheme due to its simplicity. The

Figure 1. General structure of image compression algorithms

1806

rate-distortion (RD) unit controls the quantization step-size as a function of the bit-rate R and distortion D (Sarshar & Wu, 2007). Sometimes a RD unit is not explicitly defined, but is indirectly controlled by the nature of the quantizer. Instead of quantizing each coefficient separately as in a scalar quantizer, vector quantization (VQ) can be used to represent a signal piecewisely by short vectors from a codebook. The codebook generally contains a limited number of entries that approximate the signal pieces. Compression is achieved in VQ because only the index of the best codebook entry needs to be encoded and transmitted.

exploiting the limitations of the human Visual system (hVs) High compression ratio is usually achieved by aggressively exploiting the limitations of the HVS. Psycho-visual experiments have shown that the HVS has reduced sensitivity for patterns with high spatial frequencies. The phenomenon is parameterized by the contrast sensitivity function (CSF). Exploiting this behavior can significantly improve the compression ratio without incurring noticeable distortion. The quantization table in JPEG makes use of this phenomenon to some extent by using a large quantization step for high spatial frequency DCT transform coefficients in the luminance channel. The sensitivity of the HVS for compression artifacts also varies with respect to the strength of local contrasts. Thus, an artifact might be hidden by the presence of strong contrasts or locally active image regions. This phenomenon, referred to as masking, is exploited in some sophisticated compression schemes (Gonzalez & Woods, 2002). Subjective quality evaluations showed that the HVS is very sensitive to the loss of texture information. Blurred image with texture loss usually appear unnatural. However, the exact encoding of texture information is bit-rate intensive. To overcome this problem, a generative approach for texture region encoding is sometime employed in advance compression algorithms (Egger, Fleury, Ebrahimi, & Kunt, 1999; Ryan et. al. 1996). In this approach, the texture is characterized by only a few parameters that can be encoded for a modest increase in bit-rate. During decoding, the texture is synthe-

Image Compression Concepts Overview

sized from these parameters. Even if the synthesized texture is pixel-wise different from the original texture, the HVS is nevertheless deceived due to their apparent similarity. The HVS is unable to distinguish small differences in color as easily as it can to changes in brightness value. This phenomenon is exploited in chroma subsampling by dropping half or more of the chrominance information in the image (Poynton, 2003). At normal viewing distances, there is no perceptible loss incurred by sampling the color detail at a lower bit rate.

compression Quality evaluation For lossy image compression, the image is not reproduced exactly. An approximation of the original image is enough for most purposes, as long as the error between the original and the compressed image is tolerable. Two type of quality evaluation are often used to assess the performance of different compression algorithms: (i) objective error metrics and (ii) subjective quality evaluation. Objective error metrics are quantitative and do not take into account the properties of the HVS, whereas subjective quality evaluation measures only the perceptually important distortion as determined by the HVS and is usually qualitative. The Peak Signal to Noise Ratio (PSNR) is the most commonly used objective error metrics for measuring the quality of the reconstructed image. The PSNR is defined as: 2

PSNR = 10 ∗ log10 ( MAX I / MSE ), where MAXI is the maximum pixel value of the image (255 for 8 bit grayscale image). The MSE is the Mean Square Error between two images I and I', and is given by:

MSE = 1

MN ∑i =0

M −1



post-processing for compression artifacts removal Highly visible compression artifacts appear in the reconstructed image at high compression ratio. These artifacts are usually visually annoying to the viewers and a lot of research has been done to suppress them. Post-processing the reconstructed image for compression artifacts suppression has been a popular solution to this problem (Liew & Yan, 2004; Liew, Yan, & Law, 2005; Yarnatani & Saito, 2006). For coding algorithms based on block-based DCT such as JPEG, the major artifacts are characterized by blockiness in flat areas and ringing along object edges. For wavelet-based coding techniques, ringing is the most visible artifact. Reduction of these artifacts can result in a significant improvement in the overall visual quality of the decoded images. The general approach is to reduce the block discontinuities along the 8×8 block boundaries by certain form of smoothing while preserving genuine edges in the image (Law & Siu, 2001). Alternatively, smoothing can be applied to region segments that exhibit slow intensity variation (Liew, Yan, & Law, 2005). Ringing suppression can be done by locally suppressing ringing ripples near the vicinity of strong edges (Liew & Yan, 2004).

compression choices Some of the important choice/considerations for image compression are as follows: •

N −1 j =0

[ I (i, j ) − I ′(i, j )]2.

It is well known that PSNR does not correlate well with perceptual judgment of reconstruction quality. The most common subjective quality evaluation is by visual inspection. Mean opinion score (MOS) can also be used to provide a numerical indication of the perceived quality of reconstruction quality (ITU-R Recommendation, 1992). The MOS averages the subjective evaluation of a group of viewers on different visual criteria based on a numerical score, for example, 1 = lowest perceived quality to 5 = highest perceived quality. Subjective quality evaluation is a difficult problem and currently there is no universally accepted method of performing subjective quality evaluation (Wei, Li, & Chen, 2006).

I





Lossless compression: For some applications, lossless compression is required. In this case the reconstructed image will be exactly equal to the original image. Lossless compression is usually used for medical applications. The best compression factor that can be achieved is typically around three. Visually lossless compression: If a human observer cannot see any difference between the original and the compressed image, the compression is visually lossless. Most compression algorithms exploit the properties of the human visual system (HVS) to achieve high compression factor, that is, up to 20 or 30, without noticeable degradation of the perceived image. This is done by discarding part of the image information that is not perceivable by the HVS. Scalable or progressive coding: A nonprogressively encoded bitstream must be received in its entirety before the inverse transform can be applied. However, for a progressively encoded bitstream, the inverse transform can be applied to a partially decoded bitstream. Progressive encoding arranges the bitstream in such a way that the most important information is near the

1807

Image Compression Concepts Overview

front end of the bitstream and decreasingly important information is toward the back of the bitstream. During decoding, the critical information at the front of the bitstream can be used to construct an approximate version of the image and the quality or resolution of the reconstructed image is progressively increased when further bitstream is received and decoded.

current IMage coMpressIon standards Jpeg The lossy mode in the JPEG image compression standard is widely used in WWW and multimedia systems. First, the image is converted from the RGB color space to the YCbCr color space. Because the human eye is relatively insensitive to the chrominance information, the Cb and Cr components are usually downsampled. Next, each of the Y, Cb and Cr components are grouped into 8×8 blocks. Discrete cosine transform (DCT) is then applied independently to each block to get the DCT coefficients. This is followed by the process of quantization in which each DCT coefficient is divided by a number and rounded to the nearest integer. Because humans are not good at seeing differences in high frequency components as compared to the low frequency components, the DCT coefficients are quantized differently: the divisor for the low frequency DCT coefficients is much smaller than that for the high frequency DCT coefficients. Many high frequency components are rounded to zero after quantization. “Zig-Zag” scanning then organized the two dimensional DCT coefficients block into a one dimensional coefficient stream so that similar frequency components are grouped together. Run-length or Huffman coding can then be applied to further compress the resulting bitstream. The image quality depends greatly on the choice of the divisors in the quantization process. Good image quality is often obtained for a compression ratio of less than 10. For compression ratio larger than 100, images appear to be blocky due to the fact that quantization is applied independently to each of the 8×8 blocks. The JPEG standard also has a lossless mode. Lossless JPEG was developed as a late addition to JPEG in 1993, using a completely different technique from the lossy JPEG standard. It uses a predictive scheme based on the three nearest neighbors (upper, left, and upper-left), and entropy coding is used on the prediction error. It was never widely adopted and its performance is not state-of-the-art.

1808

Jpeg-ls JPEG-LS is the new lossless/near-lossless compression standard for continuous-tone images, ISO-14495-1/ITUT.87. JPEG-LS was developed with the aim of providing a low complexity lossless image compression standard that could be able to offer better compression efficiency than lossless JPEG. Part 1 of this standard was finalized in 1999. The standard is based on the LOCO-I algorithm (LOw COmplexity LOssless COmpression for Images) developed at Hewlett-Packard Laboratories, that relies on prediction, residual modeling and context-based coding of the residuals (Weinberger, Seroussi, & Sapiro, 2000). Most of the low complexity of this technique comes from the assumption that prediction residuals follow a two-sided geometric distribution (also called a discrete Laplace distribution) and from the use of Golomb-like codes, which are known to be near-optimal for geometric distributions. Besides lossless compression, JPEG-LS also provides a lossy mode where the maximum absolute error can be controlled by the encoder. Compression for JPEG-LS is generally much faster than JPEG 2000 and much better than the original lossless JPEG standard.

Jpeg 2000 JPEG 2000 is another newly developed image compression standard, which provides better image quality and stronger compression power than JPEG. Instead of using the blockbased DCT, JPEG 2000 uses the wavelet transform so that the blocking artifacts can be avoided completely. Similar to the JPEG standard, an image is first converted from the RGB color space to YCbCr or the reversible component transform (RCT) color space. The image is then split into rectangular regions called tiles so that wavelet transform can be applied to each tile independently with different decomposition levels. The JPEG 2000 standard uses the floating-point biorthogonal 9/7 wavelet kernel for lossy compression and integer 3/5 kernel for lossless compression. The wavelet coefficients are organized into a set of subbands that shows spatial details under a certain frequency range. These coefficients are then quantized to produce a set of integer numbers. The quantized subbands are split into rectangular regions in wavelet domain called precincts. Precincts are further split into codeblocks in which the Embedded Block Coding with Optimal Truncation scheme is employed to each code block in a biplane order. There are three coding passes: significant propagation, magnitude refinement and clean up passes. The significant propagation encodes bits of insignificant coefficients with significant neighbors while the magnitude refinement refines the significant coefficients. A context-driven binary arithmetic coding is then used to code the bits after the three coding passes. The resultant

Image Compression Concepts Overview

bit stream is organized into packets where a packet groups selected passes of all code blocks from a precinct into one unit. These packets can be organized in a flexible way depending on the target application. For example, these packets from all subbands can be arranged into layers in such a way that the image quality gets improved progressively from layer to layer. Because of the inherent multiresolution nature of wavelets and the encoding schemes, the image compressed by the JPEG 2000 standard has bitstreams arranged progressively, from very low image quality (i.e., small number of bits and high compression ratio) to effectively lossless image compression. Often compression ratio of 20 can be achieved for natural images without any visible artifacts. In JPEG 2000, blocking artifacts are totally removed but smoothing and ringing artifacts are introduced at high compression ratio. JPEG 2000 includes a lossless mode based on a special integer wavelet filter (biorthogonal 3/5). JPEG 2000’s lossless mode runs more slowly and usually has worse compression ratios than JPEG-LS. However, it is scalable and progressive, and, because its algorithm is similar to JPEG 2000, it is more widely supported.

neW Features In IMage coMpressIon The aim of the JPEG 2000 is not only to have an improved compression performance as compared to the JPEG standard, but also to introduce some added features which are valuable to various kinds of applications. New features introduced include region of interesting coding, random access and progressive transmission concept (Lee, 2005). Region of interest coding implies that certain parts of the image can be encoded with higher (or lossless) quality than other regions. This feature is important for applications such as medical imaging where certain critical parts can be coded with very high quality while other less critical parts are coded with low resolution. JPEG 2000 also introduces the idea of random codestream access. This means that the compressed image bitstream support random spatial access at varying resolutions. JPEG 2000 induces a scalable structure on image representation. This scalable structure enables the generation of a single bit-stream for different purposes without having to rerun the compression algorithm. For example, the quality or the image resolution/size can be progressively improved. Thus, the same bit-stream can be used in a High Definition Television with a 1280×720 display, as well as in a PDA with a 160×160 display.

Future trends Next generation compression algorithms are characterized by the use of an object-oriented image representation and a single universal bit-stream for different transmission/display media. The object-oriented compression enables the manipulation of objects in the scene as individual items that can be processed separately. The single universal bit-stream allows the compressed data to be transmitted or displayed by different media without having to rerun the compression algorithm. The JPEG 2000 bitstream enables region of interest coding for objects and are scalable due to the use of wavelet transforms. Its popularity, however, is hindered by the software patents. Either the standards need to be modified to avoid those patented software or more researches have to be done to replace these software with other free license software. Otherwise, the open-source Web browsers cannot include the JPEG 2000 decoder.

conclusIon Image compression is a core technology for multimedia systems. Efficient transmission and storage of images is often required despite the continuing increase in network bandwidth. Transform coding is currently the most popular class of compression method for natural images. To achieve high compression ratio, lossy compression that exploits the limitation of the HVS is often employed. There are a number of international standards in image compression. The JPEG standard was established in 1992 and is very popular in WWW and multimedia systems. However, it is well known that blocking artifacts would appear at low bit-rates, which could greatly affect the perceived image quality. For near lossless compression applications, the JPEG-LS standard was introduced in 1999. Recently, the JPEG 2000 standard was introduced. It not only removes the blocking artifacts completely and improves the compression performance over the JPEG standard, but also induces a scalable structure on image representation. This scalable structure fits well with the perception of human eyes. It enables the generation of a single bit-stream for different transmission/display media, and potentially induces an object-oriented image representation. However, because of the software patent issue, the JPEG 2000 has not been included in open-source Web browsers. This affects its popularity.

reFerences Ahmed, N., Natarajan, T., & Rao, K.R. (1974). Discrete cosine transform. IEEE Transactions on Computers, 23, 90-93.

1809

I

Image Compression Concepts Overview

Egger, O., Fleury, P., Ebrahimi, T., & Kunt, M. (1999). High performance compression of visual information, a tutorial review part I: Still picture. In Proceedings of the IEEE, 87(6), 976-1013.

Weinberger, M., Seroussi, G., & Sapiro, G. (2000). The LOCO-I lossless image compression algorithm: Principles and standardization into JPEG-LS. IEEE Transactions on Image Processing, 9(8), 1309-1324.

Gonzalez, R.C., & Woods, R.E. (2002). Digital image processing. Prentice Hall.

Wei, X., Li, J., & Chen, G. (2006). An image quality estimation model based on HVS. In Proceedings of the 2006 IEEE Region 10 Conference, (pp. 1-4).

Hei. C., Walnut, D.F., & Daubechies, I. (2006). Fundamental papers in wavelet theory. USA: Princeton University Press. ITU-R Recommendation. (1992). Method for the subjective assessment of the quality of television pictures (pp. 500-505). Law, N.F., & Siu, W.C. (2001). Successive structural analysis using wavelet transform for blocking artifacts suppression. Signal Processing, 81(7), 1373-1387. Lee, D.T. (2005). JPEG 2000: Retrospective and new developments. In Proceedings of the IEEE, 93(1), 32-41. Liew, A.W.C., & Yan, H. (2004). Blocking artifacts suppression in block-coded images using overcomplete wavelet representation. IEEE Transactions on Circuits and Systems for Video Technology, 14(4), 450-461.

Xiao, L., Wu, H.Z., Wei, Z.H., & Bao, Y. (2005). Research and application of a new computational model of human vision system based on Ridgelet transform. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, (Vol. 8, pp. 5170-5175). Yarnatani, K., & Saito, N. (2006). Improvement of DCT-based compression algorithms using Poisson’s equation. IEEE Transactions on Image Processing, 15(12), 3672-3689.

key terMs Blocking Artifacts: This is one of the artifacts often exhibited by JPEG standard at high compression ratios. Images appear to have regular block structures.

Liew, A.W.C., Yan, H., & Law, N.F. (2005). POCS-based blocking artifacts suppression using a smoothness constraint set with explicit region modeling. IEEE Transactions on Circuits and Systems for Video Technology, 15(6), 795-800.

JPEG: An image compression standard proposed in 1992 by ISO/CCITT committee. It is one of the most common image file format that is found in Internet and consumer products.

Pennebaker W.B., & Mitchell, J.L. (1993). JPEG: Still image data compression. Van Nostrand Reinhold.

JPEG 2000: An image compression standard that aims not only to provide an improved compression performance over JPEG, but also to provide new features such as region of interest coding, random access and progressive coding.

Poynton, C. (2003). Digital video and HDTV: Algorithms and interfaces. USA: Morgan Kaufmann. Ryan, T.W., Sanders, D, Fishers, H.D., & Iverson, A.E. (1996). Image compression by texture modeling in the wavelet domain. IEEE Transactions on Image Processing, 5(1), 26-36. Sarshar, N., & Wu, X. (2007). On rate-distortion models for natural images and wavelet coding performance. IEEE Transactions on Image Processing, 16(5), 1383-94. Shannon, C.E. (1948). A mathematical theory of communication. Bell System. The Bell System Technical Journal, 27, 379-423. Taubman, D.S., & Marcellin, M.W. (2002). JPEG 2000: Image compression foundations, standards and practice. Kluwer Academic. Wallace, G. (1991). The JPEG still picture compression standard. Communications of the ACM, 34(4), 30-44.

1810

Progressive Transmission: This implies that the bitstream is arranged so that most important information is near the front end of the bitstream and the least important information is at the back of the bitstream. Thus, in decoding, the quality of the decoded image is progressively increased. Region of Interest Coding: This means that certain parts of the image (i.e., the interested regions) are encoded with more bits and thus have better quality than other parts of the image. Ringing Artifacts: This type of artifacts often appears near the edges of an image in which edges are blurred and have oscillating effect. Scalability: It refers to a successive quality change by bitstream manipulation. For example, PSNR scalability means the PSNR improves as more bits in the bitstream are decoded.

Image Compression Concepts Overview

Significant Wavelet Coefficients: This refers to wavelet coefficients that have large absolute magnitude. Usually, this implies important structural information such as edges in an image.

I

Transform Coding: This refers to a type of compression in which the image data is first transformed into another domain so that the data becomes uncorrelated in this new domain for further processing. Wavelet Subband: This refers to a group of the wavelet coefficients at certain frequency ranges.

1811