A Simple Block-Based Lossless Image Compression Scheme S. Grace Chang University of California, Berkeley Berkeley, CA 94720
[email protected]
Gregory S. Yovanof Hewlett-Packard Laboratories, Palo Alto, CA 94304
[email protected]
Abstract
x
A novel low-complexity lossless scheme for continuoustone images dubbed the PABLO codec (Pixel And Block adaptive LOw complexity coder) is introduced. It comprises a simple pixel-wise adaptive predictor and a block-adaptive coder based on the Golomb-Rice coding method. PABLO is an asymmetric algorithm requiring no coding dictionary and only a small amount of working memory on the encoder side. Due to the simplistic data structure for the compressed data, the decoder is even simpler lending itself to very fast implementations. Experimental results show the efficiency of the proposed scheme when compared against other state-of-the-art compression systems of considerably more complexity.
Among the various compression methods, predictive techniques have the advantage of relatively simple implementation. Predictive schemes exploit the fact that adjacent pixel values from a raster image are highly correlated. With a predictive codec, the encoder (decoder) predicts the value of the current pixel based on the value of pixels which have already been encoded (decoded) and compresses the error signal. If a good predictor is used, the distribution of the prediction error is concentrated near zero, meaning that the error signal has significantly lower entropy than the original, and hence, it can be efficiently encoded by a lossless coding scheme like the Huffman coding, Rice-coding or Arithmetic coding. The introduced algorithm falls under the category of predictive coding. The main processing steps are (see Fig.1): 1) Prediction: predict the current pixel based on the “past” pixels to allow lossless differential predictive coding; 2) Error Preprocessing: map the prediction errors to non-negative The work was performed at Hewlett-Packard Laboratories, Palo Alto, CA 94304
-
Rice Mapper
~ ε
Coder
k
Predictor
x^
Compressed Bitstream
Error Modeling
Figure 1. Block diagram of the encoding scheme.
integers while preserving a certain ordering of symbol probabilities to facilitate the coder; 3) Error Modeling: estimate the necessary statistics for statistical coders; 4) Coding: the lossless encoding algorithm is based on a technique known as the Rice coding.
1.1
1 Introduction: Predictive Coding
ε
+
Uncompressed Input data
Prediction
A good prediction method is essential to any coding scheme. To achieve accurate prediction and to keep the complexity low, we adopt a simple adaptive predictor which approximately performs edge detection along the horizontal and vertical directions. Each pixel is predicted from its neighbors A, B , and C as shown in Fig. 2. The predictor is as follows 1
8 A < xˆ = : B d A+B e 2
if jA ? C j > b and jB ? C j < s if jB ? C j > b and jA ? C j < s otherwise
(1)
The parameters b and s stand for big threshold and small threshold, respectively. The intuition is that if, say, A is not close to C but B is close to C , then there is probably a horizontal edge and we take A to be the predicted value. The analysis for a vertical edge is similar.
1.2
Rice Codes
Encoding a given sequence of N -bit symbols with Rice coding is like using a collection of N ? 1 different Huffman 1 The dxe operator in (1) denotes the smallest integer equal to or greater than the value x (i.e., rounding-up).
C
B
A
X
approximate geometric distribution. In closed form, the Rice mapper is
8 2" 0 " < "˜ = : 2j"j ? 1 ? " < 0 + j"j otherwise
Figure 2. The predictor support pixels. X is the current pixel to be coded.
codebooks designed for over a wide entropy range. For each entropy value, this allows the encoder to choose the best codebook from the N ? 1 choices. When the data is Laplacian-distributed, the Rice coder has been shown to be equivalent to the multiple Huffman codebook approach [1, 2], but it does not require a codebook. For differential predictive coding, the Laplacian assumption is usually a valid one. Encoding of a given symbol with a Rice code comprises two components: the fundamental sequence (FS) and the sample splitting. The FS is a comma code which takes a symbol value y and transforms it into y “0”s followed by a “1” (the comma). For example, the codeword for y = 3 is “0001”. Sample splitting is based on the intuition that the few least significant bits (LSB) are random and thus noncompressible and should be transmitted as is. Combining these two ideas, a symbol is encoded by splitting the k non-compressible LSB’s from the N ? k MSB’s, and the LSB’s are transmitted as the original bits while the MSB’s are transmitted as a FS code. The variable k will be referred to as the splitting factor. Clearly, the codeword length lk for a given input symbol and splitting factor k is given by
lk = m k + 1 + k
(2)
where mk is the integer corresponding to the N ? k MSB’s. The default option is k = N ; i.e. transmit the original bits, and thus guarantee that the symbol is not expanded. For each symbol of N bits we can find at least one optimal k 2 f0; 1; : : :; N ? 2; N g which gives the minimal length. By selecting among the various k options, we essentially have multiple encoders to choose from.
1.3
Pre-processor – the Rice Mapper
Prediction errors are usually modeled as having a Laplacian distribution. For N -bit symbols in the range [0; 2N ? 1], the prediction error exists in the range [?(2N ? 1); 2N ? 1], requiring N + 1 bits to represent. However, given that we know the current prediction there are only 2 N possible values for the current prediction error. Specifically, if we know xˆ , then the prediction error can only be in the range [?x; ˆ 2N ? 1 ? xˆ ], which requires N bits to code. We use the Rice-mapper in [3] to map the original Laplacian distributed integral error value to a non-negative integer following an
where " = x ? xˆ is the error residual and 1 ? xˆ ).
1.4
(3)
=
min(x; ˆ 2N
?
Error Modeling – Estimating the Statistics
In the case of Rice-coding the necessary statistic is the value of the optimal k for a processing block. There are several ways to find the optimal k adaptively. One method, as suggested by the original Rice coder based on blockencoding, finds an optimal k for each processing block through an exhaustive search among the allowable k’s. To find the best splitting factor k, a cumulative counter is kept for each allowable k keeping track of the total block length if every pixel in the block were coded with this k. The optimal k is the one yielding the smallest cumulative codelength. There are simpler methods for estimating k with only a slight loss in performance. One method is to compare the J cumulative sum of each block, i=1 "˜[i] + J , where J is the number of symbols in the block, with some decision boundaries derived from assuming k random LSB’s [3]. Another method is to note that adjacent blocks are highly correlated, and thus it suffices to search within k 1, where k is the value of the previous block.
P
2 Rice-Coding Based Compression Schemes 2.1
Original Rice Algorithm
The original Rice coder is a block-based algorithm [3]. The size of the processing block is a one-dimensional 16x1 vector. A cumulative counter is updated for each k as described in Section 1.4. The best k is taken to be the one which yields the least number of compressed bits. At the beginning of each block, ID bits are sent to indicate the k value used (in the case of 8 bpp grayscale images, 3 ID bits are sent), followed by the encoded output of the entire block using k as the splitting factor. In our investigation we have experimented with several variations of the original Rice coding method.
2.2
Mixed Rice and Binary Encoding
Notice that for Rice encoding, even by choosing an optimal k for each block, there are some codewords within the block that are expanded rather than compressed. Therefore, it would be better to have some symbols coded with the
optimal k for that block, but binary encode, i.e., send as default, the symbols that would be expanded by the chosen k value. To accomplish this, we keep a 1 bit per pixel bitmap indicating whether the pixel is Rice or binary encoded. This 1 bpp bitmap is an expensive overhead which we need to decide whether it is worth keeping. To make this decision, we keep two cumulative counts, one for summing the total length if the entire block were Rice encoded with each allowable k and the other one for summing the total length if some pixels were Rice-encoded and some were binary encoded. If the saving in bits is more than the overhead, then we keep a bitmap and do a mixture of Rice coding and binary encoding; otherwise, we do purely Rice coding for the entire block. There needs to be an indicator for each block telling us whether the block is mix-encoded or purely Rice encoded. To avoid further overhead, we reserve the value 6 as a MARKER in the 3 block-ID bits that are sent indicating the k value used.
2.3
PABLO: Mixed Rice-Binary Encoding with Block Classification
The proposed PABLO scheme builds upon the previously described mixed Rice-Binary encoding method with the purpose of improving its performance against images with large flat areas (text, graphics, compound documents). With such images instead of coding every pixel (which can achieve at most 8:1 compression ratio), better compression would be achieved with a scheme like, say, runlength encoding, which skips over large contiguous areas of a single value. One simple way to do this is by block classification. That is, we use 1 bit per block indicating whether that block is a FLAT block, meaning that the entire block is of the same value. If it is FLAT, then send the value of the block. If not, then we decide to mix-encode that block as described in Section 2.2. From the previous discussion it can be seen that this scheme employs a pixel-by-pixel adaptive predictor and a block-adaptive Rice coder, and hence the name PABLO (Pixel And Block adaptive LOw complexity coder). Obviously, the PABLO scheme is specifically targeted towards textual and graphics images. For regular images, there are seldomly FLAT blocks, and 1 bit per block would be wasted. However, the overhead incurred by the block classification only results in approximately .5% loss in coding efficiency against natural images and thus is quite insignificant.
2.4
Hierarchical Block Classification
The block-classification algorithm described in Section 2.3 could be improved by noticing that in compound images, there are many blocks which are mostly flat, but have some
REGULAR? Y
N
ENTIRELY_FLAT? Mix−encode entire block
Y
N
3/4_FLAT? Send value
Y
Send 2 location bits, send value, mix−encode quadrant
N
Send 2 location bits, send value, mix−encode half
(i.e. HALF_FLAT)
Figure 3. The tree-structured scheme for classifying the flat regions of a block.
other values in a corner or at the edge. Instead of classifying the entire block as non-flat and using Rice encoding, we can split the block into smaller regions and classify whether it is half-flat or is all flat except on one quadrant, and then mix-encode the non-flat portion. To transmit this information, it naturally incurs more overhead. To avoid too much overhead for pure images, we propose a tree-structured classification as shown in Figure 3. For REGULAR images it incurs 1 bit per block overhead, same expense as the block-classification scheme. For compound images, 2 bits per block are needed to specify ENTIRELY-FLAT blocks, and 3 bits per block are needed to specify 3/4-FLAT and HALF-FLAT blocks, plus an additional 2 bits to specify which half (top, bottom, left, right) or quadrant is not flat. A block is REGULAR if it is neither ENTIRELY-FLAT, HALF-FLAT, nor 3/4-FLAT. The value of the flat region is then sent as original, and the non-flat portion is mix-encoded.
3 Results We now present some experimental results. Table 1 provides a comparison of our schemes with other existing schemes against the USC image database. Column 1 shows the compression attained by FELICS [4] (with the maximum k parameter set to 6), which is a low-complexity context-based pixel-wise adaptive algorithm also based on the Rice code. The JPEG data shown corresponds to the independent lossless JPEG function employing the 2-point
predictor no. 7 and arithmetic coding [5]. The third column is the straightforward Rice algorithm with the parameter k estimated via exhaustive search and a processing block of size 8x8. The column MixEncode is the algorithm described in Section 2.2, and PABLO is the algorithm described in Section 2.3, both with 8 8 blocks. The next column is the 0th order entropy of the entire image, using the predictor in (1). Note that we have converted the bitrate to compression ratio. For comparison, we also include the performance of the sequential LZW algorithm, i.e., the UNIX ’compress’. Our algorithms show improvement over the FELICS and JPEG scheme. Notice that PABLO is always worse than MixEncode for pure images since there is rarely any FLAT block and thus, there is a waste of 1 bit per block. For the most part, MixEncode performs slightly better than pure RICE. However, in the case of the mand image, it is slightly worse because there are quite a few blocks with optimal k = 6, a value that is not used in our schemes. Table 2 summarizes the performance of the introduced schemes and the LOCO-I algorithm as described in [6] against a number of images from the JPEG suite of standard test images. The LOCO-I scheme is a pixel-wise adaptive coder employing an adaptive predictor, context-based error modeling with special treatment of long pixel-runs and Rice coding. It is symmetric for both the encoder and the decoder. For the non-compound images, the HIER scheme (the hierarchical block classification) is about 2-7% worse than the LOCO-I. For the compound images, HIER is about 20% worse due to the very simple encoding for blocks with mostly white space and some text. However, the advantage that PABLO offers over the LOCO-I scheme is that the decoder is extremely simple, since it does not require any statistical modeling.
4 Complexity The main design objective for all the presented algorithms so far has been low complexity, in terms of both the computational complexity as well as the overall system resource requirements. The error modeling part of the block-adaptive algorithms makes them more than 1-pass, but only on the block level (typically 8x8 block size). Only the current block plus the boundary pixels in the adjacent blocks (which are used by the predictor) need to be buffered. The collection of the statistics requires the use of a few counters and the process merely involves addition and bit-shift operations. The formation of a Rice codeword is extremely simple, and these algorithms use very little working memory and no coding memory at all. The decoder is even simpler and faster than the encoder since it does not have to estimate the k value which is transmitted as overhead along with the compressed bitstream. Thus, these schemes are ideally suited for asymmetrical ap-
plications such as compression in a laserjet printer [7] where the decoder needs to operate at a much faster rate than the encoder.
5 Conclusion This paper summarizes the results of an investigation of lossless compression schemes for grayscale images based on the Rice coding method, a low-complexity alternative to the popular Huffman coding. Due to the simplistic data structure of the compressed information, our algorithms have very simple and fast implementations ideally suited for low cost devices like computer peripherals.
References [1] S. W. Golomb, “Run-Length Encodings,”IEEE Trans. Inf. Theory, IT-12, 399-401, July 1966. [2] R. Gallager, D. Van Voorhis, “Optimal Source Codes for Geometrical Distributed Alphabets,” IEEE Trans. Info. Theory, vol. IT-21, 228-230, March 1975. [3] R. Rice, P-S. Yeh, and W. Miller, “Algorithms for a very high speed universal noiseless coding module,” JPL Publication 91-1, Jet Propulsion Laboratory, Pasadena, California, Feb. 1991. [4] P. Howard, “The design and analysis of efficient lossless data compression systems,” Ph.D. Thesis, Brown University, Department of Computer Science, June 1993. [5] W.B. Pennebaker, J.L. Mitchell, “JPEG: Still Image Data Compression Standard,” Van Nostrand Reinhold, N. York, 1993. [6] M. Weinberger, G. Seroussi, G. Sapiro, “LOCO-I: A Low Complexity, Context-Based, Lossless Image Compression Algorithm,” Proc. IEEE Data Compression Conf., Snowbird, Utah, April, 1996. [7] G.S. Yovanof, “Compression In A Printer Pipeline,” IEEE 29th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Oct 30 - Nov. 1, 1995.
Images crowd lax lena man woman2 lake mand milkdrop moon peppers tank urban
FELICS 1.82 1.34 1.74 1.62 2.23 1.49 1.27 2.07 1.91 1.88 1.28 1.79
JPEG 1.86 1.30 1.72 1.64 2.28 1.48 1.26 2.05 2.25 1.89 1.47 1.78
RICE 1.85 1.36 1.82 1.71 2.37 1.54 1.31 2.12 2.00 1.68 1.27 1.89
MixEnc 1.85 1.36 1.82 1.71 2.37 1.54 1.31 2.13 2.01 1.68 1.27 1.89
PABLO 1.84 1.36 1.81 1.70 2.36 1.54 1.31 2.12 2.00 1.67 1.27 1.88
Table 1. Comparison of the compression ratio for various techniques on the USC images. The compression ratio is defined as the number of the original bits divided by the number of compressed bits. The best performance for each image is highlighted in bold.
Images finger gold hotel water woman cmpnd1 cmpnd2
MixEncode 1.354 1.995 2.019 3.404 1.853 3.552 3.486
PABLO 1.350 1.990 2.011 4.187 1.857 4.742 4.536
HIER 1.350 1.999 2.011 4.178 1.860 5.133 4.843
LOCO-I 1.423 2.046 2.118 4.467 1.916 6.141 5.943
Table 2. Images from the JPEG standard database. Comparison of the compression ratio for various techniques.
LZW 1.00 1.39 1.17 3.44 1.13 4.80 4.29
0th Entropy 1.74 1.35 1.75 1.64 2.21 1.50 1.27 2.11 1.80 1.66 1.49 1.79
LZW 1.33 1.03 1.13 1.11 1.40 1.08 1.00 1.34 1.45 1.22 1.46 1.23