Multiband lossless compression of hyperspectral images - CiteSeerX

0 downloads 0 Views 203KB Size Report
A possible solution lies in the adoption of compression techniques. Previous ... In [9] the concept of clustered differential pulse code modulation (C-. DPCM) is ...
Multiband lossless compression of hyperspectral images Enrico Magli, Senior Member, IEEE

Abstract Hyperspectral images exhibit significant spectral correlation, whose exploitation is crucial for compression. In this paper we investigate the problem of predicting a given band of a hyperspectral image using more than one previous band. We present an information-theoretic analysis based on the concept of conditional entropy, which is used to assess the available amount of correlation and the potential compression gain. Then, we propose a new lossless compression algorithm that employs a Kalman filter in the prediction stage. Simulation results are presented on AVIRIS, HYDICE and HYMAP scenes, showing competitive performance with other state-of-the-art compression algorithms.

Index Terms Lossless compression; hyperspectral data; conditional entropy; 3D prediction; Kalman filter.

The author is with Dip. di Elettronica, Politecnico di Torino, Corso Duca degli Abruzzi 24 - 10129 Torino - Italy - Ph.: +39-011-5644195 - FAX: +39-011-5644099 - E-mail: [email protected].

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

1

Multiband lossless compression of hyperspectral images I. I NTRODUCTION Hyperspectral images have recently become very popular, as their high spectral resolution allows to distinguish different materials, facilitating several image analysis tasks, especially image classification. However, hyperspectral datasets have very large size, making acquisition, storage and transmission of these data problematic. For example, the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor acquires lines containing 614 pixels in 224 spectral bands, requiring hundreds of MBytes for storage. A possible solution lies in the adoption of compression techniques. Previous work on lossless compression of hyperspectral images has mostly been based on the predictive coding paradigm, whereby each pixel is predicted from past data, and the prediction error is entropy coded [1], [2]. In [3] fuzzy prediction is introduced, switching the predictor among a predefined set using on a fuzzy logic rule. In [4] the prediction is improved using edge-based analysis. In [5] classified prediction is introduced for near-lossless compression. Classified prediction is further developed in [6] for lossless and near-lossless compression. In [7] spectral prediction is performed using adaptive filtering. In [8] vector quantization is employed to yield lossless compression. In [9] the concept of clustered differential pulse code modulation (CDPCM) is introduced. The spectra of the image are clustered, and an optimal predictor is computed for each cluster and used to decorrelate the spectra; the prediction error is coded using a range coder. In [10] it is proposed to employ a spectral predictor that is based on two previous bands. In [11] the spectral redundancy is exploited using a context matching method driven by the spectral correlation. In [12] it is proposed to employ distributed source coding to achieve lossless compression with a very simple encoder. In [13] a simple algorithm is proposed, which encodes each image block independently. In [14] a low-complexity algorithm is introduced, based on linear prediction in the spectral domain. In [15] the performance of JPEG 2000 [16] is evaluated for lossless compression of AVIRIS scenes, in the framework of progressive lossy-to-lossless compression; lossy compression results are reported in [17]. In [18] it is proposed to employ as predictor a pixel in the previous band, whose value is equal to the pixel co-located to the one to be coded. In [19] spectral correlation is exploited

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

2

through context matching. In general, exploiting the spectral correlation is crucial to achieve a high compression ratio; this is why the algorithms with highest performance employ accurate models of this correlation. The algorithm in [15] employs JPEG 2000 with a reversible integer wavelet transform in the spectral dimension. While this allows to potentially exploit all bands in the spectral decorrelation, transform-based methods are typically outperformed by predictionbased methods for lossless compression. Among predictive methods, in [3] the prediction employs a pair of previous bands to estimate the current pixel; however, the computational complexity of [3] is high with respect to other methods. In [10] two previous bands are also used, and they are combined in a predefined way; this decreases complexity, but requires a training phase to estimate the linear combination coefficients. In [9] all previous bands are used to predict the current pixels; the algorithm requires to cluster the image before compression, as well as to solve a linear regression problem in the predictor coefficients inside each cluster. Both steps are computationally and memory demanding; moreover, the clustering stage is not causal, i.e. it requires availability of all bands during the encoding of each single band. Band reordering has been used in [20], [21], [22]; specifically, the spectral channels of the image are reordered in such a way as to maximize the correlation of adjacent bands, optimizing the performance of the subsequent compression stage. While this has been shown to be useful, it is a computationally demanding and non-causal step. In this paper we specifically address the problem of exploiting more than one previous band in the prediction stage for lossless hyperspectral data compression, which in the following we refer to as multiband prediction; this is expected to improve compression efficiency, because the spectral correlation is very strong in hyperspectral data. The objective of this paper is twofold. We first use an information-theoretic approach, and attempt to estimate the potential gain that can be achieved exploiting more than one previous band. Then, we define a new algorithm which employs a Kalman filter to perform the prediction. The performance is evaluated on a large set of hyperspectral data, including images from the AVIRIS, HYMAP and HYDICE sensors. This paper is organized as follows. In Sect. II we briefly review some aspects of the 3D context-based adaptive lossless image coding (CALIC) algorithm, which constitutes important background material for the definition of the proposed algorithm; moreover, we describe the dataset used for the experiments. In Sect. III we describe a statistical model, based on information theory, that allows to predict the performance gain of multiband compression,

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

3

Cj(x,y) Xj(x,y)

Fig. 1.

Causal neighborhood for prediction of pixel

  in 3D-CALIC; the neighborhood is the gray area.



and apply it to the test dataset. In Sect. IV we define the proposed algorithm, whereas in Sect. V we provide lossless compression performance results. Finally, in Sect. VI we draw some conclusions. II. BACKGROUND In the following we briefly review the operation of 3D-CALIC [23], [24], and specifically the prediction stage, as it will be used in the definition of the proposed algorithm. Moreover, in Sect. II-B we introduce the dataset used for the experiments. A. Review of 3D-CALIC prediction stage

  the sample in band  , line  and pixel  of a hyperspectral image   the pixel currently containing  bands,  lines and  pixels per line. Letting being encoded, and   the co-located pixel in the previous band, 3D-CALIC predicts   and selects the context model for entropy coding using the neighborhoods of   and   as defined by the gray area in Fig. 1. For a given band  and a given pixel   , we denote as   the set of pairs     such that pixel     is in the neighborhood of  .   and   is In 3D-CALIC, the local correlation coefficient between pixels We define as 













first computed as



  

 

 

 





  





where the summations are over all

  

    

 











 







 .

  





 

  







The notation highlights that this



coefficient measures the correlation of the current band with band  . If



  , 

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

4

an interband predictor is employed, otherwise an intraband predictor is used (



used in [24]). The interband predictor



 

 ,

of







 

is

where the superscript “C” denotes the

CALIC algorithm, is based on the following linear model:



where



  

and minimize the prediction error 





   (1)       of the current pixel,

and    is the Euclidean norm. This prediction is modified using the local gradients

   

and





    

  

   





     

  

   

The final non-linear predictor in 3D-CALIC is defined by the following pseudo-code:

                )      ELSE IF(                   )      ELSE        





IF(





where

















is a threshold (in practice we use



 

as in [23]). If



   , the 

intraband predictor described in [23] shall be employed. Besides the prediction error, it is useful to note that 3D-CALIC also computes an estimate of the gradient at coordinates



    

   as

  

     

  

   

which is essential to the definition of the contexts for entropy coding. The prediction error is encoded using a context-based arithmetic coder that employs 256 symbols and 8 contexts (a run mode is also available to encode flat image regions). For brevity, we do not report here the details of the conditional entropy coding process; the interested reader can refer to [23], [24] for a more detailed technical description.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

5

B. Dataset description The dataset used for the following experiments has been selected in order to provide an adequate degree of variety regarding the scenes being imaged and the hyperspectral sensors. Except for the last dataset, for all the sensors that have been considered, the data are calibrated, and each sample is represented as a signed integer on 16 bits. The first sensor is AVIRIS; we consider the radiance data of four scenes from the 1997 acquisitions, namely Cuprite, Jasper Ridge, Moffett Field and Lunar Lake, which are publicly available on the Internet at aviris.jpl.nasa.gov. These images have been used in many papers addressing hyperspectral image compression, hence their use facilitates comparisons with other algorithms. AVIRIS covers the 0.41-2.45

m spectrum in 10-nm

bands. The instrument consists of four spectrometers that are flown at 20 km altitude with 17 m resolution. Each image has 512 lines, 614 pixels, and 224 bands. The Hyperspectral Digital Imagery Collection Experiment (HYDICE) is an airborne hyperspectral imaging system that provides high spatial and spectral resolution images of the Earth. For each band, an image is built up line by line in a pushbroom scan by the forward



motion of the aircraft. The instrument covers the 0.40-2.50 m spectrum; each image has 210 bands and 307 pixels per line. We consider two scenes, named Terrain and Urban, which contain respectively 500 and 307 lines. These images are publicly available on the Internet at www.tec.army.mil/Hypercube/. The HYMAP sensor provides 128 bands across the reflective solar wavelength region of



0.45-2.5 m with contiguous spectral coverage (except in the atmospheric water vapor bands) and bandwidths between 15-20 nm. We consider one image, with 1362 lines, 512 pixels/line, and 128 bands. For a limited set of experiments, five AVIRIS raw (as opposed to calibrated) images have been employed. These scenes are referred to as Sc0, Sc3, Sc10, Hawaii and Maine. The first three scenes are represented on 16 bits, and have been acquired in 2006 over Yellowstone, WY, while the last two ones are on 12 bits. These images have been made publicly available by the NASA, and can be downloaded at http://compression.jpl.nasa.gov/hyperspectral/. III. I NFORMATION - THEORETIC

ANALYSIS

The goal of the following information-theoretic analysis is to evaluate the potential compression gain that can be achieved if, instead of using only the previous band for

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

6

prediction, multiband prediction is employed. Addressing this problem is very difficult, primarily because the potential gain is going to vary depending on the specific compression algorithm, and because each existing algorithm has its own specific prediction structures. In our analysis, we take a simpler, more general approach. Instead of investigating a specific compression structure, we attempt to answer the following question: “How much information regarding the current band is available in the previous bands?” Since the concept of information is not related to a specific compression algorithm, this approach would lead to a more general answer. The goal of this analysis is to estimate the potential improvement, not to explain why a specific algorithm is more effective that another algorithm, since this would require to tailor the model to specific prediction structures. It will be seen in Sect. V that, when comparing the performance of several existing lossless compression algorithms, the smallest bit-rates are provided by an algorithm that uses only one band for prediction. Indeed, information theory ensures that, given a specific one-band method, there always exists a way to improve its performance using more than one band, as the previous bands always contain “new” information regarding the current band. We use an abstract model of the compression process which leverages on information theory, and aims at estimating the potential gain of multiband prediction. The analysis is based on a simple statistical model of hyperspectral data and of the prediction process; it employs the concept of entropy, which is a measure of information, and hence tells us how much information a given set of data contains regarding another set. It will be seen that, as simple as this model is, it allows to accurately predict the compression gains, and correlates well with the experimental results. Specifically, we regard a hyperspectral image as a set of bands ,

with each band

  

 

an array of pixels at spatial coordinates

  





,

      , i.e.

     . The pixels in each band

as occurrences of an identically distributed random process

 . Since



 

are modeled

the data generated

by most hyperspectral sensors are integers represented on 16 bits per pixel (bpp), we

 is a discrete-valued random process that takes on values in the alphabet         with cardinality     . 16-bit signed integers can be accommodated taking          , or adding  to each sample. The first-order statistical properties are defined through the probabilities      ,        . The average amount of information contained in band is defined [25] as the entropy       ;   is also the minimum bit-rate required for assume that











IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

lossless compression of

7

using an ideal entropy coder that exploits the first-order source

statistics.



The conditional entropy of band

      , where

given band



is defined as







 and   are respectively the joint and conditional probabilities          and         . The conditional entropy    represents the uncertainty left about when is known, and is the minimum 



















by a coder that compresses

achievable bit-rate for lossless compression of first-order statistical information from and

the joint entropy of



(e.g., by predicting

is equal to











from









mimics the compression procedure through which a reference band prediction at rate approximately equal to from

at rate









, and then band



, using a total rate equal to





exploiting

). By definition,



. This formula

is coded with no

is coded with prediction



.

The entropy can be estimated using the occurrence frequency of each value in  , within a given band, as an estimate of the probability





for that band. Analogously, the conditional



entropy can be estimated from the joint probability

  in

occurrences of all pairs







, which can be obtained counting the

, and the marginal probability

   



 .

Conditional entropy can be easily extended to the case of multiple bands used for prediction, which is the scenario of interest in this paper. In particular, we define the



set of

reference bands as the vector









For simplicity, and without loss of gen-



with

  





           



 .







The conditional entropy can be

                       and     . The conditional entropy can also           

 ½  

previous bands for prediction,

; the case of band reordering can be obtained through

a straightforward modification in the definition of written as



is coded using the

erality, we assume that band i.e.

 .





½





 ½  





½  





 

 

½  

,



be written as 







.

This provides a simple algorithm for its estimation; it is sufficient to estimate the two

  -tuple

joint entropies above by counting the occurrence frequency of each

-tuple in the sets









 

estimates of their occurrence probabilities can be obtained as



















 ½  







and

 ½  

½  



and











½  



½  



respectively, as

. Then, the two joint entropies

 ½  





and





 ½  

. Note that an





 ½  

and

  -tuple

in

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING











8

is simply a spectral vector of the original data cube.

In practice, while the entropies





and







can be estimated as described

above, the estimates become very inaccurate when two or more previous bands are used for prediction. The reason is that, as the entropy is conditioned to multiple bands, the set









extremely large, i.e.



takes on values in the alphabet   , whose size can become

 

. As a consequence, a band does not contain enough pixels

to provide statistically meaningful estimates of the probabilities



 ½  





½  

. To

as a source that

address this issue, we choose to not consider each hyperspectral band takes on values in an alphabet with

and

symbols, but rather as a set of 16 binary sources.

These sources are taken as the “bit-planes” of in the binary representation of the pixels of

, i.e. the set of all bits of the same weight



; in particular, the -th bit-plane



(with

    ) is equal to the array of bits of weight of  , i.e.       ¼Ü½   , where denotes right-shift and  denotes bit-wise logical AND. A few remarks have to be made regarding the use of binary as opposed to  -ary sources. 



Using binary sources greatly reduces the alphabet size; therefore, it is possible to obtain accurate entropy estimates and to use them to predict the potential gain of multiband prediction. However, the absolute value of the entropy of the binary source is not very representative of the actual bit-rate obtained by a practical coder. This is because the entropy

 ideal binary entropy coders that would compress each bit-plane separately. However, an  -ary entropy coder

of a binary source is a good indicator of the performance of a set of

will achieve a smaller bit-rate, since it does not neglect the statistical dependencies among different bit-planes. In practice, although a few algorithms employ binary arithmetic coding of the bit-planes (e.g., JPEG 2000), most lossless and near-lossless compression algorithms, such as CALIC and JPEG-LS [26], use

 -ary entropy coders applied to the prediction error.

As a consequence, these algorithms are expected to outperform the entropy estimates based on binary entropy coding. Nevertheless, the main interest here is not to evaluate the absolute performance, but rather to estimate the potential gain, i.e. the performance difference, which is related to the difference of the entropies obtained using single-band and multiband prediction.

 -ary coding varies slowly in the spectral domain, the performance difference is almost the same for the binary and  -ary If one assumes that the performance gap between binary and

source. It will be seen in Sect. V that the practical achieved performance gains are close to the estimates obtained using the binary conditional entropy. It should also be noted that the binary conditional entropy of band

 given one or more

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

9

previous bands depends on the mean-squared error between band

 and the previous bands.

This is due to the fact that we are looking at the correlation of bit-planes of equal weight, and the bit-plane

which is maximized when the number of different bits in a bit-plane of

of same weight in a previous band is minimized. Prediction stages typically compensate for this by modeling the current pixels as being equal to the co-located pixel in the previous band except for an offset and gain (see e.g. CALIC [23]). The linear model is justified by the fact that, when the bands are highly correlated as in the hyperspectral case, this assumption becomes very accurate. We also do the same for the estimation of the conditional



entropy. We consider a model in which, instead of predicting band







,



containing the modified bands





and





minimize 



is estimated from the modified set

















from the set 

 



  



 , with     , where the coefficients

. In the following this is referred to as correlation model

“M1”. We also consider a case where





and



are not constant for each band, but are



constant over non-overlapping 8x8 blocks of pixels of each band, and minimize the meansquared error over each block separately, providing an increased degree of spatial adaptivity. This will be referred to as correlation model “M2”. Finally, it must be remarked that this statistical model of the prediction of band

from

multiple previous bands only accounts for spectral, and not spatial correlation. This is not an issue, since we are mainly interested in estimating the performance gain achievable through better exploitation of the spectral correlation; a practical compression algorithm would obtain an even smaller bit-rate, as the spatial decorrelation stage will remove most of the redundancy left after the spectral decorrelation. In fact, several recent compression algorithms, such as those in [2], [9], [6], show that explicit spatial decorrelation is not always necessary to achieve good performance, if the compression codebooks are properly designed. The algorithm used to estimate the conditional entropy is described using pseudocode in box 1. We denote as

    







the desired number of bands to be used for prediction, and

the bit-planes extracted from those bands and from the original band. The

      

and

function is described in box 2; for brevity, we do not describe the

  

The estimated conditional entropies conditioned to





functions, as they are very simple.

are reported in Tab. I, where

 previous bands for prediction. 







is the entropy

is computed as the sum of the entropies

of all 16 bit-planes, conditioned to the bit-planes of same weight in the previous bands, extracted using correlation model M1 or M2;





is the average entropy over all bands for

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Algorithm 1 Average conditional entropy estimation Set

to the desired number of previous bands to be used for prediction. Set

     do    for      do ×      ×  

10

  .

for

end for

      do      ¼Ü½ for    do ×       ¼Ü½

for 

end for

½      ½   Ë ¾     ½   Ë     if   then    ½ ¾ else

    end if end for end for

  











a given scene. For each band, only the previous bands that are actually available have been used; for example, the first band is not predicted (and hence the entropy is not conditioned), while the second band is predicted only from the first band even if

 , and so on. The

   refers to the average entropy of all bands when no spectral prediction is carried out. The last row ( ) reports the estimated bit-rate gain when all previous bands are used for prediction; this is computed assuming that the gain for   is negligible. case

The results shown in Tab. I exhibit a very consistent behavior. We first discuss model M1, which is the less accurate prediction model. For the four AVIRIS scenes, this model

  ), as opposed to

anticipates that the gain of using one previous band for prediction (

performing no spectral prediction, varies between 2.85 and 3.27 bpp, and is slightly less for HYDICE and HYMAP scenes. This gain is known to be overestimated, as practical

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Algorithm 2

  

11

    

 

   ; ½ ½  ; ; Ë  Ë  ;  ;  for        do Verify whether vector    ½  Ë   is equal to an existing entry   ½  Ë   for some     . if the condition above is true then

    else

    ½    Ë       ½  Ë  ;    ;  

Set end if end for

   for    do         ¾  end for Return



C ONDITIONAL ENTROPIES



TABLE I  COMPUTED USING MODEL

M1

AND



M2, FOR VARIOUS VALUES OF ,

OVER THE

AVIRIS, HYDICE AND HYMAP SCENES .

Cuprite

Jasper Ridge

Moffett Field

Lunar Lake

Terrain

Urban

Hymap



M2

M1

M2

M1

M2

M1

M2

M1

M2

M1

M2

M1

M2

M1

0

9.53

9.53

9.51

9.51

9.91

9.91

9.96

9.96

8.58

8.58

8.32

8.32

8.19

8.19

1

6.32

6.47

6.33

6.62

6.48

7.06

6.47

6.69

5.54

5.90

5.44

5.83

5.25

5.57

2

6.04

6.27

6.16

6.51

6.30

6.96

6.18

6.51

5.35

5.75

5.28

5.70

5.06

5.47

3

5.94

6.22

6.10

6.48

6.25

6.92

6.07

6.46

5.28

5.67

5.22

5.64

5.00

5.44

4

5.88

6.19

6.08

6.46

6.22

6.89

6.01

6.43

5.23

5.63

5.18

5.60

4.97

5.41

5

5.86

6.17

6.06

6.44

6.21

6.87

5.98

6.41

5.21

5.59

5.16

5.58

4.95

5.39

0.46

0.30

0.27

0.18

0.27

0.19

0.49

0.18

0.33

0.31

0.28

0.25

0.30

0.18

½

differences are about 1.8 bpp [10]. The reason lies in the overestimation of the bit-rate due to the use of a binary model, which is especially strong when no spectral prediction is made (

  ). When one more band is used for prediction (i.e.,   ), the additional gain is

between 0.1 and 0.2 bpp. In this case the model provides a reasonably accurate estimate, which is consistent with the results in [10]. Adding more bands yields further gains of about

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

0.04, 0.02 and 0.01 bpp for

12

 equal to 3, 4 and 5. The gain tends to decrease as  increases,

because bands that are further away in the wavelength domain are generally less correlated, and because the amount of innovation they provide becomes small as the set of bands used for prediction gets large. When model M2 is used, the prediction is more accurate. For AVIRIS scenes, the gain from

   to   is between 0.17 and 0.29 bpp, which is extremely similar to the results

in [10], and validates the proposed model for the estimation of the multiband bit-rate gain. Adding more bands yields further gains of about 0.06, 0.04 and 0.02 bpp for

 equal to 3, 4

and 5. While this behavior is essentially the same as model M1, it can be seen that a better predictor yields a larger potential gain. The model foresees that, for lossless compression,



the asymptotic gain of using all previous bands for prediction ( ), as opposed to using only the previous band, would be about 0.3 bpp. It is interesting to note that the results provided by model M2 are little dependent on the block size. E.g., for Cuprite, the following considerations can be made. If we consider 8x8 blocks, Tab. I reports, for

   and   , conditional entropies equal to 6.32 bpp and

5.94 bpp, with an expected gain of 0.36 bpp. Using 16x16 blocks, the entropies become 6.34 bpp and 5.98 bpp, with an expected gain of 0.36 bpp. With 4x4 blocks, the entropies are 6.25 bpp and 5.85 bpp, with a gain of 0.40 bpp. As the block size decreases, the entropies become smaller because the prediction improves. However, the differences in expected gain are minor. It is also interesting to look at the entropies and conditional entropies of each bit-plane, instead of the sum of their values over all bitplanes; these are shown in Fig. 2 for the Cuprite scene, using correlation model M2. As expected, the three least significant bit-planes are very noisy, have near-maximum entropy, and are almost uncorrelated with the co-located bit-planes in the previous bands. This is consistent with other existing algorithms, in which entropy coding of bit-planes can be switched off for the least significant ones [16], [27]. The fourth bit-plane is weakly correlated, while the other most significant bit-planes exhibit significant rate and correlation. Note that the three most significant bit-planes are very correlated, but their entropy is almost zero because symbol “0” occurs with probability almost 1. IV. P ROPOSED

ALGORITHM

The objective of the proposed algorithm is to improve the CALIC predictor by taking into account information from previous bands. This is done not through explicit multiband

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

13

1

H

0

H

0.9

1

H

2

0.8

H3 H

4

0.7

H

Entropy (bpp)

5

0.6 0.5 0.4 0.3 0.2 0.1 0 0

5

10

15

Bit−plane index

Fig. 2.

Entropy

¼ and conditional entropy 



for each bit-plane, using correlation model M2. The least significant

bit-plane has index 0.

prediction, but employing a prediction process that updates statistical image parameters at every new band, keeping memory of the statistics of some previous bands. The predictor is based on the idea of continuously updating some state variables as a new band is coded. This process is causal, so that no ancillary information has to be sent to the decoder, which can repeat exactly the same computations. Since the predictor is obtained using a discrete Kalman filter [28], in the following we briefly review its basic concept. A. A brief review of Kalman filtering



The Kalman filter is employed to estimate the non-measurable components of the state

 Ê  , with Ê the set of real numbers, of a discrete-time random process described by the

following linear stochastic difference equation:



using a measurement



 Ê

   such that      . The random variables  





(2) and



represent the process and measurement noise; they are assumed to be independent of each other, white, normally distributed, with zero mean value, and covariance matrix

and

!

 is a   matrix describing the transition of the state from step    to step  . The matrix  describes the relation between an optional control input  (which we will respectively.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

14

 . The matrix  relates the state to the measurement. It should be noted that the matrixes  and  could also depend on  . The Kalman filter allows to compute an a posteriori estimate  of the state  as a linear combination of an a priori estimate  and the difference between the measurement  and the measurement prediction   , as     "     where " is called Kalman gain, and the a priori estimate  is obtained from the a posteriori not use) and the state



estimate at the previous step as























The Kalman gain is chosen so as to minimize the covariance

#

(3)



of the prediction error

   , through the following set of equations:      (4) "       ! (5)   $  "  (6) where it can be noted that the Kalman gain " is adapted at each step  depending on the 







previous state. It should be noted that other filtering structures could also be used. For example, as has been seen in Tab. I, the additional information provided by bands further away than the fifth previous band is very small. Therefore, one could employ an adaptive filter of order five instead of a Kalman filter. If a 5-band predictor were used, the performance would arguably be very similar to the Kalman filter. While an adaptive filter is generally less complex than a Kalman filter, as will be seen in the following, we employ a very simple Kalman filter. B. Proposed algorithm: prediction through Kalman filtering The proposed algorithm is based on the use of a Kalman filter for spectral prediction; in the following we refer to this algorithm as the Kalman spectral prediction (KSP) algorithm. Several remarks can be made regarding the application of the Kalman filtering framework to this compression problem; these remarks help us identify the required characteristics of the filter, and drive its design.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

15

First, it should be noted that a Kalman filter can be used in more than one ways. In [29] it is shown that the Kalman filter can be seen as a recursive filter, by letting



be the

hyperspectral image pixels, and the a posteriori estimation of the state be the coefficients of a recursive least-squares filter. In this paper, however, we take a different approach. In particular, we take the decoder’s point of view; for the decoder, the current pixel is unknown and can thus be regarded as the unknown state. On the other hand, a predictor can become the measurement, which is noisy as the prediction error tends to become white if the predictor is



“good”. Therefore, we define the current pixel as the unknown state estimate



 , and the a posteriori

as predictor; as measurement, we employ another “external” predictor. As will be

seen in the following, this approach allows to employ a simplified, lower-complexity version of the Kalman filter. Moreover, the linear model is not always adequate for prediction. For example, it is known that non-linear predictors are typically better at capturing the spatial correlation in natural images, as they contain both edges and flat regions, which cannot be accurately described by a single linear predictor [23], [26]. On the other hand, a linear model is almost always adequate for spectral prediction, because of the high degree of correlation between adjacent bands; however, the parameters of the linear model should be adapted for each band, as in [24]. In our compression problem, the state to be estimated is the vector of pixels

in band

 of a hyperspectral image. In (2) a given pixel of band  is related with all the pixels of band    through the matrix . That is, besides describing the spectral correlation between a given pixel in band  and the co-located pixel in band   ,  also accounts for some spatial correlation. On the other hand, as has been said, a linear spatial correlation model is not very well suited to image data; moreover, the matrix formulation of the Kalman filter has high complexity. These remarks have led us to design a predictor which consists of a set of scalar Kalman filters, each of which predicts a pixel

  at given spatial coordinates

 ; as will be seen shortly, this does not mean that spatial prediction is not performed.



Instead, spatial prediction is exploited in the a priori estimation stage, i.e. the computation of



.

It should also be noted that, in this setting, the previous bands are not used explicitly for prediction, as would occur with an adaptive filter, but are rather used implicitly in (4), (5) and (6), where the statistics used in the Kalman gain are computed from the pixel values of all the previous bands.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Compute a priori estimate

Compute

α j ( x, y )

16

3D-CALIC interband predictor

z j ( x, y )

Compute Kalman gain

K j ( x, y )

Band j

Band j-1

Compute predictor

Cj-1(x,y)

Cj(x,y)

Xˆ j ( x, y ) (x,y)

(x,y)

Kalman filter buffer Pj ( x, y )

Predictors for band j

Predictors for band j-1

(x,y)

(x,y)

(x,y)

Memory Fig. 3.

Block diagram of the proposed algorithm.

Specifically, the compression algorithm works as follows. We wish to predict the pixel of spatial coordinates

  in band  , i.e.

 . The value of this pixel is modeled as the

unknown state of a Kalman filter such that:

           with    a suitable scalar number, and     an independent and identically distributed   is obtained through the Kalman (i.i.d.) zero-mean Gaussian process. The prediction filter equation:

  

where we have taken



   "      

 

We still need to provide an a priori estimate



 



and a measurement

Consistently with (3), the a priori estimate is taken in linear form, i.e.

 

(7)

  $ , i.e. we assume that the measurement is equal to the true pixel

value plus an additive white gaussian noise.





  .   

 . The coefficient  , which models the correlation between a pixel

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

17

 and the co-located pixel in band   , is taken as             (8)           Other formulas could also be used for   , see e.g. [30]. As can be seen,    is   and   respectively, based on the left the ratio of two causal estimators of and top pixels in bands  and   . Therefore, it can be seen that the set of Kalman filters, one for each pixel   , employed by the algorithm, are scalar, but do consider the spatial correlation of the image. Indeed, the definition of    involves the pixel values at spatial coordinates other than   . in band

Since a linear model is not always effective, we would like the predictor to comprise



also some nonlinear component; however, the Kalman filtering framework constrains the a

  to be a linear function of the previous band. We overcome this with a proper definition of the measurement    , which is taken as a non-linear function.  , where   is the predictor of pixel   Specifically, we take      priori estimate









obtained using 3D-CALIC as described in Sect. II-A. For each Kalman filter, the Kalman gain

"   is computed using a scalar version of

(4), (5) and (6), and specifically:

      

   %   "        %          "    

(9)



(10)



(11)

In other terms, first the prediction error covariance at the previous step is used to predict the a priori estimate

  . Then, the a priori estimate is used to compute the Kalman gain

"   for the current step. Finally, the Kalman gain is used to update the a priori estimate   to an a posteriori estimate     to be used in the next step. The quantities % and %    are running estimates at step  of the variance of and !, i.e. of        and       respectively. Note that, since causal prediction   is not used, the computation of   employs %   and requires that %  .   of each pixel. The procedure described above is used to compute a prediction     . The prediction Then, the prediction error is computed as #     













errors are coded following the same context definition and arithmetic coding steps in [10].

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

18

Algorithm 3 KSP algorithm

   do Compute    using (8).

for all pixels

Compute the a priori estimate

       ½   .

   applying the 3D-CALIC predictor. Compute    using (9). Compute the Kalman gain    using (10). Compute    using (11);    will be used at step   . Compute the predicted value    using (7). Compute the prediction error          . Update the estimates    and    with the error terms        ½   and       respectively. Perform context-based entropy coding of    . Compute 

end for

The first band, for which no prediction is available, is encoded using the CALIC intraband predictor [23]. The operation sequence of the KSP algorithm is summarized in box 3. The operations as well as the memory management are sketched in Fig. 3. C. Context-based arithmetic coding For the KSP algorithm, we employ the same context-based entropy coding stage used in [10], identical to the 3D-CALIC, and based on arithmetic coding. The only difference lies in the threshold setting for context selection; the thresholds are set to

&

  ,

&  , & =40

to accommodate 16-bit data, as in [10]. More details can be found in [10], [24]. V. E XPERIMENTAL

RESULTS

In a first experiment, the proposed KSP compression algorithm has been compared with several state-of-the-art lossless compression algorithms on the Terrain, Urban and Hymap test images. The algorithms are listed below. It should be noted that some of these algorithms work in band-sequential (BSQ) format, and others in band-interleaved-by-line (BIL), or can use both formats. Since KSP uses the BSQ format, the results reported for these algorithms refer to the BSQ case, unless otherwise stated. The CALIC-based algorithms can use both BSQ and BIL formats; the performance on BIL data is slightly poorer, as reported in [10].

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

19

 JPEG-LS [26], the latest ISO standard for 2D lossless and near-lossless compression. This algorithm uses non-linear prediction and Golomb-Rice codes, and provides good compression performance with moderate complexity.

 2D-CALIC [23], which performs 2D lossless and near-lossless compression using nonlinear prediction and arithmetic coding. It achieves very good compression performance, with relatively high complexity.

 3D-CALIC [24], which extends 2D-CALIC in such a way that the predictor also performs spectral decorrelation using the previous band as reference.

 M-CALIC [10], which extends 3D-CALIC using a linear combination of the two previous bands as reference.

 The CCSDS lossless data compression standard (CCSDS-Rice [31], now also ISO standard), which performs lossless data compression using Golomb-Rice codes. We used the optional 1D predictor and mapper to improve compression efficiency. This algorithm provides average performance with very low complexity.

 The JPEG 2000 part 2 ISO standard [32], which performs 3D lossless compression using integer wavelets. It achieves good performance, with very high complexity. Tab. II reports the bit-rates achieved by all algorithms on the three test images. Comparing JPEG-LS and 2D-CALIC, it can be seen that the latter consistently outperforms the former, mainly due to the use of an arithmetic coder. The performance difference between 2D-CALIC and 3D-CALIC is significant, and witnesses the importance of spectral decorrelation in the compression process. JPEG 2000 Part 2 was run with a spectral integer wavelet transform that spans blocks of

 bands, with  equal to 8 and 12. The number of decomposition levels was

set to five in both the spatial and spectral domains. Its performance is good, but still quite far from 3D-CALIC. The CCSDS-Rice algorithm yields the highest bit-rates, consistently with the fact that it employs a simple one-dimensional decorrelator. The proposed KSP algorithm consistently outperforms all other algorithms. The performance gain with respect to 3D-CALIC is up to 0.15 bpp. KSP outperforms M-CALIC by about 0.07 bpp, taking better advantage of the spectral correlation. It should be noted that the gain is slightly smaller than what can be predicted from Tab. I, which ranges from 0.08 to 0.15 bpp between

  and   . Using model M1, the average gain should be 0.12,

so that the algorithm is suboptimal by about 0.05 bpp. We have also compared the performance of the KSP algorithm with that of other stateof-the-art techniques for which compression results are available in the literature. For these

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

20

TABLE II B IT- RATES ( BPP ) ACHIEVED BY VARIOUS ALGORITHMS FOR LOSSLESS COMPRESSION OF

THE

Terrain, Urban AND

Hymap TEST IMAGES .

Terrain

Urban

Hymap

JPEG-LS

6.84

6.80

6.89

2D-CALIC

6.47

6.37

6.78

3D-CALIC

4.10

4.06

4.06

JPEG 2000 Part 2 (B=8)

5.71

5.69

5.64

JPEG 2000 Part 2 (B=12)

5.08

5.06

5.20

CCSDS-Rice

7.52

7.40

7.27

M-CALIC

4.13

4.11

3.96

KSP

4.05

4.04

3.91

comparisons, the full Cuprite, Jasper Ridge, Lunar Lake and Moffett Field images have been used. The following algorithms have been compared with KSP. For their bit-rates, we have used results available in the literature, converting the compression ratios to bit-rates. Again, we are mostly interested in BSQ results, though some of the techniques listed below can use the BIL format, as noted below.

 The JPEG-LS standard.  Differential JPEG-LS, i.e. JPEG-LS run on the difference between each band and the previous band.

 JPEG 2000 part 1 [16], which performs 2D lossless compression using integer wavelets, and differential JPEG 2000, i.e. applied to the difference between adjacent bands.

 The LP, SLSQ and SLSQ-HEU prediction-based algorithms proposed in [14]. We report the BSQ results for these algorithms, although the performance loss for using BIL data is minor [14].

 The BH block-based compression algorithm proposed in [13].  The C-DPCM algorithm proposed in [9].  The look-up table based algorithm (LUT) proposed in [18]. This algorithm uses the BIL format.

 The ABPCNEF algorithm proposed in [22], as well as the version using band reordering (R-ABPCNEF).

 The spectral relaxation labelled prediction (S-RLP) and spectral fuzzy-matching pursuit (S-FMP) algorithms proposed in [6].

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

21

 The adaptive filtering (AF) method proposed in [7], which employs the previous band for prediction, but adapts the predictor coefficients using recursive estimation. Note that, for this method, the last scene of each image is not included.

 The locally averaged interband scaling look-up table (LAIS-LUT) method proposed in [30]. This algorithm uses the BIL format.

 The M-CALIC algorithm [10]. The results are shown in Tab. III. The LAIS-LUT algorithm has the best overall performance. The C-DPCM, S-RLP and S-FMP algorithms are also very close to LAISLUT. C-DPCM is known to be one of the most effective lossless compression algorithms for hyperspectral data at the time of this writing; however, its complexity is high. The LUT algorithm also shows very high performance. Interestingly, this algorithm has also low complexity; the look-up tables have a memory occupancy of







bits, e.g., about 0.125

MBytes [18]. With the exception of the algorithms mentioned above, the KSP algorithm ranks best out of all the other algorithms. Interestingly, its performance is better than any other prediction- and transform-based algorithm that employs only the previous band for prediction. The average bit-rates of these algorithms range between 5.08 bpp for SLSQ and 5.38 bpp for LP; the bit-rate saving of KSP is hence about 0.16-0.46 bpp, which is very consistent with the gains estimated using the theoretical model and reported in Tab. I. The maximum gain with respect to 3D-CALIC is about 0.3 bpp, almost coincident with the theoretical maximum gain estimated using the theoretical model, although on the HYDICE scenes the performance gain is smaller. The performance of R-ABPCNEF comes close to KSP; however, R-ABPCNEF employs band reordering, which adds complexity to the algorithm, and requires that all bands are stored in memory. Although we do not have an implementation of KSP using band reordering, we expect that the performance improvement would be similar to the difference between ABPCNEF and R-ABPCNEF, i.e. about 0.2 bpp. The performance gain of KSP with respect to M-CALIC is only marginal on the AVIRIS scenes. The entropies in Tab. I estimate a hypotetical gain of about 0.09 bpp using model M1; however, the actual gain is only 0.01 bpp, leading to a slightly larger suboptimality than the Terrain, Urban and Hymap test images. It should also be noted that some of the M-CALIC parameters (i.e., the linear combination coefficients and some of the thresholds in the context definitions) are optimized for AVIRIS data, hence better performance is expected on these data.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

22

It is also interesting to compare the KSP and M-CALIC results to the AF algorithm. AF uses as “measurement” vector for the current band the set of three samples including the top and left samples in the current band, and the co-located sample in the previous band. The scalar product between the measurement vector and the filter coefficients yields the prediction error coefficients; the measurement vector is then used to update the filter coefficients for the next band. It can be seen that AF is somewhat similar in spirit to KSP, as it updates the coefficients of the spectral (and also spatial) predictor based on past measurements, and employs a scalar filter (one for each band). While AF seems simpler than KSP, though, its compression performance is not as good as KSP and M-CALIC. Finally, it should be noted that the most effective algorithm, i.e. LAIS-LUT, only employs the previous band for prediction. TABLE III B IT- RATES ( BPP ) ACHIEVED BY VARIOUS ALGORITHMS FOR LOSSLESS COMPRESSION ON THE COMPLETE AVIRIS IMAGES .

Jasper

Lunar

Cuprite

Moffett

Average

JPEG-LS

8.38

7.48

7.66

8.04

7.89

Diff. JPEG-LS

5.69

5.46

5.50

5.63

5.57

JPEG 2000

8.99

8.16

8.38

8.79

8.58

Diff. JPEG 2000

5.67

5.44

5.48

n.a.

n.a.

2D-CALIC

7.54

6.74

6.86

7.29

7.11

3D-CALIC

5.15

5.20

5.18

5.11

5.16

LP

5.44

5.25

5.28

5.56

5.38

SLSQ

5.08

5.08

5.08

5.10

5.08

SLSQ-HEU

4.97

4.97

4.95

5.00

4.97

BH

5.23

5.11

5.11

5.26

5.18

C-DPCM

4.62

4.75

4.68

4.62

4.67

LUT

4.95

4.71

4.65

5.05

4.84

ABPCNEF

5.18

5.23

5.21

n.a.

n.a.

R-ABPCNEF

5.03

5.06

4.94

n.a.

n.a.

S-RLP

4.65

4.69

4.69

4.67

4.68

S-FMP

4.63

4.66

4.66

4.63

4.65

AF

5.04

4.97

4.95

4.99

4.99

LAIS-LUT

4.68

4.53

4.47

4.76

4.61

M-CALIC

5.00

4.91

4.89

4.92

4.93

KSP

4.96

4.90

4.88

4.92

4.92

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

23

The KSP and LUT algorithms have also been compared on five AVIRIS raw scenes. Although, as will be seen in the following, the complexity of the KSP algorithm is not suitable for on-board compression, KSP can be effectively used to archive raw data at the ground station. The results are shown in Tab. IV. As can be seen, the LUT algorithm has on the raw data poor performance, whereas KSP performs nicely. The fact that the LUT algorithm does not work well on raw data has also been noted in [33]. In that paper, the authors compare the performance of various LUT-based algorithms with that of S-FMP on a MIVIS scene in raw and calibrated format. They find that, on the calibrated scene, LUT outperforms S-FMP by 0.65 bpp, whereas on the raw scene S-FMP outperforms LUT by about 1 bpp. The results in Tab. IV on AVIRIS scenes are similar, with an average performance loss of LUT with respect to KSP of 0.75 and 0.49 bpp on the 16-bit and 12-bit scenes respectively. TABLE IV B IT- RATES ( BPP ) ACHIEVED BY KSP AND LUT FOR LOSSLESS COMPRESSION ON THE AVIRIS RAW IMAGES .

Scene

KSP

LUT

Sc0

6.34

7.13

Sc3

6.16

6.91

Sc10

5.53

6.25

Hawaii

2.84

3.27

Maine

2.90

3.44

Finally, we have performed an experiment aimed at verifying if one could replace the 3DCALIC predictor with another predictor as measurement in the KSP scheme. In particular, we have attempted to use the LUT predictor [18]. However, this did not work well, and ended up significantly worsening the KSP performance. The problem lies in the different statistical characteristics of the prediction error generated by LUT and 3D-CALIC. In particular, the LUT prediction error has higher variance, but very few outliers. The 3D-CALIC prediction error has a smaller overall variance, but has several outliers that require a lot of bits to be coded, which explains why, on calibrated images, LUT outperforms 3D-CALIC. When one uses the LUT predictor as measurement, the higher error variance makes this measurement worse than the 3D-CALIC predictor. In fact, for KSP the outliers are not a serious problem, as they are smoothed by the Kalman filter. At the same time, using LUT leads to a KSP prediction error with different statistical characteristics than using the 3D-CALIC predictor. Just as the entropy coding stage of LUT and 3D-CALIC employ different statistical models

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

24

to account for the statistical characteristics of the respective prediction errors, if one wants to use the LUT predictor as measurement, one needs to properly redesign the KSP entropy coding stage. In summary, this shows that, while the 3D-CALIC predictor is not necessarily the optimal measurement, switching to another measurement should be done carefully. As for complexity, we have run some of the algorithms on a Pentium IV dual-processor workstation running Linux, and measured the encoding times. The results shown in Tab. V are normalized to the running time of JPEG-LS, and are expressed as multiples of it. As can be seen, besides providing a smaller bit-rate, the KSP algorithm is more than 10% faster than 3D-CALIC. In comparison with other algorithms, in [9] it is stated that C-DPCM is about 20 times more complex than differential JPEG-LS; KSP is about 25 times more complex than JPEG-LS, which probably makes it slightly slower than C-DPCM. Profiling the KSP algorithm reveals that the complexity lies in the interband prediction function (about 54%), arithmetic coding (about 26%), and in the computation of the correlation coefficient (about 16%). The complexity of the scalar Kalman filter is extremely small if compared to the other functions. This is also substantiated by the fact that KSP is slightly faster than 3D-CALIC. For efficient implementation, range coding (see e.g. [18]) or Golomb-Rice coding [31] could be used instead of arithmetic coding. Moreover, it would be useful to approximate the correlation coefficient with a simpler correlation metric. TABLE V A LGORITHM RUNNING TIMES ( NORMALIZED TO JPEG-LS).

Algorithm

Complexity

JPEG-LS

1

2D-CALIC

5.9

3D-CALIC

28.4

KSP

25.2

VI. C ONCLUSIONS We have proposed a theoretical model that allows to estimate the compression efficiency gain that can be achieved when multiple bands are exploited for spectral prediction of hyperspectral images. The model employs information-theoretic concepts and, in particular, the conditional entropy between each band and a set of previous bands. It provides an estimate of the gain of multiband prediction around 0.3 bpp.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

25

Moreover, we have proposed a lossless compression algorithm that employs Kalman filtering using a state space model, and uses the CALIC context-based arithmetic coding engine. The performance of this algorithm is in line with the theoretical expectations, with a maximum gain of about 0.3 bpp with respect to 3D-CALIC; its complexity is about 10% smaller. R EFERENCES [1] R.E. Roger and M.C. Cavenor, “Lossless compression of AVIRIS images,” IEEE Transactions on Image Processing, vol. 5, no. 5, pp. 713–719, May 1996. [2] J. Mielikainen, A. Kaarna, and P. Toivanen, “Lossless hyperspectral image compression via linear prediction,” in Proc. SPIE, vol. 4725, 2002. [3] B. Aiazzi, P. Alba, L. Alparone, and S. Baronti, “Lossless compression of multi/hyperspectral imagery based on a 3-D fuzzy prediction,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 5, pp. 2287–2294, Sept. 1999. [4] S.K. Jain and D.A. Adjeroh, “Edge-based prediction for lossless compression of hyperspectral images,” in Proc. IEEE Data Compression Conference, 2007. [5] B. Aiazzi, L. Alparone, and S. Baronti, “Near-lossless compression of 3-D optical data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 39, no. 11, pp. 2547–2557, Nov. 2001. [6] B. Aiazzi, L. Alparone, S. Baronti, and C. Lastri, “Crisp and fuzzy adaptive spectral predictions for lossless and near-lossless compression of hyperspectral imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 4, no. 4, pp. 532–536, Oct. 2007. [7] M. Klimesh,

“Low-complexity lossless compression of hyperspectral imagery via adaptive filtering,”

in The

Interplanetary Network Progress Report, 2005. [8] M.J. Ryan and J.F. Arnold, “The lossless compression of AVIRIS images by vector quantization,” IEEE Transactions on Geoscience and Remote Sensing, vol. 35, no. 3, pp. 546–550, May 1997. [9] J. Mielikainen and P. Toivanen, “Clustered DPCM for the lossless compression of hyperspectral images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 12, pp. 2943–2946, Dec. 2003. [10] E. Magli, G. Olmo, and E. Quacchio, “Optimized onboard lossless and near-lossless compression of hyperspectral data using CALIC,” IEEE Geoscience and Remote Sensing Letters, vol. 1, no. 1, pp. 21–25, Jan. 2004. [11] H. Wang, S.D. Babacan, and K. Sayood, “Lossless hyperspectral image compression using context-based conditional averages,” in Proc. of IEEE Data Compression Conference, 2005. [12] E. Magli, M. Barni, A. Abrardo, and M. Grangetto, “Distributed source coding techniques for lossless compression of hyperspectral images,” EURASIP Journal on Advances in Signal Processing, vol. 2007, 2007. [13] M. Slyz and L. Zhang, “A block-based inter-band lossless hyperspectral image compressor,” in Proc. of IEEE Data Compression Conference, 2005, pp. 427–436. [14] F. Rizzo, B. Carpentieri, G. Motta, and J.A. Storer, “Low-complexity lossless compression of hyperspectral imagery via linear prediction,” IEEE Signal Processing Letters, vol. 12, no. 2, pp. 138–141, Feb. 2005. [15] B. Penna, T. Tillo, E. Magli, and G. Olmo, “Progressive 3D coding of hyperspectral images based on JPEG 2000,” IEEE Geoscience and Remote Sensing Letters, vol. 3, no. 1, pp. 125–129, Jan. 2006. [16] D.S. Taubman and M.W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards, and Practice, Kluwer, 2001.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

26

[17] B. Penna, T. Tillo, E. Magli, and G. Olmo, “Transform coding techniques for lossy hyperspectral data compression,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 5, pp. 1408–1421, May 2007. [18] J. Mielikainen, “Lossless compression of hyperspectral images using lookup tables,” IEEE Signal Processing Letters, vol. 13, no. 3, pp. 157–160, Mar. 2006. [19] H. Wang, S.D. Babacan, and K. Sayood, “Lossless hyperspectral-image compression using context-based conditional average,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 12, pp. 4187–4193, Dec. 2007. [20] S.R. Tate, “Band ordering in lossless compression of multispectral images,” IEEE Transactions on Computers, vol. 46, no. 4, pp. 477–483, 1997. [21] P. Toivanen, O. Kubasova, and J. Mielikainen, “Correlation-based band-ordering heuristic for lossless compression of hyperspectral sounder data,” IEEE Geoscience and Remote Sensing Letters, vol. 2, no. 1, pp. 50–54, Jan. 2005. [22] J. Zhang and G. Liu, “An efficient reordering prediction-based lossless compression algorithm for hyperspectral images,” IEEE Geoscience and Remote Sensing Letters, vol. 4, no. 2, pp. 283–287, Apr. 2007. [23] X. Wu and N. Memon, “Context-based, adaptive, lossless image coding,” IEEE Transactions on Communications, vol. 45, no. 4, pp. 437–444, Apr. 1997. [24] X. Wu and N. Memon, “Context-based lossless interband compression - extending CALIC,” IEEE Transactions on Image Processing, vol. 9, no. 6, pp. 994–1001, June 2000. [25] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, 1991. [26] M.J. Weinberger, G. Seroussi, , and G. Sapiro, “The LOCO-I lossless image compression algorithm: Principles and standardization into JPEG-LS,” IEEE Transactions on Image Processing, vol. 9, no. 8, pp. 1309–1324, Aug. 2000. [27] A. Said and W.A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243–250, June 1996. [28] R.E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45, 1960. [29] T.K. Moon and W.C. Stirling, Mathematical Methods and Algorithms for Signal Processing, Prentince Hall, 2000. [30] B. Huang and Y. Sriraja, “Lossless compression of hyperspectral imagery via lookup tables with predictor selection,” in Proc. SPIE, vol. 6365, 2006. [31] Lossless Data Compression, CCSDS-121.0-B-1 Blue Book, May 1997. [32] JPEG 2000 Part 2 - Extensions, 2004, Document ISO/IEC 15444-2. [33] L. Santurri, B. Aiazzi, L. Alparone, S. Baronti, and C. Lastri, “On-board lossless hyperspectral data compression: LUT-based or classified spectral prediction?,” in Proc. of On-board Payload Data Compression Workshop, ESTEC, The Netherlands, 2008.