Sparse Representations for Online Learning based Hyperspectral Image Compression İREM ÜLKÜ,1,* BEHÇET UĞUR TÖREYİN2 1
Department of Electrical and Electronics Engineering, Çankaya University, 06790 Etimesgut, Ankara, Turkey Informatics Institute, İstanbul Technical University, 34469, Maslak, İstanbul, Turkey
2
*Corresponding author:
[email protected]
Sparse models provide data representations in fewest possible number of non-zero elements. This inherent characteristic enables sparse models to be utilized for data compression purposes. Hyperspectral data is large in size. In this paper, a framework for sparsity based hyperspectral image compression methods using online learning is proposed. There are various sparse optimization models. A comparative analysis of sparse representations in terms of their hyperspectral image compression performances is presented. For this purpose, online learning based hyperspectral image compression methods are proposed using four different sparse representations. Results indicate that, independent of the sparsity models; online learning based hyperspectral data compression schemes yield best compression performances for data rates of 0.1 and 0.3 bits-per-sample, compared to other state-of-the-art hyperspectral data compression techniques, in terms of image quality measured as average peak signal-to-noise ratio (PSNR).
1. INTRODUCTION Hyperspectral sensors acquire reflected or emitted energies in hundreds of narrow electro-magnetic frequency bands (with bandwidths typically ~10 nm) from objects within their viewing range [1]. The acquired data is stored as hyperspectral image cubes with spatial and spectral content (cf. Fig.1). Due to the spectral insight gained by hyperspectral imaging, it is used in many applications, such as material classification, target/anomaly detection, food inspection, harvest estimates, etc. [2]. Hyperspectral imaging comes with the cost of vast data size. Hyperspectral image compression plays an important role in effective utilization of transmission and storage resources. Compression schemes, be it lossy or lossless, such as, Karhunen-Loeve Transform (KLT) or Principal Component Analysis (PCA), discrete wavelet transform (DWT), discrete cosine transform (DCT) and vector quantization (VQ) make use of the inherent highly correlated nature of hyperspectral data [3,4]. Data decorrelation can be achieved by performing PCA. However, the computational cost of PCA is high. In order to cope with the computational burden, an alternative reconstruction strategy, which shifts the computational load towards decoder side, called Compressive-Projection Principal Component Analysis (CPPCA) was proposed in [5]. The CPPCA technique was also utilized to reconstruct hyperspectral imagery (HSI) yielding satisfactory results in terms of computational cost and image quality [6]. Recently, data-specific methods based on sparse coding and dictionary learning algorithms were proposed for compression and analysis of HSI [7-9]. In dictionary learning based methods, an initial dictionary of basic elements are updated according to data. This update mechanism makes it possible to represent data with a few number of dictionary elements. This is called the sparsity condition. This condition can be imposed by using l0 norm of the data vector which is nothing but the number of nonzero vector elements. In order to obtain the sparsest representation, it is necessary to minimize the l0 norm [10]. However, l0 norm minimization problems are NP-hard [11]. Hence, instead of l0 norm, a more relaxed
representation, such as l1 norm, can also be used as a notion of sparsity and more efficient solutions can be obtained [10]. Indeed, by relaxing l0 norm to l1 norm, it is possible to solve the resulted problem with convex optimization methods [11]. Sparse representations can be grouped as three categories, namely, greedy pursuit algorithms, lp norm regularization based algorithms and iterative shrinkage algorithms [12]. Greedy pursuit algorithms aim at minimizing l0 norm. The one that acts as a pioneer work in this category is the Matching Pursuit (MP) algorithm [13]. Dictionary elements are selected from the overall dictionary, one at a time, at each step. This strategy may result in choosing the same element more than once. In order to fix this inefficiency, Orthogonal Matching Pursuit (OMP) representation is proposed [13, 14]. To guarantee that no dictionary element is selected more than once, in OMP, data vector is projected onto the subspace spanned by dictionary elements [12]. When compared to MP representation, although OMP representation converges after less number of iterations, it has a higher computational complexity [12]. Both MP and OMP representations act on just one dictionary element at a time. One additional strategy is developed to reduce the complexity and accelerate the execution time of OMP representation. It is called Generalized Orthogonal Matching Pursuit (gOMP) and it is aimed at selecting multiple elements at each iteration [15]. OMP representation can be considered as a special case of gOMP representation [15]. In this paper, a gOMP sparse representation based hyperspectral coding scheme is proposed rather than an OMP based one. The basic difference between lp norm regularization based algorithms and Greedy pursuit algorithms is that lp norm is minimized rather than l0 norm in lp norm regularization based algorithms. Basis Pursuit (BP) algorithm is the most prominent example of lp norm regularization based techniques [16]. In contrast to the NP-hard l0 norm problems, taking p as unity, l1 regularized problems can be considered as convex quadratic problems with linear inequality constraints [17]. In this category, Specialized Interior-Point (SIP) representation that is proposed to solve l1 regularized problems achieves effective results for large-data-size cases [17]. SIP representation uses preconditioned conjugate gradients algorithm to find search step [17]. Another popular representation in this category
is the Least Absolute Shrinkage and Selection Operator (LASSO) representation [18]. For the solution of LASSO representation, Alternating Direction Method of Multipliers (ADMM) algorithm is developed recently by realizing ridge regression iteratively [19]. This algorithm is developed for distributed convex optimization problems with the idea of utilizing solutions of small local sub-problems to solve a bigger global one [19]. In this paper, since it is well suited to various large scale problems, LASSO representation based hyperspectral image coding is also proposed, and the problem is solved via ADMM algorithm. Another representation called Bayesian Compressive Sensing (BCS) can be classified under the category of lp norm regularization based algorithms [20]. According to this representation, original 2D image data is divided into small blocks. Each block is sampled and processed independently [21]. The image recovery method of the BCS representation is based on Projected Landweber (PL) iteration which is composed of Iterative Shrinkage Thresholding (IST) and Projection onto Convex Sets (POCS) successively. If spatial domain 2D Wiener filter is also applied for the smoothing operation, then BCS recovery method is called as BCS-SPL [21, 22]. However, in some cases, Wiener filtering causes a blurring effect on the sharp edges that degrades image details. In order to get a more effective representation, Wiener filter is not applied. The resulted representation is called Block Compressed Sensing algorithm using Projected Landweber based on three-Dimensional Bivariate Shrinkage (BCS PL-3DBS) [23]. Furthermore, 3D Wavelet Packet Transform (3D WPT) is also tailored in BCS PL-3DBS such that data decorrelation is improved. This representation is called BCS PL-3DBS + 3DWPT [23]. In this paper, an online sparse coding based hyperspectral image compression framework is proposed. As pointed out in the discussions above, there are various representative sparse optimization models. To the best of authors’ knowledge, this is the first study that proposes an online sparse coding based hyperspectral image compression framework using Specialized Interior Point Method (SIP), Least Absolute Shrinkage and Selection Operator (LASSO), Basis Pursuit (BP) and generalized Orthogonal Matching Pursuit (gOMP) representations. Comparative results are also provided with state-of-the-art hyperspectral image compression methods. Note that, the motivation behind the proposed research is to develop compression algorithms that represent the hyperspectral image cube using fewer number of dictionary elements in comparison to existing methods, while maintaining a higher image quality. The motivation is not to propose computationally efficient hyperspectral image compression methods. Hence, issues related to the computational load are not addressed in this paper. Contributions of the paper are: 1. An online sparse coding based hyperspectral image compression framework using the following sparse representations is proposed: a. Specialized Interior Point Method (SIP) b. Least Absolute Shrinkage and Selection Operator (LASSO) c. Basis Pursuit (BP) d. Generalized Orthogonal Matching Pursuit (gOMP) 2. Rate-distortion performances of proposed online sparse coding based hyperspectral image compression methods using different sparse representations are compared with various state-of-the-art hyperspectral image compression methods, such as BCS PL-3DBS + 3DWPT, BCS PL-2DBS + 2D DDWT, BCS SPL-2DBS + 2D DDWT and CPPCA [23]. The paper is organized as follows. The online learning based hyperspectral image compression framework using four different sparse representations is presented in Section 2. In Section 3, compression performance results are provided corresponding to different sparse representations. Conclusions are drawn in the last section.
2. HYPERSPECTRAL IMAGE COMPRESSION FRAMEWORK USING SPARSE REPRESENTATIONS
Hyperspectral images, being spectro-spatial cubes of highly correlated data, can be expressed in terms of a weighted sum of basic elements of a, so-called, dictionary [24, 27]. Sparse representations obtained using dictionary learning can be utilized for hyperspectral data compression purposes. A hyperspectral image compression framework using online dictionary learning based on sparse representations is proposed. Dictionary learning is cast as an optimization problem that is solved by using an online algorithm. Quadratic surrogate function of the empirical cost is minimized at each step until the solution is converged to a stationary point [9, 25]. Parameters that are used in the analysis are defined as follows. The number of bands in hyperspectral cube is represented by nb; the number of lines in hyperspectral cube is represented by nl; ns represents the number of samples in hyperspectral cube and k is the number of columns in the dictionary. Let D0 R b be the initial dictionary, A0 R kxk and B0 R b auxiliary matrices used for updating the dictionary, T be the number of iterations, E R kx1 be the error, R be the regularization n xk
n xk
parameter and R be sparse coefficients. Empirical cost function is defined as follows: k
fT ( D )
1 T l ( xi , D) T i 1
where, X x1 ,..., xT in R
nb xT
(1)
is the finite training set, D in R
nb xk
is
the dictionary and l is a loss function. The loss function, l, can be defined as the optimal value of an l1 sparse coding problem [25]:
l xt , D mink R
1 xt D t 2
2 2
t
(2)
1
where, λ is the regularization parameter, xt and αt are the training sample and the corresponding coefficient at iteration t, respectively. If dictionary D is fixed, then (2) corresponds to l1 regularized linear leastsquares problem. The minimization of the empirical cost fT(D) is not a convex optimization problem. In order to overcome this issue, a joint optimization problem is utilized that minimizes fT(D) with respect to dictionary D and sparse coefficients 1 ,..., T in R kxT , separately. Fixing D or Γ one at a time in a consecutive manner makes it possible to formulate it as a convex optimization problem:
min kxT
DC , R
1 T 1 xi Di T i 1 2
2 2
i 1
(3)
where, C is defined as:
C Di Rnb xk s.t. j 1,..., k , i 0, d j d j 1
(4)
Since empirical cost is an approximation of the expected cost, instead of minimizing the empirical cost fT(D), the following expected cost, f(D), is minimized:
f ( D) Ex l x, D lim fT D T
(5)
where, fT(D) converges to f(D) almost surely, and the expectation is taken over an unknown probability distribution of the data, x. Following a similar discussion in [25], dictionary update is performed with the method of projected first-order stochastic gradient descent as follows:
Dt Dt 1 D l xt , Dt 1 t C
(6)
where, Dt is the optimal dictionary at iteration t, ρ is the gradient step, Πc is the orthogonal projector on C. The dictionary update step is carried out in an online manner with the introduction of the quadratic surrogate function
fˆt defined for ft as:
Algorithm: Dictionary Learning Construct random initial dictionary D0 Set A0 and B0 matrices to zero initially for t = 1 to T Choose xt R b randomly from hyperspectral data cube Solve “Sparse coding equation” (8), (10), (12) or (14) depending on the representation Update At At 1 ttT and Bt Bt 1 xttT n
t
1 1 fˆt D xi Di t i 1 2
2 2
i
(7)
1
where, where, the surrogate function fˆt aggregates the past values of the objective function in (2) computed until the iteration t. Once the dictionary is fixed, the minimization problem in (2) is called as sparse coding. On the other hand, if sparse coefficients are fixed while minimizing the dictionary D, in (2), then the equation is called as dictionary update. The framework for sparsity based hyperspectral image compression consists of two algorithms, namely, dictionary learning and dictionary update. Dictionary learning and dictionary update algorithms presented below are based on consecutive solutions of sparse coding and dictionary update equations for different sparse representations. In this paper, the following sparse representations are considered: Specialized Interior Point Method (SIP), Least Absolute Shrinkage and Selection Operator (LASSO), Basis Pursuit (BP), Generalized Orthogonal Matching Pursuit (gOMP). Sparse coding and dictionary update equations corresponding to these representations are presented in Table 1.
3. COMPRESSION PERFORMANCE RESULTS
Evaluate Dt using Dictionary Update Algorithm End for Obtain learned dictionary DT Algorithm: Dictionary Update Dt is calculated by using Dt-1, At and Bt in “Dictionary Update Equation” (9), (11), (13) or (15) depending on the sparse representation repeat for j = 1 to k Find jth column of Dt by solving the equation (9), where
D d1 ,..., dk Rnb xk , A a1 ,..., ak R kxk and
B b1 ,..., bk Rnb xk uj dj
1 b j Da j d j A j, j
1
max u j ,1 2
u j and E j
d
t j
d tj 1
nb
End for The framework for sparsity based hyperspectral image compression using four different sparse representations are tested with AVIRIS and Hyperion datasets [26]. List of hyperspectral data along with image specifications are presented in Table 2. Hyperspectral data are stored in 2-bytes following the band interleaved by pixel (BIP) convention. The Hyperion data is of littleendian whereas the AVIRIS data is of big-endian byte order. Although hyperspectral data in both datasets are stored as 2-bytes, the Hyperion data has a bit-depth of 12 bits. The spectral coverages of both AVIRIS and Hyperion datasets are 0.4-2.5 µm with 10 nm resolution [27].
Table 1 Sparse Coding and Dictionary Update Equations Corresponding to SIP, LASSO, BP and gOMP Sparse Representations Sparse Sparse Coding Equation Dictionary Update Equation Rep.
(SIP)
t arg min R
k
1 xt Dt 1 2
2 2
(8) (LASSO)
t arg min Rk
1 Dt 1 xt 2
t arg min Rk
1
2 2
t arg min R k
s.t. Dt 1 xt
1 xt Dt 1 2
(14)
DC
1 t 1 xi Di t i 1 2
1
Dt arg min DC
Dt arg min DC
1 1 Di xi t i 1 2
1
s.t.
(13), where, 2
2
i 1
2 2
i 1
t 1,..., T
t
1 i t i1
2
t 1,..., T
t
(11), where,
(12) (gOMP)
Dt arg min
(9), where,
(10) (BP)
1
1 t Di xi t i1
t 1,..., T
1 t Dt arg min xi Di t i1 DC
(15), where, t 1,..., T
2
E
1 k Ej k j 1
until E < Threshold Use D in Dictionary Learning Algorithm
2
Table 2 Image Specifications of AVIRIS/HYPERION Hyperspectral Datasets
AVIRIS HYPERSPECTRAL DATA Name
No. Samples
No. Lines
No. Bands
Bit-depth
Flight Number
Year
Jasper Ridge
614
2587
224
16
f970403t01p02_r03
1997
Lunar Lake
614
1432
224
16
f970623t01p02_r07
1997
Low Altitude
614
3689
224
16
f960705t01p02_r05
1996
Name
No. Samples
No. Lines
No. Bands
Bit-depth
Image Number
Year
Lake Monona
256
3176
242
12
EO1H0240302009166110PF
2009
Mt. St. Helens
256
3242
242
12
EO1H0460282009231110KF
2009
Erta Ale
256
3187
242
12
EO1H1680502010057110KF
2010
HYPERION HYPERSPECTRAL DATA
Table 3 Image quality performance of the proposed framework for sparsity based hyperspectral image compression using four different sparse representations in comparison to lossy hyperspectral image compression schemes: BCS PL-3DBS + 3DWPT, BCS PL-2DBS + 2D DDWT, BCS SPL-2DBS + 2D DDWT and CPPCA [23]. The two highest PSNR values (in dB’s) are printed in bold for each column.
BPS
ALGORITHM
0.1
BCS PL-3DBS + 3DWPT BCS PL-2DBS + 2D DDWT BCS SPL-2DBS + 2D DDWT CPPCA SIP gOMP BP LASSO
BPS
ALGORITHM
0.3
BCS PL-3DBS + 3DWPT BCS PL-2DBS + 2D DDWT BCS SPL-2DBS + 2D DDWT CPPCA SIP gOMP BP LASSO
BPS
ALGORITHM
0.5
BCS PL-3DBS + 3DWPT BCS PL-2DBS + 2D DDWT BCS SPL-2DBS + 2D DDWT CPPCA SIP gOMP BP LASSO
Jasper Ridge 56.78 50.60 50.30 30.20 58.06 59.40 59.41 59.30
Jasper Ridge 64.21 54.18 53.67 71.31 66.74 70.01 69.23 70.67
Jasper Ridge 69.95 57.11 56.45 76.40 71.34 71.14 71.71 73.17
HYPERSPECTRAL IMAGES Low Altitude Lunar Lake 54.70 47.97 48.02 47.47 58.34 59.79 59.96 59.59
61.34 54.62 54.05 48.43 58.86 58.37 59.55 59.54
HYPERSPECTRAL IMAGES Low Altitude Lunar Lake 61.74 51.67 51.46 60.98 66.18 70.28 70.16 68.85
69.38 59.39 58.18 72.19 70.71 73.84 73.85 73.34
HYPERSPECTRAL IMAGES Low Altitude Lunar Lake 67.08 54.45 54.40 70.01 71.20 72.68 73.24 73.52
72.62 63.06 61.35 76.82 72.39 74.92 76.55 75.20
Mean PSNR 57.61 51.06 50.79 42.03 58.42 59.19 59.64 59.48
Mean PSNR 65.11 55.08 54.43 68.16 67.87 71.37 71.08 70.95
Mean PSNR 69.88 58.21 57.40 74.41 71.64 72.91 73.60 73.96
Rate-distortion performances of different sparse representations are compared with other state-of-the-art lossy hyperspectral compression schemes [23]. In order to measure the compression performance, Peak Signal-to-Noise Ratio (PSNR) is used. Bit rate, r, is measured in bits per sample (bps) as shown below:
r
z . bd nb
(16)
where, z is the number of sparse coefficients and z