Hyperspectral Image Denoising Using Legendre ...

`

Hyperspectral Image Denoising Using Legendre Fenchel Transformation for Improved Multinomial Logistic Regression based Classification Aswathy C, Sowmya V, Gandhiraj R, Soman K P

Abstract— The abundant spectral and spatial information in the hyperspectral images (HSI) are largely used in the field of remote sensing. Though there are highly sophisticated sensors to capture the hyperspectral imagery, they suffer from issues like hyperspectral noise and spectral mixing. The major challenges encountered in this field, demands the use of preprocessing techniques prior to hyperspectral image analysis. In this paper, we discuss the effective role of denoising by Legendre Fenchel Transformation (LFT) as a preprocessing method to improve the classification accuracy. Experimental time analysis shows that the computational efficiency of the proposed method is highly effective when compared with the existing preprocessing methods. LFT is based on the concept of duality which makes it a fast and reliable denoising strategy to effectively reduce the noise present in each band of the hyperspectral imagery, without losing much of the edge information. The denoising is performed on standard AVIRIS Indian Pines dataset. The performance of LFT denoising is evaluated by analysing the classification accuracy assessment measures. The denoised image is subjected to hyperspectral image classification using Multinomial Logistic Regression which learns the posterior probability distributions of each class. The potential of the proposed method is proved by the mean classification accuracy obtained experimentally without any post processing technique (94.4%), which is better when compared with the accuracies acquired by existing preprocessing techniques like Total Variation denoising and wavelet based denoising. Index Terms— Classification, Hyperspectral denoising, Legendre Fenchel transformation, Multinomial Logistic Regression, Segmentation.

I. INTRODUCTION

T

HE advances in remote sensing technologies have opened a wide area of research possibilities in the field of hyperspectral data analysis.. Despite the significant spectral and spatial information of hyperspectral imagery, they are susceptible to issues such as high dimensionality, noise, linear and non linear spectral mixing etc., which call for effective Aswathy C, Sowmya V, Soman K P are with Centre for Excellence in Computational Engineering and Networking, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu. (e-mail:[email protected], [email protected]) Gandhiraj R is with Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu. (e-mail: [email protected]).

algorithms to obtain relevant details needed for data processing. The quality of the image captured and interpretation of data extracted are considerably affected by the noise present in the hyperspectral image. The main cause of this noise is the effect of photons, errors due to calibration and limitations of imaging sensor. In order to optimize the effect of this unwanted phenomenon and to improve the accuracy of data analysis, several noise reduction approaches are introduced. Qiangqiang Yuan, Liangpei Zhang and Huanfeng Shen [1] proposed an adaptive total variation denoising method which considers both spectral and spatial noise differences in the noise reduction process. Perona Malik diffusion [2], wavelet decompositions and sparse approximation methods are some other existing denoising techniques. In [5], the authors propose denoising based on Legendre Fenchel transformation to remove the Gaussian noise from the colour remote sensing images. The pre-processed hyperspectral data is used in several data analysis like classification, unmixing etc. In recent years, a number of classifiers are used for land-cover classification of remotely sensed hyperspectral data. Some of commonly used classifiers are kernel based Support Vector Machine (SVM), Sparsity based Orthogonal Matching Pursuit (OMP), Independent Component Discriminant Analysis (ICDA) etc. Jun Li, Jose M. Bioucas-Dias and Antonio Plaza [6] proposed an algorithm that uses Multinomial Logistic Regression (MLR) to find the posterior class probability which is aided by a semi supervised segmentation. In this paper, we use denoising based on Legendre Fenchel Transformation (LFT) as a spatial preprocessing step prior to classification using MLR model. LFT maps primal space to dual space which considerably reduces the run time and helps in easy divergence calculation [5]. Rest of the sections in this paper is arranged as follows. Section II presents the HSI denoising method. An overview of Multinomial Logistic Regression (MLR) based classification and segmentation is given in section III. Section IV describes the proposed method. Section V discuss about experimental results and analysis while section VI concludes the paper.

` II. HSI DENOISING USING LEGENDRE FENCHEL TRANSFORM

III. HSI CLASSIFICATION AND SEGMENTATION

A hyperspectral data cube is generally represented as X ∈ ℜ n1 ×n2 ×nb where n1 × n2 is the number of pixels and nb is the number of bands. Noise is one of the major challenges faced by the hyperspectral image which is reduced using effective denoising method based on LFT.

In the classification problem, the hyperspectral image is represented as x ≡ ( x1 ,..., xn ) , where the n-pixels are represented as d-dimensional spectral vectors (feature vectors). I ≡ {1,..., n} represents the set of integers that indexes the n-pixels (or n-spectral vectors) of the hyperspectral image whereas L ≡ {1,..., K } shows the set of K class labels. Class labels is denoted by y = ( y1 ,..., yn ) ∈ L . The objective of

A. Legendre Fenchel Transformation (LFT) Transformations help in mapping from one space to another space. LFT uses primal-dual concept which maps ( x, f ( x) ) space to ( p, f ∗ ( p ) ) space. The transformed function is known as its conjugate or dual function. For a continuous function, f : R → R , LFT is given as f ∗ ( p ) = sup { px − f ( x )} . x∈R

Supremum of transformation is used here where x is the primal variable. The dual variable p is the slope at x , i.e., p can be differential or sub-differential function. The mapping finds the point x on f ( x ) where the slope p (tangent through

( x, f ( x ) ) ) makes

maximum intercept at the y axis. The dual

function f ( p ) is the negative intercept of tangential line at ( x, f ( x ) ) which is always convex irrespective of the nature ∗

of f ( x ) .

B. Dual Formulation of ROF model using LFT The major drawback of Euler Lagrange ROF model [3] is that, it consumes a lot of computation time. ROF based on LFT overcomes this drawback by reducing the computational complexity of the model. It utilises the concept of duality which helps in the better understanding of the problem [4]. The standard form of dual ROF model is given as: (1) λ 2 min ∇u 1 + u − g 2 u∈ X 2 By using Legendre Fenchel Transformation, (2) ∇u = max ( p, ∇u − δ ( P ) ) 1

p

p∈P

where P is the indicator function. Dual formulation of ROF by using LFT is given as: (3) λ 2⎞ ⎛ min max ⎜ p, ∇u − δ p ( P ) + u − g 2 ⎟ u∈ X p:= P ∞ ≤1 2 ⎝ ⎠ The above objective function contains the primal (represented in min with u as variable) and dual form (represented in max with p as variable) which is further used to find out the primal and dual updates. The updated primal and dual forms are given [5] as: (4) u n + τ divp n +1 + τλ g u n +1 = 1 + τλ n (5) p + σ∇u n p n +1 = n n max 1, p + σ∇u

(

)

Computational complexity of the method is highly reduced by using nabla matrix in the calculation of gradient and divergence. Hence the method is very fast and efficient compared to the general (Euler Lagrange) TV-ROF model.

hyperspectral image classification is to infer yi ∈ L∀i ∈ I , from the feature vector xi and to create a two dimensional image whose class labels represents the class information of the original image. A. Multinomial Logistic Regression based Classification MLR based supervised learning algorithm aims to design a classifier which is capable of distinguishing K classes, using the L labelled training samples, when feature vectors are given as the input for classification [7]. The algorithm involves a training phase and a testing phase [6]. L training samples having known class labels are indicated as DL = {( x1 , y1 ),..., ( xL , yL )} which is called the training set and the posterior class distribution using MLR model is computed for MAP estimation of regressors w. The general MLR model is given [7]as (6) exp( w( k ) xi ) P ( yi = k | xi , w) = K ∑ exp(w(k ) xi ) k =1

wk is the set of logistic regressors for class k where w = ( w(1)T ,..., w( K −1)T ) and the value of w( K ) is generally set

to zero since the K th class conditional probability is found by subtracting the sum of estimated regressors of ( K − 1) classes from unity. x ≡ ( x1 ,..., xi ) represents the feature vectors selected for training the model. In this paper, a Gaussian Radial Basis Function (RBF) represented as 2 is used to represent the K ( xi , x j ) ≡ exp − xi − x j / (2σ 2 )

)

(

training vectors which offer improved data separability in the transformed space [8]. The posterior probability density of w with YL (set of labels) and X L (set of feature vectors in the given labelled training samples) is represented [6] as (7) p( w | YL , X L )α p(YL | X L , w) p( w | X L ) Using Expectation Maximization (EM), the expressions for the MAP estimation of w (that maximizes the conditional log data likelihood) is given as (8) wˆ = arg max{l ( w) + log p ( w | X L )} w

where the log-likelihood function of w is expressed using L

l ( w) ≡ log p(YL | X L , w) ≡ log ∏ p( yi | xi , w)

(9)

i =1

L

≡ ∑ ( xiT w i =1

( yi )

K

− log ∑ exp( xiT w( j ) )) j =1

During testing phase, the testing and the estimated regression coefficients ( w ) are given to the MLR model to

` find out the posterior class probability densities of each feature vector in the K classes. Class label of a particular feature vector is determined from the index corresponding to the maximum posterior class probability of the given test pixel vector. B. Segmentation of hyperspectral image Classification performance is notably improved by using segmentation of similar regions. Segmentation utilises contextual information to segregate the set I ≡ {1,..., n} into regions such that pixels in each regions are similar in certain aspects. Graph-cut segmentation is given [6] as n (10) yˆ = arg min ∑ −(log p ( yi | wˆ ) − μ ∑ δ ( yi − y j ) y∈L

i =1

( i , j )∈C

where p( yi | wˆ ) = p ( yi | xi , w) is computed at wˆ (interaction term) [6], μ is a parameter to control the level of smoothness for segmentation and δ ( yi − y j ) is the unit impulse function which represents the pairwise interaction. α -expansion mincut-based integer optimization algorithm is applied to obtain better approximations to the MAP segmentation problem. IV. PROPOSED METHOD In this experiment, we intent to utilise Legendre Fenchel Transformation as a preprocessing step to denoise the standard AVIRIS Indian Pines Dataset followed by MLR based hyperspectral image classification and segmentation. The proposed method involves three main stages that include spatial preprocessing, classification and segmentation. Noise developed in the images lowers signal to noise ratio and cause reduction in the classification accuracy. The use of LFT for band by band denoising of the hyperspectral image, helps smoothing the noisy image without losing the edge information. During classification, the HSI is separated to training and testing samples. 10% samples are given for training and rest 90% is given for testing. Classification accuracy increases with increase in the training samples. The proposed method uses MLR based classification where the regression coefficients for each class are estimated during the training phase. Posterior class densities of each pixel vector are determined from the MLR model and the class label for a particular test pixel vector is obtained from the index corresponding to the maximum posterior class probability. In order to improve the overall classification performance, a graph-cut based segmentation is performed after classification, which exploits the similarities of neighbouring pixels in the hyperspectral image. During segmentation, the adjacent pixels of the image are likely to be in the same class. V. EXPERIMENTAL RESULTS AND ANALYSIS This section sets a brief note of the dataset used for the experiment, accuracy assessment measures and analysis of the effect of different preprocessing methods on hyperspectral image classification.

A. Dataset Description Among the different standard datasets available, Indian pines data set which is captured using AVIRIS sensor system is used for the experimental study. It contains 224 closely spaced spectral bands over a range of 0.4-2.5 micro meter with spectral resolution of 10nm, spatial resolution of 20m per pixel and radiometric resolution of 16 bit. The number of band is reduced to 200 from 224 by eliminating the water absorption bands (104-108, 150-163 and 220) and bands without useful information. Each band possesses 145x145 pixels. Ground truth data consists of 16 classes. B. Accuracy Assessment Measures Accuracy assessments are performed both subjectively and objectively. Subjective assessment is done through visual interpretation while objective measure involves the comparison of the thematic (classification) map generated and the ground truth (reference) data. Another method of accuracy assessment is the computation of error matrix which is used to find the overall, class wise and average accuracies [2]. Kappa coefficient is another index used to quantify the agreement of classification. (13) Kappa Coefficient = ( N * A − B ) / ( N 2 − B ) where N is the total number of pixels, A is the number of correctly classified pixels, B is the sum of product of row and column in confusion matrix. To obtain a stable result for the probability based classification, 5 Monte-Carlo runs are performed and the mean values of classification accuracy obtained for each run are taken for assessment. C. Results and Discussions In this experiment, the effectiveness of LFT for hyperspectral image denoising is demonstrated by comparing it with other existing preprocessing methods like denoising by Total Variation (Euler Lagrange ROF) and denoising by wavelet shrinkage. This is proved by visual interpretation of denoised images and by analysis of the effect of above mentioned preprocessing methods on classification accuracy. Band 165 of Indian Pines dataset (noisy band shown in Fig. 2(a)) is taken as a sample band to show the effect of various denoising techniques like TV denoising, wavelet based denoising and LFT denoising which can be visually analysed in Fig. 2(b), (c) and (d) respectively. It is clear from the figure that, though TV denoising and wavelet based denoising methods remove noise in the band, they failed to preserve the edges in the image. On the other hand, LFT denoising (proposed method) exhibits a better denoising on hyperspectral band which helps in smoothing the noisy image without losing the edge information. Subjective evaluation is conducted on the preprocessed image to select best values of various parameters that play effective roles in denoising. The value for control parameter, number of iterations and Lipchitz constant used for LFT denoising technique are chosen experimentally as 15, 30 and 8 respectively. Time is one of the major constraints for the processing of higher dimensional data like HSI. Tedious computation results in the consumption of more processing time. Efficient algorithms which minimizes the processing time can be used to solve this problem. The proposed work is experimented on

` Intel(R) Core(TM) i7-4790S CPU @ 3.2 GHz, 64 bit operating system with 8 GB RAM. Table I shows the elapsed time to denoise each image of the hyperspectral data cube (AVIRIS Indian Pines) by various preprocessing techniques like TV denoising, Wavelet shrinkage denoising and LF-ROF denoising. Analysis of the table shows that LF-ROF denoising is a very fast algorithm for high data computation as it takes very less computation time compared to TV denoising. Though wavelet shrinkage denoising takes less time compared to LF-ROF denoising, classification after using the latter is efficient than the former method. This section also describes about the contribution of various denoising methods in the improvement of classification accuracy of HSI classification to underline the performance of the proposed method. After preprocessing, each denoised image is stacked to form a hyperspectral data cube which is used for HSI classification. In this experiment, classification of hyperspectral imagery is done using a probability based Multinomial Logistic Regression with Radial Basis Function (RBF) kernel learning method. A post processing technique which uses maximum a posteriori segmentation is also included to further improve the classification accuracy. Table II shows comparison of effect of various denoising techniques on mean classification accuracy of hyperspectral image classifications. The classification accuracy obtained by subjecting the hyperspectral data without applying any of the preprocessing methods is also compared with that of the proposed technique. Experimental analysis specifies that, MLR based classification after LFT denoising provides better classification accuracy (i.e. OA=94.40%) when compared with the other existing denoising methods, without applying any post processing technique. Even though TV denoised image

appears to be significant, most of its statistical properties similar to the original image is lost which can be inferred from the reduced classification accuracy. The Average Accuracies and Kappa coefficients calculated for each run are provided in Table II. Fig. 3(a) shows the training set used for classification of hyperspectral data. Thematic map for Indian pines dataset classification without applying any preprocessing method is given in Fig. 3(b) and after applying various preprocessing methods like TV denoising, wavelet denoising and Legendre Fenchel denoising are shown in Fig. 3(c), (d) and (e) respectively. The effect of segmentation (post processing) on classification accuracies is depicted in Table III. Classification accuracy of the proposed method is improved further using post processing technique, that is, from 94.40% to 96.25%. Fig. 4(a) shows the ground truth image together with class descriptions of AVIRIS Indian Pines data scene. The segmentation result of direct classification (without applying preprocessing) is shown in Fig. 4(b) and classification after applying various preprocessing steps like TV denoising, wavelet denoising and LFT denoising are mentioned in Fig. 4(c), (d) and (e) respectively. TABLE I Time Analysis for HSI (Indian Pines) denoising Sl No. Denoising Methods Elapsed time for HSI denoising (in sec) 1 2 3

TV denoising Wavelet based denoising LFT denoising

72.545840 4.832360 7.165953

(a) Original image (b) TV Denoising (c) Wavelet Denoising (d) LF denoising Fig. 2: Preprocessing on Indian Pines dataset (Band 165) using different denoising methods TABLE II Comparison of mean classification accuracies obtained without and with preprocessing on Indian Pines dataset Preprocessing Without Preprocessing TV denoising Wavelet based denoising LF denoising Monte-Carlo OA AA Kappa OA AA Kappa OA AA Kappa OA AA Kappa runs MC1 82.38 85.76 0.7998 84.14 86.48 0.8200 88.25 88.64 0.8667 93.87 95.57 0.9305 MC2 82.65 85.58 0.8033 83.28 85.16 0.8094 87.78 88.71 0.8613 94.97 95.32 0.9429 MC3 83.51 86.03 0.8132 83.03 85.50 0.8076 88.50 89.56 0.8692 94.33 95.09 0.9355 MC4 82.50 84.57 0.8014 84.15 86.53 0.8200 89.18 89.79 0.8768 94.01 94.57 0.9320 MC5 82.65 85.61 0.8037 83.94 85.91 0.8172 88.11 89.91 0.8650 94.82 94.70 0.9411 Classification 82.74 83.71 88.36 94.40

`

(a) (b) OA=82.65% (c) OA=83.94% (d) OA=88.11% (e) OA=94.82% Fig. 3: Indian Pines Dataset: (a) Training Set. Classification maps of MC 5: (b) Without Preprocessing, (c) With TV denoising, (d) With Wavelet Denoising, (e) With Legendre Fenchel Denoising TABLE III Comparison of mean segmentation results obtained without and with preprocessing on Indian Pines dataset Preprocessing Without Preprocessing TV denoising Wavelet based denoising LF denoising Monte-Carlo OA AA Kappa OA AA Kappa OA AA Kappa OA AA Kappa runs MC1 92.64 94.64 0.9167 92.68 87.84 0.9170 94.01 94.36 0.9320 95.87 97.00 0.9531 MC2 91.08 93.31 0.8993 91.51 93.27 0.9036 92.50 93.35 0.9151 96.92 97.02 0.9650 MC3 92.39 93.63 0.9138 92.65 94.96 0.9170 93.45 94.46 0.9255 95.49 96.25 0.9487 MC4 90.84 92.78 0.8959 92.43 94.65 0.9143 94.02 93.94 0.9320 96.16 96.63 0.9564 MC5 89.22 93.20 0.8786 92.92 87.32 0.9194 93.52 95.05 0.9266 96.80 97.32 0.9635 Segmentation 91.23 92.44 93.50 96.25

(a) (b) OA=89.22% (c) OA=92.92% (d) OA=93.52% (e) OA=96.80% Fig. 4: Indian Pines Dataset: (a) Ground truth. Segmentation results of MC 5: (b) Without Preprocessing, (c) With TV Denoising, (d) With Wavelet Denoising, (e) With Legendre Fenchel Denoising I. CONCLUSION In this paper, we have discussed about a fast and efficient, band by band denoising technique for Hyperspectral image using Legendre Fenchel transformation. In our experiment, each LF denoised image is concatenated to form a hyperspectral data cube which is given to a relatively simple and efficient classifier using Multinomial Logistic Regression. Experiment analysis shows that, the proposed method for denoising is faster and gives better noise reduction compared to existing techniques. Our future work is to incorporate various optimization based classifiers to compare the effects of denoising on HSI.

REFERENCES [1]

[2]

[3] [4] [5]

ACKNOWLEDGMENT The authors would like to thank Ms. Jun Li et.al, for sharing the MLR based classification and segmentation codes. We also thank Mr. Nidhin Prabhakar, Research Associate, Center for Excellence in Computational Engineering and Networking, Amrita Vishwa Vidyapeetham, for his valuable support.

[6]

[7] [8]

Yuan, Qiangqiang, Liangpei Zhang, and Huanfeng Shen. "Hyperspectral image denoising employing a spectral–spatial adaptive total variation model."Geoscience and Remote Sensing, IEEE Transactions on 50.10 (2012): 3660-3677. Kavitha Balakrishnan, Sowmya V., and K. P. Soman. "Spatial Preprocessing for Improved Sparsity Based Hyperspectral Image Classification." International Journal of Engineering Research and Technology. Vol. 1. No. 5 (July-2012). ESRSA Publications, 2012. Rudin, Leonid I., Stanley Osher, and Emad Fatemi. "Nonlinear total variation based noise removal algorithms." Physica D: Nonlinear Phenomena 60.1 (1992): 259-268. Handa, Ankur, et al. "Applications of Legendre-Fenchel transformation to computer vision problems." Department of Computing at Imperial College London, DTR11-7 45 (2011). Santhosh, S., et al. "A novel approach for denoising coloured remote sensing image using Legendre Fenchel Transformation." Recent Trends in Information Technology (ICRTIT), 2014 International Conference on. IEEE, 2014. Li, Jun, José M. Bioucas-Dias, and Antonio Plaza. "Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning." Geoscience and Remote Sensing, IEEE Transactions on48.11 (2010): 4085-4098. Böhning, Dankmar. "Multinomial logistic regression algorithm." Annals of the Institute of Statistical Mathematics 44.1 (1992): 197-200. Camps-Valls, Gustavo, and Lorenzo Bruzzone. "Kernel-based methods for hyperspectral image classification." Geoscience and Remote Sensing, IEEE Transactions on 43.6 (2005): 1351-1362.