Anomaly detection and important bands selection for

0 downloads 0 Views 1MB Size Report
Jun 25, 2014 - for hyperspectral images via Sparse PCA. Santiago Velasco-Forero1 ... combination of bands), where the anomaly detection can be performed.
Anomaly detection and important bands selection for hyperspectral images via Sparse PCA Santiago Velasco-Forero1 , Marcus Chen2 , Alvina Goh2 and Sze Kim Pang2

National University of Singapore1 , DSO National Laboratories2

June 25, 2014

Plan

1

Simultaneous anomaly and critical band detection

2

Sparse Principal Component Analysis (L0 version)

3

Experiments

What is the anomaly and why? Problem 1:

What is the anomaly and why? Problem 1:

Problem 2:

What is the anomaly and why? Problem 1:

Problem 2:

Problem 3:

Problem definition We would like to identify the subspace (linear combination of bands), where the anomaly detection can be performed.

Problem definition We would like to identify the subspace (linear combination of bands), where the anomaly detection can be performed.

Figure : [Left] False colour representation of the Original HSI in 127 bands. [Right] Outlier detector.

Problem definition We would like to identify the subspace (linear combination of bands), where the anomaly detection can be performed.

Figure : [Left] False colour representation of the Original HSI in 127 bands. [Right] Outlier detector.

Can we know what are the best bands where the anomalies are easier detected?

Critical bands and Outlier detection

We study the problem of critical (or important) bands detection in outlier detection. We include the word critical to say that the band are selected according to the powerful of discriminate the outliers of the sense. Applications: Identified wavelength to cue different bands for different task. Reduce the measurement collection in an intelligent way. ..

Critical bands and Anomaly detection in HSI

Let us consider a data set of n vectors, XT = [x1 , x2 , . . . , xn ], where each element has an additive anomaly component: X=Z+A where ZT = [z1 , z2 , . . . , zn ], with zi is the original anomaly-free data at the pixel i and A = [ai,j ] represents an anomaly at the pixel i and the jth channel.

(1)

Thus, the first question to address by this model is given X, how A and Z can be obtained?

Figure : Representation of the proposed decomposition to perform simultaneously anomaly detection and band selection.

HSI as a matrix Our problem takes the matrix representation of a multivariate image I of size nr × nc × p denoted by X, i.e., X is a data matrix of size n × p whose rows may represents pixels and columns represent bands. Without loss of generality, assume the columns of data matrix X are centred.

HSI as a matrix Our problem takes the matrix representation of a multivariate image I of size nr × nc × p denoted by X, i.e., X is a data matrix of size n × p whose rows may represents pixels and columns represent bands. Without loss of generality, assume the columns of data matrix X are centred. The singular value decomposition (SVD) of a matrix X decomposes the input matrix into the three matrices U, Λ, V, as follows: X = UΛVT where the left and right singular matrices U = [u1 , . . . , uK ] ∈ Rn×K and V ∈ Rp×p are orthogonal, and where the matrix Λ ∈ Rn×p has diagonal form Λ = diag(λ1 , λ2 , . . . , λn ).

(2)

Variational formulation for the largest eigenvalue

The largest eigenvalue solves the follow optimisation problem: u0 Xv , ||u||2 ||v||2

σ1 = maxn maxp u∈R v∈R

(3)

Variational formulation for the largest eigenvalue

The largest eigenvalue solves the follow optimisation problem: u0 Xv , ||u||2 ||v||2

σ1 = maxn maxp u∈R v∈R

(3)

If we restrict unit vectors u and v: (u1 , v1 ) =

arg max ||u||2 =1,||v||2 =1

u0 Xv.

(4)

Artificial example(SVD)

(u1 , v1 ) =

arg max

u0 Xv.

||u||2 =1,||v||2 =1

Original data vs lower-dimensional projection T X = (X − uT 1 v1 ) + u1 v1 = Z + A,

Artificial example(SVD)

(u1 , v1 ) =

arg max

u0 Xv.

||u||2 =1,||v||2 =1

Original data vs lower-dimensional projection T X = (X − uT 1 v1 ) + u1 v1 = Z + A, where A should be sparse in columns and rows.

From L2 to L0

We propose the regularised problem by using L0 norm, i.e., (e u, e v) = arg max u0 Xv subject to ||u||2 = 1, k|v||2 = 1,

card(u) + βcard(v) ≤ α

L0 -norm is an integer-valued, discontinuous and non-convex function → in general hard to solve. Common approach is to replace by L1 norm and solve the relaxation problem instead. We use instead alternating minimisation approach.

If v is fixed: (e u) = arg max u0 Xv = arg max u0 z subject to ||u||2 = 1,

card(u) ≤ α − βcardv

Similarily, If u is fixed: (e v) = arg max u0 Xv = arg max z0 v α − card(u) subject to ||v||2 = 1, card(v) ≤ β

Algorithms

Lemma [Kim,2012] For a given vector x and a fixed constant α > 0, the solution of e = arg max u0 x − αcard(u) u (5) ||u||2 =1

x , where θ is the minimum integer that satisfies ||x||2p |x|(θ−1) ≤ α2 + 2α||x|| and τ (x, θ) is the hard-thresholding

e= is u

operator at level θ. It gives an exact solution for a given α.

Boundary vs Ordered absolute values

Figure : The solution of the optimisation problem e = arg max||u||2 =1 u0 x − αcard(u) is the first value such that B(i) < |x|(i) u

Lemma The entire solution-path of (5), can be identified by the iterative q function, αi+1 = 1 + (τ1 (x, αi )(2) )2 − 1, where α0 = 0 and the subscript x(2) the second ordered value in x. Note: Find a solution for a given α has the same complexity that ALL the possible solutions (all possible α’s).

Complete solution-path

Figure : The complete solution-path for the vector x = [−.5368, .2443, −.0723, .2915, −.7379, −.1321]

Algorithm

Figure : L0 regularised SVD

Plan

1

Simultaneous anomaly and critical band detection

2

Sparse Principal Component Analysis (L0 version) Methods for Sparse SVD

3

Experiments

Example 1: Spiked model of covariance

Σ = uut + cVVT /n where u ∈ Rp , ||u|| = 1, is the true sparse leading eigenvector, with card u = k, V ∈ Rp×n

Example 1: Spiked model of covariance

Σ = uut + cVVT /n where u ∈ Rp , ||u|| = 1, is the true sparse leading eigenvector, with card u = k, V ∈ Rp×n

Figure : c=.1

Figure : c=.5

Figure : c=1

Several eigenvalues are significantly larger than all the others.

˜ ||2 Error= ||u − u

Example 2 We generate a matrix U = [u1 , . . . , u1000 ] of size 1000 × 150, with uniform distributed coefficients in [0, 1]. We let v = [v1 , v2 , . . . , v150 ] ∈ R150 be a sparse vector with:   1 i ≤ 50, 1 vi = i−50 if 50 < i ≤ 100,   0 otherwise. We form a test matrix ( uj + σv if 1 ≤ j ≤ 100 X = [x1 , . . . , x1500 ] = uj otherwise where σ is the signal-to-noise ratio.

(6)

(a) Two vector example.

(b) True Patterns in u and v

(c) (e u, e v) by Other sparse SVD. 257 (d) (e u, e v) by L0 SVD. 22 ms. α = .1 ms.

Figure : Artificial Example, σ = 5

Intuition!

Optimization problem: (e u, e v) = arg max u0 Xv subject to ||u||2 = 1, k|v||2 = 1 card(u) + βcard(v) ≤ α Representation:

Example World-Trade Center The first hyperspectral image scene used for experiments in this work was collected by the AVIRIS instrument, which was own by NASAs Jet Propulsion Laboratory over the World Trade Center area in New York City on 16 September 2001, just five days after the terrorist attacks that collapsed the two main towers and other buildings in the WTC complex. [401,261,224]

Example World-Trade Center The first hyperspectral image scene used for experiments in this work was collected by the AVIRIS instrument, which was own by NASAs Jet Propulsion Laboratory over the World Trade Center area in New York City on 16 September 2001, just five days after the terrorist attacks that collapsed the two main towers and other buildings in the WTC complex. [401,261,224]

Example World-Trade Center

Left: False Colour by Critical Bands [220,150,170], Right: Original False Colour [40,80,70]

Future work

Parameter selection (Automatic / False rate detection). Extension to critical/important bands in target detection. Other applications (Change detection in remote sensing time series)

Future work

Parameter selection (Automatic / False rate detection). Extension to critical/important bands in target detection. Other applications (Change detection in remote sensing time series) Thanks for your attention!