A Novel Approach to Supervised Hyperspectral Image Classification

航測及遙測學刊

第九卷

第四期

第 47-70 頁

47

民國 93 年 12 月

Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004, pp. 47-70

A Novel Approach to Supervised Hyperspectral Image Classification Yang-Lang Chang

1

Chin-Chuan Han

2

Kuo-Chin Fan

3

K .S. Chen

4

ABSTRACT This paper presents a new supervised classification technique for hyperspectral imagery, which consists of two algorithms, referred to as greedy modular eigenspace (GME) and positive Boolean function (PBF). The GME method is designed to extract features by a simple and efficient GME feature module. The GME makes use of the data correlation matrix to reorder spectral bands from which a group of feature eigenspaces can be generated to reduce dimensionality. It can be implemented as a feature extractor to generate a particular feature eigenspace for each of the material classes present in hyperspectral data. The residual reconstruction errors (RRE) are then calculated by projecting the samples into different individual GME-generated modular eigenspaces. The PBF is further developed for classification. It is a stack filter built by using the binary RRE as classifier parameters for supervised training. It implements the minimum classification error (MCE) as a criterion so as to improve classification performance. It utilizes the positive and negative sample learning ability of the MCE criteria to improve classification accuracy. The performance of the proposed method is evaluated by MODIS/ASTER airborne simulator (MASTER) images for land cover classification during the Pacrim II campaign. Experimental results demonstrate that the GME feature extractor suits the nonlinear PBF-based multi-class classifier well for classification preprocessing. The proposed approach is not only an effective method for land cover classification in earth remote sensing but also dramatically improves the eigen-decomposition computational complexity compared to the conventional principal components analysis (PCA).

Key Words: principal components analysis (PCA), hyperspectral supervised classification, greedy modular eigenspaces (GME), positive Boolean function (PBF), stack filter.

1. 2

Associate Professor, Department of Information Management, National Taipei College of Business

Received Date: Nov. 11, 2003

.Associate Professor, Department of Computer Science and Information Engineering, National

Revised Date: July 25, 2004 Accepted Date: July 27, 2004

United University 3

.Professor, Institute of Computer Science and Information Engineering, National Central University Professor, Center for Space and Remote Sensing Research, National Central University

4.

48

Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004

techniques have been developed for feature

1. INTRODUCTION

extraction to reduce dimensionality without

With the evolution of remote sensing

loss of class separability. Most of them focus

technology, an increasing number of spectral

on the estimation of statistics at full

bands

dimensionality

become

available.

Data

can

be

to

extract

classification

collected in a few multispectral bands to as

features. For example, the most widely used

many as several hundreds hyperspectral

conventional principal components analysis

bands, even thousands of ultraspectral bands.

(PCA) assumes the covariances of different

A lot of attention has been focused on the

classes are equal, i.e. a common covariance

developing of high-dimensional classification

pool (matrix) is used, and the potential

devoted

The

differences between class covariance are not

increment of such high-dimensional data

utilized. The PCA reorganizes the data

greatly enhances the information content, but

coordinates in accordance with data variances

provides a challenge to the current techniques

so that features are extracted based on the

for analyzing such data. This improved

magnitudes

spectral resolution also comes at a price,

eigenvalues (Richards and Jia, 1999). Fisher

known as the curse of dimensionality. That

discriminant analysis uses the between-class

term is used by the statistical community to

and within-class variances to extract desired

describe the difficulties associated with the

features and to reduce dimensionality (Duda

feasibility

and Hart, 1973). Another example is the

to

earth

of

remote

distribution

sensing.

estimation

in

of

their

subspace

corresponding

high-dimensional data sets. One of most

orthogonal

projection

(OSP)

common issues in hyperspectral classification

recently developed by Harsanyi and Chang

is how to achieve the best class separability

(1994). OSP projects all undesired pixels into

without the restriction of limited number of

a space orthogonal to the space generated by

training samples (Jimenez and Landgrebe,

the desired pixels to achieve dimensionality

1999; Kumar et al., 2001). Numerous

reduction.

Figure 1. Data flow of the proposed GME/PBF-based multi-class classification scheme.

Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification

This paper presents a new approach to achieve increasing

49

Reordering the bands regardless of the original

dimensionality

reduction

while

order of wavelengths and spectrally adjacent

classification

accuracy.

It

bands in high-dimensional data set is an

is

comprised of two algorithms, greedy modular

important

eigenspace

Boolean

well-known segment principal components

function (PBF). The GME based on a

analysis (PCA) technique has a very high

correlation matrix (the second order statistic) is

accuracy in remote sensing image classification

developed by grouping highly correlated bands

(Jia et al., 1999) and (Richards et al., 1999).

into a small set of bands. The GME overcomes

Our proposed GME improves the segment PCA

the dependency on the global statistics as much

performance by reordering the band order to

as possible, while preserving the inherent

reproduce a new and efficient GME subspaces

separability of the different classes. Most

from the correlation matrixes.

(GME)

and

positive

characteristic

of

GME.

The

classifiers seek only one set of features that

The proposed GME approach divides the

discriminates all classes simultaneously. This

whole set of high-dimensional features into

not only requires a large number of features,

several arbitrary number of highly correlated

but also increases the complexity of the

subgroups. It makes good use of these highly

potential decision boundary. This paper will

corrected band subgroups to speed up the

show that the proposed GME/PBF method

computational time of PCA. Each ground cover

solves this problem and improves classification

type or material class has a distinct set of

accuracy. The GME provides a very good

GME-generated feature eigenspaces. When a

discrimination among the classes of interest

set of GME-based feature eigenspaces Φ is

using

eigenspaces

generated from the training samples, a modular

description which had been proved to have

eigenspace projection is next taken to calculate

very high recognition rates for the face

the

the

idea

of

modular

k

residual

reconstruction

error

(RRE)

k

recognition (Pentland and Moghaddam, 1994).

vectors e for the class ωk. These RRE vectors

The

greedy

condensed

are then normalized and quantized before being

matrix

reordering

applied to the subsequent PBF-based classifier.

transformation to find a set of high correlated

The follow-up PBF is developed to design a

GME. It performs a greedy iteration search

multi-class classifier that can take advantage of

algorithm

using the binary RRE as its training samples

approach

correlation

uses

a

coefficient

which

reorders

the

correlation

coefficients in the data correlation matrix row

and

can

also

fully

utilize

minimum

by row and column by column to group highly

classification error (MCE) (Juang et al., 1997)

correlated bands as GME feature eigenspaces

learning ability as its optimization criterion to

that can be further used for feature extraction.

explore the special characteristics of GME. The

50


MCE characteristic can best combine the

application to pairwise classifier architectures

PBF-based multi-class classifier with the

to enhance classification accuracy. Jimenez et

binary RRE generated from GME. Finally, a

al. (1999) proposed a projection pursuit method

classification map, Ω, can be obtained by

utilizing a projection index function to select

applying the PBF-based multi-class classifier

potential interesting projections from the local

to test samples. Fig. 1 illustrates the flow chart

optimization-over-projection directions of the

of

index (Bhattacharyya distance between two

this

GME/PBF-based

multi-class

classification scheme. Recently,

classes) of performance. A best-bases feature

several

hyperspectral

data

extraction algorithm developed by Kumar et

feature extraction and reduction techniques

al.(2001) used an extended local discriminant

have been developed that are well suited for

bases technique to reduce the feature spaces.

Figure 2. An example illustrates a CMPM with different gray levels and its corresponding correlation matrix with different correlation coefficients in percentage (White = 100; black = 0) for the class ωk. Note that four squares with fine black borders represent the highly correlated modular subspaces which have higher correlation coefficient compared with their neighborhood.

Both of them used pairwise classifiers to

classification significantly by exploring the

divide multi-class problems into a set of

capabilities of multi-class classifiers rather

two-class

proposed

than

performs

characteristic of a PBF-based multi-class

multi-class classification which was previously

classification scheme is its nonlinear discrete

proposed in Refs. (Han and Tsai, 2001, and

properties. It makes use of MCE criteria (Juang

Han, 2002) to enhance the precision of image

et al., 1997) to apply both positive and negative

problems.

GME/PBF-based

Our

classifier

pairwise

classifiers.

The

main


51

samples of the binary RRE as training samples

class ωk, where ml and nk represent,

for the PBF-based multi-class classifier. The

respectively, the number of bands (feature

features of GME are very well suited to the

spaces) in modular subspace Φkl, and the

PBF-based multi-class classification properties

total number of modular subspaces for a

compared to traditional feature extraction.

complete set Φk, i.e. l∈{1,...,nk} as shown

These facts will be best demonstrated in

in Fig. 2. The original correlation matrix cXk

experimental results.

(mt×mt) is decomposed into nk correlation

The rest of this paper is organized as follows.

submatrices cΦ1k(m1×m1),..cΦkl(ml×ml),...cΦnk (mn ×mn )

In Section 2, the proposed GME/PBF-based

to build a GME set Φk for the class ωk.

multi-class classifier is described in detail. In

There are mt! (the factorial of mt) possible

Section 3, a set of experiments is conducted to

combinations to construct one complete set

k

demonstrate the feasibility and utility of the proposed approach. Finally, in Section 4, several conclusions are presented.

k

k

exactly. The mt represents the total number of original bands, m t =

nk

∑m l =1

l

,

(1) th

where ml represents the l feature subspace of

2. METHODOLOGY

a complete set for a class ωk.

A visual scheme to display the magnitude

of

correlation

for

implementing our proposed GME/PBF-based

emphasizing the second-order statistics of

multi-class classification scheme. 1) A GME

high-dimensional data was proposed by

transformation algorithm is applied to achieve

Lee and Landgrebe (1993). Shown in Fig. 2

dimensionality

is a correlation matrix pseudo-color map

extraction.

(CMPM) in which the gray scale is used to

2) The second stage is a modular eigenspace

represent

its

projection similarity measure also known as

corresponding correlation matrix. It is also

distance decomposition. 3) The third stage is a

the

magnitude

matrix

Referring to Fig. 1, there are four stages to

of

k

reduction

and

feature

equal to a modular subspace set Φ .

threshold decomposition which normalizes and

Different material classes have different

quantizes the RRE. 4) Finally, a PBF-based

value sequences of correlation matrices. It

multi-class classification is performed.

can be treated as the special sequence codes for feature extractions. We define a

2.1. Greedy Modular Eigenspaces

correlation submatrix cΦ1k (ml ×ml) which belongs to the l modular subspace Φkl of a

We firstly define a complete modular

complete set Φ k = (Φ 1k ,...Φ kl ,...Φ nkk ) for a

subspace (CMS) set which is composed of all

th

possible combinations of the CMPM. The

52


original correlation matrix cX k (mt×mt) is

unique sequence of band order. It needs mt!

decomposed into nk correlation submatrices c

k Φ1

swapping operations by band order to find a

(mnk×mnk) to build a

complete and exhaustive CMS set. In Fig. 4, a

CMPM for the class ωk. There are mt! different

visual interpretation is introduced to highlight

kinds of CMPMs in a CMS set as shown in Fig.

the relations between swapping and rotating

3. Each different CMPM is associated with a

operations in terms of band order.

(m1×m1),...c

k Φ l

(ml×ml),...c

k Φ nk

Figure 3. (a.) The initial CMPM for four original bands (A, B, C and D), mt = 4, is applied to the exhaustive swapping operations by band order. (b.) An example shows a CMS set which is composed of 24 (mt! = 4!) possible CMPMs. The CMPM with a dotted-line square is the optimal CMPM in a CMS set.

Figure 4. The rotation operation between the Block K and Block 2 is performed by swapping their horizontal and vertical correlation coefficient lists row by row and column by column. Note that a pair of blocks (squares) switched by this swapping operation in terms of band order should have the same size of any length along the diagonal of correlation matrices. The Fixed Blk 1 and Fixed Blk 2 will rotate 90 degrees.


Figure 5.

53

(a.) and (b.) The original CMPM and its corresponding correlation matrix for class k. (c.) and (d.) k The GME set Φ for class ωk and its corresponding correlation matrix after greedy modular subspaces transformation.

There is one optimal CMPM in a CMS set

amount of mt to find an optimal CMPM in a

as shown in Fig. 3. This optimal CMPM is

CMS set. In order to overcome this drawback,

defined as a specific CMPM which is

we develop a fast searching algorithm to

k

composed of a set of modular subspaces Φl , l k

k

construct

an

alternative

greedy

CMPM

∈{1,...l,...nk} and Φl ∈ Φ . It has the highest

(modular subspace) instead of the optimal

correlated relations inside each individual

CMPM. This new greedy CMPM is defined as

k

modular subspace Φl . It tempts to reach the

a GME feature module set. It can not only

condition that the high correlated blocks (with

reduce the redundant operations in finding the

high correlation coefficient values) are put

greedy CMPM but also provides an efficient

together adjacently, as near as possible, to

method to construct GME sets.

In this paper, we propose a GME set

construct the optimal modular subspace set in k

the diagonal of CMPM. It is too expensive to

Φ which is composed of a group of

make an exhaustive computation for a large

modular

eigenspaces.

Each

modular

54

Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004 k

eigenspace Φ includes a set of highly

It is computationally expensive to make

correlated bands regardless of the original

an exhaustive search to construct a GME set if

order of wavelengths. There are some

mt is a large number. In this paper, we propose

merits to this scheme. 1) It reduces the number

a fast greedy band reordering algorithm, called

of bands in each GME set to speed up the

greedy modular eigenspace transformation

eigen-decomposition computation compared

(GMET), based on the assumption that highly

with conventional PCA. 2) A highly correlated

correlated bands often appear adjacent to each

GME makes PCA work more efficiently due to

other in hyperspectral data (Richards and Jia,

the redundancy reduction property of PCA. 3)

1999). In this algorithm, the absolute value of

The GME tends to equalize all the bands in a

every

correlation

coefficient

ci,j

in

the

cX is compared to a k

subgroup with highly correlated variances to

correlation matrix

avoid potential bias problems that may occur in

threshold value tc (0 ≤ tc≤ 1). Those adjacent

conventional PCA (Jia and Richards, 1999). 4)

correlation coefficient ci,j that are larger than

Different classes are best distinguished by

the threshold value tc are used to construct a

different GME abundant feature sets.

modular eigenspace Φk in an iterative mode. A

We define a correlation submatrix cΦkl(ml ×ml)

which

eigenspace

belongs

to

the

l

th

modular

Φkl of a GME Φk = (Φ1k ,...Φkl,...Φnk ) k

greedy searching iteration is initially carried out at c0,0 ∈ cX k (mt x mt) for a class ωk to condense a GME set. Each

ci,j is assigned an

for a class ωk, where ml and nk represent,

attribute during a GMET. If the attribute of a

respectively, the number of bands in modular

ci,j j is set as available, it means this ci,j has not

eigenspaces Φ kl , and the total number of

been assigned to any modular eigenspace Φk. If

modular eigenspaces of a GME set Φk, i.e. l ∈

a ci,j is assigned to a modular eigenspace Φkl, the

{1,...,nk } as shown in Fig. 5. The original

attribute of this ci,j is set to used. All attributes

correlation matrix cXk(mt ×mt), where mt is the

of the original correlation matrix cXk are first

total number of original bands, is decomposed

set

into nk correlation submatrices c m1×m1),...c

algorithm is as follows:

k Φ 1(

(ml×ml),...c

the

k Φnk

class

k Φl

k

(mnk×mnk) to build a GME set Φ for

ωk.

There

are

mt !

possible

Step

as

available.

1.

The

proposed

Initialization:

A

new

GMET

modular

k

k l

eigenspace Φ ∈ Φ for a class ωk is

combinations to construct a candidate GME set

initialized

new

correlation

as a CMS set dose. That is to say, it takes mt!

coefficient cd,d, where cd,d

and d are

operations to compose mt! different sets of the

defined as the first available element and

correlation

its

submatrices

associated

with

by

subindex

a

in

the

diagonal

list

different sequences of band order. Only one of

[c0,0,...cm −1,m −1]

them can be chosen as the GME set.

cX respectively. This diagonal coefficient

t

k

t

of the correlation matrix


55

cd,d is set to used and then assigned as the

current ci,j is available and its value is

current ci,j , i.e. the only one activated at

larger than tc, then go to step 4. Otherwise,

the current time. Then, go to step 2. Note

go to step 2.

that this GMET algorithm is terminated if the last diagonalcoefficient cd,d is already k set to used and the last subgroup Φnk of

the GME has been obtained for class ωk. Step 2.

If the column subindex j of the

current ci,j has reached the last column (i.e. j = mt − 1)

in the correlation matrix,

then a new modular eigenspace Φ kl is constructed with all used correlation coefficients ci,j ∈

Φ

k l

, these used

coefficients ci,j are then removed from the correlation matrix, and the algorithm goes to step 1 for another round to find a new modular eigenspace. Otherwise, it goes to step 3. Step 3. GMET moves the current ci,j to the next adjacent column ci,j+1

which will act

as the current ci,j, i.e. ci,j →ci,j+1. If the

Step 4.

If j≠d, swap c∗ ,j with c∗ ,d and c j,∗

with cd, ∗

respectively, where the

asterisk symbol ”*” indicates any row or column subindex in the correlation matrix. Fig. 4 and Fig. 6 shows a graphical mechanism of this swapping operation. The attributes of cd, ∗ and c ∗

,d

are then

marked used. Then let ci↔d,d and cd,i↔d∈Φkl , where i ↔ d means including all coefficients between subindex i and subindex d. Go to step 2. Eventually, a GME set, Φk = (Φ1k,...Φkl,...Φnk ), k

is composed for ground cover class ωk . For convenience,

we

sort

these

modular

eigenspaces according to the number of their feature bands, i.e. the number of feature spaces m1,...ml,...mnkk , in descending order.

Figure 6. The original CMPM (White=1 or -1; black=0) and the CMPM after swapping band Nos.0-9 and 30-39.

56


Figure 7. GME sets for the six ground cover types used in the experiment. Each of them can be treated as a unique feature for a distinguishable class.

k

Figure 8. The GME sets for different ground cover classes ωk,...ωj. A GME set Φ is composed of a group of k kl,...Φnk). k modular eigenspaces (Φ1,...Φ

Each square (ml ×ml) is filled with an

In this visualization scheme, we can build

average value of all correlation coefficients

a GME efficiently and bypass the redundant

inside its correlation submatrix cΦ kl (ml×ml).

procedures of rearranging the band order from

Fig. 5 illustrates the original CMPM and the

the original hyperspectral data sets. Moreover,

reordered one after a GMET. Each ground

the GMET algorithm can tremendously reduce

cover type or material class has its uniquely

the

ordered GME set. For instance, in our

compared

experiment, six ground cover types were

conventional

transformed into six different GME sets as

complexity for conventional PCA is on the

shown in Fig. 7.

n m2i) for GME order of O(mt2) and it is O(Σi=1

eigen-decomposition to

the PCA.

computation

feature

extraction

The

of

computational

k


57

(Jia and Richards, 1999). It makes good use of

for class ωk, has ml feature bands as shown in

these highly corrected band subgroups to speed

Fig. 5. The basis functions in a PCA are

up the computational time while it compared to

obtained by solving the eigenvalue problem

the PCA computation of the whole set of bands.

Λkl = φkl Σkl φkl

An example in which the GMET was applied

where Σ is the covariance matrix of Φkl , φkl is

to real hyperspectral data is shown in Fig. 8.

the eigenvector matrix of Σ, and Λ is the

After finding the highly correlated GME sets

corresponding diagonal matrix of eigenvalues.

k

T

,

(2)

Φ for all classes, k ∈{1,...,N}, the eigenspace

Moghaddam et al. (1995) decomposed the

projections are executed.

vector space R complementary

ml

into two exclusive and

subspaces:

the

principal

W and subspace (W feature spaces) φkl = {φlik }i=1 —k orthogonal complement subspace φl =

th

ml , where φlik is the i eigenvector of φkl. {φlik }i=W+1

A residual reconstruction error of the modular eigenspace Φkl is defined as: e ( x) = k i

mi

∑y

j =W + 1

2 j

~ 2

= x

W

− ∑ y 2j ,

(3)

j =1

~ — where x = x - x is the mean-normalized vector Figure 9. The RRE for classes ωj and ωk in the

of sample x. yj is the projected value of sample

hyperplane.

x by the eigenvector φlik corresponding with the th

i largest eigenvalue. Here, e kl (x) represents

2.2. Distance Measures: Greedy

the distance between the query sample x and

Modular Eigenspace Projection

the modular eigenspace Φ kl of the training samples. This GME projection also known as

The GME can distinguish different classes well

distance decomposition is applied to all

by their individual highly correlated features of

modular eigenspaces Φkl , l ∈{1,...l,... nk }, to

modular eigenspace. It can make use of the

k ) generate an RRE vector e =(e1k ,... ekl ,... enk

PCA, also known as the Karhunen-Loe´ve

for class ωk. The drawing interpretation of the

Transform (Han and Tsai, 2001), to extract the

RRE vectors is depicted in Fig. 9. For

most significant information as a similarity

convenience, we redefine this distance measure,

measure

RRE, as an RRE-decomposition (distance

from

each

GMET-generated

eigenspace while removing redundancy in the k l

spectral domain. Let us assume Φ , which is th

the l modular eigenspace of the GME set Φ

k

k

decomposition).

58


2.3. Threshold Decomposition

the range (0,1) by the nonlinear sigmoid

After defining the RRE vector e k from the

E lk =

function:

previous stage, the threshold decompositions described below are next performed to create k for normalized and quantized RRE vectors eNQ

e lk − µ lk , a lk

(4)

and e Nk l ( E lk ) =

all classes. The RRE ekl is first normalized to

1 , 1 + exp(-tE lk )

(5)

Figure 10. Determination process of GME/PBF-based multi-class classification scheme for supervised hyperspectral image classification. k

k

where µl , a and t denote the mean and the

k RRE vectors e into the binary RRE vectors eNQ

standard deviation of the RRE e kl

for all classes ωk, where k ∈ {1,...,N}.

threshold

value

normalized RRE

respectively. e

k Nl

This

and a new

is then uniformly

quantized into L−1 levels.

2.4.

Stack

Filter

and

Positive

Boolean Function

Finally, these normalized distance vectors k N

k ) are converted into binary e =(eN1k ,.. eNlk ,.. eNnk k k k k =(e NQl1 ,... e NQll ,...e NQlnk ), where l vectors e NQ

∈ {1,...,L −1}. In Fig. 10, the threshold decomposition function T(·) transforms the

The PBF is developed from a stack filter. Each stack filter corresponding to a PBF possesses the weak superposition property (the threshold decomposition)


and the ordering property (the stacking

product-of-sum

property) (Wendt et al., 1986). The class

morphological

of stack filters includes many familiar

(1992) defined a basis class of filters

filter

satisfying

types,

such

as

median

filter,

form

produced

59

operations.

the

by

Dougherty

morphological

basis

weighted median filter, order statistic

criteria and applied the erosion operation

filter and weighted order median filter.

as an estimator to find the optimal

The main function of a stack filter is to

morphological filter by minimizing the

remove noise, detect edges, etc. This

mean square error. Postaire (1993) et al.

technology has received considerable

proposed a binary morphological technique to

interest in communications and signal

cluster the N-dimensional observations. It is

processing

an un-supervised pattern clustering approach

communities.

Wendt

et

al.(1986) defined stack filters as the class

based

of all nonlinear digital filters and pointed

operations. Lin and Coyle (1990) developed a

out the consonant connections between

generalized stack filter, including all rank

stack filters and PBF. Maragos (1987)

order

used

morphological filters. They showed that

mathematical

morphology

to

on

mathematical

filters,

stack

morphological

filters,

and

digital

construct a class of morphological filters

choosing

and their equivalent classes. He also

equivalent

showed that a PBF has exactly one

threshold-crossing decision making when

sum-of-product

these decisions are consistent with each other.

form

or

one

a

generalized to

stack

massively

filter

is

parallel

Figure 11. An example of the use of the proposed PBF-based multi-class classifier corresponding to a stack filter for processing a map classification.

60


In

traditional

classification,

the

supervised Bayes

decision

pattern

positive and negative samples of the binary

theory

RRE generated simultaneously from GME. It

models the classification as a distribution

can

estimation problem. The Bayes theory can be

consideration

used to solve classification problems, but it

PBF-based classifier parameter. This is very

requires complete knowledge of probability

important

distributions. In hyperspectral data, obtaining

knowledge of the data distribution particularly

such complete knowledge is generally difficult

in dealing with hyperspectral data. It also

even in an unsupervised manner. This is

makes multi-class classification possible for

because samples used for training and learning

our proposed method. The proposed binary

may be mixed pixels and do not provide

RRE vector

accurate information. Juang et al. (1997)

characteristics of MCE criteria. It not only

indicated the superiority of MCE method over

improves the classification accuracy but also

the distribution estimation method. The goal of

overcomes the restriction of a limited number

MCE training is to be able to correctly

of training samples which is one of the most

discriminate

common

the

observations

for

best

take

all

competing when

because

classes

searching

we

lack

for

into the

complete

k , fully utilizes the special e NQ

problems of

in

remote

hyperspectral

classification results rather than to fit the

classification

sensing.

In

our

distributions to the data.

approach, an optimal stack filter Sf is defined

An optimal stack filter Sf is originally

as a filter whose MCE between the output and

defined as a filter whose mean absolute error

the desired signals is minimum value. We

(MAE) between the filter’s output and the

apply the window of length n to the number of

desired signals is minimum value. Since it

modular eigenspaces Φ for all classes. As an

possesses

threshold

example, Fig. 10 and Fig. 11 show that the

decomposition and stacking properties, the

windows of fixed length n slides consist of

multi-level MAE value can be decomposed

k , of all classes. They are used as binary RRE eNQ

into a summation of absolute errors occurring

the MCE training parameters for a PBF.

the

well-known

k

at each level. Han (2002) proposed an

At the supervised training stage, each

alternative PBF-based multi-class classification

level of the training samples is assigned to true

scheme based on MCE criteria to resolve

(1) or false (0). We assume there are N ground

multi-class problems. We further apply this

cover classes. We extract only the first n

MCE version of a PBF-based classification

k k k , (eNQ1 ,...eNQn ), as a window of elements of eNQ

scheme

hyperspectral

fixed length n for all classes, where n ≤nk for

imagery. It has the ability to learn from both

all classes ωk and k ∈{1,...,N}, to develop a

to

remote

sensing


61

PBF-based multi-class classifier. A PBF is

Based on the threshold decomposition

exactly one sum-of-product form without any

property, an occurrence of binary RRE vector

negative components. The classification errors

e NQij as an input of PBF can be decomposed

can be calculated from the summation of the

into binary vectors

absolute errors incurred at each level. The

k k k ,...e NQijlp ,...e NQijln ), with a window of (e NQijl1

proposed scheme is constructed by minimizing

fixed length n (n dimensions), where p ∈

the classification error rate using the training

{1,...,n}. Let us consider two samples u, v and

samples. In Fig. 10, we assume there are N

k k ≤ eNQvl for all a class ωk. We assume that eNQul

classes of training samples. M stands for a

dimensional elements. If sample u does not

fixed number of training samples for each class

belong to class ωk (an error occurs), sample v

ωk. MN training samples are projected to N

does not belong to class wk. On the other

GME sets and then decomposed into MN

2

k

k e NQijl =

hand, if sample v is an element of class ωk

k binary RRE vectors eNQ in the GME projection

(no error), sample u should be an element of

stage

class ωk. We define Ef (·) as an error function.

and

threshold

decomposition

stage

k respectively. Let xij and e NQ represent MN ij

Thus, Ef

k of training samples and MN binary vectors eNQ

dimensional elements. We further define Ef (x)

length n respectively, where i ∈{1,...,N}, j ∈

k of the to be PBF Bkf(·) of an occurrence eNQxl

{1,...,M}and k ∈ {1,...,N}. Next, the desired

k k class ωk at each level l. If eNQul ≤ eNQvl , then Bkf

values for each occurrence at each level have

k k (e NQul ) ≤ B kf (e NQvl ), satisfying the stacking

to be determined for the output of the PBF. The

property. This gives an indication that the

k can be desired value of an occurrence e NQ ij

stacking property offers the ability to solve

treated as the error value of sample xij at class

the

ωk. If sample xij belongs to class ωk, it means

occurrences are treated as the training

that no error occurs, i.e. the desired value of

occurrences of the classifier. They are

k eNQ is 0. Otherwise, it is equal to 1. ij ⎧0 if i = k, xij ∈ωk k , d (eNQ )=⎨ ij ⎩1 if i ≠ k, xij ∉ωk

decomposed into MN (L−1) binary vectors of

2

≤Ef

k k if e NQul ≤ e NQvl for all

(v),

classification

problems.

The

MN

2

2

(6)

According to the PBF criteria (Han, 2002), a k f

(u)

k

length n. The desired value d(e NQij ) for each k occurrence eNQ is determined by Eq. (6). The ij

classification error (CE), which is defined as

(·) is defined as an

the expected value (E) of the differences

at level l in a window of

between the desired values d(e ij ) and the

fixed length n (the number of Boolean binary

stack filter’s binary outputs Sf (e ij ), is

variables) for the class ωk as shown in Fig. 11,

obtained by

Boolean function B occurrence e

k NQijlp

where l∈{1,...,L−1} and p∈{1,...,n}.

k

k

CE = E d ( e ijk ) − S f ( e ijk )

62


⎡ L=1 ⎤ = E ⎢| ∑ d (Tl (eijk )) − Blk (Tl (eijk )) |⎥ ⎣ l =1 ⎦ (by threshold decomposition property)

(by stacking property)

=

∑ E [| d (e

k NQij

L =1 N

N

L =1 l =1

⎡ L =1 ⎤ k k = E ⎢∑ | d (eNQ ) − B kf (eNQ ) |⎥ ij ij ⎣ l =1 ⎦

=

M

k ) − B kf (e NQ )| ij

∑∑∑∑ | d (e l =1 i =1 j =1 k =1

k NQij

]

k ) − B kf (eNQ )|, ij

(7)

Fig. 12 MASTER images of different bands combination (left: RGB-bands 5-3-1, mid:RGB-bands8-5-3, right: RGB-bands 44-13-3)

Figure 13. Classified map of the Au-Ku test site using GME/PBF-based multi-class classification method.


63

where a stack filter Sf (·), a threshold function

MODIS/ASTER airborne simulator (MASTER)

T e (·) at level e, and a Boolean function Bkf(·)

instrument, a hyperspectral sensor instrument,

are used at each level. Furthermore, we can

as part of the PacRim II project (Hook et al.,

reformulate the CE value as the minimal sum

2001). The MASTER images of the test area

n

over the 2 possible binary vectors of length n.

with different band combination are displayed

It will be a time-consuming procedure to find

in Fig. 12. A plantation area in Au-Ku on the

an optimal stack filter if n is a large number.

east coast of Taiwan shown in Fig. 13 was

Referring to our previous work (Han, 1987), a

chosen for study. A ground survey was made

graphic

further

of the selected six land cover types at the same

improves the searching process by utilizing the

time. The MASTER is most appropriate for

greedy and constraint satisfaction searching

this study because it is a well-calibrated

criteria. It guarantees that the output filter is a

instrument and it provides spectral data in the

global

visible to shortwave infrared region of the

search-based

optimal

algorithm

solution.

Finally,

a

classification map, Ω, can be generated by this PBF-based

multi-class

classifier.

The

electromagnetic spectrum. The

proposed

PBF-based

multi-class

mechanism illustrated in Fig. 11 is an example

classification method was applied to 35 bands

of the use of the proposed PBF-based

selected from the 50 contiguous bands of

multi-class

map

MASTER, excluding the low signal-to-noise

classification. All PBF Bkf(·) of different classes

ratio mid-infrared channels (Hook et al., 2001).

at each level can simultaneously calculate the

A, B, S, P, O and R stand for six ground cover

PBF binary outputs to classify a test sample.

classes, sugar cane A, sugar cane B, seawater,

The outputs of the stack filter, i.e. the

pond, bare soil and rice (N = 6). The criterion

summations of the PBF binary outputs at each

for calculating the classification accuracy of

level for each class ωk, are compared to each

experiments was based on exhaustive test cases.

other and the one which has the smallest value

One hundred and fifty labeled samples were

is chosen as the final decision class for that test

randomly collected from ground survey data

sample.

proposed

sets by iterating every fifth sample interval for

PBF-based method can be used as a basis for a

each class. Thirty labeled samples were chosen

multi-class classifier.

as training samples, while the rest were used as

classifier

This

shows

to

process

that

our

a

3. EXPERIMENTAL RESULTS The image data were obtained by the

test samples, i.e. the samples were partitioned into 30 (20%) training and 120 (80%) test samples (M = 120) for each test case. Three correlation coefficient threshold values, tc =

64

Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004 k

0.75, 0.80 and 0.85, were selected to carry out

projection and generate RRE vectors e . Finally,

GMET. Four different window lengths, n ∈

the accuracy was obtained by averaging all of

{3,4,5,6}, were selected for each class. Based

the multiple combinations stated above. An

on PBF-based multi-class classification criteria,

example of an error matrix is given in Table 1

there were NM test samples (NM = 720) for

to illustrate the accuracy assessment of the

each class. The cumulative percentages of

PBF-based multi-class classification method.

eigenvalues were set as 90% to perform GME Table 1. An example of an error matrix for clear boundary sampling. A summary of classification errors is appended below.

Two

different

traditional

feature

feature bands obtained by using the MED of

extraction techniques, minimum Euclidean

the original data sets. 5 PCA stands for the first

distance (MED) and conventional PCA, were

five principal components of PCA of the

applied to the same PBF-based multi-class

original data sets. As shown in Fig. 14, the

classifier for accuracy comparison. In Fig. 14,

accuracy rates are improved by enlarging the

5 GME specifies a window of fixed length five

feature dimensions for GME and MED, but not

(n = 5) that was selected for each class. 5 EUB,

those of PCA. It can be interpreted that the last

5 EAB and 5 ELB represent respectively the

few PCA components contain some global

best five upper bound feature bands, the

variances which can be treated as noise in

average five bands (between upper bound and

MASTER data sets. Compared to MED and

lower bound) and the worst five lower bound

PCA, GME feature extraction is very suitable


65

for the PBF-based multi-class classifier in

means that the chosen labeled samples are pure

general.

pixels without ambiguity. A vague version of a

There is a decline in accuracy rates for

GME error matrix is shown in Table 2. It uses

GME when six ground covers are selected.

the same conditions as described above but

This is caused by the fact that some classes of

applies to vague boundary sampling. Compared

high-dimensional data sets have only a few

to Table 1, Table 2 has better classification

effective modular eigenspaces Φkl inside their

accuracy. It proves that PBF-based multi-class

k

own GME set Φ . In Fig. 7, we can see that

classifiers do have more nonlinearly discrete

some ground cover classes, such as rice (R),

capacity to increase class separability if vague

k

boundary sampling is used to develop the

have many ineffective modular eigenspaces Φ

which have small numbers of feature bands ml

classifiers.

k

in their GME set Φ . Note that we have

The discrete and nonlinear properties of

k

the proposed PBF-based multi-class classifier

previously sorted and reordered every Φkl in Φ

according to their number of ml by descending

are

order. If we choose a large window of length n,

classification.

i.e. the top modular eigenspaces (Φ1k,...Φkl,...Φnkk )

evaluation of classification accuracies under

k

considered In

to

be

Table

advantages 3,

a

in

summary

of Φ , then some classes which have ineffective

different conditions illustrates the validity of

modular eigenspaces in n will be misclassified.

these unique properties of proposed PBF-based

This is why the accuracy rates fall when the

multi-class classifiers.

sixth ground cover class is included in Fig. 14.

The encouraging results have shown that an

It can be interpreted as a great benefit in that

adequate classification rates, almost near 94%

the

useful

average accuracy for the best case, are archived

measurement

with only few training samples. Shown in Fig.

before we choose a proper window of length n

13 is an example of the classified map obtained

for each training class as the parameters to

by

build an efficient classifier.

classification

GME

provides

pre-classification

a

estimation

very

Interestingly, we demonstrated a case in

applying

GME/PBF-based schemes

to

the

multi-class MASTER

high-dimensional data sets.

which there was a difference in classification accuracy between vague and clear boundary sampling. By vague boundary sampling we mean that some mixed pixels in the uncertain boundaries are chosen when we visually select labeled samples from the hyperspectral data sets. Conversely, clear boundary sampling

4. CONCLUSIONS In this paper, a sophisticated GME/PBF-based multi-class

classifier

is

proposed

for

hyperspectral supervised classification. We first introduce the GME which can be obtained

66


by a quick greedy band reordering algorithm. It

dealing with hyperspectral data sets in which

is

computational

training data are always inadequate and

complexity. The GME is built by grouping

knowledge of the data distribution is usually

highly correlated bands into a small set of

incomplete. This MCE characteristic can best

bands. The GME can be treated as not only a

harmonize

preprocess of the filter-based classifier but also

classifiers with the features extracted from

a

efficient

unique

explores

with

little

spectral-based correlation

feature

among

the

PBF-based

multi-class

set

that

GME. It improves classification accuracy

bands

for

significantly and fully promotes multi-class

high-dimensional data. It makes use of the

classifiers instead of pairwise classifiers.

potential separability of different classes to

The experiments validate the utility of the

overcome the drawback of the common

GME/PBF-based multi-class classifier. The

covariance bias problems encountered in

highly correlated property of the GME can help

conventional PCA. The characteristic of GME

us develop a suitable preprocessing algorithm

is suitable for multi-class classifier. The

for

proposed

Furthermore, we can take the benefits of the

GME/PBF-based

multi-class

hyperspectral

data

compression.

classifier enhances the separable features of

parallel

different classes to improve the classification

computation speed and achieve real time

accuracy

conducted

computation. These valuable advantages for

experiments demonstrated the validity of our

practical implementation will be included in

proposed

our future study.

significantly.

The

GME/PBF-based

multi-class

property

in

GME

to

increase

classification scheme. The proposed PBF-based multi-class classifier

ACKNOWLEDGMENTS

is developed to effectively find nonlinear boundaries of pattern classes in hyperspectral data. Combining the GMET algorithm with the PBF-based

multi-class

classifier

provides

unique advantages for hyperspectral image classification. threshold

It

possesses

decomposition

and

well-known stacking

properties. The advantages of a PBF-based multi-class classifier are its discrete and nonlinear binary properties. It utilizes the MCE learning ability to improve the classification accuracy particularly in

The authors would like to thank the Center for Space and Remote Sensing Research of National Central University for providing MASTER datasets used for experiments in this paper.


67

Figure 14. Classification accuracy comparison of three feature extraction techniques, GME, MED and PCA, using the same PBF-based multi-class classifier. Table 2. Classification error matrix for vague boundary sampling. A summary of classification errors is appended below.

Table 3. Summary evaluation of classification accuracy for different boundary sampling types and number of modular eigenspaces.

68


2001, ”The MODIS/ASTER airborne

REFERENCES

simulator (MASTER) -a new instrument for

Dougherty, E.R., 1992, ”Optimal Mean-Square N-Observation

Digital

Remote

Jia, X. and Richards, J. A., 1999, ”Segmented principal components transformation for

55–72. R.O.

1973,

studies,”

93–102.

Filters. II. Optimal Gray-Scale Filters,”

Duda,

science

Sensing of Environment. 76(1), pp.

Morphological

CVGIP: Image Understanding, 55(1), pp.

earth

and

Hart,

”Nonparametric

Pattern

Classification

efficient hyperspectral

P.E.,

remote-sensing

Techniques,”

image display and classification,” IEEE

and

Trans. Geosci. Remote Sens., 37(1), pp.

Scene

538–542.

Analysis, John Wiley & Sons, New York. Han, C. C., Fan, K. C. and Chen, Z. M.,

Jimenez,

L.

O.

and

Landgrebe,D.

1997, ”Finding of optimal stack filter by

A.,1999, ”Hyperspectral data analysis

graphic searching methods,” IEEE Trans.

and supervised feature reduction via

Signal Processing, 45(7), pp. 1857–1862.

projection pursuit,” IEEE Trans. Geosci. Remote Sens., 37(6), pp. 2653–2667.

Han, C. C. and Tsai, C. L., 2001, ”A face

verification

Juang, B. H., Chou, W. and Lee, C. H.,

filter-based

integration,”

1997, ”Minimum classification error rate

IEEE Int. Carnahan Conf. Security

methods for speech recognition,” IEEE

Technology, pp. 278–281.

Trans. Speech and Audio Processing,

multi-resolutional system

via

5(3), pp. 257–265.

Han, C. C., 2002, ”A supervised classification scheme using positive boolean function,”

Kumar, S., Ghosh, J. and Crawford,M. M.,

accepted for publication in International

2001,

Conference

algorithms

on

Pattern

Recognition,

J.

1994,

and

Chang,

”Hyperspectral

classification

and

reduction:

an

projection

approach,”

IEEE

image

subspace Trans.

Geosci. Remote Sens., 32(4), pp. 779–

classification

of

Lee,

C. 1993,

and

Landgrebe,

”Analyzing

D.

A.,

high-dimensional

multispectral data,” IEEE Trans. Geosci. Remote Sens., 31(4), pp. 792–800. Lin, J. H. and Coyle, E. J., 1990, ”Minimum mean absolute error estimation over the

785. Hook, S. J., Myers, J. J., Thome, K. J., Fitzgerald

extraction

Remote Sens, 39(7), pp. 1368–1379.

C.-I,

dimensionality

orthogonal

for

feature

hyperspectral data,” IEEE Trans. Geosci.

IEEE, Quebec, Canada, pp. 100–103. Harsanyi,

”Best-bases

M.

and

Kahle,

A.

B.,

class of generalized stack filters,” IEEE Trans.

Acoust.,

Speech,

Signal


Processing, 38, pp. 663–678. Maragos,

P.

and

Schafer,

R.

S.,

1987, ”Morphological filters. Part II: Their relations to median, order-statistic, and stack filters,” IEEE Trans. Acoustics, Speech, and Signal Processing, 35, pp. 1170–1184. Moghaddam,

B.

and

Pentland,

A.,

1995, ”Probabilistic visual learning for object

detection,”

in

Proc.

5th

International Conference on Computer Vision, Boston, MA., pp. 786–793. Pentland, 1994,

A.

and

Moghaddam,

”View-based

and

B.,

modular

eigenspaces for face recognition,” IEEE Computer

Society

Computer

Vision

Conference and

on

Pattern

Recognition, pp. 84–91. Postaire, J.G., Zhang R.D. and Lecocq-Botte, C., 1993, ”Cluster analysis by binary morphology,”

IEEE

Trans.

Pattern

Analysis and Machine Intelligence, 15, pp. 170–180. Richards,

J.

A.

and

Jia,

X.,

1999, ”Interpretation of Hyperspectral Image Data,” Remote Sensing Digital Image Analysis, An Introduction, 3rd ed., Springer-Verlag, New York. Wendt, P. D., Coyle, E. J. and Gallagher, N. C., 1986, ”Stack filter,” IEEE Trans. Acoustics, Speech, and Signal Processing, 34(4), pp. 898–911.

69

70

航測及遙測學刊

第九卷

第四期

民國 93 年 12 月

一個新穎的方法實現高光譜監督式遙測影像分類張陽郎 1

韓欽銓 2

范國清 3

陳錕山 4

摘要「高光譜」遙測影像 (Hyperspectral Imagery) 為遙測影像之先進技術，遙測影像頻譜解析度由原數個頻譜解析度的一般感測器、至數十個頻譜解析度之「多頻譜感測器」(Multispectral)、到數百個頻譜解析度的「高光譜感測器」(Hyperspectral)、乃至於數千個頻譜解析度之「超高光譜感測器」(Ultraspectral)，持續地進步演進著。「高光譜」解析度感測器已廣泛應用於衛星遙測影像之識別、醫學影像的診斷、工業產品之檢驗、飛機及其他精密機器設備之非破害性檢查等應用上，「高光譜」遙測影像技術業已成為遙測影像中一個新興且重要的研究領域。本文提出一個適用於「高光譜」遙測影像分類的新演算法，主要有兩個實現步驟，第一個步驟為「貪婪模組特徵空間」(Greedy Modular Eigenspaces)，第二為「布林濾波器」(Positive Boolean Function)。藉由校正過後的完整台灣『高光譜』遙測影像資料，以及實地測量的地表真實資料，來實際證明「貪婪模組特徵空間」的方法提供了一個絕佳的特徵抽取方式，並為一個最適合「布林濾波器」分類方法的前處理器。本文詳細討論「貪婪模組特徵空間」演算法之推導、完整描述「布林濾波器」的基礎理論，以及詳細分析他們之間的關係，並針對二者的特性加以推演，提出適用於一般「高維資料」(High-Dimensional Data)資料分類的解決方法。最後經由實驗驗證並與其他傳統「多頻譜感測器」遙測影像資料分類方法作一比較，印證了本方法非常適用於「高維資料」分類的特性。

關鍵字: 主軸分析、高光譜監督式分類、貪婪模組特徵空間、布林函數、堆疊濾波器

1. 2

國立臺北商業技術學院資訊管理系副教授

收到日期:民國 92 年 11 月 11 日

.國立聯合大學資訊工程學系副教授

修改日期:民國 93 年 07 月 25 日

3.

國立中央大學資訊工程學系暨研究所教授

4.

國立中央大學太空及遙測研究中心暨太空科學研究所教授

接受日期:民國 93 年 07 月 27 日

A Novel Approach to Supervised Hyperspectral Image Classification

A Novel Approach to Supervised Hyperspectral Image Classification

Suggest Documents

Semi-supervised Hyperspectral Image Classification ... - (Denny) Zhou's

A novel hyperspectral image classification approach ... - SAGE Journals

SUPERVISED HYPERSPECTRAL IMAGE SEGMENTATION: A ...

Image Classification II Supervised Classification

Semi-Supervised Classification Method for Hyperspectral Remote ...

Refinement of Hyperspectral Image Classification

High Performance Hyperspectral Image Classification

Semi-Supervised Classification of Hyperspectral Images ... - MDPI

ScienceDirect Pixel Based Supervised Classification of Hyperspectral

ADVANCES IN HYPERSPECTRAL IMAGE CLASSIFICATION

Hyperspectral Image Classification Using ...

Advances in hyperspectral image classification

Semi-supervised hyperspectral image segmentation using ...

DIGITAL IMAGE PROCESSING: SUPERVISED CLASSIFICATION ...

A Partially Supervised Bayesian Image Classification Model

A Semi-Supervised Classification Approach for ...

A semi-supervised approach to question classification - UCL/ELEN

Hyperspectral Image Classification Using Relevance Vector Machines

Hyperspectral Image Classification Using Graph ... - IPOL Journal

Hyperspectral Image Classification based on Spatial

Hyperspectral Image Classification Using Dictionary ... - CiteSeerX

Hyperspectral Image Classification with Capsule Network ... - MDPI

Hyperspectral Image Classification by Collaboration ...

Hyperspectral Image Classification Using Support ... - Purdue University