GME feature extractor suits the nonlinear PBF-based multi-class classifier well for classification ... very high recognition rates for the face .... blocks (squares) switched by this swapping operation in terms of band order should have the same.
航測及遙測學刊
第九卷
第四期
第 47-70 頁
47
民國 93 年 12 月
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004, pp. 47-70
A Novel Approach to Supervised Hyperspectral Image Classification Yang-Lang Chang
1
Chin-Chuan Han
2
Kuo-Chin Fan
3
K .S. Chen
4
ABSTRACT This paper presents a new supervised classification technique for hyperspectral imagery, which consists of two algorithms, referred to as greedy modular eigenspace (GME) and positive Boolean function (PBF). The GME method is designed to extract features by a simple and efficient GME feature module. The GME makes use of the data correlation matrix to reorder spectral bands from which a group of feature eigenspaces can be generated to reduce dimensionality. It can be implemented as a feature extractor to generate a particular feature eigenspace for each of the material classes present in hyperspectral data. The residual reconstruction errors (RRE) are then calculated by projecting the samples into different individual GME-generated modular eigenspaces. The PBF is further developed for classification. It is a stack filter built by using the binary RRE as classifier parameters for supervised training. It implements the minimum classification error (MCE) as a criterion so as to improve classification performance. It utilizes the positive and negative sample learning ability of the MCE criteria to improve classification accuracy. The performance of the proposed method is evaluated by MODIS/ASTER airborne simulator (MASTER) images for land cover classification during the Pacrim II campaign. Experimental results demonstrate that the GME feature extractor suits the nonlinear PBF-based multi-class classifier well for classification preprocessing. The proposed approach is not only an effective method for land cover classification in earth remote sensing but also dramatically improves the eigen-decomposition computational complexity compared to the conventional principal components analysis (PCA).
Key Words: principal components analysis (PCA), hyperspectral supervised classification, greedy modular eigenspaces (GME), positive Boolean function (PBF), stack filter.
1. 2
Associate Professor, Department of Information Management, National Taipei College of Business
Received Date: Nov. 11, 2003
.Associate Professor, Department of Computer Science and Information Engineering, National
Revised Date: July 25, 2004 Accepted Date: July 27, 2004
United University 3
.Professor, Institute of Computer Science and Information Engineering, National Central University Professor, Center for Space and Remote Sensing Research, National Central University
4.
48
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004
techniques have been developed for feature
1. INTRODUCTION
extraction to reduce dimensionality without
With the evolution of remote sensing
loss of class separability. Most of them focus
technology, an increasing number of spectral
on the estimation of statistics at full
bands
dimensionality
become
available.
Data
can
be
to
extract
classification
collected in a few multispectral bands to as
features. For example, the most widely used
many as several hundreds hyperspectral
conventional principal components analysis
bands, even thousands of ultraspectral bands.
(PCA) assumes the covariances of different
A lot of attention has been focused on the
classes are equal, i.e. a common covariance
developing of high-dimensional classification
pool (matrix) is used, and the potential
devoted
The
differences between class covariance are not
increment of such high-dimensional data
utilized. The PCA reorganizes the data
greatly enhances the information content, but
coordinates in accordance with data variances
provides a challenge to the current techniques
so that features are extracted based on the
for analyzing such data. This improved
magnitudes
spectral resolution also comes at a price,
eigenvalues (Richards and Jia, 1999). Fisher
known as the curse of dimensionality. That
discriminant analysis uses the between-class
term is used by the statistical community to
and within-class variances to extract desired
describe the difficulties associated with the
features and to reduce dimensionality (Duda
feasibility
and Hart, 1973). Another example is the
to
earth
of
remote
distribution
sensing.
estimation
in
of
their
subspace
corresponding
high-dimensional data sets. One of most
orthogonal
projection
(OSP)
common issues in hyperspectral classification
recently developed by Harsanyi and Chang
is how to achieve the best class separability
(1994). OSP projects all undesired pixels into
without the restriction of limited number of
a space orthogonal to the space generated by
training samples (Jimenez and Landgrebe,
the desired pixels to achieve dimensionality
1999; Kumar et al., 2001). Numerous
reduction.
Figure 1. Data flow of the proposed GME/PBF-based multi-class classification scheme.
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
This paper presents a new approach to achieve increasing
49
Reordering the bands regardless of the original
dimensionality
reduction
while
order of wavelengths and spectrally adjacent
classification
accuracy.
It
bands in high-dimensional data set is an
is
comprised of two algorithms, greedy modular
important
eigenspace
Boolean
well-known segment principal components
function (PBF). The GME based on a
analysis (PCA) technique has a very high
correlation matrix (the second order statistic) is
accuracy in remote sensing image classification
developed by grouping highly correlated bands
(Jia et al., 1999) and (Richards et al., 1999).
into a small set of bands. The GME overcomes
Our proposed GME improves the segment PCA
the dependency on the global statistics as much
performance by reordering the band order to
as possible, while preserving the inherent
reproduce a new and efficient GME subspaces
separability of the different classes. Most
from the correlation matrixes.
(GME)
and
positive
characteristic
of
GME.
The
classifiers seek only one set of features that
The proposed GME approach divides the
discriminates all classes simultaneously. This
whole set of high-dimensional features into
not only requires a large number of features,
several arbitrary number of highly correlated
but also increases the complexity of the
subgroups. It makes good use of these highly
potential decision boundary. This paper will
corrected band subgroups to speed up the
show that the proposed GME/PBF method
computational time of PCA. Each ground cover
solves this problem and improves classification
type or material class has a distinct set of
accuracy. The GME provides a very good
GME-generated feature eigenspaces. When a
discrimination among the classes of interest
set of GME-based feature eigenspaces Φ is
using
eigenspaces
generated from the training samples, a modular
description which had been proved to have
eigenspace projection is next taken to calculate
very high recognition rates for the face
the
the
idea
of
modular
k
residual
reconstruction
error
(RRE)
k
recognition (Pentland and Moghaddam, 1994).
vectors e for the class ωk. These RRE vectors
The
greedy
condensed
are then normalized and quantized before being
matrix
reordering
applied to the subsequent PBF-based classifier.
transformation to find a set of high correlated
The follow-up PBF is developed to design a
GME. It performs a greedy iteration search
multi-class classifier that can take advantage of
algorithm
using the binary RRE as its training samples
approach
correlation
uses
a
coefficient
which
reorders
the
correlation
coefficients in the data correlation matrix row
and
can
also
fully
utilize
minimum
by row and column by column to group highly
classification error (MCE) (Juang et al., 1997)
correlated bands as GME feature eigenspaces
learning ability as its optimization criterion to
that can be further used for feature extraction.
explore the special characteristics of GME. The
50
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004
MCE characteristic can best combine the
application to pairwise classifier architectures
PBF-based multi-class classifier with the
to enhance classification accuracy. Jimenez et
binary RRE generated from GME. Finally, a
al. (1999) proposed a projection pursuit method
classification map, Ω, can be obtained by
utilizing a projection index function to select
applying the PBF-based multi-class classifier
potential interesting projections from the local
to test samples. Fig. 1 illustrates the flow chart
optimization-over-projection directions of the
of
index (Bhattacharyya distance between two
this
GME/PBF-based
multi-class
classification scheme. Recently,
classes) of performance. A best-bases feature
several
hyperspectral
data
extraction algorithm developed by Kumar et
feature extraction and reduction techniques
al.(2001) used an extended local discriminant
have been developed that are well suited for
bases technique to reduce the feature spaces.
Figure 2. An example illustrates a CMPM with different gray levels and its corresponding correlation matrix with different correlation coefficients in percentage (White = 100; black = 0) for the class ωk. Note that four squares with fine black borders represent the highly correlated modular subspaces which have higher correlation coefficient compared with their neighborhood.
Both of them used pairwise classifiers to
classification significantly by exploring the
divide multi-class problems into a set of
capabilities of multi-class classifiers rather
two-class
proposed
than
performs
characteristic of a PBF-based multi-class
multi-class classification which was previously
classification scheme is its nonlinear discrete
proposed in Refs. (Han and Tsai, 2001, and
properties. It makes use of MCE criteria (Juang
Han, 2002) to enhance the precision of image
et al., 1997) to apply both positive and negative
problems.
GME/PBF-based
Our
classifier
pairwise
classifiers.
The
main
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
51
samples of the binary RRE as training samples
class ωk, where ml and nk represent,
for the PBF-based multi-class classifier. The
respectively, the number of bands (feature
features of GME are very well suited to the
spaces) in modular subspace Φkl, and the
PBF-based multi-class classification properties
total number of modular subspaces for a
compared to traditional feature extraction.
complete set Φk, i.e. l∈{1,...,nk} as shown
These facts will be best demonstrated in
in Fig. 2. The original correlation matrix cXk
experimental results.
(mt×mt) is decomposed into nk correlation
The rest of this paper is organized as follows.
submatrices cΦ1k(m1×m1),..cΦkl(ml×ml),...cΦnk (mn ×mn )
In Section 2, the proposed GME/PBF-based
to build a GME set Φk for the class ωk.
multi-class classifier is described in detail. In
There are mt! (the factorial of mt) possible
Section 3, a set of experiments is conducted to
combinations to construct one complete set
k
demonstrate the feasibility and utility of the proposed approach. Finally, in Section 4, several conclusions are presented.
k
k
exactly. The mt represents the total number of original bands, m t =
nk
∑m l =1
l
,
(1) th
where ml represents the l feature subspace of
2. METHODOLOGY
a complete set for a class ωk.
A visual scheme to display the magnitude
of
correlation
for
implementing our proposed GME/PBF-based
emphasizing the second-order statistics of
multi-class classification scheme. 1) A GME
high-dimensional data was proposed by
transformation algorithm is applied to achieve
Lee and Landgrebe (1993). Shown in Fig. 2
dimensionality
is a correlation matrix pseudo-color map
extraction.
(CMPM) in which the gray scale is used to
2) The second stage is a modular eigenspace
represent
its
projection similarity measure also known as
corresponding correlation matrix. It is also
distance decomposition. 3) The third stage is a
the
magnitude
matrix
Referring to Fig. 1, there are four stages to
of
k
reduction
and
feature
equal to a modular subspace set Φ .
threshold decomposition which normalizes and
Different material classes have different
quantizes the RRE. 4) Finally, a PBF-based
value sequences of correlation matrices. It
multi-class classification is performed.
can be treated as the special sequence codes for feature extractions. We define a
2.1. Greedy Modular Eigenspaces
correlation submatrix cΦ1k (ml ×ml) which belongs to the l modular subspace Φkl of a
We firstly define a complete modular
complete set Φ k = (Φ 1k ,...Φ kl ,...Φ nkk ) for a
subspace (CMS) set which is composed of all
th
possible combinations of the CMPM. The
52
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004
original correlation matrix cX k (mt×mt) is
unique sequence of band order. It needs mt!
decomposed into nk correlation submatrices c
k Φ1
swapping operations by band order to find a
(mnk×mnk) to build a
complete and exhaustive CMS set. In Fig. 4, a
CMPM for the class ωk. There are mt! different
visual interpretation is introduced to highlight
kinds of CMPMs in a CMS set as shown in Fig.
the relations between swapping and rotating
3. Each different CMPM is associated with a
operations in terms of band order.
(m1×m1),...c
k Φ l
(ml×ml),...c
k Φ nk
Figure 3. (a.) The initial CMPM for four original bands (A, B, C and D), mt = 4, is applied to the exhaustive swapping operations by band order. (b.) An example shows a CMS set which is composed of 24 (mt! = 4!) possible CMPMs. The CMPM with a dotted-line square is the optimal CMPM in a CMS set.
Figure 4. The rotation operation between the Block K and Block 2 is performed by swapping their horizontal and vertical correlation coefficient lists row by row and column by column. Note that a pair of blocks (squares) switched by this swapping operation in terms of band order should have the same size of any length along the diagonal of correlation matrices. The Fixed Blk 1 and Fixed Blk 2 will rotate 90 degrees.
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
Figure 5.
53
(a.) and (b.) The original CMPM and its corresponding correlation matrix for class k. (c.) and (d.) k The GME set Φ for class ωk and its corresponding correlation matrix after greedy modular subspaces transformation.
There is one optimal CMPM in a CMS set
amount of mt to find an optimal CMPM in a
as shown in Fig. 3. This optimal CMPM is
CMS set. In order to overcome this drawback,
defined as a specific CMPM which is
we develop a fast searching algorithm to
k
composed of a set of modular subspaces Φl , l k
k
construct
an
alternative
greedy
CMPM
∈{1,...l,...nk} and Φl ∈ Φ . It has the highest
(modular subspace) instead of the optimal
correlated relations inside each individual
CMPM. This new greedy CMPM is defined as
k
modular subspace Φl . It tempts to reach the
a GME feature module set. It can not only
condition that the high correlated blocks (with
reduce the redundant operations in finding the
high correlation coefficient values) are put
greedy CMPM but also provides an efficient
together adjacently, as near as possible, to
method to construct GME sets.
In this paper, we propose a GME set
construct the optimal modular subspace set in k
the diagonal of CMPM. It is too expensive to
Φ which is composed of a group of
make an exhaustive computation for a large
modular
eigenspaces.
Each
modular
54
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004 k
eigenspace Φ includes a set of highly
It is computationally expensive to make
correlated bands regardless of the original
an exhaustive search to construct a GME set if
order of wavelengths. There are some
mt is a large number. In this paper, we propose
merits to this scheme. 1) It reduces the number
a fast greedy band reordering algorithm, called
of bands in each GME set to speed up the
greedy modular eigenspace transformation
eigen-decomposition computation compared
(GMET), based on the assumption that highly
with conventional PCA. 2) A highly correlated
correlated bands often appear adjacent to each
GME makes PCA work more efficiently due to
other in hyperspectral data (Richards and Jia,
the redundancy reduction property of PCA. 3)
1999). In this algorithm, the absolute value of
The GME tends to equalize all the bands in a
every
correlation
coefficient
ci,j
in
the
cX is compared to a k
subgroup with highly correlated variances to
correlation matrix
avoid potential bias problems that may occur in
threshold value tc (0 ≤ tc≤ 1). Those adjacent
conventional PCA (Jia and Richards, 1999). 4)
correlation coefficient ci,j that are larger than
Different classes are best distinguished by
the threshold value tc are used to construct a
different GME abundant feature sets.
modular eigenspace Φk in an iterative mode. A
We define a correlation submatrix cΦkl(ml ×ml)
which
eigenspace
belongs
to
the
l
th
modular
Φkl of a GME Φk = (Φ1k ,...Φkl,...Φnk ) k
greedy searching iteration is initially carried out at c0,0 ∈ cX k (mt x mt) for a class ωk to condense a GME set. Each
ci,j is assigned an
for a class ωk, where ml and nk represent,
attribute during a GMET. If the attribute of a
respectively, the number of bands in modular
ci,j j is set as available, it means this ci,j has not
eigenspaces Φ kl , and the total number of
been assigned to any modular eigenspace Φk. If
modular eigenspaces of a GME set Φk, i.e. l ∈
a ci,j is assigned to a modular eigenspace Φkl, the
{1,...,nk } as shown in Fig. 5. The original
attribute of this ci,j is set to used. All attributes
correlation matrix cXk(mt ×mt), where mt is the
of the original correlation matrix cXk are first
total number of original bands, is decomposed
set
into nk correlation submatrices c m1×m1),...c
algorithm is as follows:
k Φ 1(
(ml×ml),...c
the
k Φnk
class
k Φl
k
(mnk×mnk) to build a GME set Φ for
ωk.
There
are
mt !
possible
Step
as
available.
1.
The
proposed
Initialization:
A
new
GMET
modular
k
k l
eigenspace Φ ∈ Φ for a class ωk is
combinations to construct a candidate GME set
initialized
new
correlation
as a CMS set dose. That is to say, it takes mt!
coefficient cd,d, where cd,d
and d are
operations to compose mt! different sets of the
defined as the first available element and
correlation
its
submatrices
associated
with
by
subindex
a
in
the
diagonal
list
different sequences of band order. Only one of
[c0,0,...cm −1,m −1]
them can be chosen as the GME set.
cX respectively. This diagonal coefficient
t
k
t
of the correlation matrix
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
55
cd,d is set to used and then assigned as the
current ci,j is available and its value is
current ci,j , i.e. the only one activated at
larger than tc, then go to step 4. Otherwise,
the current time. Then, go to step 2. Note
go to step 2.
that this GMET algorithm is terminated if the last diagonalcoefficient cd,d is already k set to used and the last subgroup Φnk of
the GME has been obtained for class ωk. Step 2.
If the column subindex j of the
current ci,j has reached the last column (i.e. j = mt − 1)
in the correlation matrix,
then a new modular eigenspace Φ kl is constructed with all used correlation coefficients ci,j ∈
Φ
k l
, these used
coefficients ci,j are then removed from the correlation matrix, and the algorithm goes to step 1 for another round to find a new modular eigenspace. Otherwise, it goes to step 3. Step 3. GMET moves the current ci,j to the next adjacent column ci,j+1
which will act
as the current ci,j, i.e. ci,j →ci,j+1. If the
Step 4.
If j≠d, swap c∗ ,j with c∗ ,d and c j,∗
with cd, ∗
respectively, where the
asterisk symbol ”*” indicates any row or column subindex in the correlation matrix. Fig. 4 and Fig. 6 shows a graphical mechanism of this swapping operation. The attributes of cd, ∗ and c ∗
,d
are then
marked used. Then let ci↔d,d and cd,i↔d∈Φkl , where i ↔ d means including all coefficients between subindex i and subindex d. Go to step 2. Eventually, a GME set, Φk = (Φ1k,...Φkl,...Φnk ), k
is composed for ground cover class ωk . For convenience,
we
sort
these
modular
eigenspaces according to the number of their feature bands, i.e. the number of feature spaces m1,...ml,...mnkk , in descending order.
Figure 6. The original CMPM (White=1 or -1; black=0) and the CMPM after swapping band Nos.0-9 and 30-39.
56
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004
Figure 7. GME sets for the six ground cover types used in the experiment. Each of them can be treated as a unique feature for a distinguishable class.
k
Figure 8. The GME sets for different ground cover classes ωk,...ωj. A GME set Φ is composed of a group of k kl,...Φnk). k modular eigenspaces (Φ1,...Φ
Each square (ml ×ml) is filled with an
In this visualization scheme, we can build
average value of all correlation coefficients
a GME efficiently and bypass the redundant
inside its correlation submatrix cΦ kl (ml×ml).
procedures of rearranging the band order from
Fig. 5 illustrates the original CMPM and the
the original hyperspectral data sets. Moreover,
reordered one after a GMET. Each ground
the GMET algorithm can tremendously reduce
cover type or material class has its uniquely
the
ordered GME set. For instance, in our
compared
experiment, six ground cover types were
conventional
transformed into six different GME sets as
complexity for conventional PCA is on the
shown in Fig. 7.
n m2i) for GME order of O(mt2) and it is O(Σi=1
eigen-decomposition to
the PCA.
computation
feature
extraction
The
of
computational
k
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
57
(Jia and Richards, 1999). It makes good use of
for class ωk, has ml feature bands as shown in
these highly corrected band subgroups to speed
Fig. 5. The basis functions in a PCA are
up the computational time while it compared to
obtained by solving the eigenvalue problem
the PCA computation of the whole set of bands.
Λkl = φkl Σkl φkl
An example in which the GMET was applied
where Σ is the covariance matrix of Φkl , φkl is
to real hyperspectral data is shown in Fig. 8.
the eigenvector matrix of Σ, and Λ is the
After finding the highly correlated GME sets
corresponding diagonal matrix of eigenvalues.
k
T
,
(2)
Φ for all classes, k ∈{1,...,N}, the eigenspace
Moghaddam et al. (1995) decomposed the
projections are executed.
vector space R complementary
ml
into two exclusive and
subspaces:
the
principal
W and subspace (W feature spaces) φkl = {φlik }i=1 —k orthogonal complement subspace φl =
th
ml , where φlik is the i eigenvector of φkl. {φlik }i=W+1
A residual reconstruction error of the modular eigenspace Φkl is defined as: e ( x) = k i
mi
∑y
j =W + 1
2 j
~ 2
= x
W
− ∑ y 2j ,
(3)
j =1
~ — where x = x - x is the mean-normalized vector Figure 9. The RRE for classes ωj and ωk in the
of sample x. yj is the projected value of sample
hyperplane.
x by the eigenvector φlik corresponding with the th
i largest eigenvalue. Here, e kl (x) represents
2.2. Distance Measures: Greedy
the distance between the query sample x and
Modular Eigenspace Projection
the modular eigenspace Φ kl of the training samples. This GME projection also known as
The GME can distinguish different classes well
distance decomposition is applied to all
by their individual highly correlated features of
modular eigenspaces Φkl , l ∈{1,...l,... nk }, to
modular eigenspace. It can make use of the
k ) generate an RRE vector e =(e1k ,... ekl ,... enk
PCA, also known as the Karhunen-Loe´ve
for class ωk. The drawing interpretation of the
Transform (Han and Tsai, 2001), to extract the
RRE vectors is depicted in Fig. 9. For
most significant information as a similarity
convenience, we redefine this distance measure,
measure
RRE, as an RRE-decomposition (distance
from
each
GMET-generated
eigenspace while removing redundancy in the k l
spectral domain. Let us assume Φ , which is th
the l modular eigenspace of the GME set Φ
k
k
decomposition).
58
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004
2.3. Threshold Decomposition
the range (0,1) by the nonlinear sigmoid
After defining the RRE vector e k from the
E lk =
function:
previous stage, the threshold decompositions described below are next performed to create k for normalized and quantized RRE vectors eNQ
e lk − µ lk , a lk
(4)
and e Nk l ( E lk ) =
all classes. The RRE ekl is first normalized to
1 , 1 + exp(-tE lk )
(5)
Figure 10. Determination process of GME/PBF-based multi-class classification scheme for supervised hyperspectral image classification. k
k
where µl , a and t denote the mean and the
k RRE vectors e into the binary RRE vectors eNQ
standard deviation of the RRE e kl
for all classes ωk, where k ∈ {1,...,N}.
threshold
value
normalized RRE
respectively. e
k Nl
This
and a new
is then uniformly
quantized into L−1 levels.
2.4.
Stack
Filter
and
Positive
Boolean Function
Finally, these normalized distance vectors k N
k ) are converted into binary e =(eN1k ,.. eNlk ,.. eNnk k k k k =(e NQl1 ,... e NQll ,...e NQlnk ), where l vectors e NQ
∈ {1,...,L −1}. In Fig. 10, the threshold decomposition function T(·) transforms the
The PBF is developed from a stack filter. Each stack filter corresponding to a PBF possesses the weak superposition property (the threshold decomposition)
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
and the ordering property (the stacking
product-of-sum
property) (Wendt et al., 1986). The class
morphological
of stack filters includes many familiar
(1992) defined a basis class of filters
filter
satisfying
types,
such
as
median
filter,
form
produced
59
operations.
the
by
Dougherty
morphological
basis
weighted median filter, order statistic
criteria and applied the erosion operation
filter and weighted order median filter.
as an estimator to find the optimal
The main function of a stack filter is to
morphological filter by minimizing the
remove noise, detect edges, etc. This
mean square error. Postaire (1993) et al.
technology has received considerable
proposed a binary morphological technique to
interest in communications and signal
cluster the N-dimensional observations. It is
processing
an un-supervised pattern clustering approach
communities.
Wendt
et
al.(1986) defined stack filters as the class
based
of all nonlinear digital filters and pointed
operations. Lin and Coyle (1990) developed a
out the consonant connections between
generalized stack filter, including all rank
stack filters and PBF. Maragos (1987)
order
used
morphological filters. They showed that
mathematical
morphology
to
on
mathematical
filters,
stack
morphological
filters,
and
digital
construct a class of morphological filters
choosing
and their equivalent classes. He also
equivalent
showed that a PBF has exactly one
threshold-crossing decision making when
sum-of-product
these decisions are consistent with each other.
form
or
one
a
generalized to
stack
massively
filter
is
parallel
Figure 11. An example of the use of the proposed PBF-based multi-class classifier corresponding to a stack filter for processing a map classification.
60
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004
In
traditional
classification,
the
supervised Bayes
decision
pattern
positive and negative samples of the binary
theory
RRE generated simultaneously from GME. It
models the classification as a distribution
can
estimation problem. The Bayes theory can be
consideration
used to solve classification problems, but it
PBF-based classifier parameter. This is very
requires complete knowledge of probability
important
distributions. In hyperspectral data, obtaining
knowledge of the data distribution particularly
such complete knowledge is generally difficult
in dealing with hyperspectral data. It also
even in an unsupervised manner. This is
makes multi-class classification possible for
because samples used for training and learning
our proposed method. The proposed binary
may be mixed pixels and do not provide
RRE vector
accurate information. Juang et al. (1997)
characteristics of MCE criteria. It not only
indicated the superiority of MCE method over
improves the classification accuracy but also
the distribution estimation method. The goal of
overcomes the restriction of a limited number
MCE training is to be able to correctly
of training samples which is one of the most
discriminate
common
the
observations
for
best
take
all
competing when
because
classes
searching
we
lack
for
into the
complete
k , fully utilizes the special e NQ
problems of
in
remote
hyperspectral
classification results rather than to fit the
classification
sensing.
In
our
distributions to the data.
approach, an optimal stack filter Sf is defined
An optimal stack filter Sf is originally
as a filter whose MCE between the output and
defined as a filter whose mean absolute error
the desired signals is minimum value. We
(MAE) between the filter’s output and the
apply the window of length n to the number of
desired signals is minimum value. Since it
modular eigenspaces Φ for all classes. As an
possesses
threshold
example, Fig. 10 and Fig. 11 show that the
decomposition and stacking properties, the
windows of fixed length n slides consist of
multi-level MAE value can be decomposed
k , of all classes. They are used as binary RRE eNQ
into a summation of absolute errors occurring
the MCE training parameters for a PBF.
the
well-known
k
at each level. Han (2002) proposed an
At the supervised training stage, each
alternative PBF-based multi-class classification
level of the training samples is assigned to true
scheme based on MCE criteria to resolve
(1) or false (0). We assume there are N ground
multi-class problems. We further apply this
cover classes. We extract only the first n
MCE version of a PBF-based classification
k k k , (eNQ1 ,...eNQn ), as a window of elements of eNQ
scheme
hyperspectral
fixed length n for all classes, where n ≤nk for
imagery. It has the ability to learn from both
all classes ωk and k ∈{1,...,N}, to develop a
to
remote
sensing
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
61
PBF-based multi-class classifier. A PBF is
Based on the threshold decomposition
exactly one sum-of-product form without any
property, an occurrence of binary RRE vector
negative components. The classification errors
e NQij as an input of PBF can be decomposed
can be calculated from the summation of the
into binary vectors
absolute errors incurred at each level. The
k k k ,...e NQijlp ,...e NQijln ), with a window of (e NQijl1
proposed scheme is constructed by minimizing
fixed length n (n dimensions), where p ∈
the classification error rate using the training
{1,...,n}. Let us consider two samples u, v and
samples. In Fig. 10, we assume there are N
k k ≤ eNQvl for all a class ωk. We assume that eNQul
classes of training samples. M stands for a
dimensional elements. If sample u does not
fixed number of training samples for each class
belong to class ωk (an error occurs), sample v
ωk. MN training samples are projected to N
does not belong to class wk. On the other
GME sets and then decomposed into MN
2
k
k e NQijl =
hand, if sample v is an element of class ωk
k binary RRE vectors eNQ in the GME projection
(no error), sample u should be an element of
stage
class ωk. We define Ef (·) as an error function.
and
threshold
decomposition
stage
k respectively. Let xij and e NQ represent MN ij
Thus, Ef
k of training samples and MN binary vectors eNQ
dimensional elements. We further define Ef (x)
length n respectively, where i ∈{1,...,N}, j ∈
k of the to be PBF Bkf(·) of an occurrence eNQxl
{1,...,M}and k ∈ {1,...,N}. Next, the desired
k k class ωk at each level l. If eNQul ≤ eNQvl , then Bkf
values for each occurrence at each level have
k k (e NQul ) ≤ B kf (e NQvl ), satisfying the stacking
to be determined for the output of the PBF. The
property. This gives an indication that the
k can be desired value of an occurrence e NQ ij
stacking property offers the ability to solve
treated as the error value of sample xij at class
the
ωk. If sample xij belongs to class ωk, it means
occurrences are treated as the training
that no error occurs, i.e. the desired value of
occurrences of the classifier. They are
k eNQ is 0. Otherwise, it is equal to 1. ij ⎧0 if i = k, xij ∈ωk k , d (eNQ )=⎨ ij ⎩1 if i ≠ k, xij ∉ωk
decomposed into MN (L−1) binary vectors of
2
≤Ef
k k if e NQul ≤ e NQvl for all
(v),
classification
problems.
The
MN
2
2
(6)
According to the PBF criteria (Han, 2002), a k f
(u)
k
length n. The desired value d(e NQij ) for each k occurrence eNQ is determined by Eq. (6). The ij
classification error (CE), which is defined as
(·) is defined as an
the expected value (E) of the differences
at level l in a window of
between the desired values d(e ij ) and the
fixed length n (the number of Boolean binary
stack filter’s binary outputs Sf (e ij ), is
variables) for the class ωk as shown in Fig. 11,
obtained by
Boolean function B occurrence e
k NQijlp
where l∈{1,...,L−1} and p∈{1,...,n}.
k
k
CE = E d ( e ijk ) − S f ( e ijk )
62
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004
⎡ L=1 ⎤ = E ⎢| ∑ d (Tl (eijk )) − Blk (Tl (eijk )) |⎥ ⎣ l =1 ⎦ (by threshold decomposition property)
(by stacking property)
=
∑ E [| d (e
k NQij
L =1 N
N
L =1 l =1
⎡ L =1 ⎤ k k = E ⎢∑ | d (eNQ ) − B kf (eNQ ) |⎥ ij ij ⎣ l =1 ⎦
=
M
k ) − B kf (e NQ )| ij
∑∑∑∑ | d (e l =1 i =1 j =1 k =1
k NQij
]
k ) − B kf (eNQ )|, ij
(7)
Fig. 12 MASTER images of different bands combination (left: RGB-bands 5-3-1, mid:RGB-bands8-5-3, right: RGB-bands 44-13-3)
Figure 13. Classified map of the Au-Ku test site using GME/PBF-based multi-class classification method.
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
63
where a stack filter Sf (·), a threshold function
MODIS/ASTER airborne simulator (MASTER)
T e (·) at level e, and a Boolean function Bkf(·)
instrument, a hyperspectral sensor instrument,
are used at each level. Furthermore, we can
as part of the PacRim II project (Hook et al.,
reformulate the CE value as the minimal sum
2001). The MASTER images of the test area
n
over the 2 possible binary vectors of length n.
with different band combination are displayed
It will be a time-consuming procedure to find
in Fig. 12. A plantation area in Au-Ku on the
an optimal stack filter if n is a large number.
east coast of Taiwan shown in Fig. 13 was
Referring to our previous work (Han, 1987), a
chosen for study. A ground survey was made
graphic
further
of the selected six land cover types at the same
improves the searching process by utilizing the
time. The MASTER is most appropriate for
greedy and constraint satisfaction searching
this study because it is a well-calibrated
criteria. It guarantees that the output filter is a
instrument and it provides spectral data in the
global
visible to shortwave infrared region of the
search-based
optimal
algorithm
solution.
Finally,
a
classification map, Ω, can be generated by this PBF-based
multi-class
classifier.
The
electromagnetic spectrum. The
proposed
PBF-based
multi-class
mechanism illustrated in Fig. 11 is an example
classification method was applied to 35 bands
of the use of the proposed PBF-based
selected from the 50 contiguous bands of
multi-class
map
MASTER, excluding the low signal-to-noise
classification. All PBF Bkf(·) of different classes
ratio mid-infrared channels (Hook et al., 2001).
at each level can simultaneously calculate the
A, B, S, P, O and R stand for six ground cover
PBF binary outputs to classify a test sample.
classes, sugar cane A, sugar cane B, seawater,
The outputs of the stack filter, i.e. the
pond, bare soil and rice (N = 6). The criterion
summations of the PBF binary outputs at each
for calculating the classification accuracy of
level for each class ωk, are compared to each
experiments was based on exhaustive test cases.
other and the one which has the smallest value
One hundred and fifty labeled samples were
is chosen as the final decision class for that test
randomly collected from ground survey data
sample.
proposed
sets by iterating every fifth sample interval for
PBF-based method can be used as a basis for a
each class. Thirty labeled samples were chosen
multi-class classifier.
as training samples, while the rest were used as
classifier
This
shows
to
process
that
our
a
3. EXPERIMENTAL RESULTS The image data were obtained by the
test samples, i.e. the samples were partitioned into 30 (20%) training and 120 (80%) test samples (M = 120) for each test case. Three correlation coefficient threshold values, tc =
64
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004 k
0.75, 0.80 and 0.85, were selected to carry out
projection and generate RRE vectors e . Finally,
GMET. Four different window lengths, n ∈
the accuracy was obtained by averaging all of
{3,4,5,6}, were selected for each class. Based
the multiple combinations stated above. An
on PBF-based multi-class classification criteria,
example of an error matrix is given in Table 1
there were NM test samples (NM = 720) for
to illustrate the accuracy assessment of the
each class. The cumulative percentages of
PBF-based multi-class classification method.
eigenvalues were set as 90% to perform GME Table 1. An example of an error matrix for clear boundary sampling. A summary of classification errors is appended below.
Two
different
traditional
feature
feature bands obtained by using the MED of
extraction techniques, minimum Euclidean
the original data sets. 5 PCA stands for the first
distance (MED) and conventional PCA, were
five principal components of PCA of the
applied to the same PBF-based multi-class
original data sets. As shown in Fig. 14, the
classifier for accuracy comparison. In Fig. 14,
accuracy rates are improved by enlarging the
5 GME specifies a window of fixed length five
feature dimensions for GME and MED, but not
(n = 5) that was selected for each class. 5 EUB,
those of PCA. It can be interpreted that the last
5 EAB and 5 ELB represent respectively the
few PCA components contain some global
best five upper bound feature bands, the
variances which can be treated as noise in
average five bands (between upper bound and
MASTER data sets. Compared to MED and
lower bound) and the worst five lower bound
PCA, GME feature extraction is very suitable
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
65
for the PBF-based multi-class classifier in
means that the chosen labeled samples are pure
general.
pixels without ambiguity. A vague version of a
There is a decline in accuracy rates for
GME error matrix is shown in Table 2. It uses
GME when six ground covers are selected.
the same conditions as described above but
This is caused by the fact that some classes of
applies to vague boundary sampling. Compared
high-dimensional data sets have only a few
to Table 1, Table 2 has better classification
effective modular eigenspaces Φkl inside their
accuracy. It proves that PBF-based multi-class
k
own GME set Φ . In Fig. 7, we can see that
classifiers do have more nonlinearly discrete
some ground cover classes, such as rice (R),
capacity to increase class separability if vague
k
boundary sampling is used to develop the
have many ineffective modular eigenspaces Φ
which have small numbers of feature bands ml
classifiers.
k
in their GME set Φ . Note that we have
The discrete and nonlinear properties of
k
the proposed PBF-based multi-class classifier
previously sorted and reordered every Φkl in Φ
according to their number of ml by descending
are
order. If we choose a large window of length n,
classification.
i.e. the top modular eigenspaces (Φ1k,...Φkl,...Φnkk )
evaluation of classification accuracies under
k
considered In
to
be
Table
advantages 3,
a
in
summary
of Φ , then some classes which have ineffective
different conditions illustrates the validity of
modular eigenspaces in n will be misclassified.
these unique properties of proposed PBF-based
This is why the accuracy rates fall when the
multi-class classifiers.
sixth ground cover class is included in Fig. 14.
The encouraging results have shown that an
It can be interpreted as a great benefit in that
adequate classification rates, almost near 94%
the
useful
average accuracy for the best case, are archived
measurement
with only few training samples. Shown in Fig.
before we choose a proper window of length n
13 is an example of the classified map obtained
for each training class as the parameters to
by
build an efficient classifier.
classification
GME
provides
pre-classification
a
estimation
very
Interestingly, we demonstrated a case in
applying
GME/PBF-based schemes
to
the
multi-class MASTER
high-dimensional data sets.
which there was a difference in classification accuracy between vague and clear boundary sampling. By vague boundary sampling we mean that some mixed pixels in the uncertain boundaries are chosen when we visually select labeled samples from the hyperspectral data sets. Conversely, clear boundary sampling
4. CONCLUSIONS In this paper, a sophisticated GME/PBF-based multi-class
classifier
is
proposed
for
hyperspectral supervised classification. We first introduce the GME which can be obtained
66
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004
by a quick greedy band reordering algorithm. It
dealing with hyperspectral data sets in which
is
computational
training data are always inadequate and
complexity. The GME is built by grouping
knowledge of the data distribution is usually
highly correlated bands into a small set of
incomplete. This MCE characteristic can best
bands. The GME can be treated as not only a
harmonize
preprocess of the filter-based classifier but also
classifiers with the features extracted from
a
efficient
unique
explores
with
little
spectral-based correlation
feature
among
the
PBF-based
multi-class
set
that
GME. It improves classification accuracy
bands
for
significantly and fully promotes multi-class
high-dimensional data. It makes use of the
classifiers instead of pairwise classifiers.
potential separability of different classes to
The experiments validate the utility of the
overcome the drawback of the common
GME/PBF-based multi-class classifier. The
covariance bias problems encountered in
highly correlated property of the GME can help
conventional PCA. The characteristic of GME
us develop a suitable preprocessing algorithm
is suitable for multi-class classifier. The
for
proposed
Furthermore, we can take the benefits of the
GME/PBF-based
multi-class
hyperspectral
data
compression.
classifier enhances the separable features of
parallel
different classes to improve the classification
computation speed and achieve real time
accuracy
conducted
computation. These valuable advantages for
experiments demonstrated the validity of our
practical implementation will be included in
proposed
our future study.
significantly.
The
GME/PBF-based
multi-class
property
in
GME
to
increase
classification scheme. The proposed PBF-based multi-class classifier
ACKNOWLEDGMENTS
is developed to effectively find nonlinear boundaries of pattern classes in hyperspectral data. Combining the GMET algorithm with the PBF-based
multi-class
classifier
provides
unique advantages for hyperspectral image classification. threshold
It
possesses
decomposition
and
well-known stacking
properties. The advantages of a PBF-based multi-class classifier are its discrete and nonlinear binary properties. It utilizes the MCE learning ability to improve the classification accuracy particularly in
The authors would like to thank the Center for Space and Remote Sensing Research of National Central University for providing MASTER datasets used for experiments in this paper.
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
67
Figure 14. Classification accuracy comparison of three feature extraction techniques, GME, MED and PCA, using the same PBF-based multi-class classifier. Table 2. Classification error matrix for vague boundary sampling. A summary of classification errors is appended below.
Table 3. Summary evaluation of classification accuracy for different boundary sampling types and number of modular eigenspaces.
68
Journal of Photogrammetry and Remote Sensing Volume 9, No. 4, December 2004
2001, ”The MODIS/ASTER airborne
REFERENCES
simulator (MASTER) -a new instrument for
Dougherty, E.R., 1992, ”Optimal Mean-Square N-Observation
Digital
Remote
Jia, X. and Richards, J. A., 1999, ”Segmented principal components transformation for
55–72. R.O.
1973,
studies,”
93–102.
Filters. II. Optimal Gray-Scale Filters,”
Duda,
science
Sensing of Environment. 76(1), pp.
Morphological
CVGIP: Image Understanding, 55(1), pp.
earth
and
Hart,
”Nonparametric
Pattern
Classification
efficient hyperspectral
P.E.,
remote-sensing
Techniques,”
image display and classification,” IEEE
and
Trans. Geosci. Remote Sens., 37(1), pp.
Scene
538–542.
Analysis, John Wiley & Sons, New York. Han, C. C., Fan, K. C. and Chen, Z. M.,
Jimenez,
L.
O.
and
Landgrebe,D.
1997, ”Finding of optimal stack filter by
A.,1999, ”Hyperspectral data analysis
graphic searching methods,” IEEE Trans.
and supervised feature reduction via
Signal Processing, 45(7), pp. 1857–1862.
projection pursuit,” IEEE Trans. Geosci. Remote Sens., 37(6), pp. 2653–2667.
Han, C. C. and Tsai, C. L., 2001, ”A face
verification
Juang, B. H., Chou, W. and Lee, C. H.,
filter-based
integration,”
1997, ”Minimum classification error rate
IEEE Int. Carnahan Conf. Security
methods for speech recognition,” IEEE
Technology, pp. 278–281.
Trans. Speech and Audio Processing,
multi-resolutional system
via
5(3), pp. 257–265.
Han, C. C., 2002, ”A supervised classification scheme using positive boolean function,”
Kumar, S., Ghosh, J. and Crawford,M. M.,
accepted for publication in International
2001,
Conference
algorithms
on
Pattern
Recognition,
J.
1994,
and
Chang,
”Hyperspectral
classification
and
reduction:
an
projection
approach,”
IEEE
image
subspace Trans.
Geosci. Remote Sens., 32(4), pp. 779–
classification
of
Lee,
C. 1993,
and
Landgrebe,
”Analyzing
D.
A.,
high-dimensional
multispectral data,” IEEE Trans. Geosci. Remote Sens., 31(4), pp. 792–800. Lin, J. H. and Coyle, E. J., 1990, ”Minimum mean absolute error estimation over the
785. Hook, S. J., Myers, J. J., Thome, K. J., Fitzgerald
extraction
Remote Sens, 39(7), pp. 1368–1379.
C.-I,
dimensionality
orthogonal
for
feature
hyperspectral data,” IEEE Trans. Geosci.
IEEE, Quebec, Canada, pp. 100–103. Harsanyi,
”Best-bases
M.
and
Kahle,
A.
B.,
class of generalized stack filters,” IEEE Trans.
Acoust.,
Speech,
Signal
Yang-Lang Chang, Chin-Chuan Han, Kuo-Chin Fan, and K .S. Chen: A Novel Approach to Supervised Hyperspectral Image Classification
Processing, 38, pp. 663–678. Maragos,
P.
and
Schafer,
R.
S.,
1987, ”Morphological filters. Part II: Their relations to median, order-statistic, and stack filters,” IEEE Trans. Acoustics, Speech, and Signal Processing, 35, pp. 1170–1184. Moghaddam,
B.
and
Pentland,
A.,
1995, ”Probabilistic visual learning for object
detection,”
in
Proc.
5th
International Conference on Computer Vision, Boston, MA., pp. 786–793. Pentland, 1994,
A.
and
Moghaddam,
”View-based
and
B.,
modular
eigenspaces for face recognition,” IEEE Computer
Society
Computer
Vision
Conference and
on
Pattern
Recognition, pp. 84–91. Postaire, J.G., Zhang R.D. and Lecocq-Botte, C., 1993, ”Cluster analysis by binary morphology,”
IEEE
Trans.
Pattern
Analysis and Machine Intelligence, 15, pp. 170–180. Richards,
J.
A.
and
Jia,
X.,
1999, ”Interpretation of Hyperspectral Image Data,” Remote Sensing Digital Image Analysis, An Introduction, 3rd ed., Springer-Verlag, New York. Wendt, P. D., Coyle, E. J. and Gallagher, N. C., 1986, ”Stack filter,” IEEE Trans. Acoustics, Speech, and Signal Processing, 34(4), pp. 898–911.
69
70
航測及遙測學刊
第九卷
第四期
民國 93 年 12 月
一個新穎的方法實現高光譜監督式遙測影像分類 張陽郎 1
韓欽銓 2
范國清 3
陳錕山 4
摘要 「高光譜」遙測影像 (Hyperspectral Imagery) 為遙測影像之先進技術,遙測影像頻譜解析度由原數 個頻譜解析度的一般感測器、至數十個頻譜解析度之「多頻譜感測器」(Multispectral)、到數百個頻譜解 析度的「高光譜感測器」(Hyperspectral)、乃至於數千個頻譜解析度之「超高光譜感測器」(Ultraspectral), 持續地進步演進著。 「高光譜」解析度感測器已廣泛應用於衛星遙測影像之識別、醫學影像的診斷、工 業產品之檢驗、飛機及其他精密機器設備之非破害性檢查等應用上, 「高光譜」遙測影像技術業已成為 遙測影像中一個新興且重要的研究領域。本文提出一個適用於「高光譜」遙測影像分類的新演算法, 主要有兩個實現步驟,第一個步驟為「貪婪模組特徵空間」(Greedy Modular Eigenspaces),第二為「布 林濾波器」(Positive Boolean Function)。藉由校正過後的完整台灣『高光譜』遙測影像資料,以及實地 測量的地表真實資料,來實際證明「貪婪模組特徵空間」的方法提供了一個絕佳的特徵抽取方式,並 為一個最適合「布林濾波器」分類方法的前處理器。本文詳細討論「貪婪模組特徵空間」演算法之推 導、完整描述「布林濾波器」的基礎理論,以及詳細分析他們之間的關係,並針對二者的特性加以推 演,提出適用於一般「高維資料」(High-Dimensional Data)資料分類的解決方法。最後經由實驗驗證並與 其他傳統「多頻譜感測器」遙測影像資料分類方法作一比較,印證了本方法非常適用於「高維資料」 分類的特性。
關鍵字: 主軸分析、高光譜監督式分類、貪婪模組特徵空間、布林函數、堆疊濾波器
1. 2
國立臺北商業技術學院資訊管理系副教授
收到日期:民國 92 年 11 月 11 日
.國立聯合大學資訊工程學系副教授
修改日期:民國 93 年 07 月 25 日
3.
國立中央大學資訊工程學系暨研究所教授
4.
國立中央大學太空及遙測研究中心暨太空科學研究所教授
接受日期:民國 93 年 07 月 27 日