Laboratory for Applications .... Linear Programming ...... CPU Times (in Sec) for Training ...... In this application the event X _ Ixl,xe,...,xnl is n ...... 8085. 80.46. 7927. 79.17. 78.47. 77.58. 76.49. 6865. 80.06. 80.26. ___qf__pj x e I ...... Architecture.
/
/
Statistical Methods and Neural Network Approaches for Classification of Data from Multiple Sources
Jon Atli Benediktsson Philip H. Swain
TR-EE 90-64 December 1990
Laboratory for Applications Remote Sensing
of
and
School of Electrical Engineering Purdue University West Lafayette, Indiana 47907 iwo 1-2
G5/03
37_0
Oncl as 0016,425
STATISTICAL METItODS FOR CLASSIFICATION
Jon Atli
Laboratory
AND NEURAL NETWORK APPROACtII,;S OF DATA FROM MULTIPLE SOURCES
Benediktsson and Philip TR-EE 90-64 December 1990
for Applications
H. Swain
of Remote
Sensing
and
School West
This
work
EOS-like
was supported Data"
(Principal
in part
of Electrical Engineering Purdue University I,afayel,te, Indiana 47907
by NASA
Investigators:
David
Grant
No.
NAGW-925
A. Landgrebe
"Earth
and Chris
Observation
Johannsen)
Research
- Using
Multistage
ACKNOWLED
The
authors
would
like to thank
GME NT S
Dr. Okan
K. Ersoy
for his contributions
to this research. The
Anderson
River
loaned
by
tile
Mines,
and
Resources,
was now
originally at
Canada
of the
acquired,
Colorado
SAR/MSS Centre
for
Government
preprocessed,
State
data
University.
set
Remote
was
acquired,
Sensing, of Canada.
and
loaned
Access
to
preproeessed,
Department The
Colorado
by Dr. Roger both
data
sets
and
of Energy
Itoffer,
data
set,
who
is
is gral_efu]ly
acknowledged. The Space
research
Administration
was
supported through
Grant
in part
by
the
No. NAGW-925.
National
Aeronautics
and
°°.
111
TABLE
OF CONTENTS
Page LIST
OF TABLES
.........................................................................................
vi
LIST
OF FIGURES
.......................................................................................
xv
ABSTRACT
................................................................................................
CHAPTER
1 - INTRODUCTION
1.1
The
1.2
Two Different'
Research
1.3 Report CHAPTER 2.1 2.2
Problein
STATISTICAL
A Survey
.........................................................................
Classification
Organization
2
...................................................................
of Previous
Approaches
..............................................
............................................................................ MLTttOI)S Work
.....................................................
.................................................................
Consensus Theoretic Approaches ......................................................... 2.2.1 Linear Opinion l)ools .................................................................. 2.2.2 Choice of Weights for I,inear Opinion ])()()Is .............................. 2.2.3
Linear Opinion 1'oo1,_ for Multisource Classification ..............................................................................
2.2.4 The 2.3 Statistical
Logarithmic Multisource
Opinion Pool .................................................. Analysis .........................................................
2.3.1 2.3.2 2.3.3
Controlling Reliability Association
the Influence of the Data Sources ............................ Measures ................................................................... .................................................................................
2.3.4 2.3.5
Linear Programming Approach ................................................. Non-Linear Programming Approach ..........................................
2.3.6 Bordley's Log Odds Approach ................................................... 2.3.7 Morris' Axiomatic Approach ...................................................... 2.4 Group Interaction Methods ................................................................ 2.5 The Super Bayesian Approach ........................................................... 2.6 Overview of the Consensus Theoretic Approaches .............................. 2.7
Classification
of Non-Gaussian
Data
..................................................
xvii 1 1 2 .t 6 6 9 10 l:i 20 21 25 29 31 38 39 ,12 ,t3 ,14 46 46 48 4.(}
iv
Page 2.7.1
Itistogram
2.7.2
Parzen Density g,,timation ........................................................ Maximum Penalized Likelihood Estimators ...............................
5(} 51
Discussion
.................................
52
.................................
53
2.7.3 2.7.4 CHAPTER 3.1 3.2
3.3 3.4
of Density
3 - NEURAL
Estimation
NETWORK
Methods
APPROACHES
49
Neural Network Methods for Pattern Recognition ............................. Previous Work ................................................................................... 3.2.! The Delta Rule ..........................................................................
53 58 62
3.2.2 The Backpropagation "Fast" NeLlral Ne, tworks
63 66
Algorithm ............................................... .....................................................................
Including 3.4.1 The
St:t_,istics in Neural Networks ............................................. "Probabilistic Neural Network". .........................................
68 6(J
3.4.2
Tile
Polynomial
71
3.4.3 3.4.4
Higher Order Neural Overview of Statistics
CHAPTER 4.1 4.2
Appron,ch ..................................................................
Adaline
............................................................
Networks .................................................. in Neural Network
73
Models ........................................................................................
74
4 - EXPERIMENTAL
76
Source-Specific The Colorado
RESULTS
...............................................
Probabilities ............................................................... Data Set .......................................................................
77 78
Statistical Approaches .................................................. Neural Network Models ................................................
7(9 87
4.2.1 4.2.2
Results: Results:
4.2.3
Second
Experiment
4.2.4
Results Methods
of Second Experiment: Statistical .....................................................................................
4.2.5
Results Network
of the Second Experiment: Neural Methods .....................................................................
125
4.2.6
Summary
.................................................................................
131
on Colorado
Data
.......................................
94 94
4.3
Experiments 4.3.1 Results: 4.3.2 Results:
with Anderson River Data ........................................... Statistical Methods ..................................................... Neural Network Methods ............................................
135 139 168
4.4
Experiments with Simulated HIRIS Data ......................................... 4.4.1 20-Dimensional Data ................................................................
174 175
4.4.2 4.4.3
40-Dimensional 60-Dimensional
184 193
4.4.4
Summary
Data Data
................................................................ ................................................................
.................................................................................
218
Page CHAPTER 5 - CONCLUSIONSAND SUG(IEST1ONSFOR I,'UTURE WORK ..........................................................................221 5.1 Conclusions...................................................................................... '2'21 5.2 LIST
Future
l{esearch
OF REFERENCES
Direct,ions
..............................................................
.............................................................................
225 22S
vi
LIST OF TABI,ES
Table
Page
3.1 SampleSizeRequiredin ParzenDensity Estimation when Estimating a Standard Multivariate Normal Density Using a Normal Kernel I45}.............................................................................. 72 4.1
4.2
"['raining and Test Samples for Information Classes in the First Experiment on tile Colorado Data Set ................................. Classification Minimum
4.3
4.4
Statistical
Statistical Colorado
4.6
Equivocation
4.7
4.8
4.9
Euclidean
Classification Results Minimum Euclidean
Colorado 4.5
Results
for Training Distance
is Applied
.................................
82
for Test Samples ])istance Classifier
when is Applied
.................................
82
Classification
Training
Multisource Data:
Test
when
Classifier
Multisource Data:
Samples
80
Samples Classilication
Samples
........................................................... of
..................................................................
85 86
Conjugate Colorado
Gradient Linear Classifier Applied to Data: Training Samples ..........................................................
89
Conjugate Colorado
Gradient Linear Classifier Applied to Data: Test Samples ................................................................
89
Colorado
Gradient Data:
Sources
84
...........................................................
Conjugate
of tile Data
of
Backprot)agation
Training
Samples
Applied
to
..........................................................
91
vii
Table 4.10
l'age
Conjugate
Gradient
Colorado ,t.ll
4.12
Data:
Samples
to
...............................................................
91
Test Saint)los ['or Information t';xpcrimcnt on the Colorado
Classes Data Set .............................
Classification
Results
when
for Training
li;uclidean
Classification Minimum
4.14
Test
Applied
Training and in the Second
Minimum 4.13
Backt)ropagation
Distance
Results
Pairwise
Classifier
for Test
Euclidean
is Applied
Samples
Distance
JM Distances
Samples
the
................................
96
................................
96
when
Classifier
Between
is Applied
10 Information
Classes
in the Landsat MSS Data Source (Maximum Sel)arability is 1.41421.) .............................................................................................. 4.15
Statistical
Multisource
Classification
98
of Colorado
Data when Topographic Sources were Modeled by ! tistogram A pproac}l: Training Samph's .............................................. 4.16
Statistical
Multisource
Classification
!}5
1{):_
of Colorado
Data when Topographic Sources were Modeled by Histogram Approach: Test Samples .....................................................
104
4.17
Lqu]voeatlon
l()f;
4.18
Equivocation of Topographic Respect to Different Modeling
4.19
4.20
4.21
of MS,
Data
Soure(, ....................................................... Data Sources with Methods ................................................
106
Linear Sources
Opinion Pool At)plied to Colorado Data were Modeled by llistogram Approach:
Set. Topographic Training Samples
Linear Sources
Opinion Applied to Colorado were Modeled by Itistogram
Topographic Test Samples
Statistical when
Multisource
Topographic
Likelihood
Method:
(;lassification
Sources Training
Data Set. Approach: of Colorado
were, Modeled Samples
...... 108
.............
109
Data
by Maximum
Penalized
.................................................
11'2
°°o
Vlll
'Fabh_ 4.22
P age
StaListical
Multisource
when Topographic Likelihood Method: 4.23
Linear when
Opinion
4.24
Pool
Method:
Statistical
Method:
Test
Multisource
Statistical
Linear when
Linear
Pool
4.30
4.31
Source-Specific Classification):
Penalized
................................................. Data Set by Maximum
of Colorado
of Colorado
Pool
to Colorado were Modeled
Training
when Topographic Density Estimation: 4.29
Applied
Applied
Samples
Data
Colorado
Data:
Training
Classifier Samples
120
Set
by Parzen
Data
Applied
123
Set
CPU Time (Training Plus Landsat MSS Data Source ..........................................
Linear
119
Data
Sources were Modeled by Parzen Test Samples ........................................................
Gradient
117
Data
Source-Specific CPU Times (Training Plus Classification) for Topographic Data Sources with Respect to Different Modeling Methods ........................................ Conjugate
116
Penalized
.................................................
to Colorado
113
Set
........................................................
Classification
Sources
Estimation: Opinion
Samples
Data
by Maximum
Sources were Modeled by Parzen Test Samples ........................................................
Topographic
Density 4.28
Samples
Classification
Multisouree
Opinion
Data
Sources were Modeled by Parzen Training Samples .................................................
when Topographic Density Estimation: 4.27
to Colorado were Modeled
Training
when Topographic Density Estimation: 4.26
Applied
Sources
Linear Opinion Pool Applied to Colorado when Topographic Sources were Modeled Likelihood
4.25
of Colorado
Sources were Modeled by Maximum Penalized Test Samples ........................................................
Topographic
Likelihood
Classification
124
126
126
to
.......................................................
127
ix
Table 4.32
.t.33
I 'age Conjugate Colorado Conjugate Neurons
4.34
Conjugate Neurons
4.35
Conjugate Neurons
4.36
Conjugate Neurons
4.37
4.38
Gradient Applied
Gradient Applied
Applied
Training
Backpropagation
with
Data:
with
Data:
Training
Backpropagation
to Colorado
with
l)ata:
Test
Conjugate
Gradient Applied
Samples
with
Data:
Test
.......................
129
8 Hidden ..............................
129
16 Hidden Samples
.......................
J30
16 tlid(ten
Samples
..............................
with 32 Hidden Training Samples
Backpropagation
to Colorado
127
8 tlidden
Test, Samples
Baekpropagation
to Colorado
Gradient
with
Data:
to Colorado
Gradient Applied
Backpropagation
to Colorado
Conjugate Gradient, Backpropagation Neurons Applied to Colorado Data:
Neurons 4.39
Gradient Linear Classifier Applied to Data: Test Samples ..............................................................
.......................
1:_0
132
32 ttidden Samples
..............................
132
Information Classes, Training and Test Samples Selected from the Anderson River Data Set .........................................
137
4.40
Pairwise
JM Distances:
ABMSS
..................................................
138
4.41
Pairwise
JM Distances:
SAR
Shallow
Data ..........................................
138
4.42
Pairwise
JM Distances:
SAR
Steep
4.,13
Classitlcation tlesults ot' Training Samt)les for the Anderson River l)at,a Set when the Minimum ]"uclidear_ Distance Method and the Maximum Likelihood Method Gausslan
4.44
Data
are Applied
Classification l{esults Anderson River Data Distance Method and Gaussian
Data
Data
Data .............................................
are Applied
for
..................................................................
of Test, Samples for the Set when the Minimum Euclidean t,he Maxirnurn Likelihood Method
138
143
for
.................................................................
1,13
X
Table 4.45
4.46
Page Statistical
Multisource
River
Data:
were
Modeled
Statistical
Classification
Training
Samples.
by tlistogram
Multisource
of Anderson
Topographic Approach
...............................................
Classification
River Data: Test Samples. were Modeled by tlistogram
Sources
of Anderson
Topographic Sources Approach ...............................................
The
Equivocations
Sources ...............................
149
4.48
The with
Equivocations of the Non-Gaussian Data Sources Regard to the Three Modeling Methods Used .............................
149
Linear
Opinion
Pool
Anderson River Data: Sources were Modeled 4.50
Linear
Opinion
Pool
Anderson River Data: Sources were Modeled 4.51
Statistical
Multisource
Applied
Data
147
4.47
4.49
of the Gaussian
146
in Classification
of
Training Samples. Topographic by Histogram Approach .................................. Applied
in Classification
Test Samples. by Histogram
of
Topographic Approach ..................................
Classification
Statistical
Multisource
Classification
River Data: Test Samples. were Modeled by Maximum 4.53
Linear Opinion Anderson River Sources
4.54
Linear
were Opinion
Statistical
.................
155
.................
156
of Anderson
Topographic Penalized
Sources Likelihood
Method
Pool Applied in Classification of Data: Training Samples. Topographic
Modeled Pool
Anderson River Data: Sources were Modeled 4.55
152
of Anderson
River Data: Training Samples. Topographic Sources were Modeled by Maximum Penalized Likelihood Method 4.52
151
Multisource
by Maximum Applied
Penalized
in Classification
Test Samples. by Maximum Classification
Likelihood
Method
..... 158
Method
..... 159
of
Topographic Penalized Likelihood of Anderson
River Data: "['raining Samples. Topographic were Modeled by Parzen Density Estimation
Sources ......................................
161
xi
Table
4.56
4.57
] ) age
Statistical Data:
were
Modeled
4.59
4.60
Classification
Samples.
by Parzen
4.63
Source-Specific Classification
Estimation
......................................
162
Density
[';stimation
.........................
164
.........................
1{;5
CPU Times (ill Sec) for Training Plus of Gaussian Data Sources ..............................................
167
Source-Specific CPU Times (in Sec) for Training Plus Classification of Non-(laussian l)ata Sources
Conjugate
to Different Gradient
Modeling
I,inear
Methods
.......................................
167
Classifier
of the Anderson Samples ......................................................
Conjugate Gradient I,inear Classifier Applied in Classification of the Anderson River Data Set: Test Samples .............................................................
170
170
Conjugate Gradient Backpropagation Applied in Classification of the Anderson Data
Set,: Training
Samples
......................................................
171
Conjugate Gradient Backpropagation Applied in Classification of the Anderson River
4.65
Sources
Linear Opinion Pool Applied in Classification of Anderson River Data: Test Samples. "l'opogral)hic Sources were Modeled by l'arzen Density l_;stimal,ion
River 4.64
Density
by Parzen
Applied in Classification River Data Set: Training 4.62
of Anderson
Topographic
Pool Applied in Classification (ff Data: Training Samples. Topographic
were Modeled
with Regard 4.61
Test
Linear Opinion Anderson River Sources
4.58
Multisource
River
Data
Pairwise Simulated
Set: Test
Samples
.............................................................
JM 1)istances for the 20-Dimensional tIIRIS Data ........................................................................
171
176
xii
Table
Page
4.66 Minimum EuclideanDistanceClassifierApplied to 20-DimensionalSimulatedHIR1SData: Training Samples..................177 4.67 Minimum EuclideanDistanceClassifierApplied to 20-DimensionalSimulatedHIRIS Data: Test Samples........................177 4.68 Maximum Likelihood Method for GaussianData Applied to 20-DimensionalSimulatedHIRIS Data: Training Samples............. 178 4.69 Maximum LikelihoodMethod for GaussianData Applied to 20-DimensionalSimulatedHIRIS Data: Test Samples....................178 4.70 ConjugateGradient BuckpropagationApplied to 20-DimensionalSimulatedHIRIS Data: Training Samples............. 179 4.71 4.72
Conjugate Gradient Backpropagation to 20-1)imensional Simulated HIRIS Conjugate
Gradient
to 20-l)imensional
4.73 4.74 4.75
4.77 4.78
Classifier
Simulated
HIRIS
Conjugate Gradient Linear to 20-Dimensional Simulated Pairwise
JM Distances
Data:
Training
Samples
Samples
Minimum
Euclidean
Distance
179
.............
....................
180
180
40-Dimensional
H1RIS Data ........................................................................
Simulated
....................
Applied
Classifier Applied HIRIS Data: Test
for the
Samples
Simulated
40-Dimensional
4.76
Linear
Applied Data: Test
Classifier
HIRIS
Data:
Minimum Euclidean Distance Classifier 40-Dimensional Simulated HIRIS Data:
Applied Training
185
to Samples
Applied to Test Samples
..................
........................
Maximum Likelihood Method for Gaussian to 40-1)imensional Simulated HIRIS Data:
Data Applied Training Samples
Maxinmm Likelihood Method to 40-Dimensional Simulated
Data Applied Test Samples ....................
for Gaussian H1RIS Data:
.............
186
186
187
187
XII!
Table 4.79
4.80
4.81
4.82
4.83
|)age
Conjugat, e Gradient Backpropagation to 40-Dimensional Simulated ItIRIS
Applied Data: Training
Conj ugate G radient Backpropagation to 4()-l)im(msional Simuh_ted I IIRIS
A pt)lied l)ata: 'l'rst
Samples
.............
188
S:/u_t)l(,, ....................
I88
Conjugate Gradient l,in(,ar Classifier Applied t,o 40-Dimensiot_al Simulated IIII{.IS Data: Training Sanlt)lr,_ ..................
IS!)
Conjugate Gradient Linear Classifier Applied to 40-1)itn(,nsion:d Simulate(t I llI{IS I)ata: Test Sanlples
I S(.I
........................
Minimuin Euclidean Distance Classifier 60-Dimensional Simulated tlIRIS Data:
Applied Training
Minimum Euclidean Distance Classifier 60-Dimensional Simulated HIRIS Data:
Apt)lied to Test Samples
4.85
Pairwise
JM Distances
forl)ata
Source
://'1........................................
1:_8
4.86
Pairwise
.IM l)istances
for Data
Source
//2 ........................................
1:_8
4.8,t
4.87
Statistical H[RIS
4.88
Statistical HIRIS
4.89
I)ata
Statistical I-]IRIS
4.92
Data
Statistical ltlRIS
4.91
Data
Statistical HIRIS
4.90
Data
Data
Statistical HIRIS
Data
Mult,isource (100 Training Multisource (200 Training Multisource (300 Training Multis(mrce
Classification and
and
and
(,t00 '"[rai "r"_mg and Multisource (500 Training Multisource. (600 Training
and
Samples
200
Samples
per ('?lass) ...............
201
Samples
per (7:lass) ...............
202
Samples
per (.:lass) ...............
203
of Simulated
175 lest,
Samples
Classification
of Simulated
and
Samples
75 Test
per Class) ...............
of Simulat(_d
275 Test
Classitication
196
of Simulated
375 Test
Classification
........................
1!16
of Simulated
475 Test
Classitlcation
.......
of Simulated
575 Test
Classification
to _ amples
per (,lass)
................
per Class) ................
04
205
xiv
Table
Page
4.93 Source-SpecificEquvivocationsfor Simulated HIRIS Data Versus Number of Training Samples...............................206 4.94 Source-SpecificJM Distancesfor Simulated HIRIS Data VersusNumber of Training Samples..........................................206 4.95
Linear HIRIS
4.96
l.inear HIRIS
4.97
Linear HIRIS
4.98
Linear HIRIS
4.99
Linear HIRIS
4.100
Data
Data
Conjugate 60-I)imensional
4.102
4.103
and
Pool
Pool
Pool
Gradient
and
475 Test
and
375 Test
Samples
Applied
275 Test
Applied
175 Test
75 Test
I_ackprol)agati(m llll(IS
Data:
Conjugate
Conjugate 60-Dimensional
Gradient
Linear
Simulated Gradient
Linear
Simulated
Classifier ttlRIS
Data:
Classifier HIRIS
Data:
per Class) ...............
Samples
per Class) ...............
per Class) ...............
210
211
of Simulated
Samples
per
Applied
to
Training
Applied
Class) ................
Samples
..................
........................
212
215
215
to
Training
Test
209
of Simulated
Samples
Applied
208
of Simulated
in Classificat,ion
and
per Class) ...............
Samples
in Classification
and
207
of Simulated
in Classification
and
per Class) ...............
of Simulated
in Classification
Applied
Simulated
Samples
in Classification
Applied
(600 Training
575 Test
of Simulated
Conjugate Gradient Backpropagation Applied to 60-Dimensional Simulated IIIRIS Data: Test Samples
60-Dimensional 4.104
Pool
(500 Training
Opinion Data
in Classification
Pool Applied
(400 Training
Opinion Data
Training
(300 Training
Opinion Data
Applied
(200 Training
Opinion Data
Pool
(100
Opinion
Linear HIRIS
4.101
Opinion
Samples
..................
217
to Samples
........................
217
XV
IJST
OF I,'l(;[_R1,;s
Figure
l)age
2.1
Schematic
Diagram
of Statistical
3.1
Schematic
Diagram
of Neural
Network
Training
3.2
Schematic
Diagram
of Neural
Network
Used
of Image. 4.1
4.2
Procedure
...................
32 55
for Ciassiticatior_ 56
93
Class Histograms of Elevation Data in the Colorado Set ..........................................................................................................
99
of Slope
4.4
Class
of Aspect
4.5
Summary of Best on Colorado Data
4.8
.........................
Summary of Best Classitication Results for First Experiment on Colorado Data ....................................................................................
Class Histograms
4.7
Classifier
1)at, a .........................................................................................
4.3
't.6
Multisource
Histograms
Da_a Data
in the in the
Colorado Colorado
Data
Data
Set ....................
100
Set ..................
101
l)at:_
Classification Results For Second t,;xp(,riment ..................................................................................
IZa
Class liner
llistograms of Eh_vation l)ata in the A[,d(,rsort Data Set .... : .................................................................................
1,111
Class River
Histograms of Slope Data in the Anderson Data Set .......................................................................................
14 1
Class River
Histograms of Aspect l)ata in the Anderson Data Set, ......................................................................................
1.t2
xvi
Figure
Page
4.9 Summary of BestClassificationResultsfor Experiment on AndersonRiver Data............................................................................. 17.q 4.10 Classificationof Training Data (20 Dimensions)..................................182 4.11 Classificationof Test Data (20 Dimensions)........................................183 4.12 Glassiticationo1'Training Data (40 Dimensions)..................................190 4.13Classificationof Test Data (40 Dimensions)..........................................191 4.14 Classification
of Training
4.15 Classification
of Test
Data
Data
(60 Dimensions)
(60 Dimensions)
...................................
..........................................
194 195
4.16 Global Statistical Correlation Coefficient Image of HIRIS Data Set .................................................................................
197
4.17 Statistical Methods: Time versus Training
Plus Classification Size ........................................................
214
4.18 Neural Network Models: '['raining Plus Classification Time versus Training Sample Size ........................................................
216
Training Sample
xvii
ABS'I'I(A('/I'
Statistical (e.g.,
methods
Landsat
and
MSS
compared
multivariate
statistical that
the
sources.
data
methods
is that
statistical
the need
focuses
method
ot" statistical
to
methods
this
research
are
automatically
take
source
have.
should
a very and
care
[ong
of the
On the time.
investigated.
hay(,
and
for t,lm classes
iJ_
statistical
consensus
classification
on
neurnl no
to speed
F,xperimental
how traininfa up
that
the
but
Hlo>l
this.
This
problems:
:_
[(eliability
arc
suggested
and
network
models.
The
i_ an
The
results
for
theory.
prior
This
involving their
means
these
Jlletho(ts
methods.
hand,
This
ov(_rcomc
in these
is need(_,d.
Methods
is
:_ m('(:hanism
can
since
problem
other
with
which
investigated
types
1.o {.h(,ir rclial)ility
not
sources
conventional
;recording
focuses
classification
using
reliat)le.
distribution-fr(_e
statistical
are
not equally
sources
or t,he data
with
data
of (t:_ta of multiple
problem
analysis
Secondly,
data)
casT, or be ;_ssum_'d
methods
data
most
introduced
do
multisource
distribution
take
wcighte(l
t,he
networks
statistical
are
multiple
topographic
for classification
common
on statistical
from
A problem
distribution
sources be
and
models.
for weighting
investigated.
can
data
of data
data
network
Another
research
neural
radar
approaches
classification
measures
classification
a multivariate
sources
over
data,
to neural
in general
data
for
knowledge obvious
neural much
oF
:_dvantag('
networks weight
t ht,
e:tch
also data
process
is iterative
and
th(_ training
procedure
are
using
|)olh
of classification
xviii
neural network modelsand statistical methodsare given, and the approaches are comparedbasedoil theseresults.
CHAPTER 1 INTRODUCTION
1.1 The Research Problem Computerized information extraction from remotely sensed imagery has been applied successfullyover the last two decades. The data usedin l,hv processinghaw, mostly been inultispectral
Within
the
last decade advances in space
technologies
have
Earth
its environment.
and
only
spectral
radar
data
this reason retarding
made
data and
but
may
the same
accuracy
classification data.
This
be modeled nmltit, ype.
to amass
The
(iat, a are
include,
methods is due
These
classiticai, cannot
to several
by a convenient They
information
can
now
reasons.
the
data,
typically cover
mull.isource
maps, For
data. and
g¢_l_
lrlu[tivariale
multisource
eleval.ion
riot,
data.
i,formati()n
model
the
differe, rlt. sources
in processing
stat, isCical
be spectral
from
about
slope
t,h(', conv(qltional
is that
COIllt)llt('r
ground
Iilorc
satisfactorily One
multivariate
for exalnple
to exl, ract
llowever,
be used
(':died
more
and
da(,a
alia
of data
and
maps,
of
are collectively
ion.
more
as elevation
kinds
data.
amounts
forest
such
many
to use, all these in
large
for example,
be available
scene.
It is desirable higher
it possible
topographic
there
st,:diMical
(multNariate classificalion) met,hods are now widely
pa_tern recognition known.
tim
dal,a atnd
since
t.ultisourc(, data
cannot
_he (]aLa, art'
ra.ng(,s
and
cwm
2
non-numericaldata such as ground cover classesor soil types. The data are alsonot necessarilyin commonunits and thereforescalingproblemsmay arise. Another problemwith st,:d,isticalclassillcationmethodsis I,hat the dal,:_,_ninfori,mtiv(_.
al encoding
(A (::_[i{>ratod
his beli(,f_ ;_s t)robnbiliti(,_.)
45
Axiom
C:
If the
decision
calibrated Axiom it.
By
expert's A places
applying
processing
experts'
opinions.
C(X) where
A
eel(X)
the
decision what
-- k eel(X)
decision
the
expert
is also the
weights. geographic
the
axiom
Axiom
decision
and
experts
determine
of
B,
the
form
a
of the
C is equivalent
maker
must
the
also
axioms
in effect
calibrate results
the in
a
(2.46)
eel(X)
are
to the
the
is a calibration
all calibrated
and
data
concept
But
be
of
A is unsatisfactory
to
outcome
ignore
be
the
function
which
independent,
then
equal
[40] has of the
This
to his showed
estimated
demonstrates
aad
becomes
that
of the
own
the
The
decision the
sources
(the
when
pool
remote
data
are
of
are
maker
method
multisource sources
expert
The
are
opinion
case
axioms
(2.45).
that
extreme
regardless
rule
noting data
an
prior
that
a logarithmic
classification
it can be assumed
the
of
that
processing
it is al:;o worth
in the
opinion
in t lis approach.
opinions.
approach case
axiom
decides
v(,ry imt)ortant
easily
axiomatic
that
Schervish
in spirit.
can
In
maker
experts'
Bayesian
Morris'
does not
but
• • • pn(X)p(%)
constant
states).
due
calibration
functions
adopt
experts:
pl(X)
argued
makes
contradictory
truly
determined. The
rule
with
application
If the
[39] has
maker
calibrate
he should
1.
Lindley when
prior,
processing
conjunction
for multiple
empirically.
=
in
B together.
k is a normalization
is defined
on the
Sequential
rule
noninformative
as his own.
be completely
A and
multiplicative
a
a condition
can
axioms
has
prior
axiom
rule
to both
maker
the
self-
issue
of
must is not density
independent, with sensing
independent
equal and but
46
not equally reliable. Therefore, the logarithmic opinion pool with weights
2.4
is a more
Group
using
the
in
method
can
it is difficult,
other
sources neural
feedback thus
Super
for some a
probabilistic unknown llayesian expert
as ad
approach, are
all
hoe.
problems
evaluate
the. ln(:thod in Chapter
ion
This
is natural for the
the
They but
3.
and
the
for
weights Although
it is hard geographic
to
data,
the
performance,
has
some
The
neural
networks
use
I)eGroot
method
will
The
consensus
approaches
also
out
means
of
the
of the
parallels
the
maker.
on
discussed expert
witch
joint called
the
whys.
do Th('y
combined distributions the
super
:_ssunlpLioIl
Therefore,
above
weights
ill-(teiin(_d
situation,
:tpproach,
is basc]/4il°g(--),°io
identical) approach
where
oi
to
(2.47)
Bordley's
log-odds
is completely
equipped
_'i is a function
of m_,j, m0q c
48
and _. However,there is very little empirical evidenceavailable to determine the super Bayesian's choice of the jointly normal distribution of X. The dependencebetweenthe data sourceshas to be modeledand that problem is very diftlcult especially ill classitica/,ionof" mult,isource rvlltote sensing :rod geographicdata. As noted earlier, we are usually either unable or unwilling to model this dependence. Therefore, the super llayesian approach is in cases
not
applicable
2.{} Overview The
of the
consensus
characteristics.
theoretic
e.g.,
The
give
externally
because
multimodal
consensus pool
and
Section
2.3.1
will be used.
will be used The opinion statistical nor
the
the
statistical pool.
Both
nmltisourc_ of the
multisouree group
implementation
reliability
of reliability
classitier
interaction difficulty
a
rule
Bayes'
and
opinion
in Chapter classifier
is
will
pool
gives
4, the
linear
introduced in Section
in 2.3.2
experiments. of
by Bordley
above.
will be used methods.
when
version
proposed
as discussed
for these
is
several theorem
introduced
in the
has
impossibility
linear
measures
,'lassilb,c
methods
the
different
shortcomings
multisource
factors
approaches
these
experiments
of statistical
The
the
and
dictatorship
whereas
have
simple
and
ow.'rcomes
In the
above
is very
Bayesian
densities
version
for selection
pool
pool
densities.
opinion
discussed
of source-specific
opinion
here.
Approaches
approaches
it is not
consensus
discussed
Theoretic
opinion
logarithmic
unimodal
problem
linear
its application
used.
research
Consensus
The
shortcomings, limits
to the
trmst
Neither
the are
the
in experiments
logaritIimic related
super
to the Bayesian
because
of the
49
In order to apply the consensustheoretic approachesall the data sources have to be modeled by probability densities. Some data sourcescan be assumedto have Gaussian Gaussian
and
these
methods.
Such
methods
2.7
handle
are
part
problem
of non-Gaussian
following
three
a
simplest
histogram way
the
Parzen
density
estimator.
Several
other
e.g.,
nearest
neighbor
functions
and
addressed
here,
2.7.1
Histogram The
the In
simplest
traioing this
data.
method
['l,['2,...,l'N, by
th(, proportion
classifying
estimation
section.
approaches
have
series
histogram
Two
more
maximum
been
reviewed
density
estimators.
methods
discussed
efficiently.
field.
advanced
in the
should
the
is the
methods
penalized
the
In
approach
estimation
For
below
data
is to
will be addressed.
The
the
estimation,
classifier
research
of modeling
data. and
multisource
established
is discussed.
density
will be non-
density
non-Gaussian
approaches
estimation
three
next
sources
by
a statistical
is a well
main
orthogonal the
and
non-Gaussian
addressed:
modeled
in the
of designing
approach
to model
be
other
Data
data
sections
All of the
to
discussed
of modeling
Modeling
First
need
of Non-Gaussian
important
the
classes.
sources
Classification A very
data
are
likelihood
literature using
research
[45], weight
problem
be sufficient.
Approach way
to model
Here the whose
non-Gaussian
a fixed
data
space
volumes
of samples
cells
data
histogram
approach
is partitioned are
which
equal.
into
The
fMls into
is to use the
density
each
cell.
[29,45] mutually
histograms is described. disjoint
function When
of
cells
is estimated the
data
have
50
beenmodeledby the histogram approach, they call be classi[iedby, e.g., the maximum likelihood algorithm [45]. Th,' histogram approach is distribution-free and, if regular meshesare usedfor the I"s, tile selectionof cellsis straightrorw:lx_l, l low(,w'r, one mnjor disadvantageof this method is that it requirestoo much storage;for example, Nk cells for k variables with N sc,ctionsfor each variabh's. Therefore, most modificationswhich have beenprogosedare designedto reduce the number of cells.The variable cellsmethod [29] Although univariate more
the
data,
advanced
histogram
methods.
do a good job
is one
commonly
histogram
penalized
2.7.2
d is the
(ialled
the
density
kernel
with
the
K can
condition:
data.
Another da, ta
job
in terms
to use
multivariate
estimator N
K(
dimensionality
smoothing
upon
desirable
univariate
a good
more
t'arzen
modeling
of accuracy general
density
method
which
is
m_t, hod
lho
of
by
methods estimation
irnproves of
upon
maximum
Estimation
1
where
does
improved
method.
for
variant.
estimators.
Density
t'arzen
The
such
such
usually
is also
of modeling
used
likelihood
Th,'
It
approach
Parzen
approach
it can be significantly
which
the
is one
with
X -- Xi
of the
parameter. be of any
kernel
K is de,fined
) data
( 48) and
cr is the'
N is _h(' nLnrt,b,'r shape
by [29,45,46]:
(rectangular,
window
(,f t,rainiri_
triangular,
width, samples,
Gaussian,
also X i. etc.)
51
f K(X)
dX = 1
(2.49)
R,I
If the kernel
K is both
a density
function.
function
and
properties
the
estimate of the
2.7.3
The
The
likelihood
estimate
e.g.,
roughness
of
functional
R(f).
applying the
curve
tails
problem,
K is
density
Apart choice
Likelihood
from
estimation class
methods with f.
directly
of densities
the The
from
entire
estimates. detail
longsample
Also,
in the
drawback,
the
non-Gaussian
estimator
As pointed
relating
the
applied.
if
main Parzen
data.
Estimators
Thi_' particular f.
of the
for modeling
likelihood
curve
across
this
and
to data
essential
of a one-dimensional
on the
However, by
then
differentiability
studied
applied
is fixed
in the
this
desirable
of observations.
restrictions
and
widely
when
width
be lost.
penalized
likelihood
maximized.
to avoid can
for a particular
maximum
continuity
been
drawback
appearing
Penalized
sample
(2.49),
t3(X) will be a probability
the
has
window
to noise
is a very
linear
random
placing
[45].
maximum
piecewise
estimator a slight
distribution
Maximum
all
satisfies
K.
smoothed
estimator
and
that
inherit
from
leads is
non-negative
from this
also
density
it suffers
this often
density
used,
will
distributions
lobe
use
t3(X)
Parzen
However,
and
It follows
of the kernel
The
tailed
everywhere
function
method
tries
which
term
for
the
term can
a given
is to be can
quantifies described
to
without
likelihood which
the
possible
likelihood
be
a
to maximize
estimation
maximum a
computes
in [45] it is not
for density
to the
roughness
density
out
over
likelihood
[45,47]
be the
by
a
52
The penalizedlog-likelihood is now definedby: N
b(f) = E log r(x,) -vR(0 b=l
where
Y is a positive
The
probability
smoothing
density
approach
is attraetiw_
Also,
approach
fit.
the The
2.7.4
the is
improv,xt
upon
approach
has
curve accurate
in
Parzen which
can
Parzen size
methods
the
balance
for
shouht method of data all three
of
multivariate
gen¢_ralize has
the
drawback
to be classified will be used
than
than that
in the
the
To
experiments
can
slow
explore
method
maximum
penalized
should
with be more
approach. estimation
Also, method.
Parzen
this
the
differences 4.
The nmthod density
Itowever,
and
in (:haptcr
thai,
estimation
histogram d_msity
b_"
'l'h_' histogr:m_
density
histogr;ml
it is very
is large.
met, hod
likelihood
estimation.
the
histogram
this method
estM_lished
density
better
goodness-of-
the
data.
The
it combines properties
well
this
of test
data.
is a w_ry
This
estimation.
and
here,
discussed
for univariate
data
!,i5].
Methods
penaliz_d
test
l-(f)
to density
Ilowever,
since
of s;,mph_._.
effects.
accuracy
of its smoothing
classification
t,mximizing
maximm,l
is attractive
numb(,r
smoothness
undesirable
of classification
effective
})e used
between
st_raight-forward.
are most
met.hod
by
mebhods
the
N is the
estinmtion
Estimation
with
Because
curve
predefines
in common
density
estimation
most
estimation
fitting.
it relates
estimation
in terms
methods
likelihood
the
the
and
15 is found
of Density
density
approach
these
since
penalty
Discussion Of
function
controls
roughness
parameter
the
is a probh'n_ of these
if
53
CHAPTER 3 NEURAL NETWORK
APPROACHES
Neural networks for classificationof multisource data are addressedin this chapter. The chapterbeginswith a generaldiscussionof neural networks used for pattern recognition, followed by a discussionof well-known neural network models and previous work on classificationof remote sensingdata using neural neLworks. Next "fast" neural network models are addressedin conjunction with classificationof multisource remote sensingand geographic data. Finally, methods to implement statistics in neural networks are discussed.
3.1 Neural Network Methods for Pattern Recognition A neural network is an interconnectionof neurons,wherea neuroncan be describedin the following way. A neuron hasmany (continuous-valued)input signals xj, momentary input
{48].
frequency
j-
1,2,...,N,
which
frequency In the
of neural
simplest
of the neuron,
imimlses
formal o, is often
N
o =K
- 0) J
repr(_s(,_nt
1
model
the
act,ivity
delivered
tim
by another
of a neuron,
approximated
at
the output
by a function
input neuron value
or
the
to this or the
54
where
K is a constant
for positive synaptic
arguments
e1_cacies
In
the
network
idea
neural as
where
followed
neuron
number.
x the
process
the
with
compared output
and
the The
i.e.,
the
when change
actual
less
than
to perform
the
representation
three-layer Data models.
neural
do not
representation A straightforward
the
sample.
classifier is very coding
from
and
through
for each is shown
important approach
weights network
data
are
provides
A schematic
in Figure
if it is
and
for a
of its
class
adaptive to
network
response
is
the
desired
in the
neural
has
to the
network
stabilized,
next fed
iteration into
at the
the
output
diagram
of a
3.2.
in application used
o i ----- 1, if
The
between
the
pixel.
3.1).
error
the
i (i
is presented
output
the
x
A general
an
actual
one iteration
the
neurons
vectors,
samples
The
Then
neural
vectors
representation
The
when
input
are either
weights
to modify
the
classes).
as binary
input.
amount.
of a number
1
wj are called
output
(see Figure
is ended
tile classification,
value
or o i ---- 0 (or -1)
a set of training
is used
a threshold
network
learn
change
x,
a binary
for the
procedure
weights
network class
output
training
vector are coded
to each
response
its
of
the outputs
representation
response
to the desired
network.
or
particular
o i from
is that
in which
a set
of information
give
to
The
the
recognition
receives
input
outputs
takes
arguments.
pattern
number
values
is then
some
to
theory
signal
procedure
an output
the
which
0 is a threshold.
which
current
vector
training
will give
for the
function
responses
network
the
(iterative) input
on
means
The
box
produces
in neural
input
and
approach
a black
L depends
This
specific
0 (or -1) for negative
and
i is active
inactlve.
and
network
signals)
_---1,...,L
¢ is a nonlinear
[48] or weights,
operates
(observed
and
by most
of neural researchers
network is to
55
O
m
ro
"OI--
_e..,
G) .0.-,
,
O
20
o 575
475 Test
Figure 4.11
375
275
175
75
Samples/Class
Classification
of Test
Data (20 Dimensions)
184
The as the
CGLC
CGBP
(Tables
trained
with
The
CGLC
did rather
was
perfect
but
data,
with
it showed
consuming
The
classify
the
data
4.4.2
40-Dimensional
the
(M],
4.81
(CGLC
required the
data
classification
shown
4.78
(ML
and
4.75
result
algorithm
was,
as expected,
times
when
longer
CGLC
more
way
neurons).
For
the
is not
Thus
to
the
it
test
as time
neurons
time
in the
train
and
CGLC
is a
in this experiment.
and
of the
in training
4.75
(MD training),
4.70
and
(CGBP
test).
4.13
(test).
The
4.76
(test))
The
results
20-dimensional
were
was data.
all other
(Tables
4.77
classification
instead than
4.74.
added
classification 4.76 4.80 are
to
of the
(MD
test),
(CGBP
test),
also summarized of
very
the
similar
MD
to
Classification
were
than
used
were
performance
20 dimensions
faster
for
training),
(CGLC
in Table
20 features
in Tables
ML method
40 dimensions
when results
of 2 when much
as shown
The
and
the
separable,
4.4.1.
test),
4.82
using
a factor
increased
same
per class
worse.
hidden
experiment.
data
samples
The of the
in the
(240 input
did
7 times
increased
(training)
about
accuracy
in this
in Section
increased
The
1.5 to
JM distance
(training)
(Table
data
CGBP.
are relatively
are
training)
algorithm
from
was
100 training
(because
20-dimensional
data
4.12
CGBP
input
it always
to the
CGLC
data
average
training),
Figures
similar
size
(test))
Data
20-dimensional
4.77
sample
as the
for the
40-dimensional
in
With
40-dimensional
Predictably the
well in training.
than
alternative
4.73
binary
increased
CGBP
and
Gray-coded
training
better
The
(training)
performance
during
CGBP).
4.72
added,
the time
but
the
MD
methods.
(training)
of 20, but for the
and
4.78
it took
20-dimensional
(test))
about
3.9 data.
185
Table Pairwise
JM Distances Simulated
Class
_
1
4.74 for the 40-Dimensional HIRIS Data.
2
3
1.41189
1.40192
2
Average:
I 1.32275
1.378855
186
Table Minimum 40-Dimensional
Euclidean Simulated
# of Training Samples 100 200 300 400 500 600
CPU Time 4 4 4 4 4 4
Distance HIRIS
Percent 1
# of Training Samples 100 200 300 400 500 600
Classifier Applied to Data: Training Samples.
Agreement 2
84.0 84.0 85.0 84.0 85.4 84.8
with Reference for Class 3 II OA
47.0 49.0 50.0 54.8 51.8 50.3
Table Minimum 40-Dimensional
4.75
56.0 55.0 59.3 60.3 61.8 59.8
4.76
Euclidean Distance Classifier Simulated HIRIS Data:
Percent 1 83.1 82.9 83.5 80.4 73.7 80.0
62.33 62.67 64.78 66.33 66.33 65.00
Asreement 2 48.9 48.2 51.2 44.4 46.3 58.7
Applied to Test Samples.
with Reference for Class 3 [I OA 58.8 59.6 58.1 69.8 68.6 49.3
63.59 63.58 64.27 64.85 62.86 62.67
187
Table Maximum to
Likelihood
40-Dimensional
#
Method Simulated
of Training
Samples
4.77 for
Gaussian
HIRIS
CPU
Percent
Time
1
Data:
Agreement
Data
Applied
Training
with
2
Samples.
Reference
for Class
[[ OA
3
100.00
100
61
100.0
100.0
100.0
200
61
100.0
100.0
99.0
99.67
300
62
100.0
97.3
97.3
98.22
400
62
100.0
97.8
97.8
98.50
500
67
97.6
96.8
98.07
600
75
97.2
96.5
97.89
99.8 100.0
Table Maximum to
Likelihood
40-Dimensional
Method Simulated
4.78 for
Gaussian
HIRIS
Agreement
Data:
with
Data Test
Reference
Applied Samples.
# of Training
Percent
Samples 100
1 90.8
2• 50.3
3 I[ OA 77.0 72.70
for Class
200
92.6
55.8
73.5
73.96
300
98.1
92.5
93.1
94.58
400
97.8
91.6
92.7
94.06
500
97.7
92.0
92.6
94.10
600
94.7
93.3
93.3
93.78
NI
188
Table Conjugate to
Gradient
40-Dimensional
of
CPU
iterations
100
Backpropagation
Simulated
Number
Sample size
4.79
HIRIS
Percent
Agreement
1
time
Applied
Data:
Training
with
Samples.
Reference
2
for Class
fl OA
3
64
485
I00.0
I00.0
i00.0
I00.00
150
1548
I00.0
100.0
100.0
100.00
30O
374
4889
I00.0
i00.0
100.0
100.00
40O
274
5225
I00.0
I00.0
100.0
i00.00
5OO 200
264
6386
100.0
I00.0
100.0
100.00
600
524
14899
I00.0
100.0
i00.0
I00.00
Table Conjugate to
Gradient
40-Dimensional
Sample size
Number iterations
of
4.80
Backpropagation
Applied
Simulated
HIRIS
Data:
Percent 1
Agreement 2
with
Test
Samples.
Reference for Class 3 H OA II
100
64
87.1
57.6
54.6
66.43
200
150
83.8
56.4
48.2
62.81
300
374
85.6
61.6
60.5
69.24
4OO
274
86.2
57.1
95.6
67.52
5OO
264
83.4
54.9
57.7
65.33
6OO
524
85.3
68.0
60.0
71.11
189
Table Conjugate
Gradient
40-Dimensional
Sample size
Linear
Simulated
Classifier
HIRIS
CPU
Number of iterations
4.81
time
Data:
Applied
to
Training
Percent 1
Agreement 2
with
Samples.
Reference 3
for Class OA
100
146
194
100.0
100
100.0
100.00
200
469
898
100.0
100
100.0
100.00
300
903
2461
99.7
99
99.7
99.56
400
650
2293
100.0
98
97.0
98.33
500
629
2657
100.0
89
89.4
92.87
600
492
2492
100.0
87
85.3
90.83
Table Conjugate
Gradient
40-Dimensional
Sample size
Linear
Simulated
Number iterations
of
4.82
Percent 1
Classifier
HIRIS
Agreement 2
Data:
with
Applied Test
to Samples.
Reference 3
for Class OA
100
146
85.7
58.1
48.7
64.17
200
469
82.5
50.7
48.2
60.49
300
903
81,1
61.9
53.3
65.42
400
650
86.9
63.6
56.7
69.09
500
629
88.6
60.6
54.9
68.00
600
492
90,7
64.0
50.7
68.44
190
100
v
U L.
MD ML
U
CGBP
m m
[] CGLC
m >
0
2o
lOO
200
300
Training
Figure
4.12
Classification
400
500
600
Samples/Class
of Training
Data
(40 Dimensions)
191
IO0
8O A
V
0 &,.
6O
[] [] [] []
0 0
L-
4O
>
0
2O
575
475 Test
Figure
4.13
375
275
175
75
Samples/Class
Classification
of Test
Data (40 Dimensions)
MD ML CGBP CGLC
192
The
classification
sizes.
As
accuracy
for
of training
data
the
20-dimensional
data
significantly
when
300 or more
training
overall
performance
the
dimensional
data
The trained
CGBP with
of
neural
network
size.
The
data
up
times
of the The
dramatic
The
CGLC an
accuracy
classification CGLC
compared times
were
faster
of
than
longer net
not
(480
used
input
all sample
data
improved
were
used.
good
for
the
for
the
of 20.
in just
greatly
Thus, the
40-
As in the
data
to 5 times
was longer
However, and
4.81
gave
similar to it was
similar
classification
time
data.
4.82
data
sample
40
cases
size.
results.
40 data The case.
dimensions more
(test))
when
20-dimensional for
600
accuracy
20-dimensional
increased
in most
of CPU
and
training
converge
For
per class.
(training)
the
with
40-dimensional data.
samples
to
trained
40-dimensional
case of the
with
was
grew
classification
the
of
again
4 hours
the
(test))
time
of the
over
for
accuracy
decreased
of test
It was
training
However,
of
4.80
20-dimensional
(Tables
terms
data
CGBP
the
and
classification
converged
neurons)
instead
again
was for 600 training
in
training
up
than
improved
improvement
to 20 dimensions. than
very
neurons.
and
the ML method).
was
accuracy took
training
neural
improvement
dimensions
The
the
samples
most
showed
the
to 3 times
longer
test
was
(training)
and
sample
(200
class
4.79
size
increasing
class
(Tables
15 hidden
sample
per
for
of test
per
method
and
every
samples
samples
ML
perfect
accuracy
neurons
for
took
the
nearly
set.
480 input
to perfection
the
was
than
as two
193
4.4.3
60-Dimens|onal The
results
Figures the
MD
instead. more
matrices
bands.
(The
that
the
other
40 data
Gaussian
distribution
the
sources
data
The
information
classes source
in source
the
from Twenty
data
was
classes _2
had
1.35
#m were
The
information solJrces.
data
a higher
4.85
sources average
used
the
the
it was
-if-l) and
distance
spectral
spectral 1.97
from region
/_m
the from
=//:2 consisted
modeled
JM distance
or
determined
spectral
were
two
indicates
#m and
=ffl. Source
relatively
JM
were
uncorrelated
classes
were
since
into
are
4.16,
in the
(source
data
brightness
1.81
was
source
The
LOP
correlated
sets.
data.
be split
the
at Figure
channels
in Tables
in
to
a very data
between
to 1.47 #m and
as data
data
the
had
4.16;
treated
channels.
shown
pm
#m
data
more
By looking 0.7
in both
are
1.35
showed
60-dimensional and
iY_ data
20-dimensional
correlations
the
bands.)
which
the
The
(test))
high-dimensional
SMC
in Figure
from
4.84
to the
algorithm,
:tre summarized
of 60-dimensional
of the
The
tone,
data
other
as shown
lighter
bands.
0.7 #m to 1.35 #m, other
singular.
sources.
region
of the
be applied
data
regions
and
in classification
SMC
absorption
spectral
(training)
to use the
black
spectral
of the
were
The
are the water
In classification
not
can be visualized
correlation.
(test).
than
could
independent
channels
60-dimensional
4.83
slower
ML method
In order
of the
to classification
3 times
covariance
4.15
(Tables
performance
The
by the
separabilities 4.86
(source
separable than
the
of _2).
but
the
classes
_1. The
sizes
and
algorithm
It was about
the
of classification
4.14 (training)
similar
the
Data
and
results various
of the
SMC
source-specific
classifications weights
with are
respect
shown
to different
in Tables
4.87
sample through
in
194
100
• m el [] []
20
100
200
300
Training
Figure 4.14
Classification
400
500
60O
Samples/Class
of Training Data (60 Dimensions)
MD SMC LOP CGBP CGLC
195
100
80
D
• B
MD SMC
0
[]
LOP
[] []
CGBP CGLC
60
L_
4O
0 20
575
475 Test
Figure 4.15
375
275
175
75
Samples/Class
Classification
of Test Data (60 Dimensions)
196
Table Minimum 60-Dimensional
Euclidean Simulated
# of Training
CPU
100 200 300 400 500 600
5 6 B ti ti 6
4.83
Distance HIRIS
I Percent 1
Classifier Applied to Data: Training Samples.
Agreement 2
i 83.0 184.0 J 85.3 I 83.8 185.2 I 85.2
with Reference for Class 3 OA
50.0 51.0 50.7 55.8 53.0 51.5
Table
57.0 55.0 59.0 60.3 61.4 59.8
4.84
Minimum Euclidean Distance Classifier 60-Dimensional Simulated HIRIS Data:
# of Training
Percent 1
lOO 20o 30o 40o 50o 60o
83.5 83.6 84.0 81.1 74.3 80.0
Agreement 2 50.1 49.5 53.9 45.8 48.0 64.0
63.33 63.33 65.00 66.58 66.53 65.50
Applied to Test Samples.
with Reference for Class 3 OA 59.0 59.8 58.4 70.5 69.1 48.0
64.17 64.28 65.42 65.82 63.81 64.00
197
0.4 (p.m)
0.76
0.92
1.15
1.35
1.47
2.4
Figure
4.16
Global
Statistical
Correlation
Coefficient
Image
of HIRIS
Data Set
ORIGINAL OF POOR
PAGE IS QUALITY
198
Table l'airwise
JM Distances
Class _ 1 2 Averaee:
Source
_1.
1.24447 0.96362
1.173336
JM Distances Class _ 1 2 Averaee:
for Data
2 1.31192
Table Pairwise
4.85
4.86 for Data
2 1.40908 1.380562
Source
1.39607 1.33653
=_2.
199
4.92. Ranking of the sources according to the source-specificreliability measures based on the classification accuracy of training data, the equivocation measure(Table 4.93) and JM distance separability (Table 4.94) agreedin all cases,regardlessof sample size. Tile reliability measuresalways estimatedsource#2 as more reliable than source#1. Using thesereliability measures to weight the data sources in combination gave the highest accuraciesof training data for sample sizesup to 300 training samples per class(Tables4.87, 4.88 the
best
accuracies
200 samples got
the
unexpected 100 and 4.89) 1.0
and
and
suggest
highest
test
source
or more cases
by
as shown
4.92)
and
several The
results
was
samples given
were
several
using
the
were
used
for each
weight
SMC results.
Both
of the
60-dimensional
of test
samples
increased
algorithms
were
(Tables
tim SMC
data
set. with
very
and
For the
fast,
4.95 LOP
both
nulnber with
the
the
#1
methods
the
of training
a slight
test
edge
(Table by
performance
(Tables #1.
4.90,
4.91
Using
400
In most
accuracies.
4.100)
excellent
only
excellent
was sufficient.
were
#1
These
used
gave
best
highest
through
source
was weighted
source
data
100 and
with
were
class
than
achieve
0.1 or {}.2.
weights
gave
achieve
when
class
other
SMC
could
LOP
per
The
more
for
undertrained
source
for the high-dimensional
combinations
reached
were
when
did not
significant
by either
300 samples
However,
weights
were
sources
reached
4.89.
same
weighted
data
When
# 2 was
to the
these
"best"
was
0.9.
samples
weight
results
class.
training
differences
the
in Table
source
training
that
the
The
_2
accuracy
#2
400 or more
the
source
per
ttowever,
data.
where
1.0 and
results
accuracies when
class,
200 samples
the
4.89).
for test
per
weight
and
were
very
similar
in the classification
classification samples
to the
accuracy
used. LOP
Both
which
of uses
200
Table Statistical HIRIS
Data
Multisource
4.87
Classification
(100 Training
and
Percent Agreement
of Simulated
575 Test
Samples
with Reference for Class
Training 1
per Class).
Testing
2
3
lJ
OA
Single
1
2
3
[I OA
Sources l
Source
#1
99.0
93.0
96.0
96.00
[ 85.9
67.3
57.7
70.32
Source
#2
100.0
100.0
100.0
100.00
[ 83.3
47.8
82.1
71.07
sl
s2
1.
1.
Multiple
Sources
100.0
100.0
100.0
100.00
89.6
51.0
83.1
74.55
1..9
100.0
100.0
100.0
100.00
89.7
50.8
83.3
74.61
1..8
100.0
100.0
100.0
100.00
96.6
51.3
83.3
75.07
1..7
100.0
100.0
100.0
100.00
91.1
51.3
83.5
75.30
1..6
100.0
100.0
100.0
100.00
90.6
52.3
83.7
75.53
1..5
100.0
100.0
100.0
100.00
91.1
52.9
83.0
75.48
1..4
100.0
100.0
100.0
i00.00
91.1
54.6
83.1
76.29
1..3
100.0
100.0
100.0
100.00
91.3
55.3
83.5
76.70
1..2
100.0
100.0
100.0
100.00
90.1
64.2
77.6
77.28
1..1
100.0
100.0
100.0
100.00
90.1
84.2
77.6
77.28
96.0
96.00
85.9
67.3
57.7
70.32
1..0
99.0
93.0
.9
1.
100.0
100.0
100.0
100.00
89.6
50.4
82.8
74.26
.8
1.
100.0
100.0
100.0
100.00
89.6
50.3
82.6
74.14
.7
I.
100.0
100.0
100.0
100.00
88.7
50.1
82.8
73.86
.6
1.
100.0
100.0
100.0
100.00
88.3
49.9
83.0
73.74
.5
1.
100.0
100.0
100.0
100.00
87.8
49.7
83.0
73.51
.4
I.
100.0
100.0
100.0
100.00
87.3
49.4
82.8
73.16
.3
I.
100.0
100.0
100.0
100.00
87.0
48.7
82.6
72.77
.2
1.
100.0
100.0
100.0
100.00
86.1
48.3
81.9
72.12
.i
I.
100.0
100.0
100.0
100.00
84.5
48.0
82.1
71.54
.0
I.
100.0
100.0
100.0
100.00
83.3
47.8
82.1
71.07
100
100
100
300
575
575
575
1725
# of pixels
The
columns labeled sl and s2 indicate the weights applied to sources 1 (sl) and 2 (s2).
CPU
time for training
and
classification:
81 sec.
201
Table
Statistical HIRIS
Data
4.88
Multisource (200
Classification
Training
Percent
and
A_eement
with
Training
1
475
of
Test
Reference
2
3
[t_qA
97.5
91.0
92.0
II 93.50
100.0
100.0
99.5
_
1.1.
100.0
100.0
99.5
1..9
100.0
100.0
99.5
1..8
100.0
100.0
1..7
100.0
100.0
1..6
100.0
1..5
100.0
1..4
I
Simulated
Samples
per
Class).
for Class
2 Testing 3
1
_le_Sou_rces__ 70.3
57.3
71.93
53.1
80.4
73.82
89.1
55.8
82.3
89.1
56.0
82.3
75.75 75.79
99.5
89.5
55.8
82.5
99.5
89.7
56.0
82.5
100.0
99.5
89.9
58.1
82.5
100.0
99.5
90.5
60.0
83.4
100.0
99.5
99.0
90.3
62.3
84.0
1..3
100.0
99.0
99.0
90.7
63.6
84.4
78.88 77.96 79.58
I..2
100.0
97.5
99.0
90.7
71.2
76.4
79.37
1..1
100.0
96.0
98.5
90.5
71.2
76.4
1..0
97.5
91.0
92.0
88.2
70.3
57.3
79.37 71.93
source
#1
]t 88.2 II
sl s2
88.0
Multi')le
Sources
75.93 76.07 76.84
.9 1.
100.0
100.0
99.5
89.5
55.4
82.3
75.72
.8 1.
100.0
100.0
99.5
89.3
55.4
82.5
75.72
.7 1.
100.0
100.0
99.5
89.1
55.4
82.3
75.79
.6 1.
100.0
100.0
99.5
88.9
54.9
82.3
.5 1.
100.0
100.0
99.5
88.6
54.5
82.1
75.37 75.09
.4 1.
100.0
100.0
99.5
88.2
54.3
81.7
74.74
.3 1.
100.0
100.0
99.5
88.2
53.5
81.7
74.46
.2 1.
100.0
100.0
99.5
88.0
53.1
81.3
74.10
.1 1.
100.0
100.0
99.5
87.8
52.8
80.8
73.82
.0 1.
100.0
100.0
99.5
88.0
53.1
80.4
73.82
200
200
200
475
475
475
_#
of pixels The
columns
labeled
applied CPU
time
sl
and
to sources
for training
s2
indicate
1 (81) and and
the
weights
2 (82).
classification:
85 sec.
I
I I I
I t
202
Table 4.89 Statistical Multisource Classificationof Simulated HIRIS Data (300Training and 375Test Samplesper Class).
Percent
Agreement Training 2 3
1
with
Reference
for Class Testing
[[
OA
1
Single
Sources
I 99.11 90.78
90.7 96.0
Multiple
Sources
2
3
tl OA
82.4 94.4
79.5 94.1
[I 94.84 84.18
source #1 source #2 sl s2
96.3
86.7
89.3
100.0
99.0
98.3
1.
100.0
99.0
98.7
99.22
96.5
96.0
95.2
95.91
1..9
100.0
98.7
99.3
99.33
97.1
96.3
95.2
96.18
1..8
100.0
98.0
99.3
99.11
96.0
96.3
95.2
95.82
1..7
99.7
98.0
99.0
98.89
94.7
95.7
93.8
94.76
1..6
98.7
95.7
97.7
97.33
94.1
93.3
91.7
93.07
1..5
98.3
95.0
96.7
96.67
93.3
92.3
89.1
91.56
1..4
97.3
94.0
95.0
95.44
92.8
90.1
87.7
90.22
1..3
97.0
91.3
93.3
93.89
92.3
87.5
86.4
88.71
1..2
96.7
90.7
92.3
93.22
91.7
86.1
83.7
87.20
1..1
96.3
89.0
90.7
92.00
91.2
84.3
81.9
85.78
1..0
96.3
87.0
88.7
90.67
90.7
82.4
79.5
84.18
.9 1.
100.0
99.0
98.7
99.22
96.8
95.7
94.9
95.82
.8 1.
100.0
99.0
98.7
99.22
96.5
95.5
94.9
95.64
.7 1.
100.0
99.3
98.7
99.33
96.5
95.5
94.7
95.56
.6 1.
100.0
99.3
98.7
99.33
96.5
95.5
94.4
95.47
.51.
100.0
99.3
98.7
99.33
96.3
94.9
94.4
95.29
.4 1.
100.0
99.3
98.7
99.33
96.3
94.9
94.4
95.20
.3 1.
100.0
99.3
98.7
99.33
96.3
94.9
94.4
95.20
.2 1.
100.0
99.0
98.7
99.22
96.0
94.4
94.4
94.93
.1 1.
100.0
99.0
98.7
99.22
96.0
94.7
94.4
95.02
.0 1.
100.0
99.0
98.3
99.11
96.0
94.4
94.1
94.84
of pixels
300
300
300
900
375
375
375
1125
The
columns
1.
labeled
applied
CPU
time
to
for
sl
and
sources
training
s2 1 (sl)
and
indicate
the
and
2 (s2).
classification:
weights
87
see.
203
Table Statistical HIRIS
Data
Multisource
Classification
(400 Training
Percent
4.90
and
Agreement
with
of Simulated
275 Test
Samples
Reference
for Class
Training 1
2
per
Class).
Testing 3
II
OA
1
2
3
"____OA _
82.2 94.2
71.3 96.7
79.03 95.03
Si._l_Sou_ce_ source
#I
96.8
source
#2
100.0
87.3 99.0
87.3 98.0
1. 1.
99.8
99.3
98.8
99.25
94.2
94.5
96.4
95.03
1..9
99.8
99.0
99.0
99.25
93.8
94.5
95.6
94.67
1..8
99.8
99.0
99.0
99.25
93.5
94.9
95.6
94.67
1..7
99.8
99.0
99.0
99.25
92.7
94.5
95.6
94.30
1..6
99.3
98.5
99.0
98.92
90.9
94.9
96.0
93.94
1..5
99.0
98.3
99.0
98.75
90.2
93.8
96.0
93.33
1..4
98.5
97.8
98.8
98.33
88.4
92.4
95.6
92.12
1..3
98.5 98.5
97.5 95.5
97.17 95.67
85.5 85.5
91.6 88.7
89.8 88.4
89.70 I
1..2
95.5 93.0
1..1
97.3
90.5
92.5
93.42
84.4
85.5
82.2
84.00
1..0
96.8
87.5
87.3
90.42
83.6
82.2
71.3
79.03
.9 1.
99.8
99.3
98.8
99.25
94.5
94.9
96.4
95.27
.8 1.
99.8
99.3
99.0
99.33
94.5
95.3
96.4
95.39
.7 1.
99.8
99.3
99.0
99.33
94.5
94.9
96.4
95.27
.6 1.
100.0
99.3
99.0
99.42
94.5
94.5
96.0
95.03
.5 1.
100.0
99.3
98.8
99.33
94.5
94.5
96.0
95.03
.4 1.
100.0
99.3
98.8
99.33
94.5
94.5
96.0
95.03
.3 1.
100.0
99.3
98.8
99.33
95.3
94.5
96.4
95.39
.2 1.
100.0
99.3
98.5
99.25
94.9
94.5
97.1
95.27]
.1 1.
100.0
99.0
98.3
99.08
94.2
94.5
97.1
95.27
.0 I.
100.0
99.0
98.0
99.00
94.2
94.2
96.7
95.03[
400
400
1200
275
275
275
825
sl
#
s2
I-90.42 99.00 Multiple
of pixels
The
400
_8_,6[l 94.2 Sources
_____
columns labeled sl and s2 indicate applied to sources 1 (sl) and 2
CPU
time
for training
and
classification:
90 sec.
87.52
I
i
I
204
Table
Statistical HIRIS
Data
Multisource (500
Classification
Training
Percent
4.91
and
Agreement
175
with
of
Test
Simulated
Samples
2
Testing 3
II
OA
1
Single source
Class).
for Class
Reference
Training 1
per
2
3
JJ OA
94.3
98.9
J 76.95 95.81
Sources
#1
source _2
99.8
98.4
96.8
98.33
94.3
99.6
98.8
98.2
Multiple
Sources
98.87
1..9
99.6
98.8
92.6
95.4
97.7
95.23
98.4
98.93
93.1
95.4
97.7
1..8
99.6
95.43
98.6
98.2
98.80
93.1
94.9
97.7
1..7
95.24
99.2
98.8
98.4
98.80
93.1
96.0
97.7
95.62
1..6
99.0
98.6
98.4
98.67
90.3
96.0
97.7
94.67
1..5
98.4
98.0
98.4
98.27
88.0
95.4
97.7
93.71
1..4
98.0
96.8
97.8
97.53
87.4
93.7
97.1
92.76
1..3
97.8
95.0
96.4
96.40
85.7
92.6
91.4
89.90
1..2
97.6
92.4
94.0
94.67
84.6
88.6
89.7
87.62
1..1
96.6
90.6
90.4
92.53
82.9
83.4
83.4
83.24
1..0
96.2
88.0
84.8
89.53
78.9
78.3
73.7
76.95
.9 1.
99.6
98.8
98.0
98.80
92.6
95.4
97.7
95.23
.8 1.
99,6
98,8
98,0
98,80
93,1
94,9
97,7
95.24
.7 1.
99.6
98.8
97.8
98.67
94.3
94.9
97.7
95.62
.6 1.
99.6
98.6
97.8
98.67
94.9
94.9
98.9
96.19
.5 1.
99.6
98.6
97.8
98.67
94.3
94.9
99.4
96.19
.4 1.
99.6
98.6
97.8
98.67
94.3
94.9
99.4
96.19
.3 1.
99.6
98.6
97.8
98.67
94.3
94.9
99.4
96.19
.2 1.
99.6
98.6
97.4
98.53
93.7
94.3
99.4
95.81
.1 1.
99.8
98.4
97.2
98.47
94.3
94.3
99.4
96.0O
.0 1.
99.8
98.4
96.8
98.33
94.3
94.3
98.9
95.81
500
500
500
1500
175
175
175
525
sl
s2
1.
1-
of pixels The
columns
labeled
applied
CPU
time
to
for
sl
and
sources
training
s2 1 (sl)
and
indicate and
classification:
the
weights
2 (s2).
90
sec.
205
Table
Statistical HIRIS
Multisource
Data
(600
Percent
1
4.92
Classification
Training
and
75
Agreement
with
2Training :l
[[ OA
#1
source
#2
sl
95.2 i00.0
86.5
85.2
98.3
96.8
s2
!98.39 88.94
I
Simulated
Samples
Reference
Sin$1e source
of
Test
per
Class).
for Class
1
Testing
t
Sources I 72.0 92.0
r ....
78.7
73.3
97.3
97.3
74.67 95.56 J
Mu!tiple
Sources
98.78
92.0
100.0
98.7
96.89
92.0
100.0
98.7
i96.89
99.5
98.5
98.3
1..8
99.5 99.5
98.3 98.5
98.5 98.5
92.0
100.0
98.7
1..7
99.3
98.7
98.3
98.78
92.0
97.3
98.7
1..6
99.3
98.5
98.2
98.39
89.3
97.3
98.7
1..5
99.0
98.0
08.2
98.39
85.3
97.3
98.7
1..4
98.2
96.7
97.8
97.56
82.7
93.3
96.0
1..3
97.5
95.3
96.2
96.33
80.0
92.0
93.3
1..2
97.2
92.7
94.0
94.61
76.0
88.0
96.7
1..1
96.3
90.3
91.3
92.67
73.3
85.3
88.0
82.22]
1..0
95.2
86.5
85.2
88.94
72.0
80.0
78.7
74.674
.91.
99.5
98.3
98.3
98.72
92.0
100.0
98.7
96.89J
.8 1.
99.7
98.3
98.3
98,78
92.0
100.0
98.7
.7 1.
99.7
98.3
98.2
98.72
92.0
100.0
98.7
96.89 96.89
.6 1.
99.8
98.3
98.0
98.72
92.0
100.0
98.7
96.891
.5 1.
99.8
98.3
98.0
98.72
92.0
i00.0
98.7
.4 1.
99.8
98.3
98.0
98,72
92.0
100.0
98.7
.3 I.
99.8
08.3
97.7
08.61
90.7
I00.0
98.7
96.89 96.89 96.44
.2 1.
99.8
98.3
97.5
98.56
90.7
98.7
98.7
96.00[
.1 1.
100.0
98.3
97.0
98.44
92.0
98.7
97.3
.0 1.
100.0
98.3
96.8
98.39
92.0
97.3
97.3
600
600
600
1800
75
75
75
1.1.
1..9
#
of pixels
'
The
98.78 98.83
columns labeled sl and s2 indicate the weights applied to sources 1 (sl) and 2 (s2).
CPU
time
for training
and
classification:
90 see.
96.891 i 96"00 95.111
I
93.78
I
_90.67 88.44 84.89_
I [
96.00 95.56 225
I
206
Table
4.93
Source-Specific Equvivocations for Simulated HIRIS Data Versus Number of Training Samples.
Eouivocation Training Samples 100 200 3OO 400 5OO 6O0
1Sourlce _: 2 0.2581 0.0000 0.3325 0.0105 0.4057 0,0637 0.3958 0.0686 0.4154 0.0947 0.4356 0.0981
Table
4.94
Source-Specific JM Distances for Simulated HIRIS Data Versus Number of Training Samples.
JM Distancc Training Samvles 100 200 300 4OO 500 600
Sollr
i 1.313423 1.273153 1.208442 1.234641 1.214116 1.195292
ce_ 2 1.413598 1.4107_0 1.392419 1.389006 1.386011 1.383060
207
Table
Linear HIRIS
Opinion
Pool
Data
(100
Applied
Training
Percent
4.95
in and
Agreement
Classification 575
with
Test
of
Simulated
Samples
Reference
per
Class).
for Class Testing
Training
1
2
99.0
93.0
96.0
100.0
100.0
100.0
1. 1.
100.0
100.0
100.0
100.00
1..9
100.0
100.0
100.0
100.00
1..8
100.0
100.0
99.0
1..7
100.0
98.0
1..6
100.0
1..5
99.0
1..4
l
2
3
67.3
57.7
70.32
47.8
82.1
71.07
89.6
5O.6
82.3
91.5
61.0
79.8
99.67
90.8
65.9
73.0
76.58
99.0
99.00
90.4
67.5
71.0
76.29]
97.0
99.0
98.67
89.4
67.3
69.2
75.30
97.0
99.0
98.33
88.2
67.8
66.6
74.20
99.0
97.0
99.0
98.33
88.2
67.8
66.6
74.20
1..3
99.0
96.0
99.0
98.00
87.5
67.1
62.2
72.29
1..2
99,0
95.0
97.0
97.00
87.0
67.0
60.3
71.42
1..1
99.0
94.0
96.0
96.33
86.3
66.6
59.0
70.61
1..0
99.0
93.0
96.0
96.00
85.9
67.3
57.7
70.32
.9 1.
100.0
100.0
100.0
100.00
85.7
50.3
82.1
72.70
.8 1.
100.0
100.0
100.0
100.00
85.4
49.9
81.9
72.41
.7 1.
100.0
100.0
100.0
100.00
84.7
49.7
82.1
72.17
49.6
81.9
72.12
3
H
OA
Single source source
#1 #2
sl s2
1
Sources
96.00
I! 85.9
!00.00 li 83.3 Multiple
Sources
77.45
.6 1.
100.0
100.0
100,0
100.00
84.9
.5 1.
100.0
100.0
100.0
100.00
84.3
49.4
81.7
71.83
.4 i.
I00.0
100.0
100.0
100.00
84.0
48.9
81.9
71.59
.3 1.
100.0
100.0
100.0
100.00
84.0
48.3
81.9
71.42
.2 1.
100.0
100.0
100.0
100.00
83.8
48.0
81.9
71.25
.I I.
i00.0
IO0.O
100.0
100.00
83.8
47.8
82.1
.0 1.
100.0
100.0
100.0
100.00
83.3
47.8
82.1
71:97
_##of i__
100
100
100
3O0
575
575
575
1725
The
columns
labeled
applied
CPU
time
to
for
sl sources
training
and
s2 1 (sl)
and
indicate and
classification:
the 2
weights
(s2). 81
sec.
71.25
208
Table
Linear
Opinion
HIRIS
Data
(200
Pool
Applied
Training
Percent
4.96
in and
A_reement
Classification
475
with
Test
of
Samples
Simulated per
Class).
for Class
Reference
Training
Testing OA
1
Single
Sources
1
2
3
]]
2
3
97.5
91.0
92.0
[ 93.50
I00.0
I00.0
99.5
I 99.83 Multiple
Sources
I00.0
I00.0
99.5
99.83
1..9
I00.0
i00.0
99.5
99.83
1..8
I00.0
99.5
98.5
1..7
I00.0
98.5
1..6
I00.0
1..5
I00.0
]]OA
88.2
70.3
57.3
I 71.93
88.0
53.1
80.4
t 73.82
89.3
54.5
81.9
75.09
92.0
55.2
80.6
75.93
99.33
90.7
72.0
72.4
78.67
98.5
99.00
90.7
72.0
72.4
78.39
98.5
98.0
98.83
90.5
72.4
71.2
78.04
97.0
97.0
98.00
90.1
71.8
67.6
76.49
i
source
_1
source
#2
sl
s2
1.
1.
1..4
98.5
97.0
97.0
97.50
89.5
71.2
65.5
75.37
1..3
98.0
96.0
96.0
96.67
89.3
71.4
63.2
74.60
1..2
98.0
94.5
95.0
95.83
88.8
71.6
60.4
73.75
1..1
97.5
92.5
92.5
94.17
88.8
7O.3
59.2
73.19
1..0
97.5
91.0
92.0
93.50
88.2
70.3
57.3
71.93
.9 1.
i00.0
I00.0
99.5
99.83
88.8
54.5
81.9
75.09
.8 1.
I00.0
I00.0
99.5
99.83
88.8
54.3
82.1
75.09
.7 1.
I00.0
i00.0
99.5
99.83
88.2
54.1
81.9
74.74
.6 I.
i00.0
I00.0
99.5
99.83
88.2
53.9
81.9
74.67
.5 1.
100.0
100.0
99.5
99.83
88.0
53.9
81.5
74.46
.4 1.
100.0
100.0
99.5
99.83
88.0
53.9
81.3
74.18
.3 1.
100.0
100.0
99.5
99.83
88.0
53.1
80.8
73.96
.2 1.
100.0
100.0
99.5
99.83
88.0
52.8
81.1
73.96
.1 1.
100.0
100.0
99.5
99.83
88.0
53.1
80.6
73.89
.0 1.
100.0
100.0
99.5
99.83
88.0
53.1
80.4
73.82
200
200
200
600
475
475
475
1425
# of pixels The
columns
labeled
applied
CPU
time
to
for
sl sources
training
and
s2 1 (sl)
and
indicate and
the
weights
2 (s2).
classification:
84
see.
209
Table 4.97 Linear Opinion Pool Applied in Classificationof Simulated HIRIS Data (300Training and 375Test Samplesper Class).
Percent Agreement with Training
1
2
Reference
#1
source
#2
100.0
99.0
Testing
3 LIoA Single
source
for Class
98.3
sl s2
2 Sources
1199.11
96.0
Multiple
3 .UoA
_
94.4
94.1
94.84
Sources
99.7
99.0
99,0
99.22
96.5
96.0
95.2
95.91
1..9
99.7
98.7
99.0
99.11
96.5
96.0
95.2
95.91
1..8
99.7
98.7
99.3
99.22
96.8
96.3
95.2
96.09
1..7
99.7
98.3
99.3
99.11
96.5
96.3
95.2
96.00
1..6
99.7
97.3
99.3
99.11
95.7
96.0
95.5
95.73
1..5
99.7
97.3
99.3
98.78
94.4
96.0
95.7
95.38
1..4
99.0
97.0
99.7
98.56
93.9
94.4
95.5
94.58
1..3
98.3
94.7
98.3
97.11
93.1
92.0
92.5
92.53
1..2
98.0
93.7
95.0
95.56
92.5
88.0
89.1
90,13
1..1
97.0
90.7
92.7
93.44
92.0
86.9
86.9
88.62
1..0
96.3
86.7
89.3
90.78
90.7
82.4
79.5
84.18
.9 1.
99.7
99.0
98.7
99.11
96.3
96.3
94.9
95.82
.8 1.
99.7
99.3
98.7
99.22
96.3
95.7
94.9
95.64
.7 1.
99.7
99.3
98.7
99.22
96.3
95.5
94.9
95.56
.6 1.
99.7
99.3
98.7
99,22
96.5
95.5
94.9
95.64
.5 1.
100.0
99.3
98.7
99.33
96.5
95.2
94.7
95.47
.4 1.
100.0
99.3
98.7
99.33
96.3
95.2
94.4
95.29
.3 1.
100.0
99.3
98.7
99.33
96.3
95.2
94.4
95.29
1°
1.
.1 1.
100.0 100.0
99.0 99.0
98.7 98.7
99.22 99.22
96.3 98.0
94.7 94.7
94.4 94.4
I 95.11 95.02
.0 1.
100,0
99.0
98,3
99.11
96.0
94.4
94.1
94.84
pixels
300
300
300
375
375
375
The
columns
.2 1.
__of
labeled
applied
CPU
time
to
for
sl sources
training
and
s2 indicate 1 (sl)
and
and
classification:
the
weights
2 (s2).
86
sec.
210
Table 4.98 Linear Opinion Pool Applied in Classificationof Simulated HIRIS Data (400Training and 275Test Samplesper Class).
percent
Agreement
w!th
Reference
for Class
Training
I
Testing
2
3 (I OA Single
ooo .o
source #1 source #2
96.8
87.3
2
3 IIOA
Sources
oooo
87.3
sl s2
1
90.42
83.6
Multiple
Sources
rl
,,
82.2
71.3
I[ 79.03
99.8
99.3
98.8
99.25
95.3
94.9
96.0
95.39
1..9
99.8
99.0
98.8
99.17
95.3
94.5
95.3
95.03
1..8
99.5
99.0
99.0
99.17
94.2
94.2
95.3
94.55
1..7
99.0
99.0
98.3
99.75
91.3
93.5
92.0
92.24
1..6
98.8
97.0
97.3
97.67
89.5
92.0
89.5
90.30
1..5
98.8
95.5
96.3
96.83
86.5
90.5
88.7
88.61
1..4
98.5
93.3
93.8
95.17
85.1
89.8
86.2
87.03
1..3
97.8
92.5
92.0
94.08
84.7
87.3
82.5
84.85
1..2
97.3
90.5
91.0
92.92
84.4
84.7
79.3
82.79
1..1
97.3
89.0
89.8
92.00
83.6
85.5
75.3
80.48
1..0
96.8
87.3
87.3
90.42
83.6
82.2
71.3
79.03
.9 I.
100.0
99.3
98.8
99.33
95.3
94.9
96.0
95.39
.8 1.
100.0
99.3
99.0
99.42
95.6
94.9
96.0
95.52
.7 I.
100.0
99.3
99.0
99.42
96.0
94.5
96.0
95.52
.6 I.
i00.0
99.3
98.8
99.33
95.6
94.2
96.4
95.27
.5 I.
I00.0
99.3
98.8
99.33
95.6
94.5
96.0
95.52
.4 I.
100.0
99.3
98.5
99.25
95.3
94.5
96.4
95.36
.3 I.
100.0
99.3
98.5
99.25
94.9
94.5
96.4
95.27
.2 1.
100.0
09.3
98.5
99.25
94.9
94.5
96.4
05.27
.I 1.
100.0
99.0
98.3
99.08
94.2
94.2
96.7
95.03
.0 I.
100.0
99.0
98.0
99.00
94.2
94.2
96.7
95.03
400
400
400
1200
275
275
275
825
I.
1.
# of pixels The
columns
labeled
applied
CPU
time
for
to
sl
and
sources
training
s2 1 (sl)
and
indicate and
classification:
the
weights
2 (s2).
89
sec.
211
Table
Linear HIRIS
4.99
Opinion Pool Applied in Classification of Simulated Data (500 Training and 175 Test Samples per Class).
Percent 1
source #1 source #2 sl s2
95.8 99.8
I.I. i..9 1..8 i..7 I..6 1..5 1..4 1..3 1..2 1..1 I..0
99.6 99.6 99.6 98.6 98.2 98.2 97.6 97.0 96.8 96.4 95.8
.9 .8 .7 .6 .5 .4 .3 .2 .1 .0
Agreement
with Reference
Training 2 3
l[
88.0 98.4
84.8 96.8
for Class Test"
OA
i
I Single Sources { 89.53 78.9 I 98.33 94.3 Multiple
Sources
2
g 3
78.3 94.3
73.7 98.9
_OA__A _ 95_ 76. 95.811
988 98.0 98.8o954 954 977 96i j 988 982 98.87949 954 977 9600j 98.8 98.2 98.87 93.7 95.4 97.7 95.62 i 98.6 97.2
98.0 98.4
98.40 97.27
91.4 89.7
94.3 93.7
96.0 92.0
93.90 91.81
I
95.4 93.2 91.8
95.6 93.8 92.8
96.40 93.87 93.87
86.3 84.6 82.9
92.6 89.1 86.9
90.3 89.1 85.7
89.71 87.62 85.14
91.0 89.8 88.0
89.8 87.6 84.8
92.53 91.27 89.53
82.9 80.0 78.9
84.0 80.6 78.3
82.9 76.6 73.7
83.24 79.05 76.95
99.6 99.6 99.6 99.6 99.6 99.6 99,6 99.6 99.8 99.8
98.8 98.8 98.8
98.0 98.0 97.8
98.80 98.80 98.67
94.9 95.4 94.9
95.4 94.9 94.9
97.7 98.3 98.3
96.00 96.19 96.00 J
98.6 98.6 98.6 98.6
97.6 97.6 97.4 97.4
98.67 98.60 98.53 98.53
94.3 94.3 94.3 94.3
94.3 94.9 94.9 94.3
98.9 99.4 99.4 99.4
96.00 96.19 96.19 96.00
98.6 98.4 98.4
97.4 97.2 96.8
98.60 98.47 98.33
94.3 94.3 94.3
94.3 94.3 94.3
99.4 99.4 98.9
96.00 96.00 95.81___
500
500
500
1500
175
175
175
525___
{ h
--7
1. 1. 1. 1. 1. 1. i. 1. 1. 1.
# of pixels The
columns labeled sl and s2 indicate the weights applied to sources 1 (sl) and 2 (s2).
CPU
time
for
training
and
classification:
90 see.
I I
212
Table Linear HIRIS
source
4.100
Opinion Pool Applied in Classification of Simulated Data (600 Training and 75 Test Samples per Cl_s).
Percent
Agreement
with
Reference
1
2Training 3 1,0A
I
for Class
1
Single
Sources
88.94 98.39
72.0 92.0
2Testing 3 I]OA
#1 95.2 100.0
86.5 98.3
85.2 96.8
73.3 97.3
II 95.56 74.67
Multiple
Sources
99.7
98.5
98.3
98.83
93.3
1..9
99.5
98.5
98.5
98.83
93.3
I00.0
98.7
97.33
I00.0
98.7
1..8
99.5
98.3
98.5
98.78
97.33
92.0
97.3
98.7
1..7
99.2
98.5
98.2
96.00
98.61
85.3
97.3
97.3
1..6
98.3
97.5
93.33
96.5
97.44
81.3
93.3
93.3
1..5
97.8
89.33
96.3
95.2
96.44
80.0
92.0
92.0
1..4
88.00
97.2
94.3
94.0
95.17
78.7
89.3
90.7
86.22
1..3
96.8
91.8
92.2
93.61
74.7
84.0
88.0
82.22
1..2
96.3
90.2
90.7
92.39
73.3
82.7
85.3
80.44
1..1
95.5
89.2
87.8
90.83
73.3
81.3
81.3
78.67
1..0
95.2
86.5
85.2
88.94
72.0
78.7
73.3
74.67
.9 1.
99.8
98.3
98.3
98.83
93.3
I00.0
98.7
97.33
.8 1.
99.8
98.3
98.3
98.78
93.3
I00.0
98.7
97.33
.7 1.
99.8
98.3
98.0
98.72
93.3
I00.0
98.7
97.33
.6 1.
99.8
98.3
97.8
98.67
92.0
100.0
98.7
96.89
.5 1.
99.8
98.3
97.8
98.67
90.7
100.0
98.7
96.44
.4 1.
99.8
98.3
97.7
98.61
90.7
i00.0
98.7
96.44
.3 1.
99.8
98.3
97.2
98.56
90.7
I00.0
98.7
96.44
.2 1.
99.8
98.3
97.2
98.44
92.0
98.7
98.7
96.44
so,urge ¢#2 sl s2 I.
1.
78.7 97.3
.1 1.
I00.0
98.3
96.8
98.39
92.0
98.7
97.3
96.00
.0 1.
I00.0
98.3
96.8
98.39
92.0
97.3
97.3
95.56
600
600
600
1800
75
75
75
# of plxels
The
columns labeled sl _nd s2 indicate the weights applied to sources 1 (sl) and 2 (s2).
CPU
time
for training
and
classification:
90 sec.
225
213
addition rather than multiplication in its global membership function. As compared to the 40-dimensionalML classification, these two methods were about 25_oslower (Figure 4.17). It is worth noting that a ML classificationof 60-dimensionaldata would have been still slower. Also, classificationusing the LOP and the SMC improved in terms of accuracyas comparedto the ML classificationof The trained
CGBP
(test)).
better
than
larger
increased
slowly.
(Figure iterations
the
other
In
4.104), CGLC
for the 40-dimensional grew
of training
was
about
data
(Table
was
a little
two
of 60-dimensional
rapidly
the
60 dimensions
sample
were
times data
size.
;dmost
the
the
Oddly
Also,
4.101
(training)
and
data,
it was
The
size
600
CPU
hours.
40-dimensional
classification
case,
of the
the
rapidly
samples
LOP
and the
per
the
SMC
CGBP
was
60-dimensional
algorithm
or
converged grew
training
The
a little
of 300
CGBP
to convergence
For
case
was
a sample
data.
used.
60-dimensional
classification with
and
neurons)
of test
test
to the
20 hidden
(Tables
its time
in 3.65
in training
CGLC
for
samples
If compared
the
data
cases.
accuracy
converged
the
neurons,
of classification
experiments
dimensionality
classification
The
overall
slower
than
As the
(Table
the
of training
4.18).
input
60-dimensional
of accuracy
faster.
1.2 times
(720
lower-dimensional
algorithm
146 times
about
for the
the
number
the
were
for
data.
network
In terms
As with the
class,
neural
to perfection
4.102
with
40-dimensional
data
needed
fewer
data. CGLC
(720
4.103). better
enough
neurons)
In classification than
faster but
input
the
for the
than
the
time the
did
sa,,,m for t,h(, C(1I,( _, with
in
of test
samples
40-dhnensional
data.
CGBP
algorithm
to convergence
training
better
times
also
for the
300 training;
in grew
40 and samph's
214
80
A
o o
6O
--O'--
E
--O--
MD 20 D ML 20 D
---m--
MD40 D
---e--
ML40 D MD 60 D
o_
I-
--A_.
40
-----M---
2O
0 0
100
200
Training
Figure 4.17
300
400
500
600
700
Samples/Class
Statistical Methods: Training Plus Classification Time versus Training Sample Size
SMC60 D LOP 60 D
215
Table Conjugate
Gradient
60-Dimensional
Sample size
Backpropagation
Simulated
Number iterations
CPU
of
4.101
HIRIS
-
_ time
Percent l
Applied
Data:
Training
Agreement
with
1
to Samples.
Reference
2
3
OA
100
59
650
100.0
100.0
100.0
200
91
1921
I 100.0
100.0
100.0
4696
I 100.0
i00.0
100.0
5969
I 100.0
100.0
100.0
[ 100.0 L jOO=p .......
100.0 190.0 .......
100.0 1_00.0_ _
300
183
400
189
500
172
600
250
7622 _1_3_7_4
Table Conjugate
Gradient
60-Dimensional
Sample
Number
size
iterations
of
Percent
100.00 100.00 100.00 100.00 100.00 _10p:90
4.102
Backpropagation
Simulated
for Class
HIRIS
Agreement
Applied Data:
with
Test
to
Samples.
Reference
for Class
1
2
3
OA
100
59
89.7
57.9
52.5
68.72
200
91
89.3
57.5
46.9
64.56
300
183
89.1
62.7
56.5
69.42
400
169
88.0
55.6
61.8
68.48
500
172
86.9
59.4
65.7'
70.67
800
25___g__092.0
60.0
57.3
69.78
_
......
216
16000
---o---
CGLC 20D
14000
:-
CGBP 20D CGLC 40D
12000
_" --'¢P-
CGBP 40D CGLC 60D
¢J 10000
=
¢D
CGBP
60D
8000 Qm
I--
6000
4000
20O0
0 0
100
200
300
400
500
600
700
Training Samples/Class Figure
4.18
Neural Network Models: Time versus Training
Training Plus Classification Sample Size
217
Table Conjugate
Gradient
60-Dimensional
Sample
Linear
Simulated
Number
size
of
iterations
4.103 Classifier
HIRIS
CPU
Percent
time
1
Data:
Applied
to
Training
Agreement
with
Samples.
Reference
2
for Class
3
OA
100
102
201
100.0
100.0
100.0
100.00
200
246
843
100.0
100.0
100.0
100.00
300
517
2140
100.0
100.0
99.7
400
565
3030
100.0
100.0
100.0
500
1041
6511
100.0
89.8
99.4
99.73
600
857
6931
100.0
98.0
98.2
98.72
Table Conjugate
Gradient
60-Dimensional
Sample size
Simulated
Number iterations
of
Percent 1
99.89 100.00
4.104
Linear
Classifier
HIRIS
Agreement 2
Data:
with
Applied Test
to Samples.
Reference 3
for OA
100
102
87.3
61.6
46.8
65.22
200
246
84.8
56.8
47.2
62.95
300
517
84.8
55.2
57.3
65.78
400
565
84.4
55.3
58.2
65.94
500
1041
86.9
61.7
56.6
68.38
600
857
88.0
66.7
58.7
71.11
Class
218
or less. two
With
times
4.4.4
faster
(Figure
Statistical
methods The
in the
ML
accurate,
size, the
40-dimensional
classification
was
about
4.18).
distance
and
considered
the
that
If the
data
sources,
neural
are
faster
of
reason
MD
the
classifier
statistics,
of dimensions
the it
data
is
In fact,
than
the
showed
multisource the
two
LOP
bound
{85 I. Also,
its classification
very
poor
and
in
Sections
sources
very
good
is not
independent
4.2
were
fast.
appropriate.
performance
than
and
4.3.
rather
in The
agreeable.
performance.
not
to
poorly
does not
be
also be applied
Since
accuracy
must
can
adequately.
saturation,
minimum
extremely
It
it shows
the
and
performance.
perform
not
covariance
methods
method better
data
can provide
showed data
HIRIS
and
very-high-dimensional
a far
data
fast
singular
accurate
ML
here.
It could
two or more
ML method the
both sets.
two
of
into
can be very
when
a
these
classification
LOP
LOP
of
network
performed
outperformed
can be split
data the
is that
discriminate
dimensional
the
LOP
neural
best, data
because
the
for data
and
this is the case the The
cannot
alternatives
the
40-dimensional
methods.
of multitype
classifications
apparent When
network
experiments
to the data
clearly
data and
in classification
in these
was
20- and
SMC
high-dimensional
in classification
the
applicable,
the
SMC
superior
of very-high-dimensional
of the
case
the
consistently
60-dimensional
desirable
data.
Also,
when
for In
were
classifications
in classification
applied
They
methods
method,
matrix.
ordcr
sample
Summary The
be
a larger
it does in
use
fast
any
classification
i.e., above increase.
is very
a certain
but
second of
high-
number
In the experiments,
219
the MD classificationaccuracy did not improve for data sets more complex than the 20-dimensionaldata. Of the neural network methods applied, CGBP showed excelleilt performance in classification of training data. However, its classification accuracyfor test data did not go much over 70_o. The CGBP was very slow in training and increasingthe number of training samplesslowedthe training processmarkedly. In contrast increasingthe number of training samplesdid not significantly improve the classificationaccuracy of test data. It seems evident that CGBP needsto haveseenalmostevery sampleduring training to be able to classifythem correctly during testing. Training
of
the
CGBP is
more efficient
than
conventional
backpropagation and requiresfewer parameterselections. However,as in the conventionalbackpropagation,the number of hiddenneuronsmust be selected empiric.'_liy. We selectedthe lowest number of hidden neurons which gave 100% accuracyduring training. Use of too many neural
network
(analogous
to the Hughes
The
CGLC
dimensional good CGLC
computationally
uses
data
performance was
not
as accurate
accuracies
CGBP,
so it seems data.
it did of the
similar
dimensional
no
complex
phenomenon hidden not
do
CGLC
in classifying to be the
test better
and
worse
indicates
as CGBP
can
neurons
degrade
makes
its
the
performance
[29]).
neurons, much
and
hidden
in the than
good
The
alternative
CGBP.
separability
in classifying data.
the
experiments
training CGLC
with The
of the data
converged
for classification
high-
relatively data.
but
The
achieved
faster
than
of very-high-
220
In defenseof the neural network methods, it can be said that the maximum likelihoodmethodhad an unfair advantagesincethe simulated data were generatedto be Gaussian. Neural networks are easy to implement and do not
need
statistical
model
methods data
were
sets.
results.
prior
has
the
These of
computation
time
the
neural
increased
time when
methods,
to the
very the
training
do
which
on
rapidly
not
statistical sample
Also,
have
was
methods
which
size increases.
suitable
neural
difficult
in
the
statistical
machines.
an increased
a
as much
evident
to the
parallel with
whereas
in classifying
be comparable
implemented
increases
contrast
networks
will not
unless
in
data
potential
to have
statistical
samples
the
earlier
methods
speed
about
for the ML method.
shown
as
information
to be available
However,
generalize
terms
any
network multitype ability test
to data
methods
Currently
in their
number
of training
require
almost
no
221
CI _APTER 5 CONCLUSIONS AND SUGGESTIONS FOR FUTURE WORK
5.1
Goncluaiona This
empirical
classification
of both
high-dimensional The
evaluation
data
neural
performance
terms
of classification data
neural
neural
better network
procedure specific
for
test
considered The
some
models,
networks
data. in the neural
the
achieved
models
have
an
too many the
network and
therefore
distributions
of the
CGLC
and for
data. with
models
the
CGBP,
However, statistical
cycles,
networks
data.
This
good
If
the
networks
give
sensed
methods
problem. neural
is a shortcoming
knowledge
showed
used
methods.
)roblem
no
very-
in
in classification
and
have
for
and
remotely
statistical
ng data
of neural
networks data
multisource
to the
]turning
train
neural
differences.
overtraining
overtraining
application
striking
of training
were
and
sensing/geographic
:;uperior
results
This
methods
methods
were
accuracy
in classifying
distribution-free statistical
has revealed
recognition
goes through
too
remote
as pattern
Both
test
multisource
network
data.
of statistical
less than
Also,
their
the
training will get
optimal that
of
results
has
to
be
for classification.
the
advantage
is needed is an obvious
about
that the
advantage
they
are
underlying over
most
222
statistical methods requiring modeling of the data, which is difficult when there is no prior knowledgeof the distribution functions or when the data are non-Gaussian. It also avoids the problem of determining how much influence a sourceshould havein the classification,which is necessaryfor both the SMC and LOP methods. However,the neural networks, especiallythe CGBP, are computationally complex. When the sample size was large in the experiments, the training time could be very long. The experimentsalso showed how important the representationof the data is when using a neural network. To perform well the neural network models must be trained using representative training samples. Any trainable classifier needs to be trained using representative training samplesbut the neural networks are more sensitiveto this than are the statistical methods.If the neural networks are trained with representative training samplesthe results showedthat a two-layer or do almost samples.
as well
as the
However,
the
statistical
methods
(simulated)
HIRIS
Gaussian; methods methods. of
multiple
statistical The topographic
they did
neural
in data.
were not
The
statistical
types
network
It was
and
methods
known
much
network
in multisource
classification
simulated
have
neural
the
methods
cannot
chance
of
the
Therefore,
are more modeled
HIRIS
to the
data
neural
than
appropriate a
of test
inferior
the
better
by
can
very-high-dimensional
that
doing
net
classification clearly
the
beforehand way.
be
were
of
that
models
a three-layer
when
convenient
the the
were
network statistical data
are
multivariate
model. SMC data.
method The
worked classification
well
for of
four
combining and
multispectral six
data
sources
and gave
223
significant improvement in compared
to single
different
sources
in overall
density
classification.
also showed
promise
different
modeling of
estimation
showed
accuracy.
However,
the
very the
method)
when
likelihood
method and
alternatives
for modeling
test
The
data
statistical the
SMC
since
hand,
intensive
the
approach Also,
to use
SMC.
SMC,
the
as
of weights
for
of increase
approach
and
test
the
in its own right.
accuracy.
outperformed with
data.
more
capable
neural
insight Also, time
of
neural
to the
SMC
and
effort
are
when
the
Parzen
of the
SMC
the
neural
network
in classifying form
samples
require
of the make
not
seen
computationally
algorithm. on
become
tends
functions
to
density can
useful
but
in the
density
required
density
are
samples
knowledge
models
penalized
classification.
networks
generalizing
network
penalized
method
as are
modeled
consuming
the Parzen
training
prior
Carefully
time
in multisonrce
the
classification
maximum
Both
for density
maximum
likelihood data
Parzen
more
The
representative
in contrast
training
large.
experiments
of overall
was
representative
being
more
training
the
good
the The
in terms
penalized
provided
more
the
very
in
sources.
size was
requires
significantly
analyst
accuracies
in terms
estimation
of non-Gaussian
for
iterative
density
to their
used
performance
sample
algorithm
model(s)
levels
experiments
data
maximum
it was
training.
expensive
the
algorithm
statistical
during
with
the
to be as sensitive
models.
different
were
(histogram
also gave
estimation
test
Parzen
likelihood
not
in the
methods
good
methods
SMC
classification
Using
non-Gaussian
other
The
average
accuracy.
estimation
than
and
source
classification
Three
overall
On the
the
part
estimation
other of the is used
computationally
224
The LOP did not do well multisource
remote
because
of
its
multisource data.
and
faster to the
clearly
algorithms
the
and had the
shows reason
relatively
accuracy The
three
suggested
the
worked
well for the
data
criteria
and
LOP
determine
and tended
the
problem
is
optimum
data
in classification
can
do
in
the
well
for
geographic
data
modeling
dictatorship weights
for
investigated. the
of multisource
SMC
That
well
sources
were
LOP help
because
of training
in the
the
rather
high-
agreeable
case for the
sources
data
sources
are
and
improve
the
as
the
the
were
both
adequate.
and
were
source. and
not
of very-
agreeable
It is very the
LOP.
weighting
an excellent
geographic
The
of multitype
optimum
give
These
classification
sources
SMC
ranking
classifications.
in classifications
will certainly sensing
in
employed
sizes
of the best both
were
number
the
LOP
sample
the
the
classifications.
and
where
both
in contrast
in classification source
of these
be used
were
When
of
methods
LOP
was not the
SMC
not
Both
of the
data
With
remote
classification
data,
always
of
is appealing
in classification
limited
measures
could
being
can with
two
single
They
toward
and
experiments.
reliability
worked
optimum
still
the
for
SMC
performance
SMC in all cases
data.
sensing
that
LOP
performance.
problems
good
The
appropriate
excellent
to the
sources
also
high-dimensional remote
LOP
data.
classification
very-high-dimensional
accuracies.
as compared
for
the
the
multisource
to the
of the
was
the
criteria
ranking
for
the
not
inferior
classification
agreeable
is
singularity
source-specific
multisource
in
ML classifiers
classification high
it
showed
conventional
The
dimensional
overall
It was
ML which
samples.
in
but
all
ge,_graphic
in classification
SMC
than
and
simplicity
data.
However,
LOP
sensing
at
data.
hard
to
That and
performance
225
In general the main advantagestatistical classificationalgorithms have over the neural network models is that if the distribution functions of the information classesare But
for
those
do not
cases,
know
the
appropriate,
are
network Suggestions
6.2
Future
several
selection.
weights
As observed
mathematical These
to
to
accurately.
classification, network
in which
models
can
we
be more
expense.
both
the
classification
directions
previously,
statistical
which
in this area
be
with
the
the
it is very
classifiers. use
programming
methods
related
problem
multisource
appears
neural
very
need
and
neural
further
work.
are discussed
next.
Directions
important
statistical
perform
computational
in multisource research
can
in multisource
functions,
problems
Research most
methods
at considerable
for future
The
the
distribution
approaches
these
as for instance
although
There
known
hard
One
of
statistical
more
research
to
Chapter
3,
need
approach
optimization suggested be
is
to find optimum
general
methods
methods
for
for
determining
similar
in Sections
applicable
weights
for
techniques
weight
2.3.4
to
and
the
2.3.5.
optimum
weight
selection. As explicitly statistical However, neural prior
discussed in
neural
consensus one network
statistical
consensus
in
networks. theory
possibility algorithm information
approaches.
it
is very
Therefore, approaches
for
a
consensual
described but
difficult it
and
is very the
neural
as follows. is somewhat
to
implement hard
to
statistics combine
neural
networks
network
is the
This
network
analogous
to
models. stage-wise
does the
the
not
use
statistical
226
for
In the
stage-wise
a fixed
number
When
training
stage
is computed.
second
The
stages
is computed
of the
based
consensual
stages.
classification
error
similar
way
and
all the
are
but
with
training
of
are
computed
consensual
neural
be the
is computed
is created
an]
than
trained
o'
the in a
non-linearly
has the
e.g., Then
is lower
for
both
weights)
error
stage
stage.
from
stage.
set
input
first
can,
networ_
another this
to the
of each
to the
original
stage-specific
neural
stage
for that
data
weights
accuracies
a new
error
input
consensus
stage-specific
classification
converges. error
the
the
(using
consensual
stage,
After
consensual
added
the
finished,
out,put
the
activities
in
the
error training
decreases.
If the
is stopped.
Testing
network
consensual
as
long
as
classification
can be done
the error
by applying
all
in parallel. consensual
different
multisource
transformed
sum
The
stage,
data.
the
decreasing,
The
first
second
classification
the stages
in
the
weighted
consensual
The
fashion
finished,
is trained
stages.
Stages consensual
the
classification
transforming
has
for the
network
procedure
is created.
classification
error
If
input
consensus
various
overall
for the
to
transformed
is not
stages.
training
in a similar
stage
from
classification
both
from
second
neural
the
non-linearly
the
on the
finished,
is trained
the
the
stage
by taking
activities
selected
by
until
has
another
stage
a single-stage
or
stage
obtained
training
of output
first
second
the
network
iterations
Then
are
When
using
of
of the
stage
vectors.
neural
neural
"sources."
network
In contrast
classification, data
which
algorithm
have
the been
to the "sources" transformed
combines data
the
sources
here
information usually
consist
several
times
of
from
referred
to
non-linearly from
the
raw
227
data. In neural networksit is very important to find the "best" representation of input data and the consensualneural network attempts to averageover the results from several input representations. Also, in the network, this method This multisource guidance linear
testing
can
attractive type
be done
between
for implementation
of consensual classification.
of weight-selection
transformation.
in parallel
neural
for
on parallel
network
However, the
all the
it
may needs
sources
and
consensual
stages,
neural
which
makes
machines.
be a desirable further selection
alternative
work
in
of the
terms best
for of non-
228
LIST
OF
REFERENCES
[11
P.H. Swain, J.A. Richards and T. Lee, '_4uttisource Data Analysis in Remote Sensing and Geographic Information Processing," Proceedings of the 11th International Symposium on Machine Processing of Remotely Sensed Data 1985, West Lafayette, Indiana, pp. 211-217, June 1985.
[2]
T. Lee, J.A. Richards and P.H. Swain, '*Probabilistic and Evidential Approaches for Multisource Data Analysis," IEEE Transactions on Geoscienee and Remote Sensing, vol. GE-25, no. 3, pp. 283-293, May 1987.
[3]
A.H. Strahler and N.A. Bryant, Classification Accuracy from Landsat Information," Proceedings Twelfth Remote Sensing of the Environment, Michigan, pp. 927-942, April 1978.
[4]
J. Franklin, T.L. Logan, C.E. Woodcock "Coniferous Forest Classification and Inventory Digital Terrain Data," IEEE Transactions Remote Sensing, vol GE-25, no. 1, pp. 139-149,
[5]
A.R. Jones, J.J. Settle and B.K. Wyatt, '_tSse of Digital Terrain Data in the Interpretation of SPOT-1 HRV Multispectral Imagery," International Journal of Remote Sensing, vol. 9, no. 4, pp. 669-682, 1988.
[6]
C.F. Hutchinson, "Techniques for Combining Landsat and Ancillary Data for Digital Classification Improvement," Photogrammetrie Engineering and Remote Sensing, vol. 48, no. 1, pp. 123-130, 1982.
[71
R.M. Hoffer, M.D. Fleming, L.A. Bartolucci, S.M. Davis, R.F. Digital Processing of Landsat MSS and Topographic Data to Capabilities for Computerized Mapping of Forest Cover Types, Technical Report 011579, Laboratory for Applications of Sensing in cooperation with Department of Forestry and Resources, Purdue University, W. Lafayette, IN 47906, 1979.
"Improving by Incorporating International Environmental
Forest Cover Topographic Symposium on Institute of
and A.H. Strahler, Using Landsat and on Geoscience and 1986.
Nelson, Improve LARS Remote Natural
229
Is]
H. Kim and P.H. Swain, '_VIultisource Data Analysis in Remote Sensing and Geographic Information Systems Based on Shafer's Theory of Evidence," Proceedings IGARSS '89, IGARSS '89 12th Canadian Symposium on Remote Sensing, vol. 2, pp. 829-832, 1989.
[91
J.A. Richards, D.A. Landgrebe Utilizing Ancillary Information Remote Sensing of Environment,
[lO]
R.M. Hoffer and staff, "Computer-Aided Analysis of Skylab Multispectral Scanner Data in Mountainous Terrain for Land Use, Forestry, Water Resources and Geological Applications," LARS Information Note 121275, Laboratory for Applications of Remote Sensing, Purdue University, W. Lafayette, IN 47907, 1975.
[11]
A. Torchinsky, Redwood City,
[12]
S. French, "Group Consensus Survey," in Bayesian Statistics Lindley, A.F.M. Smith (eds.), 1985.
[13]
K.J. McConway, The Probability Assessment: Thesis, University College,
[14]
C.G. Wagner• Probablhtms,
[lS]
R.F. Bordley and R.W. Probability Estimates," 1981.
[16]
K.J. McConway, '_Iarginalization Journal of the American Statistical 1981.
[17]
C. Berenstein, L.N. Kanal and D. Lavine, "Consensus Rules," in Uncertainty in Artificial Intelligence, L.N. Kanal and J.F. Lemmer (eds.), North Holland, New York, New York, 1986.
[lS]
M. Stone, "The Opinion pp. 1339-1342, 1961.
[19}
N. Dalkey, Studies Lexington, MA, 1972.
•
•
•
t_
and P.H. Swain, in Multispectral vol. 12, pp. 463-477,
Real Variables, Addison-Wesley California, 1988.
"Allocation, Lehrer Theory and Decision,
in
Company,
Experts' Opinions in Considerations, Ph.D.
Models, and the Consensus vol. 14, pp. 207-220, 1982.
Wolff, "On Management
the
Publishing
Probability Distributions: A Critical 2, J.M. Bernardo, M.H. DeGroot, D.V. North Holland, New York, New York,
Combination of Some Theoretical London, 1980.
Pool,"
"A Means for Classification," 1982.
of
the Aggregation of Individual Sciences, vol. 27, pp. 959-964,
and Linear Opinion Pools," Association, vol. 76, pp. 410-414,
Annals
Quality
Mathematical
of
Life,
Statistics,
Lexington
32,
Books,
230
[20]
J.V. Zidek, Multi-Bayesianity, British Columbia, Vancouver,
[21]
R.L. Winkler, Distributions", Oct. 1068.
[22]
R.L. Winkler, Information 1981.
[23]
R.L. Winkler, "The Quantification Methodological Suggestions," Journal of Association, vol. 62, no. 320, pp. 1105-1120,
[24]
R.L. Winkler, "Scoring Rules and Assessors," Journal of the American pp. 1073-1078, 1969
[2s]
C. Genest and J.V. Zidek, "Combining Probability Distributions: Critique and and Annotated Bibliography," Statistical Science, 1., no. 1, pp. 114-118, 1986.
[28]
M. Bacharach, Bayesian Church, Oxford, 1973.
[27]
C. Genest, K.3. McConway and M..I. Schervish, Externally Bayesian Pooling Operators," The vol. 14, no. 2, pp. 487-501, 1986.
[2s]
R.F Bordley, Studies in Mathematical Group Decision Theory, thesis, University of California, Berkeley, California, 1979.
[29]
K. Fukunaga, Introduction to Statistical edition, Academic Press, New York, 1990.
[30]
P.H. Swain, '_undamentals of Pattern Recognition in Remote Sensing," in Remote Sensing - The Quantitative Approach, edited by P.H. Swain and S. Davis, McGraw-Hill Book Company, New York, 1978.
[31]
J.A. Richards, Remote Sensing Digital Image Introduction, Springer-Verlag, Berlin, W. Germany,
"The Management
Technical Canada, Consensus Science,
"Combining Probability Sources," Management
Dialogues,
Report 1984.
no. 05, University
of Subjective vo[. 15, no. 2, pp.
Distributions Sciences, vol.
of
Probability B-61 - B-75,
from Dependent 27, pp. 479-488,
of Judgement: the American 1967.
Some Statistical
the Evaluation of Probability Statistical Association, vol. 64,
unpublished
manuscript,
A vol.
Christ
"Characterization of Annals of Statistics,
Pattern
Recognition,
Analysis 1986.
Ph.D.
2nd
An
231
[32]
S.J. Whitsitt and D.A. Landgrebe, Error Estimation and Separability Measures in Feature Selection for Multielass Pattern Recognition, LARS Publication 082377, The Laboratory for Application of Remote Sensing, Purdue University, W. Lafayette, Indiana, 1977.
[33]
C.E. Shannon Communication,
[34]
R.J. MeEliece, The Theory of Information and Coding, Encyclopedia of Mathematics and its Applications, vol. 3, Addison-Wesley Publishing Company, Reading, Massachusetts, 1977.
[35]
J.A. Benediktsson and P.H. Swain, Methods for Multisource Data Analysis in Remote Sensing, School of Electrical Engineering and Laboratory for Application of Remote Sensing, TR-EE 87-26, Purdue University, W. Lafayette, IN 47907, 1987.
[36]
R.F. Bordley, Assessments,
"A Multiplicative Formula for Aggregating Probability Management Science, vol. 28, pp. 1137-1148, 1982.
[37]
D.H. Krantz, Measurement
R.D. Lute, P. Suppes and A. Tversky, Foundation Vol. 1, Academic Press, New York, New York, 1971.
[38]
P.A. Morris, Management
[39]
D.V. Lindley, "Another Resolution," Management
[40]
M.J. Schervish, "Comments Judgements," Management
[41}
M. H. DeGroot, "Reaching a Consensus," Journal of the Statistical Society, vol. 69, no. 345, pp. 118-121, 1974.
[42]
P.A. Morris, Management
[43]
J.O. Berger, Statistical edition, Springer-Verlag,
I44]
S. French, '_lpdating of Belief in the Opinion," Journal of the Royal Statistical pp. 43-48, 1980.
and W. University
"An Science,
Weaver, of Illinois
The Mathematical Theory Press, Chicago, 1963.
Axiomatic Approach to vol. 29, pp. 24-32, 1983. Look at Science,
Expert
of
of
Resolution,"
an Axiomatic Approach to Expert vol. 32, pp. 303-306, 1986.
on Some Axioms for Combining Expert Science, vol. 32, pp. 306-312, 1981}.
"Combining Expert Judgements, a Bayesian Science, vol. 23, pp. 679-693, 1977. Decision Theory and Bayesian New York, 1085.
American
Approach"
Analysis,
2nd
Light of Someone Else's Society, Series A, vol. 143,
232
[45]
B.W. Silverman, Density Estimation for Statistics and Data Analysis, Monographs on Statistics and Applied Probability, Chapman and Hall, New York, New York, 1986.
[46]
R.O. Duda and P.E. A Wiley Interscience New York, 1973.
Hart, Pattern Publication,
[47]
D.W. Scott, R.A. Probability Density Likelihood Criteria," 832, 1980.
Tapia and Estimation The Annals
[48]
T. Kohonen, Networks, vol.
[49]
B.P. Holt,
[50]
F. Rosenblatt, "The Perceptron: Information Storage and Organization Review, vol. 65, pp. 386-408, 1958.
[51]
R.A. IEEE
[52]
J.J. Hopfield, "Neural Networks and Physical Systems with Emergen Collective Computational Properties Like Those of Two-State Neurons," Proceeding of the National Academy of Science, vol. 81, pp. 3088-3092, May 1984.
[53]
S. Grossberg, "Nonlinear Neural Networks: Principles, Mechanisms, and Architectures," Neural Networks, vol. 1, no. 1, pp. 17-62, 1988.
[54}
G.A. Carpenter and S. Grossberg, "A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine," Neural Networks and Natural Intelligence, S. Grossberg (ed.), MIT Press, Cambridge, MA, pp. 251-315, 1988.
[55]
J.A. Hartigan, York, 1975.
[56]
T. Kohonen, Self-Organization Springer-Verlag, New York,
[57]
M. Minsky and Press, Cambridge,
"An Introduction 1, no. 1, pp. 3-16,
Lathi, Modern Digital Rinehart and Winston,
Classification John Wiley
J.R. Thompson, "Nonparametric by Discrete Maximum Penalizedof Statistics, vol. 8, no.4, pp. 820-
to 1988.
Neural
Computing,"
Neural
and Analog Communication New York, 1983. A
Algorithms,
S. Pappert, MA, 1988.
John
and Associative 1988. Perceptrons-
Systems,
Probabilistic in the brain,"
Lippmau, "An Introduction to Computing ASS,° Magazine, April 1987.
Clustering
and Scene Analysis, and Sons, New York,
Model for Psychological
with
Wiley
Neural
and
Memory,
Expanded
Nets,"
Sons,
2nd
Edition,
New
edition,
MIT
233
[58]
G.E. McClellan, R.N. DeWitt, T.H. Hemmer, L.N. G.O. Moe, ']VIultispectral Image Processing with Backpropagation Network," Proceedings of IJCNN 151-153, Washington D.C., 1989.
[sg]
S E Decatur. "Application Classfficatxon, Proceedings Washington D.C., 1989. •
•
•
°
,_
of
of Neural Networks IJCNN '89, vol 1.,
Matheson and a Three-Layer '89, vol. 1, pp.
to pp.
Terrain 283-288,
[80]
S.E. Decatur, Application of Neural Networks to Terrain Classification, M.S. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, June 1989.
{81]
O.K. Ersoy and D. Hong, "A Hierarchical Nonlinear Spectral Processing," presented D.C., 1989.
[62]
P.H. Heermann and N. Khazenie, for Classification of Multi-Source Data," Proceedings of IGARSS Washington D.C., 1990.
[63]
J. Maslanik, J. Key and A. Schweiger, "Neural Network Identification of Sea-Ice Seasons in Passive Microwave Data," Proceedings IGARSS '90, vol. 2, pp. 1281-1284, Washington D,C., 1900. and M.E. Convention
Neural Network Involving at IJCNN '89, Washington
"Application Multi-Spectral '90, vol.
Hoff, "Adaptive Switching Record, IRE, pp. 96-104,
of Neural Networks Remote Sensing 2, pp. 1273-1276,
[64]
B. Widrow WESCON
[65]
J.A. Anderson and E. Rosenfeld Cambridge, MA, 1988.
[86]
P.J. Werbos, Beyond Regression: New Analysis in the Behavioral Sciences," University, Cambridge, MA, 1974.
[87]
D. Parker, Learning Logic, Computational Research in MIT, Cambridge, MA, 1985.
[88]
Y. Le Cun, "Learning Processes in an Asymmetric Threshold Network," Disordered Systems and Biological Organization, E. Bienenstock, F. Fogelman Souli and G. Weisbruch (eds.), Springer, Berlin, 1986.
(eds.),
Technical Economics
Circuits," New York,
of
Neurocomputing,
1960 IRE 1960. M_IT Press,
Tools [or Prediction and Ph.D. thesis, Harvard
report TR-87, Center for and Management Science,
234
[69]
D.E. Rumelhart, G.E. Hinton and R.J. Williams, '_Learning Internal Representation by Error Propagation," Parallel Distributed Processing: Ezploration_ in the Microstructures of Cognition, Vol. 1, D.E. Rumelhart and J.L. MeClelland (eds.), pp. 318-362, MIT Press, Cambridge, MA, 1986.
[70]
D.E. Rumelhart, G.E. Hinton Representations by Back-propagating 536, 1986.
[71]
R.A. Rate
[72]
J.A. Benediktsson, P.H. Swain and O.K. Ersoy, "Neural Network Approaches Versus Statistical Methods in Classification of Multisource Remote Sensing Data," Proceedings IGARSS '89 and the 12th Canadian Symposium on Remote Sensing, vol. 2, pp. 489-492, Vancouver, Canada, 1989.
[73]
R.P. Gorman and T.J. Sejnowski, "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Signals", Neural Networks, vol. 1, no. 1, pp. 75-90, 1988.
[74]
E. Barnard and R.A. Cole, A Neural-Net Training Program Based on Conjugate-Gradient Optimization, Technical Report No. CSE 89-014, Oregon Graduate Center, July 1989.
[75]
R.L. Watrous, Learning Algorithms for Connectionis Networks: Applied Gradient Methods of Nonlinear Optimization, Technical Report MS-CIS-88-62, LINC LAB 124, University of Pennsylvania, 1988.
L76]
D.G. Luenberger, Addison-Wesley,
[77]
H. White, Perspective,"
[78]
W. Kan and I. Aleksander, "A Probabilistic Logic Neuron Network for Associative Learning," in Neural Computing Architectures - The Design of Brain-Like Machines, edited by I. Aleksander, M.I.T. Press, Cambridge, MA, 1989.
I79]
D.F. Specht, '_Probabilistic ICNN, San Diego, 1988.
Jacobs, "Increased Adaption," Neural
Williams, 'Zearning Nature, 323, pp. 533-
Rates of Convergence Through Networks, vol. 1, no. 4, pp. 295-307,
Linear Reading,
'_earning Neural
and R.J. Errors,"
and Nonlinear MA, 1984.
in Artificial Computation,
Neural
Programming,
Neural Networks: vol.1, pp. 425-464,
Network
(PNN)",
Learning 1988.
2nd
ed.,
A Statistical 1989.
Proceedings
of
235
[8o]
D.F. Specht, '_robabilistic Neural Networks and the Polynomial Adaline as Complementary Techniques for Classification," IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 111 - 121, March 1990.
[81]
O.K. Ersoy, Lecture University, 1990.
[82]
J.A. Benediktsson, P.H. Swain and O.K. Ersoy, Approaches Versus Statistical Methods in Multisource Remote Sensing Data," IEEE Geoscience and Remote Sensing, vol. GE-28, no. 4, 1990.
[83]
D.G. Goodenough, M. Goldberg, G. Plunkett and J. Zelek, "The CCRS SAR/MSS Anderson River Data Set," IEEE Transactions on Geoscienee and Remote Sensing, vol. GEo25, no. 3, pp. 360-367, May 1987.
{84]
J.P. Kerekes and D.A. Landgrebe, RSSIM: A Simulation Program for Optical Remote Sensing Systems, TR-EE 89-48, School of Electrical Engineering, Purdue University, West Lafayette, IN, August 1989.
[85]
C. Lee, Classification Algorithms for High Dimensional thesis proposal, School of Electrical Engineering, Purdue 1989.
Notes,
School
of Electrica_
Engineer|ng,
Purdue
"Neural Network Classification of Transactions on pp. 540-552, July
Data, Ph.D. University,
_tj e _