Statistical Methods and Neural Network Approaches

1 downloads 0 Views 8MB Size Report
Laboratory for Applications .... Linear Programming ...... CPU Times (in Sec) for Training ...... In this application the event X _ Ixl,xe,...,xnl is n ...... 8085. 80.46. 7927. 79.17. 78.47. 77.58. 76.49. 6865. 80.06. 80.26. ___qf__pj x e I ...... Architecture.
/

/

Statistical Methods and Neural Network Approaches for Classification of Data from Multiple Sources

Jon Atli Benediktsson Philip H. Swain

TR-EE 90-64 December 1990

Laboratory for Applications Remote Sensing

of

and

School of Electrical Engineering Purdue University West Lafayette, Indiana 47907 iwo 1-2

G5/03

37_0

Oncl as 0016,425

STATISTICAL METItODS FOR CLASSIFICATION

Jon Atli

Laboratory

AND NEURAL NETWORK APPROACtII,;S OF DATA FROM MULTIPLE SOURCES

Benediktsson and Philip TR-EE 90-64 December 1990

for Applications

H. Swain

of Remote

Sensing

and

School West

This

work

EOS-like

was supported Data"

(Principal

in part

of Electrical Engineering Purdue University I,afayel,te, Indiana 47907

by NASA

Investigators:

David

Grant

No.

NAGW-925

A. Landgrebe

"Earth

and Chris

Observation

Johannsen)

Research

- Using

Multistage

ACKNOWLED

The

authors

would

like to thank

GME NT S

Dr. Okan

K. Ersoy

for his contributions

to this research. The

Anderson

River

loaned

by

tile

Mines,

and

Resources,

was now

originally at

Canada

of the

acquired,

Colorado

SAR/MSS Centre

for

Government

preprocessed,

State

data

University.

set

Remote

was

acquired,

Sensing, of Canada.

and

loaned

Access

to

preproeessed,

Department The

Colorado

by Dr. Roger both

data

sets

and

of Energy

Itoffer,

data

set,

who

is

is gral_efu]ly

acknowledged. The Space

research

Administration

was

supported through

Grant

in part

by

the

No. NAGW-925.

National

Aeronautics

and

°°.

111

TABLE

OF CONTENTS

Page LIST

OF TABLES

.........................................................................................

vi

LIST

OF FIGURES

.......................................................................................

xv

ABSTRACT

................................................................................................

CHAPTER

1 - INTRODUCTION

1.1

The

1.2

Two Different'

Research

1.3 Report CHAPTER 2.1 2.2

Problein

STATISTICAL

A Survey

.........................................................................

Classification

Organization

2

...................................................................

of Previous

Approaches

..............................................

............................................................................ MLTttOI)S Work

.....................................................

.................................................................

Consensus Theoretic Approaches ......................................................... 2.2.1 Linear Opinion l)ools .................................................................. 2.2.2 Choice of Weights for I,inear Opinion ])()()Is .............................. 2.2.3

Linear Opinion 1'oo1,_ for Multisource Classification ..............................................................................

2.2.4 The 2.3 Statistical

Logarithmic Multisource

Opinion Pool .................................................. Analysis .........................................................

2.3.1 2.3.2 2.3.3

Controlling Reliability Association

the Influence of the Data Sources ............................ Measures ................................................................... .................................................................................

2.3.4 2.3.5

Linear Programming Approach ................................................. Non-Linear Programming Approach ..........................................

2.3.6 Bordley's Log Odds Approach ................................................... 2.3.7 Morris' Axiomatic Approach ...................................................... 2.4 Group Interaction Methods ................................................................ 2.5 The Super Bayesian Approach ........................................................... 2.6 Overview of the Consensus Theoretic Approaches .............................. 2.7

Classification

of Non-Gaussian

Data

..................................................

xvii 1 1 2 .t 6 6 9 10 l:i 20 21 25 29 31 38 39 ,12 ,t3 ,14 46 46 48 4.(}

iv

Page 2.7.1

Itistogram

2.7.2

Parzen Density g,,timation ........................................................ Maximum Penalized Likelihood Estimators ...............................

5(} 51

Discussion

.................................

52

.................................

53

2.7.3 2.7.4 CHAPTER 3.1 3.2

3.3 3.4

of Density

3 - NEURAL

Estimation

NETWORK

Methods

APPROACHES

49

Neural Network Methods for Pattern Recognition ............................. Previous Work ................................................................................... 3.2.! The Delta Rule ..........................................................................

53 58 62

3.2.2 The Backpropagation "Fast" NeLlral Ne, tworks

63 66

Algorithm ............................................... .....................................................................

Including 3.4.1 The

St:t_,istics in Neural Networks ............................................. "Probabilistic Neural Network". .........................................

68 6(J

3.4.2

Tile

Polynomial

71

3.4.3 3.4.4

Higher Order Neural Overview of Statistics

CHAPTER 4.1 4.2

Appron,ch ..................................................................

Adaline

............................................................

Networks .................................................. in Neural Network

73

Models ........................................................................................

74

4 - EXPERIMENTAL

76

Source-Specific The Colorado

RESULTS

...............................................

Probabilities ............................................................... Data Set .......................................................................

77 78

Statistical Approaches .................................................. Neural Network Models ................................................

7(9 87

4.2.1 4.2.2

Results: Results:

4.2.3

Second

Experiment

4.2.4

Results Methods

of Second Experiment: Statistical .....................................................................................

4.2.5

Results Network

of the Second Experiment: Neural Methods .....................................................................

125

4.2.6

Summary

.................................................................................

131

on Colorado

Data

.......................................

94 94

4.3

Experiments 4.3.1 Results: 4.3.2 Results:

with Anderson River Data ........................................... Statistical Methods ..................................................... Neural Network Methods ............................................

135 139 168

4.4

Experiments with Simulated HIRIS Data ......................................... 4.4.1 20-Dimensional Data ................................................................

174 175

4.4.2 4.4.3

40-Dimensional 60-Dimensional

184 193

4.4.4

Summary

Data Data

................................................................ ................................................................

.................................................................................

218

Page CHAPTER 5 - CONCLUSIONSAND SUG(IEST1ONSFOR I,'UTURE WORK ..........................................................................221 5.1 Conclusions...................................................................................... '2'21 5.2 LIST

Future

l{esearch

OF REFERENCES

Direct,ions

..............................................................

.............................................................................

225 22S

vi

LIST OF TABI,ES

Table

Page

3.1 SampleSizeRequiredin ParzenDensity Estimation when Estimating a Standard Multivariate Normal Density Using a Normal Kernel I45}.............................................................................. 72 4.1

4.2

"['raining and Test Samples for Information Classes in the First Experiment on tile Colorado Data Set ................................. Classification Minimum

4.3

4.4

Statistical

Statistical Colorado

4.6

Equivocation

4.7

4.8

4.9

Euclidean

Classification Results Minimum Euclidean

Colorado 4.5

Results

for Training Distance

is Applied

.................................

82

for Test Samples ])istance Classifier

when is Applied

.................................

82

Classification

Training

Multisource Data:

Test

when

Classifier

Multisource Data:

Samples

80

Samples Classilication

Samples

........................................................... of

..................................................................

85 86

Conjugate Colorado

Gradient Linear Classifier Applied to Data: Training Samples ..........................................................

89

Conjugate Colorado

Gradient Linear Classifier Applied to Data: Test Samples ................................................................

89

Colorado

Gradient Data:

Sources

84

...........................................................

Conjugate

of tile Data

of

Backprot)agation

Training

Samples

Applied

to

..........................................................

91

vii

Table 4.10

l'age

Conjugate

Gradient

Colorado ,t.ll

4.12

Data:

Samples

to

...............................................................

91

Test Saint)los ['or Information t';xpcrimcnt on the Colorado

Classes Data Set .............................

Classification

Results

when

for Training

li;uclidean

Classification Minimum

4.14

Test

Applied

Training and in the Second

Minimum 4.13

Backt)ropagation

Distance

Results

Pairwise

Classifier

for Test

Euclidean

is Applied

Samples

Distance

JM Distances

Samples

the

................................

96

................................

96

when

Classifier

Between

is Applied

10 Information

Classes

in the Landsat MSS Data Source (Maximum Sel)arability is 1.41421.) .............................................................................................. 4.15

Statistical

Multisource

Classification

98

of Colorado

Data when Topographic Sources were Modeled by ! tistogram A pproac}l: Training Samph's .............................................. 4.16

Statistical

Multisource

Classification

!}5

1{):_

of Colorado

Data when Topographic Sources were Modeled by Histogram Approach: Test Samples .....................................................

104

4.17

Lqu]voeatlon

l()f;

4.18

Equivocation of Topographic Respect to Different Modeling

4.19

4.20

4.21

of MS,

Data

Soure(, ....................................................... Data Sources with Methods ................................................

106

Linear Sources

Opinion Pool At)plied to Colorado Data were Modeled by llistogram Approach:

Set. Topographic Training Samples

Linear Sources

Opinion Applied to Colorado were Modeled by Itistogram

Topographic Test Samples

Statistical when

Multisource

Topographic

Likelihood

Method:

(;lassification

Sources Training

Data Set. Approach: of Colorado

were, Modeled Samples

...... 108

.............

109

Data

by Maximum

Penalized

.................................................

11'2

°°o

Vlll

'Fabh_ 4.22

P age

StaListical

Multisource

when Topographic Likelihood Method: 4.23

Linear when

Opinion

4.24

Pool

Method:

Statistical

Method:

Test

Multisource

Statistical

Linear when

Linear

Pool

4.30

4.31

Source-Specific Classification):

Penalized

................................................. Data Set by Maximum

of Colorado

of Colorado

Pool

to Colorado were Modeled

Training

when Topographic Density Estimation: 4.29

Applied

Applied

Samples

Data

Colorado

Data:

Training

Classifier Samples

120

Set

by Parzen

Data

Applied

123

Set

CPU Time (Training Plus Landsat MSS Data Source ..........................................

Linear

119

Data

Sources were Modeled by Parzen Test Samples ........................................................

Gradient

117

Data

Source-Specific CPU Times (Training Plus Classification) for Topographic Data Sources with Respect to Different Modeling Methods ........................................ Conjugate

116

Penalized

.................................................

to Colorado

113

Set

........................................................

Classification

Sources

Estimation: Opinion

Samples

Data

by Maximum

Sources were Modeled by Parzen Test Samples ........................................................

Topographic

Density 4.28

Samples

Classification

Multisouree

Opinion

Data

Sources were Modeled by Parzen Training Samples .................................................

when Topographic Density Estimation: 4.27

to Colorado were Modeled

Training

when Topographic Density Estimation: 4.26

Applied

Sources

Linear Opinion Pool Applied to Colorado when Topographic Sources were Modeled Likelihood

4.25

of Colorado

Sources were Modeled by Maximum Penalized Test Samples ........................................................

Topographic

Likelihood

Classification

124

126

126

to

.......................................................

127

ix

Table 4.32

.t.33

I 'age Conjugate Colorado Conjugate Neurons

4.34

Conjugate Neurons

4.35

Conjugate Neurons

4.36

Conjugate Neurons

4.37

4.38

Gradient Applied

Gradient Applied

Applied

Training

Backpropagation

with

Data:

with

Data:

Training

Backpropagation

to Colorado

with

l)ata:

Test

Conjugate

Gradient Applied

Samples

with

Data:

Test

.......................

129

8 Hidden ..............................

129

16 Hidden Samples

.......................

J30

16 tlid(ten

Samples

..............................

with 32 Hidden Training Samples

Backpropagation

to Colorado

127

8 tlidden

Test, Samples

Baekpropagation

to Colorado

Gradient

with

Data:

to Colorado

Gradient Applied

Backpropagation

to Colorado

Conjugate Gradient, Backpropagation Neurons Applied to Colorado Data:

Neurons 4.39

Gradient Linear Classifier Applied to Data: Test Samples ..............................................................

.......................

1:_0

132

32 ttidden Samples

..............................

132

Information Classes, Training and Test Samples Selected from the Anderson River Data Set .........................................

137

4.40

Pairwise

JM Distances:

ABMSS

..................................................

138

4.41

Pairwise

JM Distances:

SAR

Shallow

Data ..........................................

138

4.42

Pairwise

JM Distances:

SAR

Steep

4.,13

Classitlcation tlesults ot' Training Samt)les for the Anderson River l)at,a Set when the Minimum ]"uclidear_ Distance Method and the Maximum Likelihood Method Gausslan

4.44

Data

are Applied

Classification l{esults Anderson River Data Distance Method and Gaussian

Data

Data

Data .............................................

are Applied

for

..................................................................

of Test, Samples for the Set when the Minimum Euclidean t,he Maxirnurn Likelihood Method

138

143

for

.................................................................

1,13

X

Table 4.45

4.46

Page Statistical

Multisource

River

Data:

were

Modeled

Statistical

Classification

Training

Samples.

by tlistogram

Multisource

of Anderson

Topographic Approach

...............................................

Classification

River Data: Test Samples. were Modeled by tlistogram

Sources

of Anderson

Topographic Sources Approach ...............................................

The

Equivocations

Sources ...............................

149

4.48

The with

Equivocations of the Non-Gaussian Data Sources Regard to the Three Modeling Methods Used .............................

149

Linear

Opinion

Pool

Anderson River Data: Sources were Modeled 4.50

Linear

Opinion

Pool

Anderson River Data: Sources were Modeled 4.51

Statistical

Multisource

Applied

Data

147

4.47

4.49

of the Gaussian

146

in Classification

of

Training Samples. Topographic by Histogram Approach .................................. Applied

in Classification

Test Samples. by Histogram

of

Topographic Approach ..................................

Classification

Statistical

Multisource

Classification

River Data: Test Samples. were Modeled by Maximum 4.53

Linear Opinion Anderson River Sources

4.54

Linear

were Opinion

Statistical

.................

155

.................

156

of Anderson

Topographic Penalized

Sources Likelihood

Method

Pool Applied in Classification of Data: Training Samples. Topographic

Modeled Pool

Anderson River Data: Sources were Modeled 4.55

152

of Anderson

River Data: Training Samples. Topographic Sources were Modeled by Maximum Penalized Likelihood Method 4.52

151

Multisource

by Maximum Applied

Penalized

in Classification

Test Samples. by Maximum Classification

Likelihood

Method

..... 158

Method

..... 159

of

Topographic Penalized Likelihood of Anderson

River Data: "['raining Samples. Topographic were Modeled by Parzen Density Estimation

Sources ......................................

161

xi

Table

4.56

4.57

] ) age

Statistical Data:

were

Modeled

4.59

4.60

Classification

Samples.

by Parzen

4.63

Source-Specific Classification

Estimation

......................................

162

Density

[';stimation

.........................

164

.........................

1{;5

CPU Times (ill Sec) for Training Plus of Gaussian Data Sources ..............................................

167

Source-Specific CPU Times (in Sec) for Training Plus Classification of Non-(laussian l)ata Sources

Conjugate

to Different Gradient

Modeling

I,inear

Methods

.......................................

167

Classifier

of the Anderson Samples ......................................................

Conjugate Gradient I,inear Classifier Applied in Classification of the Anderson River Data Set: Test Samples .............................................................

170

170

Conjugate Gradient Backpropagation Applied in Classification of the Anderson Data

Set,: Training

Samples

......................................................

171

Conjugate Gradient Backpropagation Applied in Classification of the Anderson River

4.65

Sources

Linear Opinion Pool Applied in Classification of Anderson River Data: Test Samples. "l'opogral)hic Sources were Modeled by l'arzen Density l_;stimal,ion

River 4.64

Density

by Parzen

Applied in Classification River Data Set: Training 4.62

of Anderson

Topographic

Pool Applied in Classification (ff Data: Training Samples. Topographic

were Modeled

with Regard 4.61

Test

Linear Opinion Anderson River Sources

4.58

Multisource

River

Data

Pairwise Simulated

Set: Test

Samples

.............................................................

JM 1)istances for the 20-Dimensional tIIRIS Data ........................................................................

171

176

xii

Table

Page

4.66 Minimum EuclideanDistanceClassifierApplied to 20-DimensionalSimulatedHIR1SData: Training Samples..................177 4.67 Minimum EuclideanDistanceClassifierApplied to 20-DimensionalSimulatedHIRIS Data: Test Samples........................177 4.68 Maximum Likelihood Method for GaussianData Applied to 20-DimensionalSimulatedHIRIS Data: Training Samples............. 178 4.69 Maximum LikelihoodMethod for GaussianData Applied to 20-DimensionalSimulatedHIRIS Data: Test Samples....................178 4.70 ConjugateGradient BuckpropagationApplied to 20-DimensionalSimulatedHIRIS Data: Training Samples............. 179 4.71 4.72

Conjugate Gradient Backpropagation to 20-1)imensional Simulated HIRIS Conjugate

Gradient

to 20-l)imensional

4.73 4.74 4.75

4.77 4.78

Classifier

Simulated

HIRIS

Conjugate Gradient Linear to 20-Dimensional Simulated Pairwise

JM Distances

Data:

Training

Samples

Samples

Minimum

Euclidean

Distance

179

.............

....................

180

180

40-Dimensional

H1RIS Data ........................................................................

Simulated

....................

Applied

Classifier Applied HIRIS Data: Test

for the

Samples

Simulated

40-Dimensional

4.76

Linear

Applied Data: Test

Classifier

HIRIS

Data:

Minimum Euclidean Distance Classifier 40-Dimensional Simulated HIRIS Data:

Applied Training

185

to Samples

Applied to Test Samples

..................

........................

Maximum Likelihood Method for Gaussian to 40-1)imensional Simulated HIRIS Data:

Data Applied Training Samples

Maxinmm Likelihood Method to 40-Dimensional Simulated

Data Applied Test Samples ....................

for Gaussian H1RIS Data:

.............

186

186

187

187

XII!

Table 4.79

4.80

4.81

4.82

4.83

|)age

Conjugat, e Gradient Backpropagation to 40-Dimensional Simulated ItIRIS

Applied Data: Training

Conj ugate G radient Backpropagation to 4()-l)im(msional Simuh_ted I IIRIS

A pt)lied l)ata: 'l'rst

Samples

.............

188

S:/u_t)l(,, ....................

I88

Conjugate Gradient l,in(,ar Classifier Applied t,o 40-Dimensiot_al Simulated IIII{.IS Data: Training Sanlt)lr,_ ..................

IS!)

Conjugate Gradient Linear Classifier Applied to 40-1)itn(,nsion:d Simulate(t I llI{IS I)ata: Test Sanlples

I S(.I

........................

Minimuin Euclidean Distance Classifier 60-Dimensional Simulated tlIRIS Data:

Applied Training

Minimum Euclidean Distance Classifier 60-Dimensional Simulated HIRIS Data:

Apt)lied to Test Samples

4.85

Pairwise

JM Distances

forl)ata

Source

://'1........................................

1:_8

4.86

Pairwise

.IM l)istances

for Data

Source

//2 ........................................

1:_8

4.8,t

4.87

Statistical H[RIS

4.88

Statistical HIRIS

4.89

I)ata

Statistical I-]IRIS

4.92

Data

Statistical ltlRIS

4.91

Data

Statistical HIRIS

4.90

Data

Data

Statistical HIRIS

Data

Mult,isource (100 Training Multisource (200 Training Multisource (300 Training Multis(mrce

Classification and

and

and

(,t00 '"[rai "r"_mg and Multisource (500 Training Multisource. (600 Training

and

Samples

200

Samples

per ('?lass) ...............

201

Samples

per (7:lass) ...............

202

Samples

per (.:lass) ...............

203

of Simulated

175 lest,

Samples

Classification

of Simulated

and

Samples

75 Test

per Class) ...............

of Simulat(_d

275 Test

Classitication

196

of Simulated

375 Test

Classification

........................

1!16

of Simulated

475 Test

Classitlcation

.......

of Simulated

575 Test

Classification

to _ amples

per (,lass)

................

per Class) ................

04

205

xiv

Table

Page

4.93 Source-SpecificEquvivocationsfor Simulated HIRIS Data Versus Number of Training Samples...............................206 4.94 Source-SpecificJM Distancesfor Simulated HIRIS Data VersusNumber of Training Samples..........................................206 4.95

Linear HIRIS

4.96

l.inear HIRIS

4.97

Linear HIRIS

4.98

Linear HIRIS

4.99

Linear HIRIS

4.100

Data

Data

Conjugate 60-I)imensional

4.102

4.103

and

Pool

Pool

Pool

Gradient

and

475 Test

and

375 Test

Samples

Applied

275 Test

Applied

175 Test

75 Test

I_ackprol)agati(m llll(IS

Data:

Conjugate

Conjugate 60-Dimensional

Gradient

Linear

Simulated Gradient

Linear

Simulated

Classifier ttlRIS

Data:

Classifier HIRIS

Data:

per Class) ...............

Samples

per Class) ...............

per Class) ...............

210

211

of Simulated

Samples

per

Applied

to

Training

Applied

Class) ................

Samples

..................

........................

212

215

215

to

Training

Test

209

of Simulated

Samples

Applied

208

of Simulated

in Classificat,ion

and

per Class) ...............

Samples

in Classification

and

207

of Simulated

in Classification

and

per Class) ...............

of Simulated

in Classification

Applied

Simulated

Samples

in Classification

Applied

(600 Training

575 Test

of Simulated

Conjugate Gradient Backpropagation Applied to 60-Dimensional Simulated IIIRIS Data: Test Samples

60-Dimensional 4.104

Pool

(500 Training

Opinion Data

in Classification

Pool Applied

(400 Training

Opinion Data

Training

(300 Training

Opinion Data

Applied

(200 Training

Opinion Data

Pool

(100

Opinion

Linear HIRIS

4.101

Opinion

Samples

..................

217

to Samples

........................

217

XV

IJST

OF I,'l(;[_R1,;s

Figure

l)age

2.1

Schematic

Diagram

of Statistical

3.1

Schematic

Diagram

of Neural

Network

Training

3.2

Schematic

Diagram

of Neural

Network

Used

of Image. 4.1

4.2

Procedure

...................

32 55

for Ciassiticatior_ 56

93

Class Histograms of Elevation Data in the Colorado Set ..........................................................................................................

99

of Slope

4.4

Class

of Aspect

4.5

Summary of Best on Colorado Data

4.8

.........................

Summary of Best Classitication Results for First Experiment on Colorado Data ....................................................................................

Class Histograms

4.7

Classifier

1)at, a .........................................................................................

4.3

't.6

Multisource

Histograms

Da_a Data

in the in the

Colorado Colorado

Data

Data

Set ....................

100

Set ..................

101

l)at:_

Classification Results For Second t,;xp(,riment ..................................................................................

IZa

Class liner

llistograms of Eh_vation l)ata in the A[,d(,rsort Data Set .... : .................................................................................

1,111

Class River

Histograms of Slope Data in the Anderson Data Set .......................................................................................

14 1

Class River

Histograms of Aspect l)ata in the Anderson Data Set, ......................................................................................

1.t2

xvi

Figure

Page

4.9 Summary of BestClassificationResultsfor Experiment on AndersonRiver Data............................................................................. 17.q 4.10 Classificationof Training Data (20 Dimensions)..................................182 4.11 Classificationof Test Data (20 Dimensions)........................................183 4.12 Glassiticationo1'Training Data (40 Dimensions)..................................190 4.13Classificationof Test Data (40 Dimensions)..........................................191 4.14 Classification

of Training

4.15 Classification

of Test

Data

Data

(60 Dimensions)

(60 Dimensions)

...................................

..........................................

194 195

4.16 Global Statistical Correlation Coefficient Image of HIRIS Data Set .................................................................................

197

4.17 Statistical Methods: Time versus Training

Plus Classification Size ........................................................

214

4.18 Neural Network Models: '['raining Plus Classification Time versus Training Sample Size ........................................................

216

Training Sample

xvii

ABS'I'I(A('/I'

Statistical (e.g.,

methods

Landsat

and

MSS

compared

multivariate

statistical that

the

sources.

data

methods

is that

statistical

the need

focuses

method

ot" statistical

to

methods

this

research

are

automatically

take

source

have.

should

a very and

care

[ong

of the

On the time.

investigated.

hay(,

and

for t,lm classes

iJ_

statistical

consensus

classification

on

neurnl no

to speed

F,xperimental

how traininfa up

that

the

but

Hlo>l

this.

This

problems:

:_

[(eliability

arc

suggested

and

network

models.

The

i_ an

The

results

for

theory.

prior

This

involving their

means

these

Jlletho(ts

methods.

hand,

This

ov(_rcomc

in these

is need(_,d.

Methods

is

:_ m('(:hanism

can

since

problem

other

with

which

investigated

types

1.o {.h(,ir rclial)ility

not

sources

conventional

;recording

focuses

classification

using

reliat)le.

distribution-fr(_e

statistical

are

not equally

sources

or t,he data

with

data

of (t:_ta of multiple

problem

analysis

Secondly,

data)

casT, or be ;_ssum_'d

methods

data

most

introduced

do

multisource

distribution

take

wcighte(l

t,he

networks

statistical

are

multiple

topographic

for classification

common

on statistical

from

A problem

distribution

sources be

and

models.

for weighting

investigated.

can

data

of data

data

network

Another

research

neural

radar

approaches

classification

measures

classification

a multivariate

sources

over

data,

to neural

in general

data

for

knowledge obvious

neural much

oF

:_dvantag('

networks weight

t ht,

e:tch

also data

process

is iterative

and

th(_ training

procedure

are

using

|)olh

of classification

xviii

neural network modelsand statistical methodsare given, and the approaches are comparedbasedoil theseresults.

CHAPTER 1 INTRODUCTION

1.1 The Research Problem Computerized information extraction from remotely sensed imagery has been applied successfullyover the last two decades. The data usedin l,hv processinghaw, mostly been inultispectral

Within

the

last decade advances in space

technologies

have

Earth

its environment.

and

only

spectral

radar

data

this reason retarding

made

data and

but

may

the same

accuracy

classification data.

This

be modeled nmltit, ype.

to amass

The

(iat, a are

include,

methods is due

These

classiticai, cannot

to several

by a convenient They

information

can

now

reasons.

the

data,

typically cover

mull.isource

maps, For

data. and

g¢_l_

lrlu[tivariale

multisource

eleval.ion

riot,

data.

i,formati()n

model

the

differe, rlt. sources

in processing

stat, isCical

be spectral

from

about

slope

t,h(', conv(qltional

is that

COIllt)llt('r

ground

Iilorc

satisfactorily One

multivariate

for exalnple

to exl, ract

llowever,

be used

(':died

more

and

da(,a

alia

of data

and

maps,

of

are collectively

ion.

more

as elevation

kinds

data.

amounts

forest

such

many

to use, all these in

large

for example,

be available

scene.

It is desirable higher

it possible

topographic

there

st,:diMical

(multNariate classificalion) met,hods are now widely

pa_tern recognition known.

tim

dal,a atnd

since

t.ultisourc(, data

cannot

_he (]aLa, art'

ra.ng(,s

and

cwm

2

non-numericaldata such as ground cover classesor soil types. The data are alsonot necessarilyin commonunits and thereforescalingproblemsmay arise. Another problemwith st,:d,isticalclassillcationmethodsis I,hat the dal,:_,_ninfori,mtiv(_.

al encoding

(A (::_[i{>ratod

his beli(,f_ ;_s t)robnbiliti(,_.)

45

Axiom

C:

If the

decision

calibrated Axiom it.

By

expert's A places

applying

processing

experts'

opinions.

C(X) where

A

eel(X)

the

decision what

-- k eel(X)

decision

the

expert

is also the

weights. geographic

the

axiom

Axiom

decision

and

experts

determine

of

B,

the

form

a

of the

C is equivalent

maker

must

the

also

axioms

in effect

calibrate results

the in

a

(2.46)

eel(X)

are

to the

the

is a calibration

all calibrated

and

data

concept

But

be

of

A is unsatisfactory

to

outcome

ignore

be

the

function

which

independent,

then

equal

[40] has of the

This

to his showed

estimated

demonstrates

aad

becomes

that

of the

own

the

The

decision the

sources

(the

when

pool

remote

data

are

of

are

maker

method

multisource sources

expert

The

are

opinion

case

axioms

(2.45).

that

extreme

regardless

rule

noting data

an

prior

that

a logarithmic

classification

it can be assumed

the

of

that

processing

it is al:;o worth

in the

opinion

in t lis approach.

opinions.

approach case

axiom

decides

v(,ry imt)ortant

easily

axiomatic

that

Schervish

in spirit.

can

In

maker

experts'

Bayesian

Morris'

does not

but

• • • pn(X)p(%)

constant

states).

due

calibration

functions

adopt

experts:

pl(X)

argued

makes

contradictory

truly

determined. The

rule

with

application

If the

[39] has

maker

calibrate

he should

1.

Lindley when

prior,

processing

conjunction

for multiple

empirically.

=

in

B together.

k is a normalization

is defined

on the

Sequential

rule

noninformative

as his own.

be completely

A and

multiplicative

a

a condition

can

axioms

has

prior

axiom

rule

to both

maker

the

self-

issue

of

must is not density

independent, with sensing

independent

equal and but

46

not equally reliable. Therefore, the logarithmic opinion pool with weights

2.4

is a more

Group

using

the

in

method

can

it is difficult,

other

sources neural

feedback thus

Super

for some a

probabilistic unknown llayesian expert

as ad

approach, are

all

hoe.

problems

evaluate

the. ln(:thod in Chapter

ion

This

is natural for the

the

They but

3.

and

the

for

weights Although

it is hard geographic

to

data,

the

performance,

has

some

The

neural

networks

use

I)eGroot

method

will

The

consensus

approaches

also

out

means

of

the

of the

parallels

the

maker.

on

discussed expert

witch

joint called

the

whys.

do Th('y

combined distributions the

super

:_ssunlpLioIl

Therefore,

above

weights

ill-(teiin(_d

situation,

:tpproach,

is basc]/4il°g(--),°io

identical) approach

where

oi

to

(2.47)

Bordley's

log-odds

is completely

equipped

_'i is a function

of m_,j, m0q c

48

and _. However,there is very little empirical evidenceavailable to determine the super Bayesian's choice of the jointly normal distribution of X. The dependencebetweenthe data sourceshas to be modeledand that problem is very diftlcult especially ill classitica/,ionof" mult,isource rvlltote sensing :rod geographicdata. As noted earlier, we are usually either unable or unwilling to model this dependence. Therefore, the super llayesian approach is in cases

not

applicable

2.{} Overview The

of the

consensus

characteristics.

theoretic

e.g.,

The

give

externally

because

multimodal

consensus pool

and

Section

2.3.1

will be used.

will be used The opinion statistical nor

the

the

statistical pool.

Both

nmltisourc_ of the

multisouree group

implementation

reliability

of reliability

classitier

interaction difficulty

a

rule

Bayes'

and

opinion

in Chapter classifier

is

will

pool

gives

4, the

linear

introduced in Section

in 2.3.2

experiments. of

by Bordley

above.

will be used methods.

when

version

proposed

as discussed

for these

is

several theorem

introduced

in the

has

impossibility

linear

measures

,'lassilb,c

methods

the

different

shortcomings

multisource

factors

approaches

these

experiments

of statistical

The

the

and

dictatorship

whereas

have

simple

and

ow.'rcomes

In the

above

is very

Bayesian

densities

version

for selection

pool

pool

densities.

opinion

discussed

of source-specific

opinion

here.

Approaches

approaches

it is not

consensus

discussed

Theoretic

opinion

logarithmic

unimodal

problem

linear

its application

used.

research

Consensus

The

shortcomings, limits

to the

trmst

Neither

the are

the

in experiments

logaritIimic related

super

to the Bayesian

because

of the

49

In order to apply the consensustheoretic approachesall the data sources have to be modeled by probability densities. Some data sourcescan be assumedto have Gaussian Gaussian

and

these

methods.

Such

methods

2.7

handle

are

part

problem

of non-Gaussian

following

three

a

simplest

histogram way

the

Parzen

density

estimator.

Several

other

e.g.,

nearest

neighbor

functions

and

addressed

here,

2.7.1

Histogram The

the In

simplest

traioing this

data.

method

['l,['2,...,l'N, by

th(, proportion

classifying

estimation

section.

approaches

have

series

histogram

Two

more

maximum

been

reviewed

density

estimators.

methods

discussed

efficiently.

field.

advanced

in the

should

the

is the

methods

penalized

the

In

approach

estimation

For

below

data

is to

will be addressed.

The

the

estimation,

classifier

research

of modeling

data. and

multisource

established

is discussed.

density

will be non-

density

non-Gaussian

approaches

estimation

three

next

sources

by

a statistical

is a well

main

orthogonal the

and

non-Gaussian

addressed:

modeled

in the

of designing

approach

to model

be

other

Data

data

sections

All of the

to

discussed

of modeling

Modeling

First

need

of Non-Gaussian

important

the

classes.

sources

Classification A very

data

are

likelihood

literature using

research

[45], weight

problem

be sufficient.

Approach way

to model

Here the whose

non-Gaussian

a fixed

data

space

volumes

of samples

cells

data

histogram

approach

is partitioned are

which

equal.

into

The

fMls into

is to use the

density

each

cell.

[29,45] mutually

histograms is described. disjoint

function When

of

cells

is estimated the

data

have

50

beenmodeledby the histogram approach, they call be classi[iedby, e.g., the maximum likelihood algorithm [45]. Th,' histogram approach is distribution-free and, if regular meshesare usedfor the I"s, tile selectionof cellsis straightrorw:lx_l, l low(,w'r, one mnjor disadvantageof this method is that it requirestoo much storage;for example, Nk cells for k variables with N sc,ctionsfor each variabh's. Therefore, most modificationswhich have beenprogosedare designedto reduce the number of cells.The variable cellsmethod [29] Although univariate more

the

data,

advanced

histogram

methods.

do a good job

is one

commonly

histogram

penalized

2.7.2

d is the

(ialled

the

density

kernel

with

the

K can

condition:

data.

Another da, ta

job

in terms

to use

multivariate

estimator N

K(

dimensionality

smoothing

upon

desirable

univariate

a good

more

t'arzen

modeling

of accuracy general

density

method

which

is

m_t, hod

lho

of

by

methods estimation

irnproves of

upon

maximum

Estimation

1

where

does

improved

method.

for

variant.

estimators.

Density

t'arzen

The

such

such

usually

is also

of modeling

used

likelihood

Th,'

It

approach

Parzen

approach

it can be significantly

which

the

is one

with

X -- Xi

of the

parameter. be of any

kernel

K is de,fined

) data

( 48) and

cr is the'

N is _h(' nLnrt,b,'r shape

by [29,45,46]:

(rectangular,

window

(,f t,rainiri_

triangular,

width, samples,

Gaussian,

also X i. etc.)

51

f K(X)

dX = 1

(2.49)

R,I

If the kernel

K is both

a density

function.

function

and

properties

the

estimate of the

2.7.3

The

The

likelihood

estimate

e.g.,

roughness

of

functional

R(f).

applying the

curve

tails

problem,

K is

density

Apart choice

Likelihood

from

estimation class

methods with f.

directly

of densities

the The

from

entire

estimates. detail

longsample

Also,

in the

drawback,

the

non-Gaussian

estimator

As pointed

relating

the

applied.

if

main Parzen

data.

Estimators

Thi_' particular f.

of the

for modeling

likelihood

curve

across

this

and

to data

essential

of a one-dimensional

on the

However, by

then

differentiability

studied

applied

is fixed

in the

this

desirable

of observations.

restrictions

and

widely

when

width

be lost.

penalized

likelihood

maximized.

to avoid can

for a particular

maximum

continuity

been

drawback

appearing

Penalized

sample

(2.49),

t3(X) will be a probability

the

has

window

to noise

is a very

linear

random

placing

[45].

maximum

piecewise

estimator a slight

distribution

Maximum

all

satisfies

K.

smoothed

estimator

and

that

inherit

from

leads is

non-negative

from this

also

density

it suffers

this often

density

used,

will

distributions

lobe

use

t3(X)

Parzen

However,

and

It follows

of the kernel

The

tailed

everywhere

function

method

tries

which

term

for

the

term can

a given

is to be can

quantifies described

to

without

likelihood which

the

possible

likelihood

be

a

to maximize

estimation

maximum a

computes

in [45] it is not

for density

to the

roughness

density

out

over

likelihood

[45,47]

be the

by

a

52

The penalizedlog-likelihood is now definedby: N

b(f) = E log r(x,) -vR(0 b=l

where

Y is a positive

The

probability

smoothing

density

approach

is attraetiw_

Also,

approach

fit.

the The

2.7.4

the is

improv,xt

upon

approach

has

curve accurate

in

Parzen which

can

Parzen size

methods

the

balance

for

shouht method of data all three

of

multivariate

gen¢_ralize has

the

drawback

to be classified will be used

than

than that

in the

the

To

experiments

can

slow

explore

method

maximum

penalized

should

with be more

approach. estimation

Also, method.

Parzen

this

the

differences 4.

The nmthod density

Itowever,

and

in (:haptcr

thai,

estimation

histogram d_msity

b_"

'l'h_' histogr:m_

density

histogr;ml

it is very

is large.

met, hod

likelihood

estimation.

the

histogram

this method

estM_lished

density

better

goodness-of-

the

data.

The

it combines properties

well

this

of test

data.

is a w_ry

This

estimation.

and

here,

discussed

for univariate

data

!,i5].

Methods

penaliz_d

test

l-(f)

to density

Ilowever,

since

of s;,mph_._.

effects.

accuracy

of its smoothing

classification

t,mximizing

maximm,l

is attractive

numb(,r

smoothness

undesirable

of classification

effective

})e used

between

st_raight-forward.

are most

met.hod

by

mebhods

the

N is the

estinmtion

Estimation

with

Because

curve

predefines

in common

density

estimation

most

estimation

fitting.

it relates

estimation

in terms

methods

likelihood

the

the

and

15 is found

of Density

density

approach

these

since

penalty

Discussion Of

function

controls

roughness

parameter

the

is a probh'n_ of these

if

53

CHAPTER 3 NEURAL NETWORK

APPROACHES

Neural networks for classificationof multisource data are addressedin this chapter. The chapterbeginswith a generaldiscussionof neural networks used for pattern recognition, followed by a discussionof well-known neural network models and previous work on classificationof remote sensingdata using neural neLworks. Next "fast" neural network models are addressedin conjunction with classificationof multisource remote sensingand geographic data. Finally, methods to implement statistics in neural networks are discussed.

3.1 Neural Network Methods for Pattern Recognition A neural network is an interconnectionof neurons,wherea neuroncan be describedin the following way. A neuron hasmany (continuous-valued)input signals xj, momentary input

{48].

frequency

j-

1,2,...,N,

which

frequency In the

of neural

simplest

of the neuron,

imimlses

formal o, is often

N

o =K

- 0) J

repr(_s(,_nt

1

model

the

act,ivity

delivered

tim

by another

of a neuron,

approximated

at

the output

by a function

input neuron value

or

the

to this or the

54

where

K is a constant

for positive synaptic

arguments

e1_cacies

In

the

network

idea

neural as

where

followed

neuron

number.

x the

process

the

with

compared output

and

the The

i.e.,

the

when change

actual

less

than

to perform

the

representation

three-layer Data models.

neural

do not

representation A straightforward

the

sample.

classifier is very coding

from

and

through

for each is shown

important approach

weights network

data

are

provides

A schematic

in Figure

if it is

and

for a

of its

class

adaptive to

network

response

is

the

desired

in the

neural

has

to the

network

stabilized,

next fed

iteration into

at the

the

output

diagram

of a

3.2.

in application used

o i ----- 1, if

The

between

the

pixel.

3.1).

error

the

i (i

is presented

output

the

x

A general

an

actual

one iteration

the

neurons

vectors,

samples

The

Then

neural

vectors

representation

The

when

input

are either

weights

to modify

the

classes).

as binary

input.

amount.

of a number

1

wj are called

output

(see Figure

is ended

tile classification,

value

or o i ---- 0 (or -1)

a set of training

is used

a threshold

network

learn

change

x,

a binary

for the

procedure

weights

network class

output

training

vector are coded

to each

response

its

of

the outputs

representation

response

to the desired

network.

or

particular

o i from

is that

in which

a set

of information

give

to

The

the

recognition

receives

input

outputs

takes

arguments.

pattern

number

values

is then

some

to

theory

signal

procedure

an output

the

which

0 is a threshold.

which

current

vector

training

will give

for the

function

responses

network

the

(iterative) input

on

means

The

box

produces

in neural

input

and

approach

a black

L depends

This

specific

0 (or -1) for negative

and

i is active

inactlve.

and

network

signals)

_---1,...,L

¢ is a nonlinear

[48] or weights,

operates

(observed

and

by most

of neural researchers

network is to

55

O

m

ro

"OI--

_e..,

G) .0.-,

,

O

20

o 575

475 Test

Figure 4.11

375

275

175

75

Samples/Class

Classification

of Test

Data (20 Dimensions)

184

The as the

CGLC

CGBP

(Tables

trained

with

The

CGLC

did rather

was

perfect

but

data,

with

it showed

consuming

The

classify

the

data

4.4.2

40-Dimensional

the

(M],

4.81

(CGLC

required the

data

classification

shown

4.78

(ML

and

4.75

result

algorithm

was,

as expected,

times

when

longer

CGLC

more

way

neurons).

For

the

is not

Thus

to

the

it

test

as time

neurons

time

in the

train

and

CGLC

is a

in this experiment.

and

of the

in training

4.75

(MD training),

4.70

and

(CGBP

test).

4.13

(test).

The

4.76

(test))

The

results

20-dimensional

were

was data.

all other

(Tables

4.77

classification

instead than

4.74.

added

classification 4.76 4.80 are

to

of the

(MD

test),

(CGBP

test),

also summarized of

very

the

similar

MD

to

Classification

were

than

used

were

performance

20 dimensions

faster

for

training),

(CGLC

in Table

20 features

in Tables

ML method

40 dimensions

when results

of 2 when much

as shown

The

and

the

separable,

4.4.1.

test),

4.82

using

a factor

increased

same

per class

worse.

hidden

experiment.

data

samples

The of the

in the

(240 input

did

7 times

increased

(training)

about

accuracy

in this

in Section

increased

The

1.5 to

JM distance

(training)

(Table

data

CGBP.

are relatively

are

training)

algorithm

from

was

100 training

(because

20-dimensional

data

4.12

CGBP

input

it always

to the

CGLC

data

average

training),

Figures

similar

size

(test))

Data

20-dimensional

4.77

sample

as the

for the

40-dimensional

in

With

40-dimensional

Predictably the

well in training.

than

alternative

4.73

binary

increased

CGBP

and

Gray-coded

training

better

The

(training)

performance

during

CGBP).

4.72

added,

the time

but

the

MD

methods.

(training)

of 20, but for the

and

4.78

it took

20-dimensional

(test))

about

3.9 data.

185

Table Pairwise

JM Distances Simulated

Class

_

1

4.74 for the 40-Dimensional HIRIS Data.

2

3

1.41189

1.40192

2

Average:

I 1.32275

1.378855

186

Table Minimum 40-Dimensional

Euclidean Simulated

# of Training Samples 100 200 300 400 500 600

CPU Time 4 4 4 4 4 4

Distance HIRIS

Percent 1

# of Training Samples 100 200 300 400 500 600

Classifier Applied to Data: Training Samples.

Agreement 2

84.0 84.0 85.0 84.0 85.4 84.8

with Reference for Class 3 II OA

47.0 49.0 50.0 54.8 51.8 50.3

Table Minimum 40-Dimensional

4.75

56.0 55.0 59.3 60.3 61.8 59.8

4.76

Euclidean Distance Classifier Simulated HIRIS Data:

Percent 1 83.1 82.9 83.5 80.4 73.7 80.0

62.33 62.67 64.78 66.33 66.33 65.00

Asreement 2 48.9 48.2 51.2 44.4 46.3 58.7

Applied to Test Samples.

with Reference for Class 3 [I OA 58.8 59.6 58.1 69.8 68.6 49.3

63.59 63.58 64.27 64.85 62.86 62.67

187

Table Maximum to

Likelihood

40-Dimensional

#

Method Simulated

of Training

Samples

4.77 for

Gaussian

HIRIS

CPU

Percent

Time

1

Data:

Agreement

Data

Applied

Training

with

2

Samples.

Reference

for Class

[[ OA

3

100.00

100

61

100.0

100.0

100.0

200

61

100.0

100.0

99.0

99.67

300

62

100.0

97.3

97.3

98.22

400

62

100.0

97.8

97.8

98.50

500

67

97.6

96.8

98.07

600

75

97.2

96.5

97.89

99.8 100.0

Table Maximum to

Likelihood

40-Dimensional

Method Simulated

4.78 for

Gaussian

HIRIS

Agreement

Data:

with

Data Test

Reference

Applied Samples.

# of Training

Percent

Samples 100

1 90.8

2• 50.3

3 I[ OA 77.0 72.70

for Class

200

92.6

55.8

73.5

73.96

300

98.1

92.5

93.1

94.58

400

97.8

91.6

92.7

94.06

500

97.7

92.0

92.6

94.10

600

94.7

93.3

93.3

93.78

NI

188

Table Conjugate to

Gradient

40-Dimensional

of

CPU

iterations

100

Backpropagation

Simulated

Number

Sample size

4.79

HIRIS

Percent

Agreement

1

time

Applied

Data:

Training

with

Samples.

Reference

2

for Class

fl OA

3

64

485

I00.0

I00.0

i00.0

I00.00

150

1548

I00.0

100.0

100.0

100.00

30O

374

4889

I00.0

i00.0

100.0

100.00

40O

274

5225

I00.0

I00.0

100.0

i00.00

5OO 200

264

6386

100.0

I00.0

100.0

100.00

600

524

14899

I00.0

100.0

i00.0

I00.00

Table Conjugate to

Gradient

40-Dimensional

Sample size

Number iterations

of

4.80

Backpropagation

Applied

Simulated

HIRIS

Data:

Percent 1

Agreement 2

with

Test

Samples.

Reference for Class 3 H OA II

100

64

87.1

57.6

54.6

66.43

200

150

83.8

56.4

48.2

62.81

300

374

85.6

61.6

60.5

69.24

4OO

274

86.2

57.1

95.6

67.52

5OO

264

83.4

54.9

57.7

65.33

6OO

524

85.3

68.0

60.0

71.11

189

Table Conjugate

Gradient

40-Dimensional

Sample size

Linear

Simulated

Classifier

HIRIS

CPU

Number of iterations

4.81

time

Data:

Applied

to

Training

Percent 1

Agreement 2

with

Samples.

Reference 3

for Class OA

100

146

194

100.0

100

100.0

100.00

200

469

898

100.0

100

100.0

100.00

300

903

2461

99.7

99

99.7

99.56

400

650

2293

100.0

98

97.0

98.33

500

629

2657

100.0

89

89.4

92.87

600

492

2492

100.0

87

85.3

90.83

Table Conjugate

Gradient

40-Dimensional

Sample size

Linear

Simulated

Number iterations

of

4.82

Percent 1

Classifier

HIRIS

Agreement 2

Data:

with

Applied Test

to Samples.

Reference 3

for Class OA

100

146

85.7

58.1

48.7

64.17

200

469

82.5

50.7

48.2

60.49

300

903

81,1

61.9

53.3

65.42

400

650

86.9

63.6

56.7

69.09

500

629

88.6

60.6

54.9

68.00

600

492

90,7

64.0

50.7

68.44

190

100

v

U L.

MD ML

U

CGBP

m m

[] CGLC

m >

0

2o

lOO

200

300

Training

Figure

4.12

Classification

400

500

600

Samples/Class

of Training

Data

(40 Dimensions)

191

IO0

8O A

V

0 &,.

6O

[] [] [] []

0 0

L-

4O

>

0

2O

575

475 Test

Figure

4.13

375

275

175

75

Samples/Class

Classification

of Test

Data (40 Dimensions)

MD ML CGBP CGLC

192

The

classification

sizes.

As

accuracy

for

of training

data

the

20-dimensional

data

significantly

when

300 or more

training

overall

performance

the

dimensional

data

The trained

CGBP with

of

neural

network

size.

The

data

up

times

of the The

dramatic

The

CGLC an

accuracy

classification CGLC

compared times

were

faster

of

than

longer net

not

(480

used

input

all sample

data

improved

were

used.

good

for

the

for

the

of 20.

in just

greatly

Thus, the

40-

As in the

data

to 5 times

was longer

However, and

4.81

gave

similar to it was

similar

classification

time

data.

4.82

data

sample

40

cases

size.

results.

40 data The case.

dimensions more

(test))

when

20-dimensional for

600

accuracy

20-dimensional

increased

in most

of CPU

and

training

converge

For

per class.

(training)

the

with

40-dimensional data.

samples

to

trained

40-dimensional

case of the

with

was

grew

classification

the

of

again

4 hours

the

(test))

time

of the

over

for

accuracy

decreased

of test

It was

training

However,

of

4.80

20-dimensional

(Tables

terms

data

CGBP

the

and

classification

converged

neurons)

instead

again

was for 600 training

in

training

up

than

improved

improvement

to 20 dimensions. than

very

neurons.

and

the ML method).

was

accuracy took

training

neural

improvement

dimensions

The

the

samples

most

showed

the

to 3 times

longer

test

was

(training)

and

sample

(200

class

4.79

size

increasing

class

(Tables

15 hidden

sample

per

for

of test

per

method

and

every

samples

samples

ML

perfect

accuracy

neurons

for

took

the

nearly

set.

480 input

to perfection

the

was

than

as two

193

4.4.3

60-Dimens|onal The

results

Figures the

MD

instead. more

matrices

bands.

(The

that

the

other

40 data

Gaussian

distribution

the

sources

data

The

information

classes source

in source

the

from Twenty

data

was

classes _2

had

1.35

#m were

The

information solJrces.

data

a higher

4.85

sources average

used

the

the

it was

-if-l) and

distance

spectral

spectral 1.97

from region

/_m

the from

=//:2 consisted

modeled

JM distance

or

determined

spectral

were

two

indicates

#m and

=ffl. Source

relatively

JM

were

uncorrelated

classes

were

since

into

are

4.16,

in the

(source

data

brightness

1.81

was

source

The

LOP

correlated

sets.

data.

be split

the

at Figure

channels

in Tables

in

to

a very data

between

to 1.47 #m and

as data

data

the

had

4.16;

treated

channels.

shown

pm

#m

data

more

By looking 0.7

in both

are

1.35

showed

60-dimensional and

iY_ data

20-dimensional

correlations

the

bands.)

which

the

The

(test))

high-dimensional

SMC

in Figure

from

4.84

to the

algorithm,

:tre summarized

of 60-dimensional

of the

The

tone,

data

other

as shown

lighter

bands.

0.7 #m to 1.35 #m, other

singular.

sources.

region

of the

be applied

data

regions

and

in classification

SMC

absorption

spectral

(training)

to use the

black

spectral

of the

were

The

are the water

In classification

not

can be visualized

correlation.

(test).

than

could

independent

channels

60-dimensional

4.83

slower

ML method

In order

of the

to classification

3 times

covariance

4.15

(Tables

performance

The

by the

separabilities 4.86

(source

separable than

the

of _2).

but

the

classes

_1. The

sizes

and

algorithm

It was about

the

of classification

4.14 (training)

similar

the

Data

and

results various

of the

SMC

source-specific

classifications weights

with are

respect

shown

to different

in Tables

4.87

sample through

in

194

100

• m el [] []

20

100

200

300

Training

Figure 4.14

Classification

400

500

60O

Samples/Class

of Training Data (60 Dimensions)

MD SMC LOP CGBP CGLC

195

100

80

D

• B

MD SMC

0

[]

LOP

[] []

CGBP CGLC

60

L_

4O

0 20

575

475 Test

Figure 4.15

375

275

175

75

Samples/Class

Classification

of Test Data (60 Dimensions)

196

Table Minimum 60-Dimensional

Euclidean Simulated

# of Training

CPU

100 200 300 400 500 600

5 6 B ti ti 6

4.83

Distance HIRIS

I Percent 1

Classifier Applied to Data: Training Samples.

Agreement 2

i 83.0 184.0 J 85.3 I 83.8 185.2 I 85.2

with Reference for Class 3 OA

50.0 51.0 50.7 55.8 53.0 51.5

Table

57.0 55.0 59.0 60.3 61.4 59.8

4.84

Minimum Euclidean Distance Classifier 60-Dimensional Simulated HIRIS Data:

# of Training

Percent 1

lOO 20o 30o 40o 50o 60o

83.5 83.6 84.0 81.1 74.3 80.0

Agreement 2 50.1 49.5 53.9 45.8 48.0 64.0

63.33 63.33 65.00 66.58 66.53 65.50

Applied to Test Samples.

with Reference for Class 3 OA 59.0 59.8 58.4 70.5 69.1 48.0

64.17 64.28 65.42 65.82 63.81 64.00

197

0.4 (p.m)

0.76

0.92

1.15

1.35

1.47

2.4

Figure

4.16

Global

Statistical

Correlation

Coefficient

Image

of HIRIS

Data Set

ORIGINAL OF POOR

PAGE IS QUALITY

198

Table l'airwise

JM Distances

Class _ 1 2 Averaee:

Source

_1.

1.24447 0.96362

1.173336

JM Distances Class _ 1 2 Averaee:

for Data

2 1.31192

Table Pairwise

4.85

4.86 for Data

2 1.40908 1.380562

Source

1.39607 1.33653

=_2.

199

4.92. Ranking of the sources according to the source-specificreliability measures based on the classification accuracy of training data, the equivocation measure(Table 4.93) and JM distance separability (Table 4.94) agreedin all cases,regardlessof sample size. Tile reliability measuresalways estimatedsource#2 as more reliable than source#1. Using thesereliability measures to weight the data sources in combination gave the highest accuraciesof training data for sample sizesup to 300 training samples per class(Tables4.87, 4.88 the

best

accuracies

200 samples got

the

unexpected 100 and 4.89) 1.0

and

and

suggest

highest

test

source

or more cases

by

as shown

4.92)

and

several The

results

was

samples given

were

several

using

the

were

used

for each

weight

SMC results.

Both

of the

60-dimensional

of test

samples

increased

algorithms

were

(Tables

tim SMC

data

set. with

very

and

For the

fast,

4.95 LOP

both

nulnber with

the

the

#1

methods

the

of training

a slight

test

edge

(Table by

performance

(Tables #1.

4.90,

4.91

Using

400

In most

accuracies.

4.100)

excellent

only

excellent

was sufficient.

were

#1

These

used

gave

best

highest

through

source

was weighted

source

data

100 and

with

were

class

than

achieve

0.1 or {}.2.

weights

gave

achieve

when

class

other

SMC

could

LOP

per

The

more

for

undertrained

source

for the high-dimensional

combinations

reached

were

when

did not

significant

by either

300 samples

However,

weights

were

sources

reached

4.89.

same

weighted

data

When

# 2 was

to the

these

"best"

was

0.9.

samples

weight

results

class.

training

differences

the

in Table

source

training

that

the

The

_2

accuracy

#2

400 or more

the

source

per

ttowever,

data.

where

1.0 and

results

accuracies when

class,

200 samples

the

4.89).

for test

per

weight

and

were

very

similar

in the classification

classification samples

to the

accuracy

used. LOP

Both

which

of uses

200

Table Statistical HIRIS

Data

Multisource

4.87

Classification

(100 Training

and

Percent Agreement

of Simulated

575 Test

Samples

with Reference for Class

Training 1

per Class).

Testing

2

3

lJ

OA

Single

1

2

3

[I OA

Sources l

Source

#1

99.0

93.0

96.0

96.00

[ 85.9

67.3

57.7

70.32

Source

#2

100.0

100.0

100.0

100.00

[ 83.3

47.8

82.1

71.07

sl

s2

1.

1.

Multiple

Sources

100.0

100.0

100.0

100.00

89.6

51.0

83.1

74.55

1..9

100.0

100.0

100.0

100.00

89.7

50.8

83.3

74.61

1..8

100.0

100.0

100.0

100.00

96.6

51.3

83.3

75.07

1..7

100.0

100.0

100.0

100.00

91.1

51.3

83.5

75.30

1..6

100.0

100.0

100.0

100.00

90.6

52.3

83.7

75.53

1..5

100.0

100.0

100.0

100.00

91.1

52.9

83.0

75.48

1..4

100.0

100.0

100.0

i00.00

91.1

54.6

83.1

76.29

1..3

100.0

100.0

100.0

100.00

91.3

55.3

83.5

76.70

1..2

100.0

100.0

100.0

100.00

90.1

64.2

77.6

77.28

1..1

100.0

100.0

100.0

100.00

90.1

84.2

77.6

77.28

96.0

96.00

85.9

67.3

57.7

70.32

1..0

99.0

93.0

.9

1.

100.0

100.0

100.0

100.00

89.6

50.4

82.8

74.26

.8

1.

100.0

100.0

100.0

100.00

89.6

50.3

82.6

74.14

.7

I.

100.0

100.0

100.0

100.00

88.7

50.1

82.8

73.86

.6

1.

100.0

100.0

100.0

100.00

88.3

49.9

83.0

73.74

.5

1.

100.0

100.0

100.0

100.00

87.8

49.7

83.0

73.51

.4

I.

100.0

100.0

100.0

100.00

87.3

49.4

82.8

73.16

.3

I.

100.0

100.0

100.0

100.00

87.0

48.7

82.6

72.77

.2

1.

100.0

100.0

100.0

100.00

86.1

48.3

81.9

72.12

.i

I.

100.0

100.0

100.0

100.00

84.5

48.0

82.1

71.54

.0

I.

100.0

100.0

100.0

100.00

83.3

47.8

82.1

71.07

100

100

100

300

575

575

575

1725

# of pixels

The

columns labeled sl and s2 indicate the weights applied to sources 1 (sl) and 2 (s2).

CPU

time for training

and

classification:

81 sec.

201

Table

Statistical HIRIS

Data

4.88

Multisource (200

Classification

Training

Percent

and

A_eement

with

Training

1

475

of

Test

Reference

2

3

[t_qA

97.5

91.0

92.0

II 93.50

100.0

100.0

99.5

_

1.1.

100.0

100.0

99.5

1..9

100.0

100.0

99.5

1..8

100.0

100.0

1..7

100.0

100.0

1..6

100.0

1..5

100.0

1..4

I

Simulated

Samples

per

Class).

for Class

2 Testing 3

1

_le_Sou_rces__ 70.3

57.3

71.93

53.1

80.4

73.82

89.1

55.8

82.3

89.1

56.0

82.3

75.75 75.79

99.5

89.5

55.8

82.5

99.5

89.7

56.0

82.5

100.0

99.5

89.9

58.1

82.5

100.0

99.5

90.5

60.0

83.4

100.0

99.5

99.0

90.3

62.3

84.0

1..3

100.0

99.0

99.0

90.7

63.6

84.4

78.88 77.96 79.58

I..2

100.0

97.5

99.0

90.7

71.2

76.4

79.37

1..1

100.0

96.0

98.5

90.5

71.2

76.4

1..0

97.5

91.0

92.0

88.2

70.3

57.3

79.37 71.93

source

#1

]t 88.2 II

sl s2

88.0

Multi')le

Sources

75.93 76.07 76.84

.9 1.

100.0

100.0

99.5

89.5

55.4

82.3

75.72

.8 1.

100.0

100.0

99.5

89.3

55.4

82.5

75.72

.7 1.

100.0

100.0

99.5

89.1

55.4

82.3

75.79

.6 1.

100.0

100.0

99.5

88.9

54.9

82.3

.5 1.

100.0

100.0

99.5

88.6

54.5

82.1

75.37 75.09

.4 1.

100.0

100.0

99.5

88.2

54.3

81.7

74.74

.3 1.

100.0

100.0

99.5

88.2

53.5

81.7

74.46

.2 1.

100.0

100.0

99.5

88.0

53.1

81.3

74.10

.1 1.

100.0

100.0

99.5

87.8

52.8

80.8

73.82

.0 1.

100.0

100.0

99.5

88.0

53.1

80.4

73.82

200

200

200

475

475

475

_#

of pixels The

columns

labeled

applied CPU

time

sl

and

to sources

for training

s2

indicate

1 (81) and and

the

weights

2 (82).

classification:

85 sec.

I

I I I

I t

202

Table 4.89 Statistical Multisource Classificationof Simulated HIRIS Data (300Training and 375Test Samplesper Class).

Percent

Agreement Training 2 3

1

with

Reference

for Class Testing

[[

OA

1

Single

Sources

I 99.11 90.78

90.7 96.0

Multiple

Sources

2

3

tl OA

82.4 94.4

79.5 94.1

[I 94.84 84.18

source #1 source #2 sl s2

96.3

86.7

89.3

100.0

99.0

98.3

1.

100.0

99.0

98.7

99.22

96.5

96.0

95.2

95.91

1..9

100.0

98.7

99.3

99.33

97.1

96.3

95.2

96.18

1..8

100.0

98.0

99.3

99.11

96.0

96.3

95.2

95.82

1..7

99.7

98.0

99.0

98.89

94.7

95.7

93.8

94.76

1..6

98.7

95.7

97.7

97.33

94.1

93.3

91.7

93.07

1..5

98.3

95.0

96.7

96.67

93.3

92.3

89.1

91.56

1..4

97.3

94.0

95.0

95.44

92.8

90.1

87.7

90.22

1..3

97.0

91.3

93.3

93.89

92.3

87.5

86.4

88.71

1..2

96.7

90.7

92.3

93.22

91.7

86.1

83.7

87.20

1..1

96.3

89.0

90.7

92.00

91.2

84.3

81.9

85.78

1..0

96.3

87.0

88.7

90.67

90.7

82.4

79.5

84.18

.9 1.

100.0

99.0

98.7

99.22

96.8

95.7

94.9

95.82

.8 1.

100.0

99.0

98.7

99.22

96.5

95.5

94.9

95.64

.7 1.

100.0

99.3

98.7

99.33

96.5

95.5

94.7

95.56

.6 1.

100.0

99.3

98.7

99.33

96.5

95.5

94.4

95.47

.51.

100.0

99.3

98.7

99.33

96.3

94.9

94.4

95.29

.4 1.

100.0

99.3

98.7

99.33

96.3

94.9

94.4

95.20

.3 1.

100.0

99.3

98.7

99.33

96.3

94.9

94.4

95.20

.2 1.

100.0

99.0

98.7

99.22

96.0

94.4

94.4

94.93

.1 1.

100.0

99.0

98.7

99.22

96.0

94.7

94.4

95.02

.0 1.

100.0

99.0

98.3

99.11

96.0

94.4

94.1

94.84

of pixels

300

300

300

900

375

375

375

1125

The

columns

1.

labeled

applied

CPU

time

to

for

sl

and

sources

training

s2 1 (sl)

and

indicate

the

and

2 (s2).

classification:

weights

87

see.

203

Table Statistical HIRIS

Data

Multisource

Classification

(400 Training

Percent

4.90

and

Agreement

with

of Simulated

275 Test

Samples

Reference

for Class

Training 1

2

per

Class).

Testing 3

II

OA

1

2

3

"____OA _

82.2 94.2

71.3 96.7

79.03 95.03

Si._l_Sou_ce_ source

#I

96.8

source

#2

100.0

87.3 99.0

87.3 98.0

1. 1.

99.8

99.3

98.8

99.25

94.2

94.5

96.4

95.03

1..9

99.8

99.0

99.0

99.25

93.8

94.5

95.6

94.67

1..8

99.8

99.0

99.0

99.25

93.5

94.9

95.6

94.67

1..7

99.8

99.0

99.0

99.25

92.7

94.5

95.6

94.30

1..6

99.3

98.5

99.0

98.92

90.9

94.9

96.0

93.94

1..5

99.0

98.3

99.0

98.75

90.2

93.8

96.0

93.33

1..4

98.5

97.8

98.8

98.33

88.4

92.4

95.6

92.12

1..3

98.5 98.5

97.5 95.5

97.17 95.67

85.5 85.5

91.6 88.7

89.8 88.4

89.70 I

1..2

95.5 93.0

1..1

97.3

90.5

92.5

93.42

84.4

85.5

82.2

84.00

1..0

96.8

87.5

87.3

90.42

83.6

82.2

71.3

79.03

.9 1.

99.8

99.3

98.8

99.25

94.5

94.9

96.4

95.27

.8 1.

99.8

99.3

99.0

99.33

94.5

95.3

96.4

95.39

.7 1.

99.8

99.3

99.0

99.33

94.5

94.9

96.4

95.27

.6 1.

100.0

99.3

99.0

99.42

94.5

94.5

96.0

95.03

.5 1.

100.0

99.3

98.8

99.33

94.5

94.5

96.0

95.03

.4 1.

100.0

99.3

98.8

99.33

94.5

94.5

96.0

95.03

.3 1.

100.0

99.3

98.8

99.33

95.3

94.5

96.4

95.39

.2 1.

100.0

99.3

98.5

99.25

94.9

94.5

97.1

95.27]

.1 1.

100.0

99.0

98.3

99.08

94.2

94.5

97.1

95.27

.0 I.

100.0

99.0

98.0

99.00

94.2

94.2

96.7

95.03[

400

400

1200

275

275

275

825

sl

#

s2

I-90.42 99.00 Multiple

of pixels

The

400

_8_,6[l 94.2 Sources

_____

columns labeled sl and s2 indicate applied to sources 1 (sl) and 2

CPU

time

for training

and

classification:

90 sec.

87.52

I

i

I

204

Table

Statistical HIRIS

Data

Multisource (500

Classification

Training

Percent

4.91

and

Agreement

175

with

of

Test

Simulated

Samples

2

Testing 3

II

OA

1

Single source

Class).

for Class

Reference

Training 1

per

2

3

JJ OA

94.3

98.9

J 76.95 95.81

Sources

#1

source _2

99.8

98.4

96.8

98.33

94.3

99.6

98.8

98.2

Multiple

Sources

98.87

1..9

99.6

98.8

92.6

95.4

97.7

95.23

98.4

98.93

93.1

95.4

97.7

1..8

99.6

95.43

98.6

98.2

98.80

93.1

94.9

97.7

1..7

95.24

99.2

98.8

98.4

98.80

93.1

96.0

97.7

95.62

1..6

99.0

98.6

98.4

98.67

90.3

96.0

97.7

94.67

1..5

98.4

98.0

98.4

98.27

88.0

95.4

97.7

93.71

1..4

98.0

96.8

97.8

97.53

87.4

93.7

97.1

92.76

1..3

97.8

95.0

96.4

96.40

85.7

92.6

91.4

89.90

1..2

97.6

92.4

94.0

94.67

84.6

88.6

89.7

87.62

1..1

96.6

90.6

90.4

92.53

82.9

83.4

83.4

83.24

1..0

96.2

88.0

84.8

89.53

78.9

78.3

73.7

76.95

.9 1.

99.6

98.8

98.0

98.80

92.6

95.4

97.7

95.23

.8 1.

99,6

98,8

98,0

98,80

93,1

94,9

97,7

95.24

.7 1.

99.6

98.8

97.8

98.67

94.3

94.9

97.7

95.62

.6 1.

99.6

98.6

97.8

98.67

94.9

94.9

98.9

96.19

.5 1.

99.6

98.6

97.8

98.67

94.3

94.9

99.4

96.19

.4 1.

99.6

98.6

97.8

98.67

94.3

94.9

99.4

96.19

.3 1.

99.6

98.6

97.8

98.67

94.3

94.9

99.4

96.19

.2 1.

99.6

98.6

97.4

98.53

93.7

94.3

99.4

95.81

.1 1.

99.8

98.4

97.2

98.47

94.3

94.3

99.4

96.0O

.0 1.

99.8

98.4

96.8

98.33

94.3

94.3

98.9

95.81

500

500

500

1500

175

175

175

525

sl

s2

1.

1-

of pixels The

columns

labeled

applied

CPU

time

to

for

sl

and

sources

training

s2 1 (sl)

and

indicate and

classification:

the

weights

2 (s2).

90

sec.

205

Table

Statistical HIRIS

Multisource

Data

(600

Percent

1

4.92

Classification

Training

and

75

Agreement

with

2Training :l

[[ OA

#1

source

#2

sl

95.2 i00.0

86.5

85.2

98.3

96.8

s2

!98.39 88.94

I

Simulated

Samples

Reference

Sin$1e source

of

Test

per

Class).

for Class

1

Testing

t

Sources I 72.0 92.0

r ....

78.7

73.3

97.3

97.3

74.67 95.56 J

Mu!tiple

Sources

98.78

92.0

100.0

98.7

96.89

92.0

100.0

98.7

i96.89

99.5

98.5

98.3

1..8

99.5 99.5

98.3 98.5

98.5 98.5

92.0

100.0

98.7

1..7

99.3

98.7

98.3

98.78

92.0

97.3

98.7

1..6

99.3

98.5

98.2

98.39

89.3

97.3

98.7

1..5

99.0

98.0

08.2

98.39

85.3

97.3

98.7

1..4

98.2

96.7

97.8

97.56

82.7

93.3

96.0

1..3

97.5

95.3

96.2

96.33

80.0

92.0

93.3

1..2

97.2

92.7

94.0

94.61

76.0

88.0

96.7

1..1

96.3

90.3

91.3

92.67

73.3

85.3

88.0

82.22]

1..0

95.2

86.5

85.2

88.94

72.0

80.0

78.7

74.674

.91.

99.5

98.3

98.3

98.72

92.0

100.0

98.7

96.89J

.8 1.

99.7

98.3

98.3

98,78

92.0

100.0

98.7

.7 1.

99.7

98.3

98.2

98.72

92.0

100.0

98.7

96.89 96.89

.6 1.

99.8

98.3

98.0

98.72

92.0

100.0

98.7

96.891

.5 1.

99.8

98.3

98.0

98.72

92.0

i00.0

98.7

.4 1.

99.8

98.3

98.0

98,72

92.0

100.0

98.7

.3 I.

99.8

08.3

97.7

08.61

90.7

I00.0

98.7

96.89 96.89 96.44

.2 1.

99.8

98.3

97.5

98.56

90.7

98.7

98.7

96.00[

.1 1.

100.0

98.3

97.0

98.44

92.0

98.7

97.3

.0 1.

100.0

98.3

96.8

98.39

92.0

97.3

97.3

600

600

600

1800

75

75

75

1.1.

1..9

#

of pixels

'

The

98.78 98.83

columns labeled sl and s2 indicate the weights applied to sources 1 (sl) and 2 (s2).

CPU

time

for training

and

classification:

90 see.

96.891 i 96"00 95.111

I

93.78

I

_90.67 88.44 84.89_

I [

96.00 95.56 225

I

206

Table

4.93

Source-Specific Equvivocations for Simulated HIRIS Data Versus Number of Training Samples.

Eouivocation Training Samples 100 200 3OO 400 5OO 6O0

1Sourlce _: 2 0.2581 0.0000 0.3325 0.0105 0.4057 0,0637 0.3958 0.0686 0.4154 0.0947 0.4356 0.0981

Table

4.94

Source-Specific JM Distances for Simulated HIRIS Data Versus Number of Training Samples.

JM Distancc Training Samvles 100 200 300 4OO 500 600

Sollr

i 1.313423 1.273153 1.208442 1.234641 1.214116 1.195292

ce_ 2 1.413598 1.4107_0 1.392419 1.389006 1.386011 1.383060

207

Table

Linear HIRIS

Opinion

Pool

Data

(100

Applied

Training

Percent

4.95

in and

Agreement

Classification 575

with

Test

of

Simulated

Samples

Reference

per

Class).

for Class Testing

Training

1

2

99.0

93.0

96.0

100.0

100.0

100.0

1. 1.

100.0

100.0

100.0

100.00

1..9

100.0

100.0

100.0

100.00

1..8

100.0

100.0

99.0

1..7

100.0

98.0

1..6

100.0

1..5

99.0

1..4

l

2

3

67.3

57.7

70.32

47.8

82.1

71.07

89.6

5O.6

82.3

91.5

61.0

79.8

99.67

90.8

65.9

73.0

76.58

99.0

99.00

90.4

67.5

71.0

76.29]

97.0

99.0

98.67

89.4

67.3

69.2

75.30

97.0

99.0

98.33

88.2

67.8

66.6

74.20

99.0

97.0

99.0

98.33

88.2

67.8

66.6

74.20

1..3

99.0

96.0

99.0

98.00

87.5

67.1

62.2

72.29

1..2

99,0

95.0

97.0

97.00

87.0

67.0

60.3

71.42

1..1

99.0

94.0

96.0

96.33

86.3

66.6

59.0

70.61

1..0

99.0

93.0

96.0

96.00

85.9

67.3

57.7

70.32

.9 1.

100.0

100.0

100.0

100.00

85.7

50.3

82.1

72.70

.8 1.

100.0

100.0

100.0

100.00

85.4

49.9

81.9

72.41

.7 1.

100.0

100.0

100.0

100.00

84.7

49.7

82.1

72.17

49.6

81.9

72.12

3

H

OA

Single source source

#1 #2

sl s2

1

Sources

96.00

I! 85.9

!00.00 li 83.3 Multiple

Sources

77.45

.6 1.

100.0

100.0

100,0

100.00

84.9

.5 1.

100.0

100.0

100.0

100.00

84.3

49.4

81.7

71.83

.4 i.

I00.0

100.0

100.0

100.00

84.0

48.9

81.9

71.59

.3 1.

100.0

100.0

100.0

100.00

84.0

48.3

81.9

71.42

.2 1.

100.0

100.0

100.0

100.00

83.8

48.0

81.9

71.25

.I I.

i00.0

IO0.O

100.0

100.00

83.8

47.8

82.1

.0 1.

100.0

100.0

100.0

100.00

83.3

47.8

82.1

71:97

_##of i__

100

100

100

3O0

575

575

575

1725

The

columns

labeled

applied

CPU

time

to

for

sl sources

training

and

s2 1 (sl)

and

indicate and

classification:

the 2

weights

(s2). 81

sec.

71.25

208

Table

Linear

Opinion

HIRIS

Data

(200

Pool

Applied

Training

Percent

4.96

in and

A_reement

Classification

475

with

Test

of

Samples

Simulated per

Class).

for Class

Reference

Training

Testing OA

1

Single

Sources

1

2

3

]]

2

3

97.5

91.0

92.0

[ 93.50

I00.0

I00.0

99.5

I 99.83 Multiple

Sources

I00.0

I00.0

99.5

99.83

1..9

I00.0

i00.0

99.5

99.83

1..8

I00.0

99.5

98.5

1..7

I00.0

98.5

1..6

I00.0

1..5

I00.0

]]OA

88.2

70.3

57.3

I 71.93

88.0

53.1

80.4

t 73.82

89.3

54.5

81.9

75.09

92.0

55.2

80.6

75.93

99.33

90.7

72.0

72.4

78.67

98.5

99.00

90.7

72.0

72.4

78.39

98.5

98.0

98.83

90.5

72.4

71.2

78.04

97.0

97.0

98.00

90.1

71.8

67.6

76.49

i

source

_1

source

#2

sl

s2

1.

1.

1..4

98.5

97.0

97.0

97.50

89.5

71.2

65.5

75.37

1..3

98.0

96.0

96.0

96.67

89.3

71.4

63.2

74.60

1..2

98.0

94.5

95.0

95.83

88.8

71.6

60.4

73.75

1..1

97.5

92.5

92.5

94.17

88.8

7O.3

59.2

73.19

1..0

97.5

91.0

92.0

93.50

88.2

70.3

57.3

71.93

.9 1.

i00.0

I00.0

99.5

99.83

88.8

54.5

81.9

75.09

.8 1.

I00.0

I00.0

99.5

99.83

88.8

54.3

82.1

75.09

.7 1.

I00.0

i00.0

99.5

99.83

88.2

54.1

81.9

74.74

.6 I.

i00.0

I00.0

99.5

99.83

88.2

53.9

81.9

74.67

.5 1.

100.0

100.0

99.5

99.83

88.0

53.9

81.5

74.46

.4 1.

100.0

100.0

99.5

99.83

88.0

53.9

81.3

74.18

.3 1.

100.0

100.0

99.5

99.83

88.0

53.1

80.8

73.96

.2 1.

100.0

100.0

99.5

99.83

88.0

52.8

81.1

73.96

.1 1.

100.0

100.0

99.5

99.83

88.0

53.1

80.6

73.89

.0 1.

100.0

100.0

99.5

99.83

88.0

53.1

80.4

73.82

200

200

200

600

475

475

475

1425

# of pixels The

columns

labeled

applied

CPU

time

to

for

sl sources

training

and

s2 1 (sl)

and

indicate and

the

weights

2 (s2).

classification:

84

see.

209

Table 4.97 Linear Opinion Pool Applied in Classificationof Simulated HIRIS Data (300Training and 375Test Samplesper Class).

Percent Agreement with Training

1

2

Reference

#1

source

#2

100.0

99.0

Testing

3 LIoA Single

source

for Class

98.3

sl s2

2 Sources

1199.11

96.0

Multiple

3 .UoA

_

94.4

94.1

94.84

Sources

99.7

99.0

99,0

99.22

96.5

96.0

95.2

95.91

1..9

99.7

98.7

99.0

99.11

96.5

96.0

95.2

95.91

1..8

99.7

98.7

99.3

99.22

96.8

96.3

95.2

96.09

1..7

99.7

98.3

99.3

99.11

96.5

96.3

95.2

96.00

1..6

99.7

97.3

99.3

99.11

95.7

96.0

95.5

95.73

1..5

99.7

97.3

99.3

98.78

94.4

96.0

95.7

95.38

1..4

99.0

97.0

99.7

98.56

93.9

94.4

95.5

94.58

1..3

98.3

94.7

98.3

97.11

93.1

92.0

92.5

92.53

1..2

98.0

93.7

95.0

95.56

92.5

88.0

89.1

90,13

1..1

97.0

90.7

92.7

93.44

92.0

86.9

86.9

88.62

1..0

96.3

86.7

89.3

90.78

90.7

82.4

79.5

84.18

.9 1.

99.7

99.0

98.7

99.11

96.3

96.3

94.9

95.82

.8 1.

99.7

99.3

98.7

99.22

96.3

95.7

94.9

95.64

.7 1.

99.7

99.3

98.7

99.22

96.3

95.5

94.9

95.56

.6 1.

99.7

99.3

98.7

99,22

96.5

95.5

94.9

95.64

.5 1.

100.0

99.3

98.7

99.33

96.5

95.2

94.7

95.47

.4 1.

100.0

99.3

98.7

99.33

96.3

95.2

94.4

95.29

.3 1.

100.0

99.3

98.7

99.33

96.3

95.2

94.4

95.29



1.

.1 1.

100.0 100.0

99.0 99.0

98.7 98.7

99.22 99.22

96.3 98.0

94.7 94.7

94.4 94.4

I 95.11 95.02

.0 1.

100,0

99.0

98,3

99.11

96.0

94.4

94.1

94.84

pixels

300

300

300

375

375

375

The

columns

.2 1.

__of

labeled

applied

CPU

time

to

for

sl sources

training

and

s2 indicate 1 (sl)

and

and

classification:

the

weights

2 (s2).

86

sec.

210

Table 4.98 Linear Opinion Pool Applied in Classificationof Simulated HIRIS Data (400Training and 275Test Samplesper Class).

percent

Agreement

w!th

Reference

for Class

Training

I

Testing

2

3 (I OA Single

ooo .o

source #1 source #2

96.8

87.3

2

3 IIOA

Sources

oooo

87.3

sl s2

1

90.42

83.6

Multiple

Sources

rl

,,

82.2

71.3

I[ 79.03

99.8

99.3

98.8

99.25

95.3

94.9

96.0

95.39

1..9

99.8

99.0

98.8

99.17

95.3

94.5

95.3

95.03

1..8

99.5

99.0

99.0

99.17

94.2

94.2

95.3

94.55

1..7

99.0

99.0

98.3

99.75

91.3

93.5

92.0

92.24

1..6

98.8

97.0

97.3

97.67

89.5

92.0

89.5

90.30

1..5

98.8

95.5

96.3

96.83

86.5

90.5

88.7

88.61

1..4

98.5

93.3

93.8

95.17

85.1

89.8

86.2

87.03

1..3

97.8

92.5

92.0

94.08

84.7

87.3

82.5

84.85

1..2

97.3

90.5

91.0

92.92

84.4

84.7

79.3

82.79

1..1

97.3

89.0

89.8

92.00

83.6

85.5

75.3

80.48

1..0

96.8

87.3

87.3

90.42

83.6

82.2

71.3

79.03

.9 I.

100.0

99.3

98.8

99.33

95.3

94.9

96.0

95.39

.8 1.

100.0

99.3

99.0

99.42

95.6

94.9

96.0

95.52

.7 I.

100.0

99.3

99.0

99.42

96.0

94.5

96.0

95.52

.6 I.

i00.0

99.3

98.8

99.33

95.6

94.2

96.4

95.27

.5 I.

I00.0

99.3

98.8

99.33

95.6

94.5

96.0

95.52

.4 I.

100.0

99.3

98.5

99.25

95.3

94.5

96.4

95.36

.3 I.

100.0

99.3

98.5

99.25

94.9

94.5

96.4

95.27

.2 1.

100.0

09.3

98.5

99.25

94.9

94.5

96.4

05.27

.I 1.

100.0

99.0

98.3

99.08

94.2

94.2

96.7

95.03

.0 I.

100.0

99.0

98.0

99.00

94.2

94.2

96.7

95.03

400

400

400

1200

275

275

275

825

I.

1.

# of pixels The

columns

labeled

applied

CPU

time

for

to

sl

and

sources

training

s2 1 (sl)

and

indicate and

classification:

the

weights

2 (s2).

89

sec.

211

Table

Linear HIRIS

4.99

Opinion Pool Applied in Classification of Simulated Data (500 Training and 175 Test Samples per Class).

Percent 1

source #1 source #2 sl s2

95.8 99.8

I.I. i..9 1..8 i..7 I..6 1..5 1..4 1..3 1..2 1..1 I..0

99.6 99.6 99.6 98.6 98.2 98.2 97.6 97.0 96.8 96.4 95.8

.9 .8 .7 .6 .5 .4 .3 .2 .1 .0

Agreement

with Reference

Training 2 3

l[

88.0 98.4

84.8 96.8

for Class Test"

OA

i

I Single Sources { 89.53 78.9 I 98.33 94.3 Multiple

Sources

2

g 3

78.3 94.3

73.7 98.9

_OA__A _ 95_ 76. 95.811

988 98.0 98.8o954 954 977 96i j 988 982 98.87949 954 977 9600j 98.8 98.2 98.87 93.7 95.4 97.7 95.62 i 98.6 97.2

98.0 98.4

98.40 97.27

91.4 89.7

94.3 93.7

96.0 92.0

93.90 91.81

I

95.4 93.2 91.8

95.6 93.8 92.8

96.40 93.87 93.87

86.3 84.6 82.9

92.6 89.1 86.9

90.3 89.1 85.7

89.71 87.62 85.14

91.0 89.8 88.0

89.8 87.6 84.8

92.53 91.27 89.53

82.9 80.0 78.9

84.0 80.6 78.3

82.9 76.6 73.7

83.24 79.05 76.95

99.6 99.6 99.6 99.6 99.6 99.6 99,6 99.6 99.8 99.8

98.8 98.8 98.8

98.0 98.0 97.8

98.80 98.80 98.67

94.9 95.4 94.9

95.4 94.9 94.9

97.7 98.3 98.3

96.00 96.19 96.00 J

98.6 98.6 98.6 98.6

97.6 97.6 97.4 97.4

98.67 98.60 98.53 98.53

94.3 94.3 94.3 94.3

94.3 94.9 94.9 94.3

98.9 99.4 99.4 99.4

96.00 96.19 96.19 96.00

98.6 98.4 98.4

97.4 97.2 96.8

98.60 98.47 98.33

94.3 94.3 94.3

94.3 94.3 94.3

99.4 99.4 98.9

96.00 96.00 95.81___

500

500

500

1500

175

175

175

525___

{ h

--7

1. 1. 1. 1. 1. 1. i. 1. 1. 1.

# of pixels The

columns labeled sl and s2 indicate the weights applied to sources 1 (sl) and 2 (s2).

CPU

time

for

training

and

classification:

90 see.

I I

212

Table Linear HIRIS

source

4.100

Opinion Pool Applied in Classification of Simulated Data (600 Training and 75 Test Samples per Cl_s).

Percent

Agreement

with

Reference

1

2Training 3 1,0A

I

for Class

1

Single

Sources

88.94 98.39

72.0 92.0

2Testing 3 I]OA

#1 95.2 100.0

86.5 98.3

85.2 96.8

73.3 97.3

II 95.56 74.67

Multiple

Sources

99.7

98.5

98.3

98.83

93.3

1..9

99.5

98.5

98.5

98.83

93.3

I00.0

98.7

97.33

I00.0

98.7

1..8

99.5

98.3

98.5

98.78

97.33

92.0

97.3

98.7

1..7

99.2

98.5

98.2

96.00

98.61

85.3

97.3

97.3

1..6

98.3

97.5

93.33

96.5

97.44

81.3

93.3

93.3

1..5

97.8

89.33

96.3

95.2

96.44

80.0

92.0

92.0

1..4

88.00

97.2

94.3

94.0

95.17

78.7

89.3

90.7

86.22

1..3

96.8

91.8

92.2

93.61

74.7

84.0

88.0

82.22

1..2

96.3

90.2

90.7

92.39

73.3

82.7

85.3

80.44

1..1

95.5

89.2

87.8

90.83

73.3

81.3

81.3

78.67

1..0

95.2

86.5

85.2

88.94

72.0

78.7

73.3

74.67

.9 1.

99.8

98.3

98.3

98.83

93.3

I00.0

98.7

97.33

.8 1.

99.8

98.3

98.3

98.78

93.3

I00.0

98.7

97.33

.7 1.

99.8

98.3

98.0

98.72

93.3

I00.0

98.7

97.33

.6 1.

99.8

98.3

97.8

98.67

92.0

100.0

98.7

96.89

.5 1.

99.8

98.3

97.8

98.67

90.7

100.0

98.7

96.44

.4 1.

99.8

98.3

97.7

98.61

90.7

i00.0

98.7

96.44

.3 1.

99.8

98.3

97.2

98.56

90.7

I00.0

98.7

96.44

.2 1.

99.8

98.3

97.2

98.44

92.0

98.7

98.7

96.44

so,urge ¢#2 sl s2 I.

1.

78.7 97.3

.1 1.

I00.0

98.3

96.8

98.39

92.0

98.7

97.3

96.00

.0 1.

I00.0

98.3

96.8

98.39

92.0

97.3

97.3

95.56

600

600

600

1800

75

75

75

# of plxels

The

columns labeled sl _nd s2 indicate the weights applied to sources 1 (sl) and 2 (s2).

CPU

time

for training

and

classification:

90 sec.

225

213

addition rather than multiplication in its global membership function. As compared to the 40-dimensionalML classification, these two methods were about 25_oslower (Figure 4.17). It is worth noting that a ML classificationof 60-dimensionaldata would have been still slower. Also, classificationusing the LOP and the SMC improved in terms of accuracyas comparedto the ML classificationof The trained

CGBP

(test)).

better

than

larger

increased

slowly.

(Figure iterations

the

other

In

4.104), CGLC

for the 40-dimensional grew

of training

was

about

data

(Table

was

a little

two

of 60-dimensional

rapidly

the

60 dimensions

sample

were

times data

size.

;dmost

the

the

Oddly

Also,

4.101

(training)

and

data,

it was

The

size

600

CPU

hours.

40-dimensional

classification

case,

of the

the

rapidly

samples

LOP

and the

per

the

SMC

CGBP

was

60-dimensional

algorithm

or

converged grew

training

The

a little

of 300

CGBP

to convergence

For

case

was

a sample

data.

used.

60-dimensional

classification with

and

neurons)

of test

test

to the

20 hidden

(Tables

its time

in 3.65

in training

CGLC

for

samples

If compared

the

data

cases.

accuracy

converged

the

neurons,

of classification

experiments

dimensionality

classification

The

overall

slower

than

As the

(Table

the

of training

4.18).

input

60-dimensional

of accuracy

faster.

1.2 times

(720

lower-dimensional

algorithm

146 times

about

for the

the

number

the

were

for

data.

network

In terms

As with the

class,

neural

to perfection

4.102

with

40-dimensional

data

needed

fewer

data. CGLC

(720

4.103). better

enough

neurons)

In classification than

faster but

input

the

for the

than

the

time the

did

sa,,,m for t,h(, C(1I,( _, with

in

of test

samples

40-dhnensional

data.

CGBP

algorithm

to convergence

training

better

times

also

for the

300 training;

in grew

40 and samph's

214

80

A

o o

6O

--O'--

E

--O--

MD 20 D ML 20 D

---m--

MD40 D

---e--

ML40 D MD 60 D

o_

I-

--A_.

40

-----M---

2O

0 0

100

200

Training

Figure 4.17

300

400

500

600

700

Samples/Class

Statistical Methods: Training Plus Classification Time versus Training Sample Size

SMC60 D LOP 60 D

215

Table Conjugate

Gradient

60-Dimensional

Sample size

Backpropagation

Simulated

Number iterations

CPU

of

4.101

HIRIS

-

_ time

Percent l

Applied

Data:

Training

Agreement

with

1

to Samples.

Reference

2

3

OA

100

59

650

100.0

100.0

100.0

200

91

1921

I 100.0

100.0

100.0

4696

I 100.0

i00.0

100.0

5969

I 100.0

100.0

100.0

[ 100.0 L jOO=p .......

100.0 190.0 .......

100.0 1_00.0_ _

300

183

400

189

500

172

600

250

7622 _1_3_7_4

Table Conjugate

Gradient

60-Dimensional

Sample

Number

size

iterations

of

Percent

100.00 100.00 100.00 100.00 100.00 _10p:90

4.102

Backpropagation

Simulated

for Class

HIRIS

Agreement

Applied Data:

with

Test

to

Samples.

Reference

for Class

1

2

3

OA

100

59

89.7

57.9

52.5

68.72

200

91

89.3

57.5

46.9

64.56

300

183

89.1

62.7

56.5

69.42

400

169

88.0

55.6

61.8

68.48

500

172

86.9

59.4

65.7'

70.67

800

25___g__092.0

60.0

57.3

69.78

_

......

216

16000

---o---

CGLC 20D

14000

:-

CGBP 20D CGLC 40D

12000

_" --'¢P-

CGBP 40D CGLC 60D

¢J 10000

=

¢D

CGBP

60D

8000 Qm

I--

6000

4000

20O0

0 0

100

200

300

400

500

600

700

Training Samples/Class Figure

4.18

Neural Network Models: Time versus Training

Training Plus Classification Sample Size

217

Table Conjugate

Gradient

60-Dimensional

Sample

Linear

Simulated

Number

size

of

iterations

4.103 Classifier

HIRIS

CPU

Percent

time

1

Data:

Applied

to

Training

Agreement

with

Samples.

Reference

2

for Class

3

OA

100

102

201

100.0

100.0

100.0

100.00

200

246

843

100.0

100.0

100.0

100.00

300

517

2140

100.0

100.0

99.7

400

565

3030

100.0

100.0

100.0

500

1041

6511

100.0

89.8

99.4

99.73

600

857

6931

100.0

98.0

98.2

98.72

Table Conjugate

Gradient

60-Dimensional

Sample size

Simulated

Number iterations

of

Percent 1

99.89 100.00

4.104

Linear

Classifier

HIRIS

Agreement 2

Data:

with

Applied Test

to Samples.

Reference 3

for OA

100

102

87.3

61.6

46.8

65.22

200

246

84.8

56.8

47.2

62.95

300

517

84.8

55.2

57.3

65.78

400

565

84.4

55.3

58.2

65.94

500

1041

86.9

61.7

56.6

68.38

600

857

88.0

66.7

58.7

71.11

Class

218

or less. two

With

times

4.4.4

faster

(Figure

Statistical

methods The

in the

ML

accurate,

size, the

40-dimensional

classification

was

about

4.18).

distance

and

considered

the

that

If the

data

sources,

neural

are

faster

of

reason

MD

the

classifier

statistics,

of dimensions

the it

data

is

In fact,

than

the

showed

multisource the

two

LOP

bound

{85 I. Also,

its classification

very

poor

and

in

Sections

sources

very

good

is not

independent

4.2

were

fast.

appropriate.

performance

than

and

4.3.

rather

in The

agreeable.

performance.

not

to

poorly

does not

be

also be applied

Since

accuracy

must

can

adequately.

saturation,

minimum

extremely

It

it shows

the

and

performance.

perform

not

covariance

methods

method better

data

can provide

showed data

HIRIS

and

very-high-dimensional

a far

data

fast

singular

accurate

ML

here.

It could

two or more

ML method the

both sets.

two

of

into

can be very

when

a

these

classification

LOP

LOP

of

network

performed

outperformed

can be split

data the

is that

discriminate

dimensional

the

LOP

neural

best, data

because

the

for data

and

this is the case the The

cannot

alternatives

the

40-dimensional

methods.

of multitype

classifications

apparent When

network

experiments

to the data

clearly

data and

in classification

in these

was

20- and

SMC

high-dimensional

in classification

the

applicable,

the

SMC

superior

of very-high-dimensional

of the

case

the

consistently

60-dimensional

desirable

data.

Also,

when

for In

were

classifications

in classification

applied

They

methods

method,

matrix.

ordcr

sample

Summary The

be

a larger

it does in

use

fast

any

classification

i.e., above increase.

is very

a certain

but

second of

high-

number

In the experiments,

219

the MD classificationaccuracy did not improve for data sets more complex than the 20-dimensionaldata. Of the neural network methods applied, CGBP showed excelleilt performance in classification of training data. However, its classification accuracyfor test data did not go much over 70_o. The CGBP was very slow in training and increasingthe number of training samplesslowedthe training processmarkedly. In contrast increasingthe number of training samplesdid not significantly improve the classificationaccuracy of test data. It seems evident that CGBP needsto haveseenalmostevery sampleduring training to be able to classifythem correctly during testing. Training

of

the

CGBP is

more efficient

than

conventional

backpropagation and requiresfewer parameterselections. However,as in the conventionalbackpropagation,the number of hiddenneuronsmust be selected empiric.'_liy. We selectedthe lowest number of hidden neurons which gave 100% accuracyduring training. Use of too many neural

network

(analogous

to the Hughes

The

CGLC

dimensional good CGLC

computationally

uses

data

performance was

not

as accurate

accuracies

CGBP,

so it seems data.

it did of the

similar

dimensional

no

complex

phenomenon hidden not

do

CGLC

in classifying to be the

test better

and

worse

indicates

as CGBP

can

neurons

degrade

makes

its

the

performance

[29]).

neurons, much

and

hidden

in the than

good

The

alternative

CGBP.

separability

in classifying data.

the

experiments

training CGLC

with The

of the data

converged

for classification

high-

relatively data.

but

The

achieved

faster

than

of very-high-

220

In defenseof the neural network methods, it can be said that the maximum likelihoodmethodhad an unfair advantagesincethe simulated data were generatedto be Gaussian. Neural networks are easy to implement and do not

need

statistical

model

methods data

were

sets.

results.

prior

has

the

These of

computation

time

the

neural

increased

time when

methods,

to the

very the

training

do

which

on

rapidly

not

statistical sample

Also,

have

was

methods

which

size increases.

suitable

neural

difficult

in

the

statistical

machines.

an increased

a

as much

evident

to the

parallel with

whereas

in classifying

be comparable

implemented

increases

contrast

networks

will not

unless

in

data

potential

to have

statistical

samples

the

earlier

methods

speed

about

for the ML method.

shown

as

information

to be available

However,

generalize

terms

any

network multitype ability test

to data

methods

Currently

in their

number

of training

require

almost

no

221

CI _APTER 5 CONCLUSIONS AND SUGGESTIONS FOR FUTURE WORK

5.1

Goncluaiona This

empirical

classification

of both

high-dimensional The

evaluation

data

neural

performance

terms

of classification data

neural

neural

better network

procedure specific

for

test

considered The

some

models,

networks

data. in the neural

the

achieved

models

have

an

too many the

network and

therefore

distributions

of the

CGLC

and for

data. with

models

the

CGBP,

However, statistical

cycles,

networks

data.

This

good

If

the

networks

give

sensed

methods

problem. neural

is a shortcoming

knowledge

showed

used

methods.

)roblem

no

very-

in

in classification

and

have

for

and

remotely

statistical

ng data

of neural

networks data

multisource

to the

]turning

train

neural

differences.

overtraining

overtraining

application

striking

of training

were

and

sensing/geographic

:;uperior

results

This

methods

methods

were

accuracy

in classifying

distribution-free statistical

has revealed

recognition

goes through

too

remote

as pattern

Both

test

multisource

network

data.

of statistical

less than

Also,

their

the

training will get

optimal that

of

results

has

to

be

for classification.

the

advantage

is needed is an obvious

about

that the

advantage

they

are

underlying over

most

222

statistical methods requiring modeling of the data, which is difficult when there is no prior knowledgeof the distribution functions or when the data are non-Gaussian. It also avoids the problem of determining how much influence a sourceshould havein the classification,which is necessaryfor both the SMC and LOP methods. However,the neural networks, especiallythe CGBP, are computationally complex. When the sample size was large in the experiments, the training time could be very long. The experimentsalso showed how important the representationof the data is when using a neural network. To perform well the neural network models must be trained using representative training samples. Any trainable classifier needs to be trained using representative training samplesbut the neural networks are more sensitiveto this than are the statistical methods.If the neural networks are trained with representative training samplesthe results showedthat a two-layer or do almost samples.

as well

as the

However,

the

statistical

methods

(simulated)

HIRIS

Gaussian; methods methods. of

multiple

statistical The topographic

they did

neural

in data.

were not

The

statistical

types

network

It was

and

methods

known

much

network

in multisource

classification

simulated

have

neural

the

methods

cannot

chance

of

the

Therefore,

are more modeled

HIRIS

to the

data

neural

than

appropriate a

of test

inferior

the

better

by

can

very-high-dimensional

that

doing

net

classification clearly

the

beforehand way.

be

were

of

that

models

a three-layer

when

convenient

the the

were

network statistical data

are

multivariate

model. SMC data.

method The

worked classification

well

for of

four

combining and

multispectral six

data

sources

and gave

223

significant improvement in compared

to single

different

sources

in overall

density

classification.

also showed

promise

different

modeling of

estimation

showed

accuracy.

However,

the

very the

method)

when

likelihood

method and

alternatives

for modeling

test

The

data

statistical the

SMC

since

hand,

intensive

the

approach Also,

to use

SMC.

SMC,

the

as

of weights

for

of increase

approach

and

test

the

in its own right.

accuracy.

outperformed with

data.

more

capable

neural

insight Also, time

of

neural

to the

SMC

and

effort

are

when

the

Parzen

of the

SMC

the

neural

network

in classifying form

samples

require

of the make

not

seen

computationally

algorithm. on

become

tends

functions

to

density can

useful

but

in the

density

required

density

are

samples

knowledge

models

penalized

classification.

networks

generalizing

network

penalized

method

as are

modeled

consuming

the Parzen

training

prior

Carefully

time

in multisonrce

the

classification

maximum

Both

for density

maximum

likelihood data

Parzen

more

The

representative

in contrast

training

large.

experiments

of overall

was

representative

being

more

training

the

good

the The

in terms

penalized

provided

more

the

very

in

sources.

size was

requires

significantly

analyst

accuracies

in terms

estimation

of non-Gaussian

for

iterative

density

to their

used

performance

sample

algorithm

model(s)

levels

experiments

data

maximum

it was

training.

expensive

the

algorithm

statistical

during

with

the

to be as sensitive

models.

different

were

(histogram

also gave

estimation

test

Parzen

likelihood

not

in the

methods

good

methods

SMC

classification

Using

non-Gaussian

other

The

average

accuracy.

estimation

than

and

source

classification

Three

overall

On the

the

part

estimation

other of the is used

computationally

224

The LOP did not do well multisource

remote

because

of

its

multisource data.

and

faster to the

clearly

algorithms

the

and had the

shows reason

relatively

accuracy The

three

suggested

the

worked

well for the

data

criteria

and

LOP

determine

and tended

the

problem

is

optimum

data

in classification

can

do

in

the

well

for

geographic

data

modeling

dictatorship weights

for

investigated. the

of multisource

SMC

That

well

sources

were

LOP help

because

of training

in the

the

rather

high-

agreeable

case for the

sources

data

sources

are

and

improve

the

as

the

the

were

both

adequate.

and

were

source. and

not

of very-

agreeable

It is very the

LOP.

weighting

an excellent

geographic

The

of multitype

optimum

give

These

classification

sources

SMC

ranking

classifications.

in classifications

will certainly sensing

in

employed

sizes

of the best both

were

number

the

LOP

sample

the

the

classifications.

and

where

both

in contrast

in classification source

of these

be used

were

When

of

methods

LOP

was not the

SMC

not

Both

of the

data

With

remote

classification

data,

always

of

is appealing

in classification

limited

measures

could

being

can with

two

single

They

toward

and

experiments.

reliability

worked

optimum

still

the

for

SMC

performance

SMC in all cases

data.

sensing

that

LOP

performance.

problems

good

The

appropriate

excellent

to the

sources

also

high-dimensional remote

LOP

data.

classification

very-high-dimensional

accuracies.

as compared

for

the

the

multisource

to the

of the

was

the

criteria

ranking

for

the

not

inferior

classification

agreeable

is

singularity

source-specific

multisource

in

ML classifiers

classification high

it

showed

conventional

The

dimensional

overall

It was

ML which

samples.

in

but

all

ge,_graphic

in classification

SMC

than

and

simplicity

data.

However,

LOP

sensing

at

data.

hard

to

That and

performance

225

In general the main advantagestatistical classificationalgorithms have over the neural network models is that if the distribution functions of the information classesare But

for

those

do not

cases,

know

the

appropriate,

are

network Suggestions

6.2

Future

several

selection.

weights

As observed

mathematical These

to

to

accurately.

classification, network

in which

models

can

we

be more

expense.

both

the

classification

directions

previously,

statistical

which

in this area

be

with

the

the

it is very

classifiers. use

programming

methods

related

problem

multisource

appears

neural

very

need

and

neural

further

work.

are discussed

next.

Directions

important

statistical

perform

computational

in multisource research

can

in multisource

functions,

problems

Research most

methods

at considerable

for future

The

the

distribution

approaches

these

as for instance

although

There

known

hard

One

of

statistical

more

research

to

Chapter

3,

need

approach

optimization suggested be

is

to find optimum

general

methods

methods

for

for

determining

similar

in Sections

applicable

weights

for

techniques

weight

2.3.4

to

and

the

2.3.5.

optimum

weight

selection. As explicitly statistical However, neural prior

discussed in

neural

consensus one network

statistical

consensus

in

networks. theory

possibility algorithm information

approaches.

it

is very

Therefore, approaches

for

a

consensual

described but

difficult it

and

is very the

neural

as follows. is somewhat

to

implement hard

to

statistics combine

neural

networks

network

is the

This

network

analogous

to

models. stage-wise

does the

the

not

use

statistical

226

for

In the

stage-wise

a fixed

number

When

training

stage

is computed.

second

The

stages

is computed

of the

based

consensual

stages.

classification

error

similar

way

and

all the

are

but

with

training

of

are

computed

consensual

neural

be the

is computed

is created

an]

than

trained

o'

the in a

non-linearly

has the

e.g., Then

is lower

for

both

weights)

error

stage

stage.

from

stage.

set

input

first

can,

networ_

another this

to the

of each

to the

original

stage-specific

neural

stage

for that

data

weights

accuracies

a new

error

input

consensus

stage-specific

classification

converges. error

the

the

(using

consensual

stage,

After

consensual

added

the

finished,

out,put

the

activities

in

the

error training

decreases.

If the

is stopped.

Testing

network

consensual

as

long

as

classification

can be done

the error

by applying

all

in parallel. consensual

different

multisource

transformed

sum

The

stage,

data.

the

decreasing,

The

first

second

classification

the stages

in

the

weighted

consensual

The

fashion

finished,

is trained

stages.

Stages consensual

the

classification

transforming

has

for the

network

procedure

is created.

classification

error

If

input

consensus

various

overall

for the

to

transformed

is not

stages.

training

in a similar

stage

from

classification

both

from

second

neural

the

non-linearly

the

on the

finished,

is trained

the

the

stage

by taking

activities

selected

by

until

has

another

stage

a single-stage

or

stage

obtained

training

of output

first

second

the

network

iterations

Then

are

When

using

of

of the

stage

vectors.

neural

neural

"sources."

network

In contrast

classification, data

which

algorithm

have

the been

to the "sources" transformed

combines data

the

sources

here

information usually

consist

several

times

of

from

referred

to

non-linearly from

the

raw

227

data. In neural networksit is very important to find the "best" representation of input data and the consensualneural network attempts to averageover the results from several input representations. Also, in the network, this method This multisource guidance linear

testing

can

attractive type

be done

between

for implementation

of consensual classification.

of weight-selection

transformation.

in parallel

neural

for

on parallel

network

However, the

all the

it

may needs

sources

and

consensual

stages,

neural

which

makes

machines.

be a desirable further selection

alternative

work

in

of the

terms best

for of non-

228

LIST

OF

REFERENCES

[11

P.H. Swain, J.A. Richards and T. Lee, '_4uttisource Data Analysis in Remote Sensing and Geographic Information Processing," Proceedings of the 11th International Symposium on Machine Processing of Remotely Sensed Data 1985, West Lafayette, Indiana, pp. 211-217, June 1985.

[2]

T. Lee, J.A. Richards and P.H. Swain, '*Probabilistic and Evidential Approaches for Multisource Data Analysis," IEEE Transactions on Geoscienee and Remote Sensing, vol. GE-25, no. 3, pp. 283-293, May 1987.

[3]

A.H. Strahler and N.A. Bryant, Classification Accuracy from Landsat Information," Proceedings Twelfth Remote Sensing of the Environment, Michigan, pp. 927-942, April 1978.

[4]

J. Franklin, T.L. Logan, C.E. Woodcock "Coniferous Forest Classification and Inventory Digital Terrain Data," IEEE Transactions Remote Sensing, vol GE-25, no. 1, pp. 139-149,

[5]

A.R. Jones, J.J. Settle and B.K. Wyatt, '_tSse of Digital Terrain Data in the Interpretation of SPOT-1 HRV Multispectral Imagery," International Journal of Remote Sensing, vol. 9, no. 4, pp. 669-682, 1988.

[6]

C.F. Hutchinson, "Techniques for Combining Landsat and Ancillary Data for Digital Classification Improvement," Photogrammetrie Engineering and Remote Sensing, vol. 48, no. 1, pp. 123-130, 1982.

[71

R.M. Hoffer, M.D. Fleming, L.A. Bartolucci, S.M. Davis, R.F. Digital Processing of Landsat MSS and Topographic Data to Capabilities for Computerized Mapping of Forest Cover Types, Technical Report 011579, Laboratory for Applications of Sensing in cooperation with Department of Forestry and Resources, Purdue University, W. Lafayette, IN 47906, 1979.

"Improving by Incorporating International Environmental

Forest Cover Topographic Symposium on Institute of

and A.H. Strahler, Using Landsat and on Geoscience and 1986.

Nelson, Improve LARS Remote Natural

229

Is]

H. Kim and P.H. Swain, '_VIultisource Data Analysis in Remote Sensing and Geographic Information Systems Based on Shafer's Theory of Evidence," Proceedings IGARSS '89, IGARSS '89 12th Canadian Symposium on Remote Sensing, vol. 2, pp. 829-832, 1989.

[91

J.A. Richards, D.A. Landgrebe Utilizing Ancillary Information Remote Sensing of Environment,

[lO]

R.M. Hoffer and staff, "Computer-Aided Analysis of Skylab Multispectral Scanner Data in Mountainous Terrain for Land Use, Forestry, Water Resources and Geological Applications," LARS Information Note 121275, Laboratory for Applications of Remote Sensing, Purdue University, W. Lafayette, IN 47907, 1975.

[11]

A. Torchinsky, Redwood City,

[12]

S. French, "Group Consensus Survey," in Bayesian Statistics Lindley, A.F.M. Smith (eds.), 1985.

[13]

K.J. McConway, The Probability Assessment: Thesis, University College,

[14]

C.G. Wagner• Probablhtms,

[lS]

R.F. Bordley and R.W. Probability Estimates," 1981.

[16]

K.J. McConway, '_Iarginalization Journal of the American Statistical 1981.

[17]

C. Berenstein, L.N. Kanal and D. Lavine, "Consensus Rules," in Uncertainty in Artificial Intelligence, L.N. Kanal and J.F. Lemmer (eds.), North Holland, New York, New York, 1986.

[lS]

M. Stone, "The Opinion pp. 1339-1342, 1961.

[19}

N. Dalkey, Studies Lexington, MA, 1972.







t_

and P.H. Swain, in Multispectral vol. 12, pp. 463-477,

Real Variables, Addison-Wesley California, 1988.

"Allocation, Lehrer Theory and Decision,

in

Company,

Experts' Opinions in Considerations, Ph.D.

Models, and the Consensus vol. 14, pp. 207-220, 1982.

Wolff, "On Management

the

Publishing

Probability Distributions: A Critical 2, J.M. Bernardo, M.H. DeGroot, D.V. North Holland, New York, New York,

Combination of Some Theoretical London, 1980.

Pool,"

"A Means for Classification," 1982.

of

the Aggregation of Individual Sciences, vol. 27, pp. 959-964,

and Linear Opinion Pools," Association, vol. 76, pp. 410-414,

Annals

Quality

Mathematical

of

Life,

Statistics,

Lexington

32,

Books,

230

[20]

J.V. Zidek, Multi-Bayesianity, British Columbia, Vancouver,

[21]

R.L. Winkler, Distributions", Oct. 1068.

[22]

R.L. Winkler, Information 1981.

[23]

R.L. Winkler, "The Quantification Methodological Suggestions," Journal of Association, vol. 62, no. 320, pp. 1105-1120,

[24]

R.L. Winkler, "Scoring Rules and Assessors," Journal of the American pp. 1073-1078, 1969

[2s]

C. Genest and J.V. Zidek, "Combining Probability Distributions: Critique and and Annotated Bibliography," Statistical Science, 1., no. 1, pp. 114-118, 1986.

[28]

M. Bacharach, Bayesian Church, Oxford, 1973.

[27]

C. Genest, K.3. McConway and M..I. Schervish, Externally Bayesian Pooling Operators," The vol. 14, no. 2, pp. 487-501, 1986.

[2s]

R.F Bordley, Studies in Mathematical Group Decision Theory, thesis, University of California, Berkeley, California, 1979.

[29]

K. Fukunaga, Introduction to Statistical edition, Academic Press, New York, 1990.

[30]

P.H. Swain, '_undamentals of Pattern Recognition in Remote Sensing," in Remote Sensing - The Quantitative Approach, edited by P.H. Swain and S. Davis, McGraw-Hill Book Company, New York, 1978.

[31]

J.A. Richards, Remote Sensing Digital Image Introduction, Springer-Verlag, Berlin, W. Germany,

"The Management

Technical Canada, Consensus Science,

"Combining Probability Sources," Management

Dialogues,

Report 1984.

no. 05, University

of Subjective vo[. 15, no. 2, pp.

Distributions Sciences, vol.

of

Probability B-61 - B-75,

from Dependent 27, pp. 479-488,

of Judgement: the American 1967.

Some Statistical

the Evaluation of Probability Statistical Association, vol. 64,

unpublished

manuscript,

A vol.

Christ

"Characterization of Annals of Statistics,

Pattern

Recognition,

Analysis 1986.

Ph.D.

2nd

An

231

[32]

S.J. Whitsitt and D.A. Landgrebe, Error Estimation and Separability Measures in Feature Selection for Multielass Pattern Recognition, LARS Publication 082377, The Laboratory for Application of Remote Sensing, Purdue University, W. Lafayette, Indiana, 1977.

[33]

C.E. Shannon Communication,

[34]

R.J. MeEliece, The Theory of Information and Coding, Encyclopedia of Mathematics and its Applications, vol. 3, Addison-Wesley Publishing Company, Reading, Massachusetts, 1977.

[35]

J.A. Benediktsson and P.H. Swain, Methods for Multisource Data Analysis in Remote Sensing, School of Electrical Engineering and Laboratory for Application of Remote Sensing, TR-EE 87-26, Purdue University, W. Lafayette, IN 47907, 1987.

[36]

R.F. Bordley, Assessments,

"A Multiplicative Formula for Aggregating Probability Management Science, vol. 28, pp. 1137-1148, 1982.

[37]

D.H. Krantz, Measurement

R.D. Lute, P. Suppes and A. Tversky, Foundation Vol. 1, Academic Press, New York, New York, 1971.

[38]

P.A. Morris, Management

[39]

D.V. Lindley, "Another Resolution," Management

[40]

M.J. Schervish, "Comments Judgements," Management

[41}

M. H. DeGroot, "Reaching a Consensus," Journal of the Statistical Society, vol. 69, no. 345, pp. 118-121, 1974.

[42]

P.A. Morris, Management

[43]

J.O. Berger, Statistical edition, Springer-Verlag,

I44]

S. French, '_lpdating of Belief in the Opinion," Journal of the Royal Statistical pp. 43-48, 1980.

and W. University

"An Science,

Weaver, of Illinois

The Mathematical Theory Press, Chicago, 1963.

Axiomatic Approach to vol. 29, pp. 24-32, 1983. Look at Science,

Expert

of

of

Resolution,"

an Axiomatic Approach to Expert vol. 32, pp. 303-306, 1986.

on Some Axioms for Combining Expert Science, vol. 32, pp. 306-312, 1981}.

"Combining Expert Judgements, a Bayesian Science, vol. 23, pp. 679-693, 1977. Decision Theory and Bayesian New York, 1085.

American

Approach"

Analysis,

2nd

Light of Someone Else's Society, Series A, vol. 143,

232

[45]

B.W. Silverman, Density Estimation for Statistics and Data Analysis, Monographs on Statistics and Applied Probability, Chapman and Hall, New York, New York, 1986.

[46]

R.O. Duda and P.E. A Wiley Interscience New York, 1973.

Hart, Pattern Publication,

[47]

D.W. Scott, R.A. Probability Density Likelihood Criteria," 832, 1980.

Tapia and Estimation The Annals

[48]

T. Kohonen, Networks, vol.

[49]

B.P. Holt,

[50]

F. Rosenblatt, "The Perceptron: Information Storage and Organization Review, vol. 65, pp. 386-408, 1958.

[51]

R.A. IEEE

[52]

J.J. Hopfield, "Neural Networks and Physical Systems with Emergen Collective Computational Properties Like Those of Two-State Neurons," Proceeding of the National Academy of Science, vol. 81, pp. 3088-3092, May 1984.

[53]

S. Grossberg, "Nonlinear Neural Networks: Principles, Mechanisms, and Architectures," Neural Networks, vol. 1, no. 1, pp. 17-62, 1988.

[54}

G.A. Carpenter and S. Grossberg, "A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine," Neural Networks and Natural Intelligence, S. Grossberg (ed.), MIT Press, Cambridge, MA, pp. 251-315, 1988.

[55]

J.A. Hartigan, York, 1975.

[56]

T. Kohonen, Self-Organization Springer-Verlag, New York,

[57]

M. Minsky and Press, Cambridge,

"An Introduction 1, no. 1, pp. 3-16,

Lathi, Modern Digital Rinehart and Winston,

Classification John Wiley

J.R. Thompson, "Nonparametric by Discrete Maximum Penalizedof Statistics, vol. 8, no.4, pp. 820-

to 1988.

Neural

Computing,"

Neural

and Analog Communication New York, 1983. A

Algorithms,

S. Pappert, MA, 1988.

John

and Associative 1988. Perceptrons-

Systems,

Probabilistic in the brain,"

Lippmau, "An Introduction to Computing ASS,° Magazine, April 1987.

Clustering

and Scene Analysis, and Sons, New York,

Model for Psychological

with

Wiley

Neural

and

Memory,

Expanded

Nets,"

Sons,

2nd

Edition,

New

edition,

MIT

233

[58]

G.E. McClellan, R.N. DeWitt, T.H. Hemmer, L.N. G.O. Moe, ']VIultispectral Image Processing with Backpropagation Network," Proceedings of IJCNN 151-153, Washington D.C., 1989.

[sg]

S E Decatur. "Application Classfficatxon, Proceedings Washington D.C., 1989. •





°

,_

of

of Neural Networks IJCNN '89, vol 1.,

Matheson and a Three-Layer '89, vol. 1, pp.

to pp.

Terrain 283-288,

[80]

S.E. Decatur, Application of Neural Networks to Terrain Classification, M.S. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, June 1989.

{81]

O.K. Ersoy and D. Hong, "A Hierarchical Nonlinear Spectral Processing," presented D.C., 1989.

[62]

P.H. Heermann and N. Khazenie, for Classification of Multi-Source Data," Proceedings of IGARSS Washington D.C., 1990.

[63]

J. Maslanik, J. Key and A. Schweiger, "Neural Network Identification of Sea-Ice Seasons in Passive Microwave Data," Proceedings IGARSS '90, vol. 2, pp. 1281-1284, Washington D,C., 1900. and M.E. Convention

Neural Network Involving at IJCNN '89, Washington

"Application Multi-Spectral '90, vol.

Hoff, "Adaptive Switching Record, IRE, pp. 96-104,

of Neural Networks Remote Sensing 2, pp. 1273-1276,

[64]

B. Widrow WESCON

[65]

J.A. Anderson and E. Rosenfeld Cambridge, MA, 1988.

[86]

P.J. Werbos, Beyond Regression: New Analysis in the Behavioral Sciences," University, Cambridge, MA, 1974.

[87]

D. Parker, Learning Logic, Computational Research in MIT, Cambridge, MA, 1985.

[88]

Y. Le Cun, "Learning Processes in an Asymmetric Threshold Network," Disordered Systems and Biological Organization, E. Bienenstock, F. Fogelman Souli and G. Weisbruch (eds.), Springer, Berlin, 1986.

(eds.),

Technical Economics

Circuits," New York,

of

Neurocomputing,

1960 IRE 1960. M_IT Press,

Tools [or Prediction and Ph.D. thesis, Harvard

report TR-87, Center for and Management Science,

234

[69]

D.E. Rumelhart, G.E. Hinton and R.J. Williams, '_Learning Internal Representation by Error Propagation," Parallel Distributed Processing: Ezploration_ in the Microstructures of Cognition, Vol. 1, D.E. Rumelhart and J.L. MeClelland (eds.), pp. 318-362, MIT Press, Cambridge, MA, 1986.

[70]

D.E. Rumelhart, G.E. Hinton Representations by Back-propagating 536, 1986.

[71]

R.A. Rate

[72]

J.A. Benediktsson, P.H. Swain and O.K. Ersoy, "Neural Network Approaches Versus Statistical Methods in Classification of Multisource Remote Sensing Data," Proceedings IGARSS '89 and the 12th Canadian Symposium on Remote Sensing, vol. 2, pp. 489-492, Vancouver, Canada, 1989.

[73]

R.P. Gorman and T.J. Sejnowski, "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Signals", Neural Networks, vol. 1, no. 1, pp. 75-90, 1988.

[74]

E. Barnard and R.A. Cole, A Neural-Net Training Program Based on Conjugate-Gradient Optimization, Technical Report No. CSE 89-014, Oregon Graduate Center, July 1989.

[75]

R.L. Watrous, Learning Algorithms for Connectionis Networks: Applied Gradient Methods of Nonlinear Optimization, Technical Report MS-CIS-88-62, LINC LAB 124, University of Pennsylvania, 1988.

L76]

D.G. Luenberger, Addison-Wesley,

[77]

H. White, Perspective,"

[78]

W. Kan and I. Aleksander, "A Probabilistic Logic Neuron Network for Associative Learning," in Neural Computing Architectures - The Design of Brain-Like Machines, edited by I. Aleksander, M.I.T. Press, Cambridge, MA, 1989.

I79]

D.F. Specht, '_Probabilistic ICNN, San Diego, 1988.

Jacobs, "Increased Adaption," Neural

Williams, 'Zearning Nature, 323, pp. 533-

Rates of Convergence Through Networks, vol. 1, no. 4, pp. 295-307,

Linear Reading,

'_earning Neural

and R.J. Errors,"

and Nonlinear MA, 1984.

in Artificial Computation,

Neural

Programming,

Neural Networks: vol.1, pp. 425-464,

Network

(PNN)",

Learning 1988.

2nd

ed.,

A Statistical 1989.

Proceedings

of

235

[8o]

D.F. Specht, '_robabilistic Neural Networks and the Polynomial Adaline as Complementary Techniques for Classification," IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 111 - 121, March 1990.

[81]

O.K. Ersoy, Lecture University, 1990.

[82]

J.A. Benediktsson, P.H. Swain and O.K. Ersoy, Approaches Versus Statistical Methods in Multisource Remote Sensing Data," IEEE Geoscience and Remote Sensing, vol. GE-28, no. 4, 1990.

[83]

D.G. Goodenough, M. Goldberg, G. Plunkett and J. Zelek, "The CCRS SAR/MSS Anderson River Data Set," IEEE Transactions on Geoscienee and Remote Sensing, vol. GEo25, no. 3, pp. 360-367, May 1987.

{84]

J.P. Kerekes and D.A. Landgrebe, RSSIM: A Simulation Program for Optical Remote Sensing Systems, TR-EE 89-48, School of Electrical Engineering, Purdue University, West Lafayette, IN, August 1989.

[85]

C. Lee, Classification Algorithms for High Dimensional thesis proposal, School of Electrical Engineering, Purdue 1989.

Notes,

School

of Electrica_

Engineer|ng,

Purdue

"Neural Network Classification of Transactions on pp. 540-552, July

Data, Ph.D. University,

_tj e _

Suggest Documents