mapping the condition of

7 downloads 0 Views 867KB Size Report
A Milestone Report to the Murray-Darling Basin Authority as part of Contract .... The modelling reported here for two focal floodplains of the Murray River ...
IMPROVING MODELLING OF FOREST TYPES AND STAND CONDITION USING NEW REMOTE SENSING DATA SETS

Shaun C. Cunningham, Peter Griffioen, Matt White and Ralph Mac Nally A Milestone Report to the Murray-Darling Basin Authority as part of Contract MD2434.

Shaun C. Cunningham* and Ralph Mac Nally School of Biological Sciences, Monash University, VIC 3800 Peter Griffioen Ecoinformatics Pty. Ltd., Heidelberg, VIC 3084 Matt White Arthur Rylah Institute, Victorian Department of Environment and Primary Industries, Heidelberg, VIC 3084 *

Corresponding author: Tel.: +61 3 9902 0142: Fax: +61 3 9905 5613

E-mail address: [email protected] This report should be cited as: Cunningham SC, Griffioen P, White M and Mac Nally R, (2013) Improving Modelling of Forest Types and Stand Condition Using New Remote Sensing Data Sets. Murray-Darling Basin Authority, Canberra. Cover image: Digital elevation model produced from LiDAR data collected over GunbowerKoondrook-Perricoota Forests.

© Murray-Darling Basin Authority for and on behalf of the Commonwealth of Australia 2014 With the exception of photographs, the Commonwealth Coat of Arms, the Murray-Darling Basin Authority logo, all material presented in this document is provided under a Creative Commons Attribution 3.0 Australia licence (http://creativecommons.org/licences/by/3.0/au/). For the avoidance of any doubt, this licence only applies to the material set out in this document.

The details of the licence are available on the Creative Commons website (accessible using the links provided) as is the full legal code for the CC BY 3.0 AU licence ((http://creativecommons.org/licences/by/3.0/legal code). MDBA’s preference is that this publication be attributed (and any material sourced from it) using the following: Publication title: Cunningham SC, Griffioen P, White M and Mac Nally R, (2014) Improving Modelling of Forest Types and Stand Condition Using New Remote Sensing Data Sets. Murray-Darling Basin Authority, Canberra. Source: Licensed from the Murray-Darling Basin Authority under a Creative Commons Attribution 3.0 Australia Licence The contents of this publication do not purport to represent the position of the Commonwealth of Australia or the MDBA in any way and are presented for the purpose of informing and stimulating discussion for improved management of Basin's natural resources. To the extent permitted by law, the copyright holders (including its employees and consultants) exclude all liability to any person for any consequences, including but not limited to all losses, damages, costs, expenses and any other compensation, arising directly or indirectly from using this report (in part or in whole) and any information or material contained in it.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Executive Summary Accurate predictions of stand condition depend on strong relationships between ground measurements of stand condition and remotely-sensed variables, including probability maps of the forest types across the floodplain. Early models of stand condition and forest type probability were based on Landsat imagery. In November 2011, the Landsat 5 satellite ceased to provide imagery over Australia. We reviewed possible alternative remotely–sensed data sets for predicting stand condition including Rapideye imagery, which was subsequently used to build the Stand Condition Tool. Here, we report on an investigation into whether additional remotely-sensed data could be used to improve predictions of Stand Condition Models developed for the forests and woodlands of the Murray River floodplain. The study focused on two contrasting floodplains (Chowilla Floodplain and Gunbower-KoondrookPerricoota Forest) on the Murray River. New remotely-sensed data sets were collected for the two focal floodplains including an historical (2000-2010) Landsat composite, Rapideye, SPOTMap, LiDAR and PALSAR data sets. The ground survey from 2010 was used as these data had previously produced the weakest Landsat-based stand condition models, which may be improved by the new data sets. First, models of Forest Type probability were built using the new remotely-sensed data sets and the original Landsat data to provide comparisons. Second, models of Stand Condition were built using the new remotely-sensed data sets, including the most accurate Forest Type model built here, and the original Landsat data for comparison. All data combinations were modelled using neural networks and random forests to determine which provided more accurate predictions. Random forest produced more accurate models for Forest Type probability whereas neural networks produced better models for stand condition. The inclusion of the new data sets improved the predictions for the Forest Types relative to the original data set (compare Models 2 & 4, R2 = 0.50 and 0.59, respectively). The inclusion of the new data sets in the neural network lead to a substantial improvement in the predictions of stand condition across the two focal floodplains (compare Models 9 & 11, R2 = 0.60 and 0.81, respectively). The modelling reported here for two focal floodplains of the Murray River suggests that the prediction of stand condition across the Murray River could be improved by inclusion of new remotely-sensed data sets. The potential data sets include: 1. A new tree probability layer, like Pr(MurrayTree_new), built using the historical Landsat composite and PALSAR data set. 2. New probability maps for the target Forest Types built using Rapideye, the historical Landsat composite, SPOTMap and PALSAR imagery. 3. Rapideye imagery. 4. PALSAR imagery. If additional remotely-sensed data sets were to be included in a stand condition model, several modelling approaches should be used to achieve the most accurate predictions.

Improving Modelling of Forest Types and Stand Condition 1

41 42

Table of Contents

43

Executive Summary

1

Introduction

3

Methods

5

44 45 46 47 48

Study area

5

49

Reference sites

6

50

Condition assessment

6

51

Original remotely-sensed data sets

7

52

New remotely-sensed data sets

8

53

Modelling

17

54

Forest Type modelling

18

55

Stand condition modelling

20

56 57

Results

22

58

Forest Types

22

59

Stand Condition

23

60 61

Discussion

47

62

Conclusion

49

Acknowledgements

50

References

51

63 64 65 66 67

Improving Modelling of Forest Types and Stand Condition 2

68

INTRODUCTION

69 70

The forests and woodlands of the Murray River floodplain have been declining rapidly in condition

71

over the past two decades (Margules_&_Partners, 1990; Cunningham et al., 2009c). This dieback is

72

associated with increased regulation and extraction of water from the Murray River, and an

73

extended period of drought. Water availability on the floodplain has decreased with reduced

74

flooding and increasing salinity of soils, ground water and river water in many areas. ‘The Living

75

Murray’ program of the Murray-Darling Basin Commission (now Murray–Darling Basin Authority) was

76

established in 2002 to restore the health of the basin by returning water to these floodplains (MDBC,

77

2002). This has involved water recovery, construction of environmental works and measures, and an

78

environmental watering and monitoring effort across the six Icon Sites. The Murray-Darling Basin

79

Authority decided in 2008 that a remote sensing approach was necessary to provide adequate

80

monitoring of changes in the condition of forests and woodlands across the whole Murray River

81

floodplain.

82 83

We had previously quantified the condition of river red gum stands across the Victorian Murray River

84

floodplain using a combination of quantitative ground surveys, remotely-sensed data and several

85

modelling methods (Cunningham et al., 2009c). This approach allowed us to predict forest condition

86

on this floodplain (ca 100 000 ha), with high accuracy (R2 = 0.78) and resolution (25 m x 25 m pixels).

87

This Living Murray project ‘Mapping Of Stand Condition For The Living Murray Icon Sites’ builds on

88

the previous work, expanding into new forest types (black box and mixed box woodlands) and

89

increasing predictive power by modelling data over three years.

90 91

The project aims to complement site-based ground surveys with annual maps of stand condition

92

across the whole Murray River floodplain. The specific aims of the project are:

93 94

1. Survey condition of river red gum and black box stands across The Living Murray Icon Sites excluding the Lower Lakes, Coorong and Murray Mouth.

95

2. Predict and map stand condition of these Icon Sites in 2003, 2008, 2009, 2010 and 2012.

96

3. Build a Stand Condition Tool that can be used to predict stand condition of the Icon Sites

97

annually using current ground assessments of reference sites and satellite imagery.

98

In the first year of the project, we assessed stand condition of forest types dominated by river red

99

gum and black box using ground surveys of 175 reference sites (Cunningham et al., 2009a). These

100

assessments were predicted successfully (R2 = 0.68) from Landsat imagery using an artificial neural

101

network. The 2009 Stand Condition Model predicted that 79% of the area covered by river red gum,

Improving Modelling of Forest Types and Stand Condition 3

102

black box and box communities in The Living Murray Icon Sites were in a stressed condition in 2009.

103

Surveys of stand condition in 2010 at the reference sites were less successfully predicted (R2 = 0.58)

104

than the first year from Landsat imagery and derived structural data using an artificial neural network

105

(Cunningham et al., 2011). This was predominantly due to the number of references sites in good

106

condition halving between 2009 and 2010 and the majority (77%) of sites being in poor to moderate

107

condition. Consequently, the model fitted well at these intermediate values but poorly at extreme

108

values (good and severe condition). This issue was addressed statistically by transforming the

109

predictions towards equality (the line where observations equal predictions) and this capability was

110

included in the final Stand Condition Tool.

111 112

In 2011 there were two significant delays to the development of the Stand Condition Tool. First, the

113

extensive floods that began in spring 2010 prevented access to the majority of reference sites during

114

early 2011. It was decided to postpone ground surveys until early 2012 when flood waters should

115

have receded and any positive growth response would still be apparent. Second, the Landsat 5

116

satellite stopped providing imagery over Australia in November 2011 due to a declining power

117

source. This initiated a review of potential alternate remotely-sensed data to improve the prediction

118

of stand condition (Cunningham et al., 2012). It was concluded that higher resolution reflectance

119

data (Rapideye, SPOTMap) and structural information beneath the canopy (LiDAR, PALSAR) could

120

improve predictions of the distribution of forest types and stand condition.

121 122

The lack of Landsat imagery in 2012 necessitated a shift to the equivalent imagery provided by the

123

Rapideye satellite constellation. Consequently, in order to build a Stand Condition Tool based on

124

Rapideye imagery, stand condition needed to be modelled for the three years (2009, 2010 and 2012)

125

using Rapideye imagery to ensure strong predictive power in subsequent years. The individual year

126

models were built successfully from Rapideye imagery for 2009 (R2 = 0.75), 2010 (R2 = 0.61) and 2012

127

(R2 = 0.71, Cunningham et al., 2013a, b). Using Rapideye instead of Landsat imagery provided more

128

accurate predictions of stand condition. The final Stand Condition Tool was around a multi-year

129

(2009, 2010 and 2012) model, which had strong predictive power (R2 = 0.87) and validated well using

130

an independent survey of 50 new sites (R2 = 0.84, Cunningham et al., 2013c).

Improving Modelling of Forest Types and Stand Condition 4

131

This part of The Living Murray project ‘Mapping Of Stand Condition For The Living Murray Icon Sites’

132

investigates whether the inclusion of additional remotely-sensed data sets, including those

133

highlighted in our remote sensing review, could improve the predictions of stand condition. Here,

134

we report on modelling to determine if predictions of Forest Type probability and stand condition

135

could be improved by:

136

a) including new remotely-sensed data sets or;

137

b) using different modelling approaches.

138 139

METHODS

140 141

Study area

142

The study area included forests and woodlands of the Murray River floodplain in southeastern

143

Australia. Two focal floodplains were chosen to provide contrasting extents of forest types and stand

144

structures (Figure 1). Gunbower-Koondrook-Perricoota Forests (ca 35⁰ 45´S 144⁰ 20´E) in the middle

145

Murray is dominated by dense river red gum forests and more open river red gum woodlands.

146

Chowilla Floodplain (ca 33⁰ 55´S 140⁰ 55´E) on the lower Murray is dominated by sparse black box

147

woodlands with river red gum woodlands and forests along more permanent waterways. All forest

148

types that are dominated by river red gum (Eucalyptus camaldulensis) or black box (E. largiflorens)

149

were included.

150 151

The distribution of river red gum and black box across The Living Murray Icon Sites was defined using

152

existing digital vegetation maps (Cunningham et al., 2009b). Mapping for the Middle Murray in New

153

South Wales side did not distinguish black box woodlands from other box woodlands, so the forest

154

type box woodland was included for Millewa and Koondrook-Perricoota only. Distributions for the

155

following forest types were created for the treed Icon Sites along the Murray River floodplain:

156 157 158 159

1. River red gum forest – stands dominated by E. camaldulensis with 30-45% projective foliage cover. 2. River red gum woodland – stands dominated by E. camaldulensis with 20-25% projective foliage cover.

160

3. River red gum / black box woodland – mixed stand of E. camaldulensis and E. largiflorens.

161

4. Black box woodland – stands dominated by E. largiflorens.

162

5. Box woodland – stands dominated by E. largiflorens and E. macrocarpa (Gunbower-

163

Koondrook-Perricoota Forests only.

Improving Modelling of Forest Types and Stand Condition 5

164

Reference sites

165

A total of 175 reference sites were surveyed across The Living Murray Icon Sites to inform the

166

previous 2010 and 2009 Landsat-based Stand Condition Models (Cunningham et al., 2009b;

167

Cunningham et al., 2011). Within each Icon Site, reference sites were distributed across the forest

168

types according to how much area they covered. In 2009, reference sites were chosen to be

169

representative of the range of forest types, forest condition and landscape positions (e.g. riverine,

170

wetland and floodplain) at each Icon Site. This approach provided sites with a full range of current

171

stand condition (Cunningham et al., 2009c). Here, we focused on Chowilla Floodplain and Gunbower-

172

Koondrook-Perricoota Forests only for which there were 25 and 50 reference sites, respectively.

173 174

Condition assessment

175

The 2010 survey data set was chosen to investigate potential improvements to prediction of stand

176

condition because of the two years with Landsat-based models it produced the weakest predictions

177

and may improve with the additional of new data sets. Reference sites were assessed between

178

January and May 2010. At each site location, a 0.25 ha plot was established for assessments. Most

179

plots were 50 x 50 m plots but four rectangular plots (125 x 20 m) were used to assess linear stands

180

along watercourses on the Chowilla floodplain.

181 182

The stand condition assessment involved measuring the three indicators percentage live basal area,

183

plant area index and crown extent, which are known to be reliable and objective indicators of

184

condition in stands of river red gum (Cunningham et al., 2007). Plant area index (PAI) is the area of

185

leaves and stems per unit ground area without adjustment for clumping of canopy components. PAI

186

was estimated from hemispherical photographs of the canopy, which were first classified using

187

image analysis (MultiSpec Application Version 3.1, Purdue University, Indiana), with the program

188

Winphot 5.00 (ter Steege, 1996). Crown extent is the percentage of the potential crown, which is

189

determined by the extent of the existing branching structure, that contains foliage. Crown extent

190

was estimated by two observers using an interval scale (0%, 1-20%, 21-40%, 41-60%, 61-80%, 81-

191

100%) from 30 trees representative of the range of tree size and condition within a plot. Percentage

192

live basal area (%LBA) is the percentage of a stand’s basal area that is contributed by live trees.

193

Trees were considered alive if there was live foliage within the crown. PAI was standardized relative

194

to the maxima measured for each Forest Type within a Bioregion (Riverina, Murray Mallee). This

195

accounted for the historical reduction in PAI owing to the decline in productivity associated with

196

reduced rainfall, flooding and increased evaporation downstream along the Murray River floodplain

197

(Bioregion) and local differences in water availability within a floodplain (Forest Type). Scores for

198

each condition indicator were converted to values out of 10 (PAI and %LBA x 10, and crown extent x

Improving Modelling of Forest Types and Stand Condition 6

199

2). A stand condition score (SCS) was calculated from the average score of the three condition

200

indicators, which had a maximum of 10 points.

201 202

Original remotely-sensed data sets

203

LANDSAT imagery covering the study area was obtained from the National Earth Observatory Group,

204

Geoscience Australia. Seven scenes of Landsat5 data were required to cover the whole floodplain,

205

with two scenes covering the areas of interest (Table 1). Imagery was obtained for early (January-

206

April) 2003, 2009 and 2010 to match the timing of ground surveys of stand condition. It was not

207

possible to obtain a cloud-free image of Gunbower-Koondrook-Perricoota Forests in early 2010, so

208

an earlier image from November 2009 was used. Landsat imagery provides reflectance in six spectral

209

bands and the normalized difference vegetation index (NDVI) was calculated from the red and near

210

infrared spectra (Table 2).

211 212

The Landsat images required processing before reflectance band data could be extracted to inform

213

the modelling. The seven scenes were mosaicked into a single image of the floodplain using ENVI 4.5

214

(ITT Visual Information Solutions, Boulder, Colorado, USA). The coordinate systems differed among

215

the seven scenes, with the two most eastern scenes in GDA Zone 55 and the other five in GDA Zone

216

54. The images were first colour balanced and then mosaicked with cubic convolution splining into

217

the VicGrid coordinate system. A feathering of a 20 pixel overlap between the image boundaries was

218

used to produce an almost seamless image of the entire study region. The final image was produced

219

as rasters with a 25 x 25 m pixel resolution. We considered this to be an appropriate pixel resolution

220

to ensure a pixel fell within plot locations (50 x 50 m) and to estimate stand condition across the Icon

221

Sites, which were at least five orders of magnitude larger in area.

222 223

In 2009, the distribution of forest types compiled from existing vegetation maps provided the most

224

definitive outline of the area for which the condition model should be applied (Cunningham et al.,

225

2009b). However, these distributions are likely to contain some areas of sparse or cleared

226

vegetation. Therefore, a map of tree cover was built to improve predictions of stand condition and

227

remove areas that have no forests or woodlands. A layer of tree probability was created for the

228

whole floodplain using an existing Tree/No Tree layer for Victoria (Griffioen & White, unpublished).

229

The Tree/No Tree layer was built using multiple feed-forward, multilayer perceptron artificial neural

230

networks learned by a backpropagation algorithm (Rumelhart et al., 1986) using nine years of

231

Landsat imagery and tree/no tree training data. Estimates of tree cover across Victoria were

232

determined for three temporal groupings of these satellite images a) 1989, 1991 and 1992 b) 1995,

233

1998 and 2000, and c) 2002, 2004 and 2005 using artificial neural networks. Another artificial neural

Improving Modelling of Forest Types and Stand Condition 7

234

network was used to combine the above three networks by introducing the new classification:

235

always tree, never tree, tree loss and tree gain, and to determine a tree probability for each 25 x 25

236

m pixel.

237 238

After visually confirming the utility of the Tree/No Tree layer across the floodplain, a layer of tree

239

probability, Pr(MurrayTree), was built by training neural networks to recognise trees in satellite

240

images of the floodplain in 2003, 2008 and 2009 using the Tree/No Tree layer to supply the

241

exemplars. A total of 10 000 data points were extracted from the Tree/No Tree layer and the

242

reflectance bands from the 2003, 2008 and 2009 images across the Murray River floodplain. The

243

resolution of Tree/No Tree layer means that each pixel (25 x 25 m) would usually encompass more

244

than one tree within forested areas. A visual comparison of the predictions of Pr(MurrayTree) with

245

independent imagery (Google Earth) found that the model predicted trees well. The probability

246

threshold for trees differed across the floodplain, with river red gum forests in the Riverina predicted

247

at Pr > 0.7 while sparse black box woodlands on the outer floodplain of Chowilla were predicted at Pr

248

> 0.1. These thresholds demonstrated that Pr(MurrayTree) was a good predictor of tree cover within

249

a pixel, which provided useful information for modelling stand condition. That is, areas with a higher

250

probability of having trees may also have a higher stand condition.

251 252

In 2009, probability maps for the five Forest Types were built to provide additional information for

253

predicting stand condition (Cunningham et al., 2009b). A set of 2220 points were randomly sampled

254

from across the Forest Type distribution maps produced from existing vegetation maps, ensuring

255

sufficient sampling of the types with more limited extents. A neural network was built in Statistica

256

6.0 to predict Forest Type from the values of the six reflectance bands in the 2003 and 2009 Landsat

257

composite of the floodplain. This neural network predicted the probability of each Forest Type

258

occurring at a site. The network had an overall accuracy of 67.1% and accurately predicted (84%) the

259

distribution of river red gum forest and black box woodlands (Cunningham et al., 2009b). These

260

original Forest Type probability maps were used here in the original models of stand condition. To

261

provide an appropriate comparison for the new models of Forest Type built here, the original models

262

had to be replicated using the same set of samples from within the focal floodplains only.

263 264

New remotely-sensed data sets

265

A range of new remotely-sensed data sets were obtained that were anticipated to provide improved

266

predictions for either forest type or stand condition (Cunningham et al., 2012). These data sets

267

include reflectance with finer spatial resolution (e.g. SPOTMap) or finer spectral resolution (e.g.

268

Rapdieye) than Landsat. Data sets from active sensors (e.g. LiDAR), which emit energy and detect the

Improving Modelling of Forest Types and Stand Condition 8

269

reflected radiation, were included to provide information about the structure beneath the canopy,

270

which reflectance data does not provide.

271 272

An HISTORICAL LANDSAT COMPOSITE was produced by Geosciences Australia, which included images

273

over the period January 1st 2000 to December 31st 2010. The composite included median values for

274

six spectral bands (Table 3) and five indices calculated from ratios of these bands (Table 4).

275

Floodplain vegetation is dynamic and the understorey of floodplain forests is quite visually distinct

276

between wet and dry years. An extended period of satellite imagery is likely to provide a more

277

consistent vegetation map across the floodplain than one based on a single year of reflectance

278

imagery, which may include flooded and unflooded areas.

279 280

RAPIDEYE imagery covering the Murray River floodplain was obtained for 2009, 2010 and 2012 from

281

the AAM Group. Sixty-seven tiles (25 km × 25 km each) of Rapideye data were required to cover the

282

whole floodplain (Cunningham et al., 2013a), with six tiles covering the areas of interest (Table 5).

283

Ideally, imagery would be cloudless and captured over the same period as the ground surveys

284

(January-May). This was not possible for tiles over Gunbower-Koondrook-Perricoota in 2012 but

285

useful tiles were obtained from December 2011 (Table 5). Tiles were supplied by AAM as a top of

286

atmosphere corrected mosaic of the floodplain. Rapideye imagery provides spectral information at a

287

5 m pixel resolution. This is a much finer solution than the actual ground surveys (50 m × 50 m plots).

288

Consequently, the Rapideye imagery was resampled at 25 m × 25 m scale converting it to the same

289

scale as Landsat imagery. The rescaling allowed the estimation of a mean and stand deviation for

290

each pixel (Table 6).

291 292

SPOTMaps were obtained for the floodplains of interest from Astrium Services. These maps were

293

made using imagery collected from three dates: Chowilla Floodplain (19/1/2010) and Gunbower-

294

Koondrook-Perricoota Forests (30/12/2009, 2/3/2010). SPOTMaps provide three spectral bands

295

(blue, green , red) at the high resolution of 2.5 m pixels (Table 7). The SPOTMap imagery was

296

resampled to the same scale as Landsat imagery (25 m × 25 m), allowing the estimation of a mean

297

and stand deviation for each pixel.

298 299

LiDAR (Light Detection And Ranging) data was only available for Gunbower-Koondrook-Perricoota

300

Forests. The data set was provided by the MDBA included a 1-m grid of first strike heights i.e. heights

301

determined from first returns of the laser from objects. This data was converted into 1-m

302

presence/absence grids of seven height class to represent different vegetation strata within these

303

forests (Table 8). These presences were then summed over a 25 x 25 pixel array to calculate

Improving Modelling of Forest Types and Stand Condition 9

304

percentage within each strata over a 25 m x 25 m pixel. A 1-m gird of digital elevation (DEM) was also

305

supplied by the MDBA. These data sets were only used in the models of Gunbower-Koondrook-

306

Perricoota Forests.

307 308

ALOS-PALSAR data was included because it detects microwaves in the L-band, which provides

309

structural information beneath the canopy on the biomass of a forest. This information could be

310

useful in distinguishing among the different Forest Types (e.g. river red gum versus black box

311

woodland). Two PALSAR data sets with different pixel resolutions were obtained for the focal

312

floodplains (Table 9). The PALSAR 50 m Orthorectified Mosaic for Australia created by ALOS Kyoto

313

and Carbon Initiative Project was used (www.eorc.jaxa.jp/ALOS/en/kc_mosaic/kc_50_australia.htm).

314

This is a mosaic of images from June to September 2009 and included dual polarisations of HH and

315

HV. The second data set supplied by the MDBA was PALSAR at a 12.5 m resolution.

316 317

A HEIGHT-ABOVE-RIVER data set was created from the nine second DEM and the nine second DEM

318

stream network (Stein, 2006) for Australia. This data set was included to help distinguish among the

319

Forest Types, which are often found at different elevations on the floodplain.

320 321

Given the new data set available, it was decided to rebuild the tree probability layer Pr(MurrayTree).

322

The historical Landsat composite (2000-2010) provides much more consistent image than the

323

individual year composites because it is derived from ca 200 images per scene instead of a single

324

scene. The ALOS-PALSAR data set provided structural information for both focal floodplains. A new

325

model Pr(MurrayTree_new) was built by training neural networks to recognise trees in the historical

326

Landsat composite (including seasonal reflectance bands and indices) and the ALOS-PALSAR imagery.

327

The 10 000 exemplars from the Tree/No Tree layer used to build the original model were used again

328

for consistency. The predictions of Pr(MurrayTree_new) were compared visually with independent

329

imagery (Google Earth) and were found to predict trees well.

Improving Modelling of Forest Types and Stand Condition 10

Chowilla floodplain

NSW

SA Mildura

Gunbower-Koondrook -Perricoota Forests

Vic

Wellington

Echuca

Hume Dam

100 km

330

Figure 1 Location of the focal floodplains of Chowilla Floodplain and Gunbower-Koondrook-Perricoota Forests along the Murray River.

Improving Modelling of Forest Types and Stand Condition 11

331

Table 1 Landsat satellite imagery obtained over the areas of interest for the 2003, 2009 and 2010

332

composites.

333 Path

Row

Day

Month

Year

Satellite

Gunbower-Koondrook-Perricoota Forests 93

85

4

4

2003

Landsat 7

93

85

23

2

2009

Landsat 5

93

85

6

11

2009

Landsat 5

96

84

9

3

2003

Landsat 7

96

84

12

1

2009

Landsat 5

96

84

16

2

2010

Landsat 5

Chowilla Floodplain

334 335

Table 2 Spectral variables derived from Landsat5 composites for 2003, 2009 and 2010. Spectral variable

Explanation

LS Blue

Reflectance in the blue spectrum (0.45-0.52 µm)

LS Green

Reflectance in the green spectrum (0.52-0.60 µm)

LS Red

Reflectance in the red spectrum (0.63-0.69 µm)

LS NIR

Reflectance in the near infrared (0.76-0.90 µm)

LS MIR

Reflectance in the middle infrared (1.55-1.75 µm)

LS FIR

Reflectance in the far infrared (2.08-2.35 µm)

LS NDVI

Normalised difference vegetation index

336

Improving Modelling of Forest Types and Stand Condition 12

337 338 339

Table 3 Median seasonal band values derived from the historical Landsat composite (2000-2010) used in all the models. Variable

Explanation

Period

HLS Sum_Blue

Reflectance in the blue spectrum (0.45-0.52 µm)

Dec 1 to Mar 31

HLS Sum_Green

Reflectance in the green spectrum (0.52-0.60 µm)

Dec 1 to Mar 31

HLS Sum_Red

Reflectance in the red spectrum (0.63-0.69 µm)

Dec 1 to Mar 31

HLS Sum_NIR

Reflectance in the near infrared (0.76-0.90 µm)

Dec 1 to Mar 31

HLS Sum_MIR

Reflectance in the middle infrared (1.55-1.75 µm)

Dec 1 to Mar 31

HLS Sum_FIR

Reflectance in the far infrared (2.08-2.35 µm)

Dec 1 to Mar 31

HLS Aut_Blue

Reflectance in the blue spectrum (0.45-0.52 µm)

Mar 1 to Jun 30

HLS Aut_Green

Reflectance in the green spectrum (0.52-0.60 µm)

Mar 1 to Jun 30

HLS Aut_Red

Reflectance in the red spectrum (0.63-0.69 µm)

Mar 1 to Jun 30

HLS Aut_NIR

Reflectance in the near infrared (0.76-0.90 µm)

Mar 1 to Jun 30

HLS Aut_MIR

Reflectance in the middle infrared (1.55-1.75 µm)

Mar 1 to Jun 30

HLS Aut_FIR

Reflectance in the far infrared (2.08-2.35 µm)

Mar 1 to Jun 30

HLS Win_Blue

Reflectance in the blue spectrum (0.45-0.52 µm)

Jun 30 - Sept 30

HLS Win_Green

Reflectance in the green spectrum (0.52-0.60 µm)

Jun 30 - Sept 30

HLS Win_Red

Reflectance in the red spectrum (0.63-0.69 µm)

Jun 30 - Sept 30

HLS Win_NIR

Reflectance in the near infrared (0.76-0.90 µm)

Jun 30 - Sept 30

HLS Win_MIR

Reflectance in the middle infrared (1.55-1.75 µm)

Jun 30 - Sept 30

HLS Win_FIR

Reflectance in the far infrared (2.08-2.35 µm)

Jun 30 - Sept 30

HLS Spr_Blue

Reflectance in the blue spectrum (0.45-0.52 µm)

Sept 1 to Dec 31

HLS Spr_Green

Reflectance in the green spectrum (0.52-0.60 µm)

Sept 1 to Dec 31

HLS Spr_Red

Reflectance in the red spectrum (0.63-0.69 µm)

Sept 1 to Dec 31

HLS Spr_NIR

Reflectance in the near infrared (0.76-0.90 µm)

Sept 1 to Dec 31

HLS Spr_MIR

Reflectance in the middle infrared (1.55-1.75 µm)

Sept 1 to Dec 31

HLS Spr_FIR

Reflectance in the far infrared (2.08-2.35 µm)

Sept 1 to Dec 31

340

Improving Modelling of Forest Types and Stand Condition 13

341

Table 4 Median seasonal indices derived from the historical Landsat composite (2000-2010). Variable

Explanation

Period

HLS Sum_NDVI

Normalised Difference Vegetation Index

Dec 1 to Mar 31

= (B4 – B3) / (B3 + B4) HLS Sum_EVI

Enhanced Vegetation Index

Dec 1 to Mar 31

= (B4 – B3) / (B4 + 6*B3 – 7.5*B1 + 1) HLS Sum_SATVI

Soil Adjusted Total Vegetation Index

Dec 1 to Mar 31

= [ [ (B5-B3) / (B5-B3+0.5) ] * 1.5] - (B7/2) HLS Sum_SLAVI

Specific Leaf Area Vegetation Index

Dec 1 to Mar 31

= B4 / (B3 + B5) HLS Sum_NDMI

Normalised Difference Moisture Index

Dec 1 to Mar 31

= (B4 – B5) / (B4 + B5) HLS Sum_NDSI

Normalised Difference Soil Index

Dec 1 to Mar 31

= (B3 – B5) / (B3 + B5) HLS Aut_NDVI

Normalised Difference Vegetation Index

Mar 1 to Jun 30

HLS Aut_EVI

Enhanced Vegetation Index

Mar 1 to Jun 30

HLS Aut_SATVI

Soil Adjusted Total Vegetation Index

Mar 1 to Jun 30

HLS Aut_SLAVI

Specific Leaf Area Vegetation Index

Mar 1 to Jun 30

HLS Aut_NDMI

Normalised Difference Moisture Index

Mar 1 to Jun 30

HLS Aut_NDSI

Normalised Difference Soil Index

Mar 1 to Jun 30

HLS Win_NDVI

Normalised Difference Vegetation Index

Jun 30 - Sept 30

HLS Win_EVI

Enhanced Vegetation Index

Jun 30 - Sept 30

HLS Win_SATVI

Soil Adjusted Total Vegetation Index

Jun 30 - Sept 30

HLS Win_SLAVI

Specific Leaf Area Vegetation Index

Jun 30 - Sept 30

HLS Win_NDMI

Normalised Difference Moisture Index

Jun 30 - Sept 30

HLS Win_NDSI

Normalised Difference Soil Index

Jun 30 - Sept 30

HLS Spr_EVI

Enhanced Vegetation Index

Sept 1 to Dec 31

HLS Spr_SATVI

Soil Adjusted Total Vegetation Index

Sept 1 to Dec 31

HLS Spr_SLAVI

Specific Leaf Area Vegetation Index

Sept 1 to Dec 31

HLS Spr_NDMI

Normalised Difference Moisture Index

Sept 1 to Dec 31

HLS Spr_NDSI

Normalised Difference Soil Index

Sept 1 to Dec 31

HLS Spr_EVI

Enhanced Vegetation Index

Sept 1 to Dec 31

342

Improving Modelling of Forest Types and Stand Condition 14

343

Table 5 Rapideye satellite imagery obtained over the areas of interest for the 2009, 2010 and 2012

344

composites.

345 Tile ID

Location

Acquisition Dates for Composites 2009

2010

2012

Gunbower-Koondrook-Perricotta Forests 5522505

Gunbower

8/04/2009

28/01/2010

5/12/2011

5522605

Tantonan

8/04/2009

28/01/2010

5/12/2011

5522504

Cohuna

22/02/2009

28/01/2010

5/12/2011

5522604

Koondrook

22/02/2009

28/01/2010

5/12/2011

Chowilla Floodplain

346 347 348

5423415

Cal Lal

3/02/2009

6/02/2010

24/12/2012

5423414

Chowilla

7/04/2009

4/04/2010

17/01/2012

Table 6 Spectral variables derived from Rapideye imagery for 2003, 2009 and 2010. SD = standard deviation of the mean. Environmental variable

Explanation

RE Blue mean

Reflectance in the blue spectrum (0.44-0.51 µm)

RE Green mean

Reflectance in the green spectrum (0.52-0.59 µm)

RE Red mean

Reflectance in the red spectrum (0.63-0.69 µm)

RE Red edge mean

Reflectance at the red edge (0.69-0.73 µm)

RE NIR mean

Reflectance in the near infrared (0.76-0.85 µm)

RE NDVI mean

Normalised difference vegetation index

RE Blue SD

SD of reflectance in the blue spectrum

RE Green SD

SD of reflectance in the green spectrum

RE Red SD

SD of reflectance in the red spectrum

RE Red Edge SD

SD of reflectance at the red edge

RE NIR SD

SD of reflectance in the near infrared

RE NDVI SD

SD of normalised difference vegetation index

349

Improving Modelling of Forest Types and Stand Condition 15

350 351 352

Table 7 Spectral variables extracted from SPOTMap derived from SPOT 5 satellite imagery. SD = standard deviation of the mean. Spectral variable

Explanation

SPOTMap Blue mean

Reflectance in the blue spectrum (0.43-0.55 µm)

SPOTMap Green mean

Reflectance in the green spectrum (0.50-0.59 µm)

SPOTMap Red mean

Reflectance in the red spectrum (0.61-0.68 µm)

SPOTMap Blue SD

SD of reflectance in the blue spectrum (0.43-0.55 µm)

SPOTMap Green SD

SD of reflectance in the green spectrum (0.50-0.59 µm)

SPOTMap Red SD

SD of reflectance in the red spectrum (0.61-0.68 µm)

353 354 355 356 357

358 359 360

Table 8 Variables derived from the LiDAR data set over Gunbower-Koondrook-Perricoota Forests. Percentage cover values were estimated for different strata from first-strike data capture. Variable

Explanation

LiDAR DEM

Digital elevation model derived from LiDAR data set

LiDAR cover 0.5 m

Percentage cover at 0-0.5 m

LiDAR cover 1.5 m

Percentage cover at 0.5-1.5 m

LiDAR cover 2.5 m

Percentage cover at 1.5-2.5 m

LiDAR cover 4.5 m

Percentage cover at 2.5-4.5 m

LiDAR cover 8.5 m

Percentage cover at 4.5-8.5 m

LiDAR cover 16.5 m

Percentage cover at 8.5-16.5 m

LiDAR cover 32.5 m

Percentage cover at 16.5-32.5 m

Table 9 Satellite-derived variables used in the new modelling.

361 Variable

Explanation

PALSAR SEAus LL_HH

Horizontal-horizontal polarisation of L-band, 50 m resolution

PALSAR SEAus LL_HV

Horizontal-vertical polarisation of L-band, 50 m resolution

PALSAR Murray LL_HH

Horizontal-horizontal polarisation of L-band, 12.5 m resolution

PALSAR Murray LL_HV

Horizontal-vertical polarisation of L-band, 12.5 m resolution

HAR

Height above river (< 10 m accuracy)

Pr(MurrayTree_new)

Probability of trees being present built from historical Landsat composite and PALSAR data sets.

362

Improving Modelling of Forest Types and Stand Condition 16

363 364

Modelling Both neural networks and random forests were used to model all data combinations. Previously,

365

neural networks were found to provide stronger predictions than regression trees (e.g. random

366

forests) when predicting stand condition from Landsat data (Cunningham et al., 2009c). When using

367

Rapideye imagery, it was found that random forests provided stronger predictions of stand condition

368

than neural networks (Cunningham et al., 2013b). Given we were predicting both Forest Type and

369

stand condition with Landsat, Rapideye and a suite of new remotely sensed variables, which

370

modelling approach provided better predictions had to be determined.

371 372

We used feed-forward multilayer perceptron artificial neural networks learned by a backpropagation

373

algorithm (MLP neural network, Rumelhart et al., 1986) in the program Statistica 10.0 (StatSoft,

374

2011). MLP neural networks are useful for modelling ecological data, which rarely meet parametric

375

statistical assumptions and commonly involve non-linear relationships. They make no prior

376

assumptions about the relationship between the input variables and the underlying mathematical

377

distributions of the data (Özesmi et al., 2006). A MLP neural network is best conceptualized as a

378

series of layers of nodes, with connections (neurons) between each adjacent layer. Here, the neural

379

network included an input layer of remotely sensed predictor variables, hidden layers and an output

380

layer of the response variable (forest type or stand condition score). Networks were built from 20

381

random starts and the best network was chosen based on statistical fit (R2 values).

382 383

Random forests were used because they are well suited to modelling large sets of independent

384

variables, many of which may be highly correlated, they select relevant environmental variables and

385

can model interactions among variables. This modelling technique creates a forest of regression

386

trees. Individual trees relate values of a response (leaves) to its predictors through a series of binary

387

decisions or branches (Friedman, 2001). At each branch of the tree, the algorithm randomly selects a

388

small number of independent variables from all of those available and creates the node on the basis

389

of which variable minimises the model error. Random forests may be used to classify categorical

390

response variables with each tree in the forest voting to include a class (e.g. Forest Type), or as an

391

ensemble of regression trees to solve for a continuous response variable (e.g. stand condition).

392 393

Regression trees, such as random forests, overcome the inherent inaccuracies in seeking a single

394

parsimonious model by constructing an ensemble of models. Bootstrap aggregating (or bagging),

395

which is similar to model averaging, is used to improve the accuracy of predictions. Models of 20

396

bootstrap samples were fitted (individual trees) to create an ensemble tree (forest) that predicted

397

the variable of interest. We used a particular type of random forest known as predictive clustering

398

trees (Kocev et al., 2007) in the program Clus. While most decision tree learners induce classification Improving Modelling of Forest Types and Stand Condition 17

399

of regression trees, predictive clustering trees generalizes this approach by learning trees that are

400

interpreted as cluster hierarchies.

401 402

While over-fitting is often seen as a problem in statistical modelling, predictions of regression trees

403

for independent data sets are not compromised by using a large number of variables and are

404

generally superior to other methods (e.g. GLM, GAM and MARS, Elith et al., 2006). In contrast, neural

405

networks must consider all independent variables supplied simultaneously. Without an order of

406

magnitude more exemplars than predictor variables, neural networks may produce over-fitted

407

models that are less robust and validate poorly compared with those produced with a carefully

408

selected subset of independent variables.

409 410

All modelling data sets were divided randomly into separate training (60%), cross-validation (20%)

411

and model testing (20%) data sets. For the neural networks, training data were used to train the

412

network, cross-validation data were used to detect over-fitting of the network and the model testing

413

data were used as an independent test of model fit. Similarly, for the random forests, training data

414

were used to train the forest, cross-validation data were used to optimally prune the trees of the

415

forest and the model testing data were used to independently test model fit.

416 417

The accuracy of predictions from the various models was compared using the model fit (R2) for the

418

overall data set. Confusion matrices were also used to assess the accuracy of predictions for the

419

categorical variable of Forest Type but not the continuous variable of stand condition. The

420

importance of individual variables to prediction of Forest Type and stand condition were determined

421

by sensitivity analyses. The error ratio was used for the neural networks. This is the ratio of the

422

amount of error in estimating the response variable (e.g. stand condition) when a predictor variable

423

is included to when it is not included. This approach is not possible for random forests, so the

424

percentage of branches across the forest that used a predictor variable was used as a sensitivity

425

analysis.

426 427

Forest Type modelling

428

Replicating original models

429

This modelling was designed to replicate the approach used to produce Forest Type probability maps

430

in the 2009 Stand Condition report (Cunningham et al., 2009b). A total of 4500 samples were used in

431

the modelling of forest type across the two focal floodplains. The six reflectance bands from Landsat

432

imagery in 2003 and 2009, and the original tree probability layer Pr(MurrayTree) were included as

433

predictor variables in the original Forest Type models.

434 Improving Modelling of Forest Types and Stand Condition 18

435

The Forest Type maps created from existing digital maps for the 2009 report were used to create

436

random samples of the four target Forest Types: river red gum forest, river red gum woodland, river

437

red gum / black box woodland and black box woodland. For each Forest Type, 500 samples were

438

selected from Chowilla floodplain and 500 samples from Gunbower-Koondrook-Perricoota Forests.

439

The one exception being river red gum / black box woodland, which only occurs on Chowilla

440

Floodplain, so 500 samples were selected from that floodplain. One thousand samples of non-forest

441

location were selected evenly across the two focal floodplains to provide random absences.

442 443

New models

444

The new models of Forest Type used the same 4500 samples and modelling approaches but included

445

new remotely-sensed data. New remotely-sensed data included the new tree probability layer

446

Pr(MurrayTree_new), reflectance bands and indices from the historical Landsat composite, Rapideye

447

from 2009, 2010 and 2012, SPOTMap, PALSAR and height above river (Tables 3, 4, 6, 7 & 9). There

448

were a total of 104 potential variables from these new data sets, so the number of selected variables

449

needed to be reduced for the neural networks to avoid overly complex models. Fifty neural

450

networks were built to investigate the sensitivity of Forest Type predictions to the extensive list of

451

potential variables. This sensitivity analysis was used to reduce the list down to 28 variables,

452

including spring and summer medians for the six reflectance bands and three indices (NDMI, SATVI &

453

SLAVI) from the historical Landsat composite, three reflectance bands (blue, green & red) from the

454

SPTOMap, three reflectance bands (red, red edge & near infrared) from the 2009 and 2010 Rapideye

455

composites, and Pr(MurrayTree_new). This step was not necessary for the random forests as the

456

approach limits the number of variables that can be considered for each branch of the tree.

457 458

An additional eight LiDAR-derived variables were available across Gunbower-Koondrook-Perricoota

459

Forests (GKP) only (Table 8). Separate neural networks and random forests were built to predict

460

Forest Type using the samples from GKP (N = 2000), the same remotely-sensed data as the above

461

new models and the additional LiDAR-derived data. To provide a direct comparison for these

462

models, another pair of original Forest Type models was built using the same predictor variables as

463

the above original models but using the GKP samples only.

464 465

A total of eight Forest Type models were built (Table 10). These included combinations of using the

466

old and new data sets, using samples from both floodplains and just GKP to compare the

467

improvements of using LiDAR data, and different modelling approaches (neural networks and

468

random forest). The best Forest Type model was then used as an input variable into the new models

469

of stand condition.

Improving Modelling of Forest Types and Stand Condition 19

470

Stand condition modelling

471

Replicating original models

472

This modelling was designed to replicate the approach used to produce the stand condition map in

473

the 2010 Stand Condition report (Cunningham et al., 2011). The 2010 survey data set was chosen to

474

investigate potential improvements to prediction of stand condition because it produced the

475

weakest of the Landsat-based Stand Condition Models, which may be improved with the additional

476

of new data sets. The 2010 ground survey also had the best temporal alignment with the available

477

remotely-sensed data sets. A total of 75 surveys of reference sites were available from 2010 across

478

the two focal floodplains. The original models of stand condition used reflectance bands from

479

Landsat in 2009 and 2010, and the original Forest Type probability models, including Pr(MurrayTree)

480

and the models for individual types.

481 482

New models

483

The new models of stand condition included several potential improvements on the original models.

484

The potential predictor variables included Rapideye from 2009 and 2010, SPOTMap, PALSAR and

485

height above river. Forest distribution information was provided by Pr(MurrayTree_new) and the

486

best Forest Type probability model (Model 4 see results for details) built for both focal floodplains in

487

the new modelling. Given the stand condition models have a temporal component, the historical

488

(2000-2010) Landsat composite was not included in these models due to long period of capture.

489

Again the eight LiDAR-derived variables were used to predict stand condition using the samples from

490

GKP (N = 50).

491 492

A total of eight stand condition models were built (Table 10). These included combinations of using

493

the old and new data sets, using samples from both floodplains and just GKP to compare the

494

improvements of using LiDAR data, and different modelling approaches (neural networks and

495

random forest).

Improving Modelling of Forest Types and Stand Condition 20

496

Table 10 Details of the various models used to predict Forest Type and Stand Condition.

497 Model

Model type

Data sets included

Samples

Area sampled

used Forest Type models 1

Neural network

Landsat 2003 and 2009, Pr(MurrayTree)

4500

Chowilla and GKP

2

Random forest

Landsat 2003 and 2009, Pr(MurrayTree)

4500

Chowilla and GKP

3

Neural network

Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new)

4500

Chowilla and GKP

4

Random forest

Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new)

4500

Chowilla and GKP

5

Neural network

Landsat 2003 and 2009, Pr(MurrayTree)

2000

GKP

6

Random forest

Landsat 2003 and 2009, Pr(MurrayTree)

2000

GKP

7

Neural network

Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP, Pr(MurrayTree_new) & LiDAR

2000

GKP

8

Random forest

Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new) & LiDAR

2000

GKP

Stand Condition models 9

Neural network

Landsat, Pr(MurrayTree), original Pr(Forest Types)

75

Chowilla and GKP

10

Random forest

Landsat, Pr(MurrayTree), original Pr(Forest Types)

75

Chowilla and GKP

11

Neural network

Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR & SPOTMAP

75

Chowilla and GKP

12

Random forest

Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR & SPOTMAP

75

Chowilla and GKP

13

Neural network

Landsat, Pr(MurrayTree), original Pr(Forest Types)

50

GKP

14

Random forest

Landsat, Pr(MurrayTree), original Pr(Forest Types)

50

GKP

15

Neural network

Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR, SPOTMAP & LiDAR

50

GKP

16

Random forest

Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR, SPOTMAP & LiDAR

50

GKP

Improving Modelling of Forest Types and Stand Condition 21

498

Results

499 500

Modelling Forest Type

501

The neural network that predicted Forest Type from the original Landsat and tree probability

502

distribution (Model 1) had moderately accurate predictions (48.4%, Table 11). Using a random forest

503

to model the same data set (Model 2) provided a minor improvement in accuracy (50.2%, Table 12).

504

These models differed in how accurately they predicted the individual Forest Types. The random

505

forest provided the most accurate predictions for river red gum forest, river red gum / black box

506

woodland and non-forest locations while the neural network provided more accurate predictions for

507

black box woodlands and river red gum woodlands (Table 11 & 12). Important variables for

508

predicting Forest Type in the network were reflectance in blue, red and far infrared spectra (Table

509

13) while the probability of trees, and reflectance in the blue and near infrared spectra were

510

important variables in the random forest (Table 14).

511 512

Including the new remotely-sensed data sets improved the predictions of Forest Type relative to the

513

original models using both a neural network (Model 3, 56.5%, Table 15) and a random forest (Model

514

4, 58.9%, Table 16). The random forest produced more accurate predictions of river red gum forest

515

and non-forest, similar accuracies for black box woodland and river red gum woodland but a slightly

516

lower accuracy for river red gum / black box woodlands than the neural network (Tables 15 & 16).

517

Important predictor variables for the neural network included variables obtained from Rapideye (red

518

and red edge), SPOTMap (green and red) and the historical Landsat composite (far infrared in

519

summer, Table 17). In contrast, the most important predictors for the random forest were the five

520

spectral bands from the 2009 Rapideye image (Table 18).

521 522

The neural network and random forest built for Forest Types across Gunbower-Koondrook-

523

Perricoota Forests (GKP) from the original data sets had similar accuracies (53.0% and 54.2%, Models

524

5 & 6 respectively, Tables 19 & 20). However, they differed substantially in the accuracy of their

525

predictions for individual Forest Types, with the neural network providing better accuracy for black

526

box and river red gum woodlands while the random forest provided better accuracy for river red gum

527

forest and non-forest locations. Two important predictors for both models were the probability of

528

trees and reflectance in the middle infrared spectrum (Tables 21 & 22).

529 530

The models for Forest Type across GKP built using the new remotely-sensed data sets including

531

LiDAR-derived variables provided similar accuracies (63.6% and 64.9%, Models 7 & 8 respectively,

532

Tables 23 & 24). Both models were more accurate than the original models for Forest Type across

533

GKP. The random forest provided similar accuracies for river red gum woodland and river red gum Improving Modelling of Forest Types and Stand Condition 22

534

forest, more accurate predictions of non-forest locations, but a lower accuracy for black box

535

woodlands than the neural network (Tables 24 & 25). Important predictor variables for the neural

536

network included the red edge spectrum in 2009 from Rapideye, the red spectrum from the

537

SPOTMap and the probability of trees (Table 25). The most important predictors for the random

538

forest were the blue and red spectra in 2009 from Rapideye and the HH polarisation from the

539

PALSAR imagery (Table 26). In both models, LiDAR-derived variables were not important predictors

540

of Forest Type.

541 542

In summary, random forests consistently provided slightly better test accuracies than neural

543

networks when using the same data set (Table 27). The inclusion of the new data sets substantially

544

improved the predictions for the Forest Types relative to the original data set for the overall models

545

and the reduced models for GKP. Variables derived from Rapideye imagery were consistently

546

important predictors in the new models of Forest Type. Other important predictors in the new

547

models were from the SPOTMap, PALSAR and the historical Landsat composite (Tables 17, 18, 25 &

548

26). The most accurate model of Forest Types across the focal floodplains was the random forest

549

built using the new data sets (Model 4). The probability maps for Forest Types produced by Model 4

550

were included as new data sets for the modelling of stand condition.

551 552

Modelling Stand Condition

553

The neural networks consistently provided more accurate predictions of stand condition than the

554

random forests for the same data set (Table 28). The inclusion of the new data sets in the neural

555

network lead to a substantial improvement in the predictions of stand condition across the two focal

556

floodplains (compare Models 9 & 11, R2 = 0.60 and 0.81, respectively). Using new data sets including

557

LiDAR-derived variables in the neural network of stand condition at GKP did not provide an

558

improvement in predictions (compare Models 13 & 15, Table 28).

559 560

The models of stand condition for the two focal floodplains using the original data sets (Models 9 &

561

10) had the probability of trees and Forest Types as important predictors (Tables 29 & 30). When the

562

new data sets were included in models of stand condition for the two focal floodplains (Models 11 &

563

12), the most important predictors were HV polarisation from the PALSAR imagery for the neural

564

network and the new model of tree probability for the random forest (Tables 31 & 32). Rapideye

565

variables were also important predictors, particularly for the random forest.

566 567

The original models for GKP (Models 13 & 14), like those for the combined floodplains, had the

568

probability of trees and Forest Types as important predictors but reflectance from the near infrared

569

spectrum from Landsat was also an important predictor (Tables 33 & 34). The new models for GKP Improving Modelling of Forest Types and Stand Condition 23

570

(Models 15 & 16) also had reflectance from the near infrared as an important predictor but from

571

Rapideye imagery instead of Landsat (Tables 35 & 36). Other important predictors included LiDAR

572

cover below 0.5 m and probability of non-forest in neural network and probability of trees in the

573

random forest.

Improving Modelling of Forest Types and Stand Condition 24

574

Table 11 Confusion matrix of Forest Types predicted by Model 1 (neural network using the original

575

data sets) for the samples used to test the model (N = 903 samples).

576 Observed Forest Type Predicted

Non-forest

Black box

Forest Type

RRG-black

RRG

box

woodland

RRG Forest

Non-forest

105

33

17

22

9

Black box

66

106

49

34

18

RRG-black box

4

9

24

27

15

RRG woodland

25

12

13

90

40

RRG forest

16

15

7

35

112

48.6%

60.6%

21.8%

43.3%

57.7%

Total

woodland

Accuracy

48.4%

577 578

Table 12 Confusion matrix of Forest Types predicted by Model 2 (random forest using the original

579

data sets) for the samples used to test the model (N = 903 samples).

580 Observed Forest Type Predicted

Non-forest

Black box

Forest Type

RRG-black

RRG

box

woodland

RRG Forest

Non-forest

132

43

22

23

16

Black box

34

89

30

27

24

13

12

40

32

12

RRG woodland

22

15

8

75

25

RRG forest

15

16

10

51

117

61.1%

50.9%

36.4%

36.1%

60.3%

RRG-black box woodland

Accuracy

Total

50.2%

581 Improving Modelling of Forest Types and Stand Condition 25

582 583 584 585

Table 13 Sensitivity analysis for variables used in Model 1 (neural network using the original data sets) to predict Forest Type. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable

Error ratio

LS(2003) FIR

4.06

LS(2003) Red

3.96

LS(2003) Blue

2.54

LS(2009) Blue

2.48

LS(2009) IR

2.18

LS(2009) Red

1.92

LS(2009) FIR

1.92

LS(2003) IR

1.92

LS(2003) Green

1.60

LS(2009) Green

1.32

LS(2003) NIR

1.23

Pr(MurrayTree)

1.15

LS(2009) NIR

1.04

586 587 588 589 590

Table 14 Sensitivity analysis for variables used in Model 2 (random forest using the original data sets) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable

% Forests used in

Pr(MurrayTree)

10.8

LS(2003) NIR

8.8

LS(2003) Blue

8.7

LS(2003) MIR

8.6

LS(2009) MIR

8.2

LS(2003) Red

8.1

LS(2003) Green

7.8

LS(2003) FIR

7.8

LS(2009) Blue

7.0

LS(2009) NIR

6.8

LS(2009) FIR

6.1

LS(2009) Red

5.9

LS(2009) Green

5.3

591 Improving Modelling of Forest Types and Stand Condition 26

592 593

Table 15 Confusion matrix of Forest Types predicted by Model 3 (neural network using the new data sets) for the samples used to test the model (N = 903 samples).

594 Observed Forest Type Predicted

Non-forest

Black box

Forest Type

RRG-black

RRG

box

woodland

RRG Forest

Non-forest

113

25

18

19

9

Black box

56

125

20

34

10

10

10

58

26

13

RRG woodland

22

5

8

84

32

RRG forest

15

10

6

45

130

52.3%

71.4%

52.7%

40.4%

67.0%

RRG-black box woodland

Accuracy

Total

56.5%

595 596

Table 16 Confusion matrix of Forest Types predicted by Model 4 (random forest using the new data

597

sets) for the samples used to test the model (N = 903 samples).

598 Observed Forest Type Predicted

Non-forest

Black box

Forest Type

RRG-black

RRG

box

woodland

RRG Forest

Non-forest

134

27

13

11

8

Black box

37

124

24

24

8

9

7

49

33

14

RRG woodland

21

3

12

85

24

RRG forest

15

14

12

55

140

62.0%

70.8%

44.5%

40.9%

72.2%

RRG-black box woodland

Accuracy

Total

58.9%

599 Improving Modelling of Forest Types and Stand Condition 27

600 601 602 603

Table 17 Sensitivity analysis for variables used in Model 3 (neural network using the new data sets) to predict Forest Type. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable

Error ratio

RE (2009) Red edge mean

4.298

SPOTMap Green mean

4.406

SPOTMap Red mean

5.786

HLS Sum_FIR

3.305

RE (2009) Red mean

3.829

HLS Sum_Blue

1.980

HLS Spr_Green

2.217

RE (2010) Red edge mean

2.487

HLS Spr_FIR

2.423

SPOTMap Blue mean

2.612

HLS Spr_MIR

3.050

HLS Sum_MIR

2.805

RE (2010) Red mean

2.121

HLS Sum_Red

2.674

HLS Sum_Green

3.003

HLS Sum_SATVI

2.023

HLS Spr_NIR

1.746

HLS Spr_Red

1.595

HLS Sum_NIR

1.509

RE (2009) NIR mean

1.765

HLS Spr_Blue

1.106

HLS Spr_SATVI

1.392

HLS Spr_SLAVI

1.338

RE (2010) NIR mean

1.200

HLS Sum_SLAVI

1.221

HLS Spr_NDMI

1.241

HLS Sum_NDMI

1.233

Pr(MurrayTree_new)

1.087

604

Improving Modelling of Forest Types and Stand Condition 28

605 606 607 608

Table 18 Sensitivity analysis for variables used in Model 4 (random forest using the new data sets) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable RE (2009) Blue mean RE (2009) Red edge mean RE (2009) Red mean RE (2009) Green mean RE (2009) NIR mean RE (2009) NDVI SD RE (2009) Blue SD RE (2010) NIR mean Pr(MurrayTree_new) RE (2009) NDVI mean RE (2010) Red edge mean RE (2009) Green SD PALSAR Murray LL_HH RE (2010) Red mean RE (2009) NIR SD RE (2010) Green mean RE (2010) NIR SD RE (2009) Red SD PALSAR Murray LL_HV RE (2012) NIR mean RE (2010) Blue SD RE (2010) NDVI mean RE (2012) Blue mean RE (2010) Blue mean RE (2009) Red edge SD PALSAR SEAus LL_HH RE (2012) Green mean RE (2012) NDVI mean RE (2010) Red edge SD RE (2012) NIR SD RE (2012) NDVI SD RE (2010) Green SD RE (2010) NDVI SD RE (2012) Blue SD RE (2012) Green SD RE (2012) Red mean RE (2012) Red edge mean PALSAR SEAus LL_HV RE (2010) Red SD RE (2012) Red SD RE (2012) Red edge SD SPOTMap Green mean HLS Win_Blue SPOTMap Blue mean SPOTMap Red mean HLS Win_NIR HLS Sum_NIR HLS Sum_NDSI

% Forests used in 1.93 1.82 1.75 1.72 1.68 1.65 1.64 1.64 1.56 1.55 1.50 1.48 1.48 1.47 1.46 1.45 1.44 1.42 1.41 1.39 1.38 1.37 1.37 1.36 1.35 1.33 1.30 1.30 1.29 1.28 1.21 1.19 1.18 1.17 1.17 1.13 1.13 1.12 1.06 1.06 1.00 0.99 0.96 0.93 0.92 0.92 0.92 0.91

609

Improving Modelling of Forest Types and Stand Condition 29

610 611 612 613

Table 18 (cont.) Sensitivity analysis for variables used in Model 4 (random forest using the new data sets) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable HLS Spr_EVI HLS Spr_SATVI HLS Win_FIR HLS Spr_NDSI HLS Sum_SATVI HLS Win_SLAVI HLS Spr_NIR HLS Win_NDVI HLS Win_Red HLS Spr_Red HLS Win_NDSI SPOTMap Red SD HLS Win_EVI HLS Sum_MIR HLS Win_Green HLS Spr_NDMI HLS Sum_EVI HLS Aut_NDSI HLS Spr_Blue HLS Win_SATVI SPOTMap Green SD HLS Spr_Green HLS Spr_SLAVI HLS Sum_Blue HLS Aut_NIR HLS Win_NDMI HLS Spr_FIR SPOTMap Blue SD HLS Sum_Green HLS Sum_NDMI HLS Win_MIR HLS Aut_EVI HLS Aut_SATVI HLS Sum_FIR HAR HLS Sum_SLAVI HLS Spr_MIR HLS Sum_Red HLS Aut_NDMI HLS Aut_Red HLS Aut_MIR HLS Sum_NDVI HLS Aut_NDVI HLS Spr_NDVI HLS Aut_SLAVI HLS Aut_Blue HLS Aut_Green HLS Aut_FIR

% Forests used in 0.90 0.90 0.89 0.88 0.88 0.87 0.86 0.86 0.84 0.84 0.84 0.83 0.83 0.82 0.81 0.81 0.81 0.81 0.79 0.79 0.78 0.78 0.78 0.77 0.77 0.77 0.76 0.75 0.74 0.74 0.73 0.73 0.73 0.72 0.70 0.69 0.68 0.68 0.68 0.66 0.66 0.63 0.63 0.62 0.62 0.60 0.57 0.47

614

Improving Modelling of Forest Types and Stand Condition 30

615 616

Table 19 Confusion matrix of Forest Types predicted by Model 5 (neural network using the original data sets for GKP only) for the samples used to test the model (N = 404 samples).

617 Observed Forest Type Predicted Forest

Non-forest

Black box

RRG woodland

RRG Forest

Non-forest

41

13

7

5

Black box

43

55

18

16

RRG woodland

14

11

52

13

RRG forest

14

12

24

66

36.6%

60.4%

66.0%

51.5%

Total

Type

Accuracy

53.0

618

Table 20 Confusion matrix of Forest Types predicted by Model 6 (random forest using the original

619

data sets for GKP only) for the samples used to test the model (N = 404 samples).

620 Observed Forest Type Predicted Forest

Non-forest

Black box

RRG woodland

RRG Forest

Non-forest

61

25

14

11

Black box

25

43

12

15

RRG woodland

13

8

54

13

RRG forest

13

15

21

61

54.5%

47.3%

53.5%

61.0%

Total

Type

Accuracy

54.2%

621

Improving Modelling of Forest Types and Stand Condition 31

622 623 624 625

Table 21 Sensitivity analysis for variables used in Model 5 (neural network using the original data sets for GKP only) to predict Forest Type. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable

Error ratio

Pr(MurrayTree)

3.32

LS(2003) FIR

2.49

LS(2003) MIR

1.77

LS(2009) MIR

1.40

LS(2009) Red

1.36

LS(2009) NIR

1.31

LS(2009) Green

1.18

LS(2003) Green

1.14

LS(2009) Blue

1.11

LS(2009) FIR

1.08

LS(2003) Red

1.08

LS(2003) NIR

1.07

LS(2003) Blue

1.04

626 627 628 629 630

Table 22 Sensitivity analysis for variables used in Model 6 (random forest using the original data sets for GKP only) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable

% Forests used in

Pr(MurrayTree)

11.25

LS(2003) MIR

8.89

LS(2003) Red

8.76

LS(2003) FIR

8.65

LS(2009) MIR

8.51

LS(2003) NIR

8.1

LS(2003) Blue

7.8

LS(2003) Green

7.26

LS(2009) NIR

7.16

LS(2009) FIR

6.51

LS(2009) Red

6.26

LS(2009) Blue

6.11

LS(2009) Green

4.75

631 Improving Modelling of Forest Types and Stand Condition 32

632 633

Table 23 Confusion matrix of Forest Types predicted by Model 7 (neural network using the new data sets including LiDAR for GKP only) for the samples used to test the model (N = 404 samples).

634 Observed Forest Type Predicted Forest

Non-forest

Black box

RRG woodland

RRG Forest

Non-forest

62

11

10

8

Black box

24

67

14

5

RRG woodland

18

6

60

19

RRG forest

8

7

17

68

55.4%

73.6%

59.4%

68.0%

Total

Type

Accuracy

63.6%

635

Table 24 Confusion matrix of Forest Types predicted by Model 8 (random forest using the new data

636

sets including LiDAR for GKP only) for the samples used to test the model (N = 404 samples).

637 Observed Forest Type Predicted Forest

Non-forest

Black box

RRG woodland

RRG Forest

Non-forest

76

20

13

12

Black box

19

59

8

5

RRG woodland

13

5

57

13

RRG forest

4

7

23

70

67.9%

64.8%

57.0%

69.3%

Total

Type

Accuracy

64.9%

638

Improving Modelling of Forest Types and Stand Condition 33

639 640 641 642

Table 25 Sensitivity analysis for variables used in Model 7 (neural network using the new data sets including LiDAR for GKP only) to predict Forest Type. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable RE (2009) Red edge mean SPOTMap Red mean

Error ratio 1.636 1.453

Pr(MurrayTree_new)

1.410

HLS Sum_Blue

1.362

RE (2009) MIR mean

1.291

HLS Spr_Green

1.276

HLS Sum_MIR

1.269

RE (2009) Red mean

1.265

RE (2010) Red mean

1.247

HLS Spr_MIR

1.230

HLS Spr_FIR

1.227

SPOTMap Green mean

1.210

SPOTMap Blue mean

1.185

HLS Sum_MIR

1.181

HLS Sum_SATVI

1.164

HLS Spr_NDMI

1.143

HLS Sum_NDMI

1.131

HLS Spr_SATVI

1.127

LiDAR cover 0.5 m

1.124

HLS Spr_SLAVI

1.116

HLS Spr_Red

1.113

LiDAR cover 16.5 m

1.108

LiDAR DEM

1.101

LiDAR cover 8.5 m

1.100

RE (2010) NIR mean

1.096

HLS Sum_NIR

1.095

HLS Spr_NIR

1.081

HLS Sum_SLAVI

1.068

RE (2010) MIR mean

1.064

LiDAR cover 4.5 m

1.033

HLS Sum_Green

1.030

HLS Spr_Blue

1.027

HLS Sum_Red

1.024

LiDAR cover 2.5 m

1.019

LiDAR cover 32.5 m

1.003

LiDAR cover 1.5 m

1.002

Improving Modelling of Forest Types and Stand Condition 34

643 644 645 646

Table 26 Sensitivity analysis for variables used in Model 8 (random forest using the new data sets including LiDAR for GKP only) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable RE (2009) Blue mean RE (2009) Red mean PALSAR SEAus LL_HH RE (2012) Blue mean RE (2012) NDVI mean RE (2009) Red edge mean RE (2009) Red SD RE (2009) NIR mean Pr(MurrayTree_new) RE (2009) Red edge SD RE (2009) NDVI SD RE (2009) Green mean LiDAR DEM RE (2009) Blue SD RE (2009) Green SD RE (2010) Green mean RE (2010) NIR SD RE (2012) NIR mean RE (2010) Blue mean RE (2010) NIR mean RE (2012) Red mean RE (2009) NIR SD RE (2010) Red edge mean RE (2010) Green SD HLS Spr_SATVI RE (2009) NDVI mean RE (2012) Blue SD HLS Spr_NIR RE (2010) NDVI mean PALSAR SEAus LL_HV RE (2010) Red SD RE (2012) Red SD HLS Win_Blue HLS Sum_SATVI RE (2012) NIR SD PALSAR Murray LL_HV HLS Sum_NDSI RE (2010) Blue SD LiDAR cover 16.5 m RE (2012) Green mean HLS Win_FIR RE (2012) NDVI SD HLS Aut_NDSI HLS Win_NDVI RE (2012) Red edge mean PALSAR Murray LL_HH HLS Spr_NDSI HLS Win_NDSI RE (2010) Red mean RE (2012) Green SD LiDAR cover 8.5 m

% Forests used in 1.78 1.77 1.64 1.60 1.58 1.53 1.51 1.51 1.49 1.48 1.48 1.47 1.47 1.40 1.32 1.30 1.30 1.30 1.28 1.26 1.26 1.24 1.24 1.23 1.23 1.22 1.20 1.20 1.16 1.16 1.14 1.13 1.13 1.10 1.09 1.07 1.07 1.05 1.05 1.03 1.03 1.02 1.02 0.99 0.98 0.98 0.98 0.96 0.94 0.93 0.93 Improving Modelling of Forest Types and Stand Condition 35

647

Table 26(cont.) Sensitivity analysis for variables used in Model 8 Variable HLS Win_NDMI HLS Aut_SATVI HLS Spr_EVI HLS Win_NIR HLS Aut_Blue RE (2010) Red edge SD HLS Win_SATVI HLS Aut_SLAVI HLS Sum_EVI HLS Spr_MIR RE (2012) Red edge SD HLS Win_Green HLS Win_Red HLS Win_EVI LiDAR cover 4.5 m RE (2010) NDVI SD HLS Spr_NDMI HAR HLS Spr_Blue HLS Aut_EVI HLS Sum_NIR HLS Win_MIR HLS Spr_NDVI SPOTMap Blue mean HLS Aut_NIR HLS Spr_Red HLS Aut_Red HLS Sum_SLAVI HLS Spr_Green HLS Sum_Blue HLS Aut_Green SPOTMap Red mean SPOTMap Green mean HLS Aut_MIR HLS Win_SLAVI HLS Sum_NDVI HLS Sum_Green HLS Spr_SLAVI SPOTMap Red SD HLS Spr_FIR HLS Sum_MIR SPOTMap Green SD HLS Aut_NDVI HLS Sum_NDMI SPOTMap Blue SD LiDAR cover 2.5 m LiDAR cover 0.5 m HLS Sum_FIR HLS Sum_Red LiDAR cover 1.5 m HLS Aut_FIR HLS Aut_NDMI LiDAR cover 32.5 m

% Forests used in 0.92 0.92 0.90 0.89 0.89 0.88 0.88 0.88 0.85 0.84 0.82 0.82 0.82 0.82 0.82 0.81 0.81 0.79 0.79 0.79 0.77 0.76 0.75 0.73 0.73 0.72 0.71 0.71 0.69 0.69 0.69 0.68 0.67 0.67 0.67 0.64 0.63 0.63 0.62 0.60 0.59 0.58 0.58 0.56 0.51 0.51 0.50 0.47 0.46 0.46 0.45 0.41 0.03 Improving Modelling of Forest Types and Stand Condition 36

648

Table 27 Details and summary of the model fits (R2) for the various models used to predict Forest Type.

649 Model

Model type

Data sets included

Samples used

Area sampled

Overall

Test accuracy

accuracy 1

Neural network

Landsat 2003 and 2009, Pr(MurrayTree_new)

4500

Chowilla and GKP

53.1%

48.4%

2

Random forest

Landsat 2003 and 2009, Pr(MurrayTree_new)

4500

Chowilla and GKP

82.2%

50.2%

3

Neural network

Historical Landsat, Rapideye, PALSAR, HAR,

4500

Chowilla and GKP

60.6%

56.5%

4500

Chowilla and GKP

88.0%

58.9%

SPOTMAP & Pr(MurrayTree_new) 4

Random forest

Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new)

5

Neural network

Landsat 2003 and 2009, Pr(MurrayTree_new)

2000

GKP

57.0%

53.0%

6

Random forest

Landsat 2003 and 2009, Pr(MurrayTree_new)

2000

GKP

83.1%

54.2%

7

Neural network

Historical Landsat, Rapideye, PALSAR, HAR,

2000

GKP

69.2%

63.6%

2000

GKP

89.5%

64.9%

SPOTMAP, Pr(MurrayTree_new) & LiDAR 8

Random forest

Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new) & LiDAR

650

Improving Modelling of Forest Types and Stand Condition 37

651

Table 28 Details and summary of the model fits (R2) for the various models used to predict Stand Condition. Model Model type Data sets included Samples

Area sampled

9

Neural network

Landsat, Pr(MurrayTree_new), original Pr(Forest Types)

75

10

Random forest

Landsat, Pr(MurrayTree_new), original Pr(Forest Types)

11

Neural network

Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye,

Overall R2

Test R2

Chowilla and GKP

0.69

0.60

75

Chowilla and GKP

0.81

0.55

75

Chowilla and GKP

0.77

0.81

75

Chowilla and GKP

0.81

0.59

PALSAR, HAR & SPOTMAP 12

Random forest

Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR & SPOTMAP

13

Neural network

Landsat, Pr(MurrayTree_new), original Pr(Forest Types)

50

GKP

0.69

0.73

14

Random forest

Landsat, Pr(MurrayTree_new), original Pr(Forest Types)

50

GKP

0.77

0.36

15

Neural network

Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye,

50

GKP

0.67

0.72

50

GKP

0.76

0.52

PALSAR, HAR, SPOTMAP & LiDAR 16

Random forest

Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR, SPOTMAP & LiDAR

Improving Modelling of Forest Types and Stand Condition 38

Table 29 Sensitivity analysis for variables used in Model 9 (neural network using Landsat, and the original Forest Type probability maps) to predict Stand Condition. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type.

Variable Pr(MurrayTree) Pr(RF)

Error ratio 1.67 1.42

Pr(RGBB)

1.39

LS(2009) NIR

1.28

LS(2009) FIR

1.13

LS(2009) MIR

1.08

LS(2009) Green

1.06

LS(2010) NIR

1.05

LS(2009) NDVI

1.04

LS(2010) FIR

1.02

LS(2009) Red

1.01

Pr(BX)

1.00

LS(2010) MIR

1.00

LS(2010) Blue

1.00

LS(2010) Green

1.00

LS(2010) Red

1.00

Pr(RW)

1.00

LS(2009) Blue

0.99

Pr(BB)

0.98

Improving Modelling of Forest Types and Stand Condition 39

Table 30 Sensitivity analysis for variables used in Model 10 (random forest using Landsat, and the original Forest Type probability maps) to predict Stand Condition. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable Pr(MurrayTree) Pr(RW)

% Forest used in 8.96 7.51

Pr(RF)

6.94

Pr(RGBB)

6.65

LS(2009) Blue

6.65

LS(2009) NDVI

6.65

LS(2010) Blue

6.36

LS(2010) MIR

5.20

LS(2010) Blue

4.91

LS(2009) MIR

4.34

LS(2009) FIR

4.05

LS(2009) NIR

3.76

Pr(BB)

3.47

LS(2009) Green

3.47

LS(2009) Red

3.47

LS(2010) NDVI

3.47

Pr(BX)

3.18

LS(2010) NIR

3.18

LS(2010) FIR

3.18

LS(2010) Red

2.60

LS(2010) Green

2.02

Improving Modelling of Forest Types and Stand Condition 40

Table 31 Sensitivity analysis for variables used in Model 11 (neural network using Landsat, the new Forest Type probability maps, Rapideye, PALSAR, HAR & SPOTMAP) to predict Stand Condition. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable Error ratio PALSAR SEAus LL_HV 1.23 RE (2009) MIR mean 1.20 RE (2009) NDVI SD 1.20 Pr(RF_M4) 1.17 RE (2009) NDVI mean 1.15 SPOTMap Blue SD 1.12 Pr(RGBB_M4) 1.11 PALSAR SEAus LL_HH 1.10 Pr(Non-forest_M4) 1.07 Pr(BB_M4) 1.06 SPOTMap Red mean 1.05 RE (2010) MIR mean 1.05 Pr(RW_M4) 1.04 PALSAR Murray LL_HV 1.04 RE (2009) Red SD 1.03 RE (2009) Red edge mean 1.03 SPOTMap Green mean 1.03 SPOTMap Blue mean 1.03 RE (2010) Red edge SD 1.03 SPOTMap Green SD 1.03 RE (2009) Blue SD 1.03 RE (2010) Red SD 1.03 RE (2009) Blue mean 1.02 SPOTMap Red SD 1.02 RE (2010) Blue SD 1.02 Pr(MurrayTree_new) 1.02 RE (2010) NDVI mean 1.01 PALSAR Murray LL_HH 1.01 RE (2009) NIR SD 1.01 RE (2010) Red edge mean 1.01 HAR 1.01 RE (2009) Green SD 1.01 RE (2010) NIR SD 1.01 RE (2010) Green SD 1.01 RE (2009) Red edge SD 1.00 RE (2010) Red mean 1.00 RE (2010) NDVI SD 1.00 RE (2009) Green mean 1.00 RE (2010) Green mean 1.00 RE (2009) Red mean 1.00 RE (2010) Blue mean 0.99

Improving Modelling of Forest Types and Stand Condition 41

Table 32 Sensitivity analysis for variables used in Model 12 (random forest using Landsat, the new Forest Type probability maps, Rapideye, PALSAR, HAR & SPOTMAP) to predict Stand Condition. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable Pr(MurrayTree_new) RE (2009) NIR mean RE (2009) Blue mean RE (2009) Green mean RE (2009) Red mean RE (2009) Green SD RE (2009) Red SD RE (2009) NDVI mean RE (2009) Red edge mean RE (2009) Red edge SD RE (2009) NIR SD RE (2010) NDVI mean RE (2010) NIR SD RE (2010) Blue SD PALSAR Murray LL_HH RE (2009) Blue SD RE (2010) Green mean RE (2010) Red edge mean RE (2010) Red edge SD RE (2009) NDVI SD RE (2010) NIR mean PALSAR SEAus LL_HH SPOTMap Blue SD SPOTMap Green mean RE (2010) Blue SD RE (2010) Red mean Pr(RW_M4) PALSAR Murray LL_HV SPOTMap Blue mean SPOTMap Red SD RE (2010) Red mean RE (2010) NDVI SD Pr(BB_M4) SPOTMap Red mean RE (2010) Green SD HAR PALSAR SEAus LL_HV SPOTMap Green SD Pr(Non-forest_M4) Pr(BB_M4) Pr(RF_M4)

% Forest used in 8.75 5.83 5.54 4.37 4.37 4.08 4.08 3.79 3.50 3.50 3.50 3.50 3.21 2.92 2.62 2.62 2.62 2.62 2.62 2.04 2.04 1.75 1.75 1.75 1.75 1.46 1.46 1.17 1.17 1.17 1.17 1.17 0.87 0.87 0.87 0.87 0.58 0.58 0.58 0.58 0.29

Improving Modelling of Forest Types and Stand Condition 42

Table 33 Sensitivity analysis for variables used in Model 13 (neural network using Landsat and the original Forest Type probability maps for GKP) to predict Stand Condition. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable LS(2009) NIR Pr(MurrayTree) Pr(RF) LS(2010) NIR Pr(BX) LS(2009) FIR LS(2009) MIR Pr(BB) LS(2009) NDVI LS(2010) NDVI LS(2009) Blue Pr(RW) LS(2010) MIR LS(2010) Red LS(2010) FIR LS(2010) Blue LS(2009) Red LS(2010) Green LS(2009) Green Pr(RGBB)

Error ratio 1.39 1.39 1.17 1.08 1.05 1.05 1.05 1.02 1.02 1.02 1.02 1.01 1.01 1.00 1.00 1.00 0.99 0.99 0.98 0.98

Improving Modelling of Forest Types and Stand Condition 43

Table 34 Sensitivity analysis for variables used in Model 14 (random forest using Landsat and the original Forest Type probability maps for GKP) to predict Stand Condition. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable Pr(BX) LS(2009) NIR Pr(RF) Pr(RW) Pr(RGBB) LS(2009) NDVI Pr(MurrayTree) LS(2009) MIR Pr(BX) LS(2010) NIR LS(2010) NDVI LS(2009) Blue LS(2009) FIR LS(2009) Red LS(2010) MIR LS(2010) Red LS(2010) FIR LS(2010) Blue LS(2010) Green LS(2009) Green

% Forest used in 8.22 8.22 7.31 6.85 6.85 6.85 6.39 5.94 5.48 4.57 4.57 4.11 4.11 3.65 3.65 3.2 3.2 2.74 2.28 1.83

Improving Modelling of Forest Types and Stand Condition 44

Table 35 Sensitivity analysis for variables used in Model 15 (neural network using Landsat, the new Forest Type probability maps, Rapideye, PALSAR, HAR, SPOTMAP and LiDAR for GKP) to predict Stand Condition. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable RE (2009) NIR mean LiDAR cover 0.5 m Pr(Non-forest_M4) LiDAR cover 8.5 m Pr(BB_M4) Pr(RW_M4) RE (2010) Blue mean PALSAR Murray LL_HH RE (2010) Blue SD LiDAR cover 16.5 m RE (2009) NDVI mean RE (2009) Blue mean PALSAR Murray LL_HV Pr(RGBB_M4) PALSAR SEAus LL_HV LS(2010) NIR LS(2010) Green LiDAR cover 1.5 m RE (2010) Red mean SPOTMap Blue mean LiDAR DEM RE (2010) NDVI mean SPOTMap Green SD RE (2010) Green SD RE (2010) Red edge mean RE (2009) Green mean SPOTMap Red mean SPOTMap Red SD SPOTMap Green mean Pr(MurrayTree_new) LiDAR cover 4.5 m RE (2009) Red edge mean RE (2009) Red mean SPOTMap Blue SD PALSAR SEAus LL_HH RE (2010) Red edge SD LiDAR cover 2.5 m HAR RE (2010) Red SD RE (2010) NIR SD RE (2009) NDVI SD RE (2009) Green SD RE (2009) NIR SD RE (2009) Red SD Pr(RF_M4) RE (2009) Red edge SD RE (2009) Blue SD RE (2010) NDVI SD LiDAR cover 32.5 m

Error ratio 1.052 1.046 1.029 1.019 1.019 1.014 1.013 1.013 1.013 1.012 1.011 1.008 1.008 1.007 1.007 1.006 1.006 1.005 1.004 1.004 1.004 1.003 1.003 1.002 1.002 1.002 1.002 1.001 1.001 1.001 1.001 1.000 1.000 1.000 0.999 0.999 0.998 0.998 0.998 0.997 0.995 0.995 0.994 0.991 0.989 0.989 0.987 0.982 0.975

Improving Modelling of Forest Types and Stand Condition 45

Table 36 Sensitivity analysis for variables used in Model 16 (random forest using Landsat, the new Forest Type probability maps, Rapideye, PALSAR, HAR, SPOTMAP and LiDAR for GKP) to predict Stand Condition. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable RE (2009) NIR SD RE (2009) NIR mean Pr(MurrayTree_new) RE (2009) Blue SD RE (2009) Green SD SPOTMap Red mean RE (2009) Red edge SD RE (2010) NDVI mean RE (2009) Blue mean RE (2009) Red SD RE (2009) NDVI mean RE (2010) Green mean RE (2010) Blue SD RE (2010) Red mean RE (2009) Green mean RE (2010) Blue mean RE (2010) NIR mean RE (2009) Red mean RE (2010) Red edge mean RE (2010) NIR SD HAR SPOTMap Blue mean RE (2009) Red edge mean RE (2010) Green SD RE (2010) NDVI SD PALSAR Murray LL_HH PALSAR SEAus LL_HV RE (2009) Red edge SD LiDAR cover 0.5 m LiDAR cover 4.5 m Pr(BB_M4) RE (2009) NDVI SD RE (2010) Red SD LiDAR cover 2.5 m LiDAR cover 8.5 m LiDAR DEM Pr(RW_M4) SPOTMap Green mean SPOTMap Green SD PALSAR SEAus LL_HH Pr(Non-forest_M4) Pr(RF_M4) PALSAR Murray LL_HV SPOTMap Red SD SPOTMap Blue SD LiDAR cover 1.5 m LiDAR cover 16.5 m LiDAR cover 32.5 m Pr(RGBB_M4)

% Forest used in 6.28 5.83 4.48 4.48 4.48 4.48 4.04 4.04 3.59 3.59 3.59 3.59 3.14 3.14 2.69 2.69 2.69 2.24 2.24 2.24 2.24 2.24 1.79 1.79 1.79 1.79 1.79 1.35 1.35 1.35 1.35 0.90 0.90 0.90 0.90 0.90 0.90 0.45 0.45 0.45 0.45 0.45 0 0 0 0 0 0 0

Improving Modelling of Forest Types and Stand Condition 46

Discussion Modelling Forest Type The probability maps for the Forest Types were improved by using an alternative modelling approach and new remotely-sensed data sets. The original model of Forest Types from the 2009 Condition Report (Cunningham et al., 2009b) was built using a neural network. Here, we found that using random forests slightly improved the accuracy of the Forest Type model compared with neural networks. Substantial improvements in the accuracy of the Forest Type model across the focal floodplains were achieved by using the new remotely-sensed data sets (50% to 59% using random forests, Table 27). Similarly, the Forest Type model restricted to Gunbower-Koondrook-Perricoota Forests was improved substantially by the inclusion of the new remotely-sensed data sets (54% to 65% using random forests, Table 27). New remotely-sensed data sets that were important predictors of Forest Type included variables derived from Rapideye, the historical Landsat composite, SPOTMap and PALSAR imagery. Imagery from the Rapideye and SPOT satellites have a much finer resolution, 5 m and 2.5 m respectively, than Landsat, which provides reflectance at a 25 m scale. The finer resolution of Rapideye and SPOTMap imagery would provide more accurate estimates of reflectance at the 25 m scale than Landsat imagery. This increased accuracy of reflectance measurements may have improved the accuracy of predictions for Forest Types. Rapideye provides the red edge spectral band that measures between 680-730 nm, which is predominantly not measured by Landsat. The red edge is valuable in determining the physiological condition of vegetation, as it is directly related to chlorophyll production (Boochs et al., 1990). It may be that the red edge provided useful information on the difference in the amount of canopy (woodland versus forest) or differences in spectral characteristics among species. Summer reflectance variables from the historical Landsat composite were important predictors in the neural networks of Forest Type. Given this composite was created from images over a long period (2000-2010), it would provide a more consistent image than individual scenes, which have a large amount of variation due to atmospheric conditions and sensor error. Having a more consistent measurement of reflectance across the floodplain provided better differentiation among the Forest Types. The same historical Landsat composite was used to successfully distinguish among stands of river red gum, black box and coolabah (Cunningham et al., 2013d).

Improving Modelling of Forest Types and Stand Condition 47

LiDAR and PALSAR data sets were included in the modelling to provide structural information beneath the canopy that reflectance data cannot provide. LiDAR derived-variables were not found to be good predictors of Forest Type at Gunbower-Koondrook-Perricoota Forests. This is surprising as we estimated the percentage cover at seven strata between 0 and 32.5 m. These cover variables were expected to help differentiate between different structural types (river red gum forest and woodland) due to differences in the canopy height and cover. In contrast, the HH polarisation from the PALSAR imagery was an important predictor of Forest Type in Gunbower-Koondrook-Perricoota Forests. Modelling Stand Condition Predictions of stand condition in 2010 for the 75 sites across the two focal floodplains were substantially more accurate from the neural networks than the random forests (Table 28). In a previous report (Cunningham et al., 2013b), we found that stand condition in 2010, based on the complete data set of 175 sites was best predicted by neural networks when Landsat imagery was used. However, random forests provided more accurate predictions than neural networks when Rapideye imagery was used. This demonstrates that the modelling approach that provides the most accurate predictions of stand condition is dependent on both the survey data set and remotelysensed data sets included in the modelling. Therefore, different modelling approaches should be explored whenever including new data sets. Important predictors of stand condition from the new remotely-sensed data sets included variables derived from Rapideye, PALSAR and LiDAR, and the new tree presence and Forest Type probabilities. As was found previously, Rapideye provides more useful spectral data for predicting stand condition than Landsat (Cunningham et al., 2013b). LiDAR and PALSAR data are likely to help differentiate between good and degraded condition stands, detecting the structurally complexity of good condition stands compared with stands with little canopy and/or branches. Similarly, the probabilities of trees and the probability of non-forest would be useful estimates of the amount of canopy and, therefore, the stand condition of a location. The important LiDAR variable was cover below 0.5 m suggesting stand condition is associated with differences in understorey structure. This is consistent with the observed increase in plant richness of the understorey with decreasing stand condition of river red gum forests (Horner et al., 2012).

Improving Modelling of Forest Types and Stand Condition 48

Conclusions The modelling reported here for two focal floodplains of the Murray River suggests that the prediction of stand condition across the Murray River could be improved by inclusion of new remotely-sensed data sets. The Stand Condition Tool could be improved by: 1. building new Forest Type extent maps. Currently, the Tool uses extents for the Forest Types that are based on polygons developed from aerial photography or predicted relationships with environmental variables. These Forest Type extents could be improved by modelling accurate location data against Rapideye, the historical Landsat composite, SPOTMap and PALSAR imagery; 2. building a new tree probability layer, like Pr(MurrayTree_new), for the whole Murray River floodplain using the historical Landsat composite and PALSAR data set; 3. rebuilding the models of stand condition that underlie the Tool using Rapideye, PALSAR, LiDAR and a new tree probability layer; 4. using several modelling approaches (e.g. random forest, neural networks) when building any new models to achieve the most accurate predictions.

Improving Modelling of Forest Types and Stand Condition 49

Acknowledgements This project was funded by the Murray-Darling Basin Authority as part of The Living Murray program. We appreciate all the continuing support and discussions from the Environmental Monitoring Team at the MDBA (Greg Raisin, Stuart Little, David Hohnberg and Anne Stensletten). We thank Anisul Islam for organising the ORGE Panel request for remote sensing imagery. We also appreciate the continued support of the Icon Site Management agencies involved with the project, Forests NSW, Goulburn-Broken CMA, Mallee CMA, North Central CMA, South Australian Department of Environment, Water and Natural Resources, and Victorian Department of Environment and Primary Industries. AAM Group, particularly Ken Gillan, for supplying the Rapideye mosaic across the Murray River floodplain for 2009 and 2010. Astrium Services for supplying for supplying the SPOTMaps for Chowilla Floodplain and Gunbower-Koondrook-Perricoota Forests.

Improving Modelling of Forest Types and Stand Condition 50

References Boochs, F., Kupfer, G., Dockter, K. & Kuhbauch, W. (1990) Shape of the red edge as vitality indicator for plants. International Journal of Remote Sensing, 11, 1741-1753. Cunningham, S.C., Griffioen, P. & White, M. (2012) Potential For Additional Remotely Sensed Data To Improve Mapping Of Stand Condition Across The Living Murray Icon Sites. A Milestone Report to the Murray-Darling Basin Authority as part of Contract MD1114. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Read, J., Baker, P.J. & Mac Nally, R. (2007) Quantitative assessment of stand condition and its relationship to physiological stress in stands of Eucalyptus camaldulensis (Myrtaceae) in southeastern Australia. Australian Journal of Botany, 55, 692-699. Cunningham, S.C., Mac Nally, R., Griffioen, P. & White, M. (2009a) Mapping the Condition of River Red Gum (Eucalyptus camaldulensis Dehnh.) and Black Box (Eucalyptus largiflorens F.Muell.) Stands in The Living Murray Icon Sites. A Milestone Report to the Murray-Darling Basin Authority as part of Contract MD1114. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Mac Nally, R., Griffioen, P. & White, M. (2009b) Mapping the Condition of River Red Gum and Black Box Stands in The Living Murray Icon Sites. Stand Condition Report 2009 (with modelled results for 2003 and 2008). Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Griffioen, P., White, M. & Mac Nally, R. (2011) Mapping the Condition of River Red Gum (Eucalyptus camaldulensis Dehnh.) and Black Box (Eucalyptus largiflorens F.Muell.) Stands in The Living Murray Icon Sites. Stand Condition Report 2010. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Griffioen, P., White, M. & Mac Nally, R. (2013a) Mapping the Condition of River Red Gum (Eucalyptus camaldulensis Dehnh.) and Black Box (Eucalyptus largiflorens F.Muell.) Stands in The Living Murray Icon Sites. Stand Condition Report 2012. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Griffioen, P., White, M. & Mac Nally, R. (2013b) Mapping the Condition of River Red Gum (Eucalyptus camaldulensis Dehnh.) and Black Box (Eucalyptus largiflorens F.Muell.) Stands in The Living Murray Icon Sites. Comparison of the predictive power of Landsat and Rapideye imagery, and validation of future predictions based on imagery only. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Griffioen, P., White, M. & Mac Nally, R. (2013c) A Tool for Mapping Stand Condition across the Floodplain Forests of the Living Murray Icon Sites. Murray-Darling Basin Authority, Canberra.

Improving Modelling of Forest Types and Stand Condition 51

Cunningham, S.C., White, M., Griffioen, P., Newell, G. & Mac Nally, R. (2013d) Mapping Floodplain Vegetation Types across the Murray-Darling Basin Using Remote Sensing. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Mac Nally, R., Read, J., Baker, P.J., White, M., Thomson, J.R. & Griffioen, P. (2009c) A robust technique for mapping vegetation condition across a major river system. Ecosystems, 12, 207-219. Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J., Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M., Peterson, A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire, R.E., Soberon, J., Williams, S., Wisz, M.S. & Zimmermann, N.E. (2006) Novel methods improve prediction of species' distributions from occurrence data. Ecography, 29, 129-151. Friedman, J.H. (2001) Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29, 1189–1232. Horner, G.J., Cunningham, S.C., Thomson, J.R., Baker, P.J. & Mac Nally, R. (2012) Forest structure, flooding and grazing predict understorey composition of floodplain forests in southeastern Australia. Forest Ecology and Management, 286, 148-158. Kocev, D., Vens, C., Struyf, J. & Džeroski, S. (2007) Ensembles of multi-objective decision trees. Machine Learning: ECML 2007. Proceedings of the 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007 (ed. by J. Kok, J. Koronacki, R. De Mántaras, S. Matwin, D. Mladenić and A. Skowron), pp. 624-631. Springer, Berlin. Margules_&_Partners (1990) Riparian Vegetation of the River Murray. Report prepared by Margules and Partners Pty. Ltd., P. & J. Smith Ecological Consultants and Department of Conservation Forests and Lands. Murray-Darling Basin Commission, Canberra. MDBC (2002) The Living Murray: a Discussion Paper on Restoring the Health of the River Murray. In, p. 94. Murray-Darling Basin Commission, Canberra. Özesmi, S.L., Tan, C.O. & Özesmi, U. (2006) Methodological issues in building, training and testing artificial neural networks in ecological applications. Ecological Modelling, 195, 83-93. Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986) Learning representations by back-propagating errors. Nature, 323, 533–536. StatSoft (2011) Statistica Version 10. StatSoft, Inc. www.statsoft.com. Stein, J.L. (2006) A Continental Landscape Framework for Systematic Conservation Planning for Australian Rivers and Streams. Available at http://hdl.handle.net/1885/49406. Australian National University, Canberra. ter Steege, H. (1996) WINPHOT 5.0: a programme to analyze vegetation indices, light and light quality from hemispherical photographs. In. Tropenbos Guyana Programme, Report 95-2., Tropenbos, Guyana. Improving Modelling of Forest Types and Stand Condition 52