IMPROVING MODELLING OF FOREST TYPES AND STAND CONDITION USING NEW REMOTE SENSING DATA SETS
Shaun C. Cunningham, Peter Griffioen, Matt White and Ralph Mac Nally A Milestone Report to the Murray-Darling Basin Authority as part of Contract MD2434.
Shaun C. Cunningham* and Ralph Mac Nally School of Biological Sciences, Monash University, VIC 3800 Peter Griffioen Ecoinformatics Pty. Ltd., Heidelberg, VIC 3084 Matt White Arthur Rylah Institute, Victorian Department of Environment and Primary Industries, Heidelberg, VIC 3084 *
Corresponding author: Tel.: +61 3 9902 0142: Fax: +61 3 9905 5613
E-mail address:
[email protected] This report should be cited as: Cunningham SC, Griffioen P, White M and Mac Nally R, (2013) Improving Modelling of Forest Types and Stand Condition Using New Remote Sensing Data Sets. Murray-Darling Basin Authority, Canberra. Cover image: Digital elevation model produced from LiDAR data collected over GunbowerKoondrook-Perricoota Forests.
© Murray-Darling Basin Authority for and on behalf of the Commonwealth of Australia 2014 With the exception of photographs, the Commonwealth Coat of Arms, the Murray-Darling Basin Authority logo, all material presented in this document is provided under a Creative Commons Attribution 3.0 Australia licence (http://creativecommons.org/licences/by/3.0/au/). For the avoidance of any doubt, this licence only applies to the material set out in this document.
The details of the licence are available on the Creative Commons website (accessible using the links provided) as is the full legal code for the CC BY 3.0 AU licence ((http://creativecommons.org/licences/by/3.0/legal code). MDBA’s preference is that this publication be attributed (and any material sourced from it) using the following: Publication title: Cunningham SC, Griffioen P, White M and Mac Nally R, (2014) Improving Modelling of Forest Types and Stand Condition Using New Remote Sensing Data Sets. Murray-Darling Basin Authority, Canberra. Source: Licensed from the Murray-Darling Basin Authority under a Creative Commons Attribution 3.0 Australia Licence The contents of this publication do not purport to represent the position of the Commonwealth of Australia or the MDBA in any way and are presented for the purpose of informing and stimulating discussion for improved management of Basin's natural resources. To the extent permitted by law, the copyright holders (including its employees and consultants) exclude all liability to any person for any consequences, including but not limited to all losses, damages, costs, expenses and any other compensation, arising directly or indirectly from using this report (in part or in whole) and any information or material contained in it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Executive Summary Accurate predictions of stand condition depend on strong relationships between ground measurements of stand condition and remotely-sensed variables, including probability maps of the forest types across the floodplain. Early models of stand condition and forest type probability were based on Landsat imagery. In November 2011, the Landsat 5 satellite ceased to provide imagery over Australia. We reviewed possible alternative remotely–sensed data sets for predicting stand condition including Rapideye imagery, which was subsequently used to build the Stand Condition Tool. Here, we report on an investigation into whether additional remotely-sensed data could be used to improve predictions of Stand Condition Models developed for the forests and woodlands of the Murray River floodplain. The study focused on two contrasting floodplains (Chowilla Floodplain and Gunbower-KoondrookPerricoota Forest) on the Murray River. New remotely-sensed data sets were collected for the two focal floodplains including an historical (2000-2010) Landsat composite, Rapideye, SPOTMap, LiDAR and PALSAR data sets. The ground survey from 2010 was used as these data had previously produced the weakest Landsat-based stand condition models, which may be improved by the new data sets. First, models of Forest Type probability were built using the new remotely-sensed data sets and the original Landsat data to provide comparisons. Second, models of Stand Condition were built using the new remotely-sensed data sets, including the most accurate Forest Type model built here, and the original Landsat data for comparison. All data combinations were modelled using neural networks and random forests to determine which provided more accurate predictions. Random forest produced more accurate models for Forest Type probability whereas neural networks produced better models for stand condition. The inclusion of the new data sets improved the predictions for the Forest Types relative to the original data set (compare Models 2 & 4, R2 = 0.50 and 0.59, respectively). The inclusion of the new data sets in the neural network lead to a substantial improvement in the predictions of stand condition across the two focal floodplains (compare Models 9 & 11, R2 = 0.60 and 0.81, respectively). The modelling reported here for two focal floodplains of the Murray River suggests that the prediction of stand condition across the Murray River could be improved by inclusion of new remotely-sensed data sets. The potential data sets include: 1. A new tree probability layer, like Pr(MurrayTree_new), built using the historical Landsat composite and PALSAR data set. 2. New probability maps for the target Forest Types built using Rapideye, the historical Landsat composite, SPOTMap and PALSAR imagery. 3. Rapideye imagery. 4. PALSAR imagery. If additional remotely-sensed data sets were to be included in a stand condition model, several modelling approaches should be used to achieve the most accurate predictions.
Improving Modelling of Forest Types and Stand Condition 1
41 42
Table of Contents
43
Executive Summary
1
Introduction
3
Methods
5
44 45 46 47 48
Study area
5
49
Reference sites
6
50
Condition assessment
6
51
Original remotely-sensed data sets
7
52
New remotely-sensed data sets
8
53
Modelling
17
54
Forest Type modelling
18
55
Stand condition modelling
20
56 57
Results
22
58
Forest Types
22
59
Stand Condition
23
60 61
Discussion
47
62
Conclusion
49
Acknowledgements
50
References
51
63 64 65 66 67
Improving Modelling of Forest Types and Stand Condition 2
68
INTRODUCTION
69 70
The forests and woodlands of the Murray River floodplain have been declining rapidly in condition
71
over the past two decades (Margules_&_Partners, 1990; Cunningham et al., 2009c). This dieback is
72
associated with increased regulation and extraction of water from the Murray River, and an
73
extended period of drought. Water availability on the floodplain has decreased with reduced
74
flooding and increasing salinity of soils, ground water and river water in many areas. ‘The Living
75
Murray’ program of the Murray-Darling Basin Commission (now Murray–Darling Basin Authority) was
76
established in 2002 to restore the health of the basin by returning water to these floodplains (MDBC,
77
2002). This has involved water recovery, construction of environmental works and measures, and an
78
environmental watering and monitoring effort across the six Icon Sites. The Murray-Darling Basin
79
Authority decided in 2008 that a remote sensing approach was necessary to provide adequate
80
monitoring of changes in the condition of forests and woodlands across the whole Murray River
81
floodplain.
82 83
We had previously quantified the condition of river red gum stands across the Victorian Murray River
84
floodplain using a combination of quantitative ground surveys, remotely-sensed data and several
85
modelling methods (Cunningham et al., 2009c). This approach allowed us to predict forest condition
86
on this floodplain (ca 100 000 ha), with high accuracy (R2 = 0.78) and resolution (25 m x 25 m pixels).
87
This Living Murray project ‘Mapping Of Stand Condition For The Living Murray Icon Sites’ builds on
88
the previous work, expanding into new forest types (black box and mixed box woodlands) and
89
increasing predictive power by modelling data over three years.
90 91
The project aims to complement site-based ground surveys with annual maps of stand condition
92
across the whole Murray River floodplain. The specific aims of the project are:
93 94
1. Survey condition of river red gum and black box stands across The Living Murray Icon Sites excluding the Lower Lakes, Coorong and Murray Mouth.
95
2. Predict and map stand condition of these Icon Sites in 2003, 2008, 2009, 2010 and 2012.
96
3. Build a Stand Condition Tool that can be used to predict stand condition of the Icon Sites
97
annually using current ground assessments of reference sites and satellite imagery.
98
In the first year of the project, we assessed stand condition of forest types dominated by river red
99
gum and black box using ground surveys of 175 reference sites (Cunningham et al., 2009a). These
100
assessments were predicted successfully (R2 = 0.68) from Landsat imagery using an artificial neural
101
network. The 2009 Stand Condition Model predicted that 79% of the area covered by river red gum,
Improving Modelling of Forest Types and Stand Condition 3
102
black box and box communities in The Living Murray Icon Sites were in a stressed condition in 2009.
103
Surveys of stand condition in 2010 at the reference sites were less successfully predicted (R2 = 0.58)
104
than the first year from Landsat imagery and derived structural data using an artificial neural network
105
(Cunningham et al., 2011). This was predominantly due to the number of references sites in good
106
condition halving between 2009 and 2010 and the majority (77%) of sites being in poor to moderate
107
condition. Consequently, the model fitted well at these intermediate values but poorly at extreme
108
values (good and severe condition). This issue was addressed statistically by transforming the
109
predictions towards equality (the line where observations equal predictions) and this capability was
110
included in the final Stand Condition Tool.
111 112
In 2011 there were two significant delays to the development of the Stand Condition Tool. First, the
113
extensive floods that began in spring 2010 prevented access to the majority of reference sites during
114
early 2011. It was decided to postpone ground surveys until early 2012 when flood waters should
115
have receded and any positive growth response would still be apparent. Second, the Landsat 5
116
satellite stopped providing imagery over Australia in November 2011 due to a declining power
117
source. This initiated a review of potential alternate remotely-sensed data to improve the prediction
118
of stand condition (Cunningham et al., 2012). It was concluded that higher resolution reflectance
119
data (Rapideye, SPOTMap) and structural information beneath the canopy (LiDAR, PALSAR) could
120
improve predictions of the distribution of forest types and stand condition.
121 122
The lack of Landsat imagery in 2012 necessitated a shift to the equivalent imagery provided by the
123
Rapideye satellite constellation. Consequently, in order to build a Stand Condition Tool based on
124
Rapideye imagery, stand condition needed to be modelled for the three years (2009, 2010 and 2012)
125
using Rapideye imagery to ensure strong predictive power in subsequent years. The individual year
126
models were built successfully from Rapideye imagery for 2009 (R2 = 0.75), 2010 (R2 = 0.61) and 2012
127
(R2 = 0.71, Cunningham et al., 2013a, b). Using Rapideye instead of Landsat imagery provided more
128
accurate predictions of stand condition. The final Stand Condition Tool was around a multi-year
129
(2009, 2010 and 2012) model, which had strong predictive power (R2 = 0.87) and validated well using
130
an independent survey of 50 new sites (R2 = 0.84, Cunningham et al., 2013c).
Improving Modelling of Forest Types and Stand Condition 4
131
This part of The Living Murray project ‘Mapping Of Stand Condition For The Living Murray Icon Sites’
132
investigates whether the inclusion of additional remotely-sensed data sets, including those
133
highlighted in our remote sensing review, could improve the predictions of stand condition. Here,
134
we report on modelling to determine if predictions of Forest Type probability and stand condition
135
could be improved by:
136
a) including new remotely-sensed data sets or;
137
b) using different modelling approaches.
138 139
METHODS
140 141
Study area
142
The study area included forests and woodlands of the Murray River floodplain in southeastern
143
Australia. Two focal floodplains were chosen to provide contrasting extents of forest types and stand
144
structures (Figure 1). Gunbower-Koondrook-Perricoota Forests (ca 35⁰ 45´S 144⁰ 20´E) in the middle
145
Murray is dominated by dense river red gum forests and more open river red gum woodlands.
146
Chowilla Floodplain (ca 33⁰ 55´S 140⁰ 55´E) on the lower Murray is dominated by sparse black box
147
woodlands with river red gum woodlands and forests along more permanent waterways. All forest
148
types that are dominated by river red gum (Eucalyptus camaldulensis) or black box (E. largiflorens)
149
were included.
150 151
The distribution of river red gum and black box across The Living Murray Icon Sites was defined using
152
existing digital vegetation maps (Cunningham et al., 2009b). Mapping for the Middle Murray in New
153
South Wales side did not distinguish black box woodlands from other box woodlands, so the forest
154
type box woodland was included for Millewa and Koondrook-Perricoota only. Distributions for the
155
following forest types were created for the treed Icon Sites along the Murray River floodplain:
156 157 158 159
1. River red gum forest – stands dominated by E. camaldulensis with 30-45% projective foliage cover. 2. River red gum woodland – stands dominated by E. camaldulensis with 20-25% projective foliage cover.
160
3. River red gum / black box woodland – mixed stand of E. camaldulensis and E. largiflorens.
161
4. Black box woodland – stands dominated by E. largiflorens.
162
5. Box woodland – stands dominated by E. largiflorens and E. macrocarpa (Gunbower-
163
Koondrook-Perricoota Forests only.
Improving Modelling of Forest Types and Stand Condition 5
164
Reference sites
165
A total of 175 reference sites were surveyed across The Living Murray Icon Sites to inform the
166
previous 2010 and 2009 Landsat-based Stand Condition Models (Cunningham et al., 2009b;
167
Cunningham et al., 2011). Within each Icon Site, reference sites were distributed across the forest
168
types according to how much area they covered. In 2009, reference sites were chosen to be
169
representative of the range of forest types, forest condition and landscape positions (e.g. riverine,
170
wetland and floodplain) at each Icon Site. This approach provided sites with a full range of current
171
stand condition (Cunningham et al., 2009c). Here, we focused on Chowilla Floodplain and Gunbower-
172
Koondrook-Perricoota Forests only for which there were 25 and 50 reference sites, respectively.
173 174
Condition assessment
175
The 2010 survey data set was chosen to investigate potential improvements to prediction of stand
176
condition because of the two years with Landsat-based models it produced the weakest predictions
177
and may improve with the additional of new data sets. Reference sites were assessed between
178
January and May 2010. At each site location, a 0.25 ha plot was established for assessments. Most
179
plots were 50 x 50 m plots but four rectangular plots (125 x 20 m) were used to assess linear stands
180
along watercourses on the Chowilla floodplain.
181 182
The stand condition assessment involved measuring the three indicators percentage live basal area,
183
plant area index and crown extent, which are known to be reliable and objective indicators of
184
condition in stands of river red gum (Cunningham et al., 2007). Plant area index (PAI) is the area of
185
leaves and stems per unit ground area without adjustment for clumping of canopy components. PAI
186
was estimated from hemispherical photographs of the canopy, which were first classified using
187
image analysis (MultiSpec Application Version 3.1, Purdue University, Indiana), with the program
188
Winphot 5.00 (ter Steege, 1996). Crown extent is the percentage of the potential crown, which is
189
determined by the extent of the existing branching structure, that contains foliage. Crown extent
190
was estimated by two observers using an interval scale (0%, 1-20%, 21-40%, 41-60%, 61-80%, 81-
191
100%) from 30 trees representative of the range of tree size and condition within a plot. Percentage
192
live basal area (%LBA) is the percentage of a stand’s basal area that is contributed by live trees.
193
Trees were considered alive if there was live foliage within the crown. PAI was standardized relative
194
to the maxima measured for each Forest Type within a Bioregion (Riverina, Murray Mallee). This
195
accounted for the historical reduction in PAI owing to the decline in productivity associated with
196
reduced rainfall, flooding and increased evaporation downstream along the Murray River floodplain
197
(Bioregion) and local differences in water availability within a floodplain (Forest Type). Scores for
198
each condition indicator were converted to values out of 10 (PAI and %LBA x 10, and crown extent x
Improving Modelling of Forest Types and Stand Condition 6
199
2). A stand condition score (SCS) was calculated from the average score of the three condition
200
indicators, which had a maximum of 10 points.
201 202
Original remotely-sensed data sets
203
LANDSAT imagery covering the study area was obtained from the National Earth Observatory Group,
204
Geoscience Australia. Seven scenes of Landsat5 data were required to cover the whole floodplain,
205
with two scenes covering the areas of interest (Table 1). Imagery was obtained for early (January-
206
April) 2003, 2009 and 2010 to match the timing of ground surveys of stand condition. It was not
207
possible to obtain a cloud-free image of Gunbower-Koondrook-Perricoota Forests in early 2010, so
208
an earlier image from November 2009 was used. Landsat imagery provides reflectance in six spectral
209
bands and the normalized difference vegetation index (NDVI) was calculated from the red and near
210
infrared spectra (Table 2).
211 212
The Landsat images required processing before reflectance band data could be extracted to inform
213
the modelling. The seven scenes were mosaicked into a single image of the floodplain using ENVI 4.5
214
(ITT Visual Information Solutions, Boulder, Colorado, USA). The coordinate systems differed among
215
the seven scenes, with the two most eastern scenes in GDA Zone 55 and the other five in GDA Zone
216
54. The images were first colour balanced and then mosaicked with cubic convolution splining into
217
the VicGrid coordinate system. A feathering of a 20 pixel overlap between the image boundaries was
218
used to produce an almost seamless image of the entire study region. The final image was produced
219
as rasters with a 25 x 25 m pixel resolution. We considered this to be an appropriate pixel resolution
220
to ensure a pixel fell within plot locations (50 x 50 m) and to estimate stand condition across the Icon
221
Sites, which were at least five orders of magnitude larger in area.
222 223
In 2009, the distribution of forest types compiled from existing vegetation maps provided the most
224
definitive outline of the area for which the condition model should be applied (Cunningham et al.,
225
2009b). However, these distributions are likely to contain some areas of sparse or cleared
226
vegetation. Therefore, a map of tree cover was built to improve predictions of stand condition and
227
remove areas that have no forests or woodlands. A layer of tree probability was created for the
228
whole floodplain using an existing Tree/No Tree layer for Victoria (Griffioen & White, unpublished).
229
The Tree/No Tree layer was built using multiple feed-forward, multilayer perceptron artificial neural
230
networks learned by a backpropagation algorithm (Rumelhart et al., 1986) using nine years of
231
Landsat imagery and tree/no tree training data. Estimates of tree cover across Victoria were
232
determined for three temporal groupings of these satellite images a) 1989, 1991 and 1992 b) 1995,
233
1998 and 2000, and c) 2002, 2004 and 2005 using artificial neural networks. Another artificial neural
Improving Modelling of Forest Types and Stand Condition 7
234
network was used to combine the above three networks by introducing the new classification:
235
always tree, never tree, tree loss and tree gain, and to determine a tree probability for each 25 x 25
236
m pixel.
237 238
After visually confirming the utility of the Tree/No Tree layer across the floodplain, a layer of tree
239
probability, Pr(MurrayTree), was built by training neural networks to recognise trees in satellite
240
images of the floodplain in 2003, 2008 and 2009 using the Tree/No Tree layer to supply the
241
exemplars. A total of 10 000 data points were extracted from the Tree/No Tree layer and the
242
reflectance bands from the 2003, 2008 and 2009 images across the Murray River floodplain. The
243
resolution of Tree/No Tree layer means that each pixel (25 x 25 m) would usually encompass more
244
than one tree within forested areas. A visual comparison of the predictions of Pr(MurrayTree) with
245
independent imagery (Google Earth) found that the model predicted trees well. The probability
246
threshold for trees differed across the floodplain, with river red gum forests in the Riverina predicted
247
at Pr > 0.7 while sparse black box woodlands on the outer floodplain of Chowilla were predicted at Pr
248
> 0.1. These thresholds demonstrated that Pr(MurrayTree) was a good predictor of tree cover within
249
a pixel, which provided useful information for modelling stand condition. That is, areas with a higher
250
probability of having trees may also have a higher stand condition.
251 252
In 2009, probability maps for the five Forest Types were built to provide additional information for
253
predicting stand condition (Cunningham et al., 2009b). A set of 2220 points were randomly sampled
254
from across the Forest Type distribution maps produced from existing vegetation maps, ensuring
255
sufficient sampling of the types with more limited extents. A neural network was built in Statistica
256
6.0 to predict Forest Type from the values of the six reflectance bands in the 2003 and 2009 Landsat
257
composite of the floodplain. This neural network predicted the probability of each Forest Type
258
occurring at a site. The network had an overall accuracy of 67.1% and accurately predicted (84%) the
259
distribution of river red gum forest and black box woodlands (Cunningham et al., 2009b). These
260
original Forest Type probability maps were used here in the original models of stand condition. To
261
provide an appropriate comparison for the new models of Forest Type built here, the original models
262
had to be replicated using the same set of samples from within the focal floodplains only.
263 264
New remotely-sensed data sets
265
A range of new remotely-sensed data sets were obtained that were anticipated to provide improved
266
predictions for either forest type or stand condition (Cunningham et al., 2012). These data sets
267
include reflectance with finer spatial resolution (e.g. SPOTMap) or finer spectral resolution (e.g.
268
Rapdieye) than Landsat. Data sets from active sensors (e.g. LiDAR), which emit energy and detect the
Improving Modelling of Forest Types and Stand Condition 8
269
reflected radiation, were included to provide information about the structure beneath the canopy,
270
which reflectance data does not provide.
271 272
An HISTORICAL LANDSAT COMPOSITE was produced by Geosciences Australia, which included images
273
over the period January 1st 2000 to December 31st 2010. The composite included median values for
274
six spectral bands (Table 3) and five indices calculated from ratios of these bands (Table 4).
275
Floodplain vegetation is dynamic and the understorey of floodplain forests is quite visually distinct
276
between wet and dry years. An extended period of satellite imagery is likely to provide a more
277
consistent vegetation map across the floodplain than one based on a single year of reflectance
278
imagery, which may include flooded and unflooded areas.
279 280
RAPIDEYE imagery covering the Murray River floodplain was obtained for 2009, 2010 and 2012 from
281
the AAM Group. Sixty-seven tiles (25 km × 25 km each) of Rapideye data were required to cover the
282
whole floodplain (Cunningham et al., 2013a), with six tiles covering the areas of interest (Table 5).
283
Ideally, imagery would be cloudless and captured over the same period as the ground surveys
284
(January-May). This was not possible for tiles over Gunbower-Koondrook-Perricoota in 2012 but
285
useful tiles were obtained from December 2011 (Table 5). Tiles were supplied by AAM as a top of
286
atmosphere corrected mosaic of the floodplain. Rapideye imagery provides spectral information at a
287
5 m pixel resolution. This is a much finer solution than the actual ground surveys (50 m × 50 m plots).
288
Consequently, the Rapideye imagery was resampled at 25 m × 25 m scale converting it to the same
289
scale as Landsat imagery. The rescaling allowed the estimation of a mean and stand deviation for
290
each pixel (Table 6).
291 292
SPOTMaps were obtained for the floodplains of interest from Astrium Services. These maps were
293
made using imagery collected from three dates: Chowilla Floodplain (19/1/2010) and Gunbower-
294
Koondrook-Perricoota Forests (30/12/2009, 2/3/2010). SPOTMaps provide three spectral bands
295
(blue, green , red) at the high resolution of 2.5 m pixels (Table 7). The SPOTMap imagery was
296
resampled to the same scale as Landsat imagery (25 m × 25 m), allowing the estimation of a mean
297
and stand deviation for each pixel.
298 299
LiDAR (Light Detection And Ranging) data was only available for Gunbower-Koondrook-Perricoota
300
Forests. The data set was provided by the MDBA included a 1-m grid of first strike heights i.e. heights
301
determined from first returns of the laser from objects. This data was converted into 1-m
302
presence/absence grids of seven height class to represent different vegetation strata within these
303
forests (Table 8). These presences were then summed over a 25 x 25 pixel array to calculate
Improving Modelling of Forest Types and Stand Condition 9
304
percentage within each strata over a 25 m x 25 m pixel. A 1-m gird of digital elevation (DEM) was also
305
supplied by the MDBA. These data sets were only used in the models of Gunbower-Koondrook-
306
Perricoota Forests.
307 308
ALOS-PALSAR data was included because it detects microwaves in the L-band, which provides
309
structural information beneath the canopy on the biomass of a forest. This information could be
310
useful in distinguishing among the different Forest Types (e.g. river red gum versus black box
311
woodland). Two PALSAR data sets with different pixel resolutions were obtained for the focal
312
floodplains (Table 9). The PALSAR 50 m Orthorectified Mosaic for Australia created by ALOS Kyoto
313
and Carbon Initiative Project was used (www.eorc.jaxa.jp/ALOS/en/kc_mosaic/kc_50_australia.htm).
314
This is a mosaic of images from June to September 2009 and included dual polarisations of HH and
315
HV. The second data set supplied by the MDBA was PALSAR at a 12.5 m resolution.
316 317
A HEIGHT-ABOVE-RIVER data set was created from the nine second DEM and the nine second DEM
318
stream network (Stein, 2006) for Australia. This data set was included to help distinguish among the
319
Forest Types, which are often found at different elevations on the floodplain.
320 321
Given the new data set available, it was decided to rebuild the tree probability layer Pr(MurrayTree).
322
The historical Landsat composite (2000-2010) provides much more consistent image than the
323
individual year composites because it is derived from ca 200 images per scene instead of a single
324
scene. The ALOS-PALSAR data set provided structural information for both focal floodplains. A new
325
model Pr(MurrayTree_new) was built by training neural networks to recognise trees in the historical
326
Landsat composite (including seasonal reflectance bands and indices) and the ALOS-PALSAR imagery.
327
The 10 000 exemplars from the Tree/No Tree layer used to build the original model were used again
328
for consistency. The predictions of Pr(MurrayTree_new) were compared visually with independent
329
imagery (Google Earth) and were found to predict trees well.
Improving Modelling of Forest Types and Stand Condition 10
Chowilla floodplain
NSW
SA Mildura
Gunbower-Koondrook -Perricoota Forests
Vic
Wellington
Echuca
Hume Dam
100 km
330
Figure 1 Location of the focal floodplains of Chowilla Floodplain and Gunbower-Koondrook-Perricoota Forests along the Murray River.
Improving Modelling of Forest Types and Stand Condition 11
331
Table 1 Landsat satellite imagery obtained over the areas of interest for the 2003, 2009 and 2010
332
composites.
333 Path
Row
Day
Month
Year
Satellite
Gunbower-Koondrook-Perricoota Forests 93
85
4
4
2003
Landsat 7
93
85
23
2
2009
Landsat 5
93
85
6
11
2009
Landsat 5
96
84
9
3
2003
Landsat 7
96
84
12
1
2009
Landsat 5
96
84
16
2
2010
Landsat 5
Chowilla Floodplain
334 335
Table 2 Spectral variables derived from Landsat5 composites for 2003, 2009 and 2010. Spectral variable
Explanation
LS Blue
Reflectance in the blue spectrum (0.45-0.52 µm)
LS Green
Reflectance in the green spectrum (0.52-0.60 µm)
LS Red
Reflectance in the red spectrum (0.63-0.69 µm)
LS NIR
Reflectance in the near infrared (0.76-0.90 µm)
LS MIR
Reflectance in the middle infrared (1.55-1.75 µm)
LS FIR
Reflectance in the far infrared (2.08-2.35 µm)
LS NDVI
Normalised difference vegetation index
336
Improving Modelling of Forest Types and Stand Condition 12
337 338 339
Table 3 Median seasonal band values derived from the historical Landsat composite (2000-2010) used in all the models. Variable
Explanation
Period
HLS Sum_Blue
Reflectance in the blue spectrum (0.45-0.52 µm)
Dec 1 to Mar 31
HLS Sum_Green
Reflectance in the green spectrum (0.52-0.60 µm)
Dec 1 to Mar 31
HLS Sum_Red
Reflectance in the red spectrum (0.63-0.69 µm)
Dec 1 to Mar 31
HLS Sum_NIR
Reflectance in the near infrared (0.76-0.90 µm)
Dec 1 to Mar 31
HLS Sum_MIR
Reflectance in the middle infrared (1.55-1.75 µm)
Dec 1 to Mar 31
HLS Sum_FIR
Reflectance in the far infrared (2.08-2.35 µm)
Dec 1 to Mar 31
HLS Aut_Blue
Reflectance in the blue spectrum (0.45-0.52 µm)
Mar 1 to Jun 30
HLS Aut_Green
Reflectance in the green spectrum (0.52-0.60 µm)
Mar 1 to Jun 30
HLS Aut_Red
Reflectance in the red spectrum (0.63-0.69 µm)
Mar 1 to Jun 30
HLS Aut_NIR
Reflectance in the near infrared (0.76-0.90 µm)
Mar 1 to Jun 30
HLS Aut_MIR
Reflectance in the middle infrared (1.55-1.75 µm)
Mar 1 to Jun 30
HLS Aut_FIR
Reflectance in the far infrared (2.08-2.35 µm)
Mar 1 to Jun 30
HLS Win_Blue
Reflectance in the blue spectrum (0.45-0.52 µm)
Jun 30 - Sept 30
HLS Win_Green
Reflectance in the green spectrum (0.52-0.60 µm)
Jun 30 - Sept 30
HLS Win_Red
Reflectance in the red spectrum (0.63-0.69 µm)
Jun 30 - Sept 30
HLS Win_NIR
Reflectance in the near infrared (0.76-0.90 µm)
Jun 30 - Sept 30
HLS Win_MIR
Reflectance in the middle infrared (1.55-1.75 µm)
Jun 30 - Sept 30
HLS Win_FIR
Reflectance in the far infrared (2.08-2.35 µm)
Jun 30 - Sept 30
HLS Spr_Blue
Reflectance in the blue spectrum (0.45-0.52 µm)
Sept 1 to Dec 31
HLS Spr_Green
Reflectance in the green spectrum (0.52-0.60 µm)
Sept 1 to Dec 31
HLS Spr_Red
Reflectance in the red spectrum (0.63-0.69 µm)
Sept 1 to Dec 31
HLS Spr_NIR
Reflectance in the near infrared (0.76-0.90 µm)
Sept 1 to Dec 31
HLS Spr_MIR
Reflectance in the middle infrared (1.55-1.75 µm)
Sept 1 to Dec 31
HLS Spr_FIR
Reflectance in the far infrared (2.08-2.35 µm)
Sept 1 to Dec 31
340
Improving Modelling of Forest Types and Stand Condition 13
341
Table 4 Median seasonal indices derived from the historical Landsat composite (2000-2010). Variable
Explanation
Period
HLS Sum_NDVI
Normalised Difference Vegetation Index
Dec 1 to Mar 31
= (B4 – B3) / (B3 + B4) HLS Sum_EVI
Enhanced Vegetation Index
Dec 1 to Mar 31
= (B4 – B3) / (B4 + 6*B3 – 7.5*B1 + 1) HLS Sum_SATVI
Soil Adjusted Total Vegetation Index
Dec 1 to Mar 31
= [ [ (B5-B3) / (B5-B3+0.5) ] * 1.5] - (B7/2) HLS Sum_SLAVI
Specific Leaf Area Vegetation Index
Dec 1 to Mar 31
= B4 / (B3 + B5) HLS Sum_NDMI
Normalised Difference Moisture Index
Dec 1 to Mar 31
= (B4 – B5) / (B4 + B5) HLS Sum_NDSI
Normalised Difference Soil Index
Dec 1 to Mar 31
= (B3 – B5) / (B3 + B5) HLS Aut_NDVI
Normalised Difference Vegetation Index
Mar 1 to Jun 30
HLS Aut_EVI
Enhanced Vegetation Index
Mar 1 to Jun 30
HLS Aut_SATVI
Soil Adjusted Total Vegetation Index
Mar 1 to Jun 30
HLS Aut_SLAVI
Specific Leaf Area Vegetation Index
Mar 1 to Jun 30
HLS Aut_NDMI
Normalised Difference Moisture Index
Mar 1 to Jun 30
HLS Aut_NDSI
Normalised Difference Soil Index
Mar 1 to Jun 30
HLS Win_NDVI
Normalised Difference Vegetation Index
Jun 30 - Sept 30
HLS Win_EVI
Enhanced Vegetation Index
Jun 30 - Sept 30
HLS Win_SATVI
Soil Adjusted Total Vegetation Index
Jun 30 - Sept 30
HLS Win_SLAVI
Specific Leaf Area Vegetation Index
Jun 30 - Sept 30
HLS Win_NDMI
Normalised Difference Moisture Index
Jun 30 - Sept 30
HLS Win_NDSI
Normalised Difference Soil Index
Jun 30 - Sept 30
HLS Spr_EVI
Enhanced Vegetation Index
Sept 1 to Dec 31
HLS Spr_SATVI
Soil Adjusted Total Vegetation Index
Sept 1 to Dec 31
HLS Spr_SLAVI
Specific Leaf Area Vegetation Index
Sept 1 to Dec 31
HLS Spr_NDMI
Normalised Difference Moisture Index
Sept 1 to Dec 31
HLS Spr_NDSI
Normalised Difference Soil Index
Sept 1 to Dec 31
HLS Spr_EVI
Enhanced Vegetation Index
Sept 1 to Dec 31
342
Improving Modelling of Forest Types and Stand Condition 14
343
Table 5 Rapideye satellite imagery obtained over the areas of interest for the 2009, 2010 and 2012
344
composites.
345 Tile ID
Location
Acquisition Dates for Composites 2009
2010
2012
Gunbower-Koondrook-Perricotta Forests 5522505
Gunbower
8/04/2009
28/01/2010
5/12/2011
5522605
Tantonan
8/04/2009
28/01/2010
5/12/2011
5522504
Cohuna
22/02/2009
28/01/2010
5/12/2011
5522604
Koondrook
22/02/2009
28/01/2010
5/12/2011
Chowilla Floodplain
346 347 348
5423415
Cal Lal
3/02/2009
6/02/2010
24/12/2012
5423414
Chowilla
7/04/2009
4/04/2010
17/01/2012
Table 6 Spectral variables derived from Rapideye imagery for 2003, 2009 and 2010. SD = standard deviation of the mean. Environmental variable
Explanation
RE Blue mean
Reflectance in the blue spectrum (0.44-0.51 µm)
RE Green mean
Reflectance in the green spectrum (0.52-0.59 µm)
RE Red mean
Reflectance in the red spectrum (0.63-0.69 µm)
RE Red edge mean
Reflectance at the red edge (0.69-0.73 µm)
RE NIR mean
Reflectance in the near infrared (0.76-0.85 µm)
RE NDVI mean
Normalised difference vegetation index
RE Blue SD
SD of reflectance in the blue spectrum
RE Green SD
SD of reflectance in the green spectrum
RE Red SD
SD of reflectance in the red spectrum
RE Red Edge SD
SD of reflectance at the red edge
RE NIR SD
SD of reflectance in the near infrared
RE NDVI SD
SD of normalised difference vegetation index
349
Improving Modelling of Forest Types and Stand Condition 15
350 351 352
Table 7 Spectral variables extracted from SPOTMap derived from SPOT 5 satellite imagery. SD = standard deviation of the mean. Spectral variable
Explanation
SPOTMap Blue mean
Reflectance in the blue spectrum (0.43-0.55 µm)
SPOTMap Green mean
Reflectance in the green spectrum (0.50-0.59 µm)
SPOTMap Red mean
Reflectance in the red spectrum (0.61-0.68 µm)
SPOTMap Blue SD
SD of reflectance in the blue spectrum (0.43-0.55 µm)
SPOTMap Green SD
SD of reflectance in the green spectrum (0.50-0.59 µm)
SPOTMap Red SD
SD of reflectance in the red spectrum (0.61-0.68 µm)
353 354 355 356 357
358 359 360
Table 8 Variables derived from the LiDAR data set over Gunbower-Koondrook-Perricoota Forests. Percentage cover values were estimated for different strata from first-strike data capture. Variable
Explanation
LiDAR DEM
Digital elevation model derived from LiDAR data set
LiDAR cover 0.5 m
Percentage cover at 0-0.5 m
LiDAR cover 1.5 m
Percentage cover at 0.5-1.5 m
LiDAR cover 2.5 m
Percentage cover at 1.5-2.5 m
LiDAR cover 4.5 m
Percentage cover at 2.5-4.5 m
LiDAR cover 8.5 m
Percentage cover at 4.5-8.5 m
LiDAR cover 16.5 m
Percentage cover at 8.5-16.5 m
LiDAR cover 32.5 m
Percentage cover at 16.5-32.5 m
Table 9 Satellite-derived variables used in the new modelling.
361 Variable
Explanation
PALSAR SEAus LL_HH
Horizontal-horizontal polarisation of L-band, 50 m resolution
PALSAR SEAus LL_HV
Horizontal-vertical polarisation of L-band, 50 m resolution
PALSAR Murray LL_HH
Horizontal-horizontal polarisation of L-band, 12.5 m resolution
PALSAR Murray LL_HV
Horizontal-vertical polarisation of L-band, 12.5 m resolution
HAR
Height above river (< 10 m accuracy)
Pr(MurrayTree_new)
Probability of trees being present built from historical Landsat composite and PALSAR data sets.
362
Improving Modelling of Forest Types and Stand Condition 16
363 364
Modelling Both neural networks and random forests were used to model all data combinations. Previously,
365
neural networks were found to provide stronger predictions than regression trees (e.g. random
366
forests) when predicting stand condition from Landsat data (Cunningham et al., 2009c). When using
367
Rapideye imagery, it was found that random forests provided stronger predictions of stand condition
368
than neural networks (Cunningham et al., 2013b). Given we were predicting both Forest Type and
369
stand condition with Landsat, Rapideye and a suite of new remotely sensed variables, which
370
modelling approach provided better predictions had to be determined.
371 372
We used feed-forward multilayer perceptron artificial neural networks learned by a backpropagation
373
algorithm (MLP neural network, Rumelhart et al., 1986) in the program Statistica 10.0 (StatSoft,
374
2011). MLP neural networks are useful for modelling ecological data, which rarely meet parametric
375
statistical assumptions and commonly involve non-linear relationships. They make no prior
376
assumptions about the relationship between the input variables and the underlying mathematical
377
distributions of the data (Özesmi et al., 2006). A MLP neural network is best conceptualized as a
378
series of layers of nodes, with connections (neurons) between each adjacent layer. Here, the neural
379
network included an input layer of remotely sensed predictor variables, hidden layers and an output
380
layer of the response variable (forest type or stand condition score). Networks were built from 20
381
random starts and the best network was chosen based on statistical fit (R2 values).
382 383
Random forests were used because they are well suited to modelling large sets of independent
384
variables, many of which may be highly correlated, they select relevant environmental variables and
385
can model interactions among variables. This modelling technique creates a forest of regression
386
trees. Individual trees relate values of a response (leaves) to its predictors through a series of binary
387
decisions or branches (Friedman, 2001). At each branch of the tree, the algorithm randomly selects a
388
small number of independent variables from all of those available and creates the node on the basis
389
of which variable minimises the model error. Random forests may be used to classify categorical
390
response variables with each tree in the forest voting to include a class (e.g. Forest Type), or as an
391
ensemble of regression trees to solve for a continuous response variable (e.g. stand condition).
392 393
Regression trees, such as random forests, overcome the inherent inaccuracies in seeking a single
394
parsimonious model by constructing an ensemble of models. Bootstrap aggregating (or bagging),
395
which is similar to model averaging, is used to improve the accuracy of predictions. Models of 20
396
bootstrap samples were fitted (individual trees) to create an ensemble tree (forest) that predicted
397
the variable of interest. We used a particular type of random forest known as predictive clustering
398
trees (Kocev et al., 2007) in the program Clus. While most decision tree learners induce classification Improving Modelling of Forest Types and Stand Condition 17
399
of regression trees, predictive clustering trees generalizes this approach by learning trees that are
400
interpreted as cluster hierarchies.
401 402
While over-fitting is often seen as a problem in statistical modelling, predictions of regression trees
403
for independent data sets are not compromised by using a large number of variables and are
404
generally superior to other methods (e.g. GLM, GAM and MARS, Elith et al., 2006). In contrast, neural
405
networks must consider all independent variables supplied simultaneously. Without an order of
406
magnitude more exemplars than predictor variables, neural networks may produce over-fitted
407
models that are less robust and validate poorly compared with those produced with a carefully
408
selected subset of independent variables.
409 410
All modelling data sets were divided randomly into separate training (60%), cross-validation (20%)
411
and model testing (20%) data sets. For the neural networks, training data were used to train the
412
network, cross-validation data were used to detect over-fitting of the network and the model testing
413
data were used as an independent test of model fit. Similarly, for the random forests, training data
414
were used to train the forest, cross-validation data were used to optimally prune the trees of the
415
forest and the model testing data were used to independently test model fit.
416 417
The accuracy of predictions from the various models was compared using the model fit (R2) for the
418
overall data set. Confusion matrices were also used to assess the accuracy of predictions for the
419
categorical variable of Forest Type but not the continuous variable of stand condition. The
420
importance of individual variables to prediction of Forest Type and stand condition were determined
421
by sensitivity analyses. The error ratio was used for the neural networks. This is the ratio of the
422
amount of error in estimating the response variable (e.g. stand condition) when a predictor variable
423
is included to when it is not included. This approach is not possible for random forests, so the
424
percentage of branches across the forest that used a predictor variable was used as a sensitivity
425
analysis.
426 427
Forest Type modelling
428
Replicating original models
429
This modelling was designed to replicate the approach used to produce Forest Type probability maps
430
in the 2009 Stand Condition report (Cunningham et al., 2009b). A total of 4500 samples were used in
431
the modelling of forest type across the two focal floodplains. The six reflectance bands from Landsat
432
imagery in 2003 and 2009, and the original tree probability layer Pr(MurrayTree) were included as
433
predictor variables in the original Forest Type models.
434 Improving Modelling of Forest Types and Stand Condition 18
435
The Forest Type maps created from existing digital maps for the 2009 report were used to create
436
random samples of the four target Forest Types: river red gum forest, river red gum woodland, river
437
red gum / black box woodland and black box woodland. For each Forest Type, 500 samples were
438
selected from Chowilla floodplain and 500 samples from Gunbower-Koondrook-Perricoota Forests.
439
The one exception being river red gum / black box woodland, which only occurs on Chowilla
440
Floodplain, so 500 samples were selected from that floodplain. One thousand samples of non-forest
441
location were selected evenly across the two focal floodplains to provide random absences.
442 443
New models
444
The new models of Forest Type used the same 4500 samples and modelling approaches but included
445
new remotely-sensed data. New remotely-sensed data included the new tree probability layer
446
Pr(MurrayTree_new), reflectance bands and indices from the historical Landsat composite, Rapideye
447
from 2009, 2010 and 2012, SPOTMap, PALSAR and height above river (Tables 3, 4, 6, 7 & 9). There
448
were a total of 104 potential variables from these new data sets, so the number of selected variables
449
needed to be reduced for the neural networks to avoid overly complex models. Fifty neural
450
networks were built to investigate the sensitivity of Forest Type predictions to the extensive list of
451
potential variables. This sensitivity analysis was used to reduce the list down to 28 variables,
452
including spring and summer medians for the six reflectance bands and three indices (NDMI, SATVI &
453
SLAVI) from the historical Landsat composite, three reflectance bands (blue, green & red) from the
454
SPTOMap, three reflectance bands (red, red edge & near infrared) from the 2009 and 2010 Rapideye
455
composites, and Pr(MurrayTree_new). This step was not necessary for the random forests as the
456
approach limits the number of variables that can be considered for each branch of the tree.
457 458
An additional eight LiDAR-derived variables were available across Gunbower-Koondrook-Perricoota
459
Forests (GKP) only (Table 8). Separate neural networks and random forests were built to predict
460
Forest Type using the samples from GKP (N = 2000), the same remotely-sensed data as the above
461
new models and the additional LiDAR-derived data. To provide a direct comparison for these
462
models, another pair of original Forest Type models was built using the same predictor variables as
463
the above original models but using the GKP samples only.
464 465
A total of eight Forest Type models were built (Table 10). These included combinations of using the
466
old and new data sets, using samples from both floodplains and just GKP to compare the
467
improvements of using LiDAR data, and different modelling approaches (neural networks and
468
random forest). The best Forest Type model was then used as an input variable into the new models
469
of stand condition.
Improving Modelling of Forest Types and Stand Condition 19
470
Stand condition modelling
471
Replicating original models
472
This modelling was designed to replicate the approach used to produce the stand condition map in
473
the 2010 Stand Condition report (Cunningham et al., 2011). The 2010 survey data set was chosen to
474
investigate potential improvements to prediction of stand condition because it produced the
475
weakest of the Landsat-based Stand Condition Models, which may be improved with the additional
476
of new data sets. The 2010 ground survey also had the best temporal alignment with the available
477
remotely-sensed data sets. A total of 75 surveys of reference sites were available from 2010 across
478
the two focal floodplains. The original models of stand condition used reflectance bands from
479
Landsat in 2009 and 2010, and the original Forest Type probability models, including Pr(MurrayTree)
480
and the models for individual types.
481 482
New models
483
The new models of stand condition included several potential improvements on the original models.
484
The potential predictor variables included Rapideye from 2009 and 2010, SPOTMap, PALSAR and
485
height above river. Forest distribution information was provided by Pr(MurrayTree_new) and the
486
best Forest Type probability model (Model 4 see results for details) built for both focal floodplains in
487
the new modelling. Given the stand condition models have a temporal component, the historical
488
(2000-2010) Landsat composite was not included in these models due to long period of capture.
489
Again the eight LiDAR-derived variables were used to predict stand condition using the samples from
490
GKP (N = 50).
491 492
A total of eight stand condition models were built (Table 10). These included combinations of using
493
the old and new data sets, using samples from both floodplains and just GKP to compare the
494
improvements of using LiDAR data, and different modelling approaches (neural networks and
495
random forest).
Improving Modelling of Forest Types and Stand Condition 20
496
Table 10 Details of the various models used to predict Forest Type and Stand Condition.
497 Model
Model type
Data sets included
Samples
Area sampled
used Forest Type models 1
Neural network
Landsat 2003 and 2009, Pr(MurrayTree)
4500
Chowilla and GKP
2
Random forest
Landsat 2003 and 2009, Pr(MurrayTree)
4500
Chowilla and GKP
3
Neural network
Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new)
4500
Chowilla and GKP
4
Random forest
Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new)
4500
Chowilla and GKP
5
Neural network
Landsat 2003 and 2009, Pr(MurrayTree)
2000
GKP
6
Random forest
Landsat 2003 and 2009, Pr(MurrayTree)
2000
GKP
7
Neural network
Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP, Pr(MurrayTree_new) & LiDAR
2000
GKP
8
Random forest
Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new) & LiDAR
2000
GKP
Stand Condition models 9
Neural network
Landsat, Pr(MurrayTree), original Pr(Forest Types)
75
Chowilla and GKP
10
Random forest
Landsat, Pr(MurrayTree), original Pr(Forest Types)
75
Chowilla and GKP
11
Neural network
Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR & SPOTMAP
75
Chowilla and GKP
12
Random forest
Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR & SPOTMAP
75
Chowilla and GKP
13
Neural network
Landsat, Pr(MurrayTree), original Pr(Forest Types)
50
GKP
14
Random forest
Landsat, Pr(MurrayTree), original Pr(Forest Types)
50
GKP
15
Neural network
Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR, SPOTMAP & LiDAR
50
GKP
16
Random forest
Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR, SPOTMAP & LiDAR
50
GKP
Improving Modelling of Forest Types and Stand Condition 21
498
Results
499 500
Modelling Forest Type
501
The neural network that predicted Forest Type from the original Landsat and tree probability
502
distribution (Model 1) had moderately accurate predictions (48.4%, Table 11). Using a random forest
503
to model the same data set (Model 2) provided a minor improvement in accuracy (50.2%, Table 12).
504
These models differed in how accurately they predicted the individual Forest Types. The random
505
forest provided the most accurate predictions for river red gum forest, river red gum / black box
506
woodland and non-forest locations while the neural network provided more accurate predictions for
507
black box woodlands and river red gum woodlands (Table 11 & 12). Important variables for
508
predicting Forest Type in the network were reflectance in blue, red and far infrared spectra (Table
509
13) while the probability of trees, and reflectance in the blue and near infrared spectra were
510
important variables in the random forest (Table 14).
511 512
Including the new remotely-sensed data sets improved the predictions of Forest Type relative to the
513
original models using both a neural network (Model 3, 56.5%, Table 15) and a random forest (Model
514
4, 58.9%, Table 16). The random forest produced more accurate predictions of river red gum forest
515
and non-forest, similar accuracies for black box woodland and river red gum woodland but a slightly
516
lower accuracy for river red gum / black box woodlands than the neural network (Tables 15 & 16).
517
Important predictor variables for the neural network included variables obtained from Rapideye (red
518
and red edge), SPOTMap (green and red) and the historical Landsat composite (far infrared in
519
summer, Table 17). In contrast, the most important predictors for the random forest were the five
520
spectral bands from the 2009 Rapideye image (Table 18).
521 522
The neural network and random forest built for Forest Types across Gunbower-Koondrook-
523
Perricoota Forests (GKP) from the original data sets had similar accuracies (53.0% and 54.2%, Models
524
5 & 6 respectively, Tables 19 & 20). However, they differed substantially in the accuracy of their
525
predictions for individual Forest Types, with the neural network providing better accuracy for black
526
box and river red gum woodlands while the random forest provided better accuracy for river red gum
527
forest and non-forest locations. Two important predictors for both models were the probability of
528
trees and reflectance in the middle infrared spectrum (Tables 21 & 22).
529 530
The models for Forest Type across GKP built using the new remotely-sensed data sets including
531
LiDAR-derived variables provided similar accuracies (63.6% and 64.9%, Models 7 & 8 respectively,
532
Tables 23 & 24). Both models were more accurate than the original models for Forest Type across
533
GKP. The random forest provided similar accuracies for river red gum woodland and river red gum Improving Modelling of Forest Types and Stand Condition 22
534
forest, more accurate predictions of non-forest locations, but a lower accuracy for black box
535
woodlands than the neural network (Tables 24 & 25). Important predictor variables for the neural
536
network included the red edge spectrum in 2009 from Rapideye, the red spectrum from the
537
SPOTMap and the probability of trees (Table 25). The most important predictors for the random
538
forest were the blue and red spectra in 2009 from Rapideye and the HH polarisation from the
539
PALSAR imagery (Table 26). In both models, LiDAR-derived variables were not important predictors
540
of Forest Type.
541 542
In summary, random forests consistently provided slightly better test accuracies than neural
543
networks when using the same data set (Table 27). The inclusion of the new data sets substantially
544
improved the predictions for the Forest Types relative to the original data set for the overall models
545
and the reduced models for GKP. Variables derived from Rapideye imagery were consistently
546
important predictors in the new models of Forest Type. Other important predictors in the new
547
models were from the SPOTMap, PALSAR and the historical Landsat composite (Tables 17, 18, 25 &
548
26). The most accurate model of Forest Types across the focal floodplains was the random forest
549
built using the new data sets (Model 4). The probability maps for Forest Types produced by Model 4
550
were included as new data sets for the modelling of stand condition.
551 552
Modelling Stand Condition
553
The neural networks consistently provided more accurate predictions of stand condition than the
554
random forests for the same data set (Table 28). The inclusion of the new data sets in the neural
555
network lead to a substantial improvement in the predictions of stand condition across the two focal
556
floodplains (compare Models 9 & 11, R2 = 0.60 and 0.81, respectively). Using new data sets including
557
LiDAR-derived variables in the neural network of stand condition at GKP did not provide an
558
improvement in predictions (compare Models 13 & 15, Table 28).
559 560
The models of stand condition for the two focal floodplains using the original data sets (Models 9 &
561
10) had the probability of trees and Forest Types as important predictors (Tables 29 & 30). When the
562
new data sets were included in models of stand condition for the two focal floodplains (Models 11 &
563
12), the most important predictors were HV polarisation from the PALSAR imagery for the neural
564
network and the new model of tree probability for the random forest (Tables 31 & 32). Rapideye
565
variables were also important predictors, particularly for the random forest.
566 567
The original models for GKP (Models 13 & 14), like those for the combined floodplains, had the
568
probability of trees and Forest Types as important predictors but reflectance from the near infrared
569
spectrum from Landsat was also an important predictor (Tables 33 & 34). The new models for GKP Improving Modelling of Forest Types and Stand Condition 23
570
(Models 15 & 16) also had reflectance from the near infrared as an important predictor but from
571
Rapideye imagery instead of Landsat (Tables 35 & 36). Other important predictors included LiDAR
572
cover below 0.5 m and probability of non-forest in neural network and probability of trees in the
573
random forest.
Improving Modelling of Forest Types and Stand Condition 24
574
Table 11 Confusion matrix of Forest Types predicted by Model 1 (neural network using the original
575
data sets) for the samples used to test the model (N = 903 samples).
576 Observed Forest Type Predicted
Non-forest
Black box
Forest Type
RRG-black
RRG
box
woodland
RRG Forest
Non-forest
105
33
17
22
9
Black box
66
106
49
34
18
RRG-black box
4
9
24
27
15
RRG woodland
25
12
13
90
40
RRG forest
16
15
7
35
112
48.6%
60.6%
21.8%
43.3%
57.7%
Total
woodland
Accuracy
48.4%
577 578
Table 12 Confusion matrix of Forest Types predicted by Model 2 (random forest using the original
579
data sets) for the samples used to test the model (N = 903 samples).
580 Observed Forest Type Predicted
Non-forest
Black box
Forest Type
RRG-black
RRG
box
woodland
RRG Forest
Non-forest
132
43
22
23
16
Black box
34
89
30
27
24
13
12
40
32
12
RRG woodland
22
15
8
75
25
RRG forest
15
16
10
51
117
61.1%
50.9%
36.4%
36.1%
60.3%
RRG-black box woodland
Accuracy
Total
50.2%
581 Improving Modelling of Forest Types and Stand Condition 25
582 583 584 585
Table 13 Sensitivity analysis for variables used in Model 1 (neural network using the original data sets) to predict Forest Type. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable
Error ratio
LS(2003) FIR
4.06
LS(2003) Red
3.96
LS(2003) Blue
2.54
LS(2009) Blue
2.48
LS(2009) IR
2.18
LS(2009) Red
1.92
LS(2009) FIR
1.92
LS(2003) IR
1.92
LS(2003) Green
1.60
LS(2009) Green
1.32
LS(2003) NIR
1.23
Pr(MurrayTree)
1.15
LS(2009) NIR
1.04
586 587 588 589 590
Table 14 Sensitivity analysis for variables used in Model 2 (random forest using the original data sets) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable
% Forests used in
Pr(MurrayTree)
10.8
LS(2003) NIR
8.8
LS(2003) Blue
8.7
LS(2003) MIR
8.6
LS(2009) MIR
8.2
LS(2003) Red
8.1
LS(2003) Green
7.8
LS(2003) FIR
7.8
LS(2009) Blue
7.0
LS(2009) NIR
6.8
LS(2009) FIR
6.1
LS(2009) Red
5.9
LS(2009) Green
5.3
591 Improving Modelling of Forest Types and Stand Condition 26
592 593
Table 15 Confusion matrix of Forest Types predicted by Model 3 (neural network using the new data sets) for the samples used to test the model (N = 903 samples).
594 Observed Forest Type Predicted
Non-forest
Black box
Forest Type
RRG-black
RRG
box
woodland
RRG Forest
Non-forest
113
25
18
19
9
Black box
56
125
20
34
10
10
10
58
26
13
RRG woodland
22
5
8
84
32
RRG forest
15
10
6
45
130
52.3%
71.4%
52.7%
40.4%
67.0%
RRG-black box woodland
Accuracy
Total
56.5%
595 596
Table 16 Confusion matrix of Forest Types predicted by Model 4 (random forest using the new data
597
sets) for the samples used to test the model (N = 903 samples).
598 Observed Forest Type Predicted
Non-forest
Black box
Forest Type
RRG-black
RRG
box
woodland
RRG Forest
Non-forest
134
27
13
11
8
Black box
37
124
24
24
8
9
7
49
33
14
RRG woodland
21
3
12
85
24
RRG forest
15
14
12
55
140
62.0%
70.8%
44.5%
40.9%
72.2%
RRG-black box woodland
Accuracy
Total
58.9%
599 Improving Modelling of Forest Types and Stand Condition 27
600 601 602 603
Table 17 Sensitivity analysis for variables used in Model 3 (neural network using the new data sets) to predict Forest Type. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable
Error ratio
RE (2009) Red edge mean
4.298
SPOTMap Green mean
4.406
SPOTMap Red mean
5.786
HLS Sum_FIR
3.305
RE (2009) Red mean
3.829
HLS Sum_Blue
1.980
HLS Spr_Green
2.217
RE (2010) Red edge mean
2.487
HLS Spr_FIR
2.423
SPOTMap Blue mean
2.612
HLS Spr_MIR
3.050
HLS Sum_MIR
2.805
RE (2010) Red mean
2.121
HLS Sum_Red
2.674
HLS Sum_Green
3.003
HLS Sum_SATVI
2.023
HLS Spr_NIR
1.746
HLS Spr_Red
1.595
HLS Sum_NIR
1.509
RE (2009) NIR mean
1.765
HLS Spr_Blue
1.106
HLS Spr_SATVI
1.392
HLS Spr_SLAVI
1.338
RE (2010) NIR mean
1.200
HLS Sum_SLAVI
1.221
HLS Spr_NDMI
1.241
HLS Sum_NDMI
1.233
Pr(MurrayTree_new)
1.087
604
Improving Modelling of Forest Types and Stand Condition 28
605 606 607 608
Table 18 Sensitivity analysis for variables used in Model 4 (random forest using the new data sets) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable RE (2009) Blue mean RE (2009) Red edge mean RE (2009) Red mean RE (2009) Green mean RE (2009) NIR mean RE (2009) NDVI SD RE (2009) Blue SD RE (2010) NIR mean Pr(MurrayTree_new) RE (2009) NDVI mean RE (2010) Red edge mean RE (2009) Green SD PALSAR Murray LL_HH RE (2010) Red mean RE (2009) NIR SD RE (2010) Green mean RE (2010) NIR SD RE (2009) Red SD PALSAR Murray LL_HV RE (2012) NIR mean RE (2010) Blue SD RE (2010) NDVI mean RE (2012) Blue mean RE (2010) Blue mean RE (2009) Red edge SD PALSAR SEAus LL_HH RE (2012) Green mean RE (2012) NDVI mean RE (2010) Red edge SD RE (2012) NIR SD RE (2012) NDVI SD RE (2010) Green SD RE (2010) NDVI SD RE (2012) Blue SD RE (2012) Green SD RE (2012) Red mean RE (2012) Red edge mean PALSAR SEAus LL_HV RE (2010) Red SD RE (2012) Red SD RE (2012) Red edge SD SPOTMap Green mean HLS Win_Blue SPOTMap Blue mean SPOTMap Red mean HLS Win_NIR HLS Sum_NIR HLS Sum_NDSI
% Forests used in 1.93 1.82 1.75 1.72 1.68 1.65 1.64 1.64 1.56 1.55 1.50 1.48 1.48 1.47 1.46 1.45 1.44 1.42 1.41 1.39 1.38 1.37 1.37 1.36 1.35 1.33 1.30 1.30 1.29 1.28 1.21 1.19 1.18 1.17 1.17 1.13 1.13 1.12 1.06 1.06 1.00 0.99 0.96 0.93 0.92 0.92 0.92 0.91
609
Improving Modelling of Forest Types and Stand Condition 29
610 611 612 613
Table 18 (cont.) Sensitivity analysis for variables used in Model 4 (random forest using the new data sets) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable HLS Spr_EVI HLS Spr_SATVI HLS Win_FIR HLS Spr_NDSI HLS Sum_SATVI HLS Win_SLAVI HLS Spr_NIR HLS Win_NDVI HLS Win_Red HLS Spr_Red HLS Win_NDSI SPOTMap Red SD HLS Win_EVI HLS Sum_MIR HLS Win_Green HLS Spr_NDMI HLS Sum_EVI HLS Aut_NDSI HLS Spr_Blue HLS Win_SATVI SPOTMap Green SD HLS Spr_Green HLS Spr_SLAVI HLS Sum_Blue HLS Aut_NIR HLS Win_NDMI HLS Spr_FIR SPOTMap Blue SD HLS Sum_Green HLS Sum_NDMI HLS Win_MIR HLS Aut_EVI HLS Aut_SATVI HLS Sum_FIR HAR HLS Sum_SLAVI HLS Spr_MIR HLS Sum_Red HLS Aut_NDMI HLS Aut_Red HLS Aut_MIR HLS Sum_NDVI HLS Aut_NDVI HLS Spr_NDVI HLS Aut_SLAVI HLS Aut_Blue HLS Aut_Green HLS Aut_FIR
% Forests used in 0.90 0.90 0.89 0.88 0.88 0.87 0.86 0.86 0.84 0.84 0.84 0.83 0.83 0.82 0.81 0.81 0.81 0.81 0.79 0.79 0.78 0.78 0.78 0.77 0.77 0.77 0.76 0.75 0.74 0.74 0.73 0.73 0.73 0.72 0.70 0.69 0.68 0.68 0.68 0.66 0.66 0.63 0.63 0.62 0.62 0.60 0.57 0.47
614
Improving Modelling of Forest Types and Stand Condition 30
615 616
Table 19 Confusion matrix of Forest Types predicted by Model 5 (neural network using the original data sets for GKP only) for the samples used to test the model (N = 404 samples).
617 Observed Forest Type Predicted Forest
Non-forest
Black box
RRG woodland
RRG Forest
Non-forest
41
13
7
5
Black box
43
55
18
16
RRG woodland
14
11
52
13
RRG forest
14
12
24
66
36.6%
60.4%
66.0%
51.5%
Total
Type
Accuracy
53.0
618
Table 20 Confusion matrix of Forest Types predicted by Model 6 (random forest using the original
619
data sets for GKP only) for the samples used to test the model (N = 404 samples).
620 Observed Forest Type Predicted Forest
Non-forest
Black box
RRG woodland
RRG Forest
Non-forest
61
25
14
11
Black box
25
43
12
15
RRG woodland
13
8
54
13
RRG forest
13
15
21
61
54.5%
47.3%
53.5%
61.0%
Total
Type
Accuracy
54.2%
621
Improving Modelling of Forest Types and Stand Condition 31
622 623 624 625
Table 21 Sensitivity analysis for variables used in Model 5 (neural network using the original data sets for GKP only) to predict Forest Type. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable
Error ratio
Pr(MurrayTree)
3.32
LS(2003) FIR
2.49
LS(2003) MIR
1.77
LS(2009) MIR
1.40
LS(2009) Red
1.36
LS(2009) NIR
1.31
LS(2009) Green
1.18
LS(2003) Green
1.14
LS(2009) Blue
1.11
LS(2009) FIR
1.08
LS(2003) Red
1.08
LS(2003) NIR
1.07
LS(2003) Blue
1.04
626 627 628 629 630
Table 22 Sensitivity analysis for variables used in Model 6 (random forest using the original data sets for GKP only) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable
% Forests used in
Pr(MurrayTree)
11.25
LS(2003) MIR
8.89
LS(2003) Red
8.76
LS(2003) FIR
8.65
LS(2009) MIR
8.51
LS(2003) NIR
8.1
LS(2003) Blue
7.8
LS(2003) Green
7.26
LS(2009) NIR
7.16
LS(2009) FIR
6.51
LS(2009) Red
6.26
LS(2009) Blue
6.11
LS(2009) Green
4.75
631 Improving Modelling of Forest Types and Stand Condition 32
632 633
Table 23 Confusion matrix of Forest Types predicted by Model 7 (neural network using the new data sets including LiDAR for GKP only) for the samples used to test the model (N = 404 samples).
634 Observed Forest Type Predicted Forest
Non-forest
Black box
RRG woodland
RRG Forest
Non-forest
62
11
10
8
Black box
24
67
14
5
RRG woodland
18
6
60
19
RRG forest
8
7
17
68
55.4%
73.6%
59.4%
68.0%
Total
Type
Accuracy
63.6%
635
Table 24 Confusion matrix of Forest Types predicted by Model 8 (random forest using the new data
636
sets including LiDAR for GKP only) for the samples used to test the model (N = 404 samples).
637 Observed Forest Type Predicted Forest
Non-forest
Black box
RRG woodland
RRG Forest
Non-forest
76
20
13
12
Black box
19
59
8
5
RRG woodland
13
5
57
13
RRG forest
4
7
23
70
67.9%
64.8%
57.0%
69.3%
Total
Type
Accuracy
64.9%
638
Improving Modelling of Forest Types and Stand Condition 33
639 640 641 642
Table 25 Sensitivity analysis for variables used in Model 7 (neural network using the new data sets including LiDAR for GKP only) to predict Forest Type. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable RE (2009) Red edge mean SPOTMap Red mean
Error ratio 1.636 1.453
Pr(MurrayTree_new)
1.410
HLS Sum_Blue
1.362
RE (2009) MIR mean
1.291
HLS Spr_Green
1.276
HLS Sum_MIR
1.269
RE (2009) Red mean
1.265
RE (2010) Red mean
1.247
HLS Spr_MIR
1.230
HLS Spr_FIR
1.227
SPOTMap Green mean
1.210
SPOTMap Blue mean
1.185
HLS Sum_MIR
1.181
HLS Sum_SATVI
1.164
HLS Spr_NDMI
1.143
HLS Sum_NDMI
1.131
HLS Spr_SATVI
1.127
LiDAR cover 0.5 m
1.124
HLS Spr_SLAVI
1.116
HLS Spr_Red
1.113
LiDAR cover 16.5 m
1.108
LiDAR DEM
1.101
LiDAR cover 8.5 m
1.100
RE (2010) NIR mean
1.096
HLS Sum_NIR
1.095
HLS Spr_NIR
1.081
HLS Sum_SLAVI
1.068
RE (2010) MIR mean
1.064
LiDAR cover 4.5 m
1.033
HLS Sum_Green
1.030
HLS Spr_Blue
1.027
HLS Sum_Red
1.024
LiDAR cover 2.5 m
1.019
LiDAR cover 32.5 m
1.003
LiDAR cover 1.5 m
1.002
Improving Modelling of Forest Types and Stand Condition 34
643 644 645 646
Table 26 Sensitivity analysis for variables used in Model 8 (random forest using the new data sets including LiDAR for GKP only) to predict Forest Type. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable RE (2009) Blue mean RE (2009) Red mean PALSAR SEAus LL_HH RE (2012) Blue mean RE (2012) NDVI mean RE (2009) Red edge mean RE (2009) Red SD RE (2009) NIR mean Pr(MurrayTree_new) RE (2009) Red edge SD RE (2009) NDVI SD RE (2009) Green mean LiDAR DEM RE (2009) Blue SD RE (2009) Green SD RE (2010) Green mean RE (2010) NIR SD RE (2012) NIR mean RE (2010) Blue mean RE (2010) NIR mean RE (2012) Red mean RE (2009) NIR SD RE (2010) Red edge mean RE (2010) Green SD HLS Spr_SATVI RE (2009) NDVI mean RE (2012) Blue SD HLS Spr_NIR RE (2010) NDVI mean PALSAR SEAus LL_HV RE (2010) Red SD RE (2012) Red SD HLS Win_Blue HLS Sum_SATVI RE (2012) NIR SD PALSAR Murray LL_HV HLS Sum_NDSI RE (2010) Blue SD LiDAR cover 16.5 m RE (2012) Green mean HLS Win_FIR RE (2012) NDVI SD HLS Aut_NDSI HLS Win_NDVI RE (2012) Red edge mean PALSAR Murray LL_HH HLS Spr_NDSI HLS Win_NDSI RE (2010) Red mean RE (2012) Green SD LiDAR cover 8.5 m
% Forests used in 1.78 1.77 1.64 1.60 1.58 1.53 1.51 1.51 1.49 1.48 1.48 1.47 1.47 1.40 1.32 1.30 1.30 1.30 1.28 1.26 1.26 1.24 1.24 1.23 1.23 1.22 1.20 1.20 1.16 1.16 1.14 1.13 1.13 1.10 1.09 1.07 1.07 1.05 1.05 1.03 1.03 1.02 1.02 0.99 0.98 0.98 0.98 0.96 0.94 0.93 0.93 Improving Modelling of Forest Types and Stand Condition 35
647
Table 26(cont.) Sensitivity analysis for variables used in Model 8 Variable HLS Win_NDMI HLS Aut_SATVI HLS Spr_EVI HLS Win_NIR HLS Aut_Blue RE (2010) Red edge SD HLS Win_SATVI HLS Aut_SLAVI HLS Sum_EVI HLS Spr_MIR RE (2012) Red edge SD HLS Win_Green HLS Win_Red HLS Win_EVI LiDAR cover 4.5 m RE (2010) NDVI SD HLS Spr_NDMI HAR HLS Spr_Blue HLS Aut_EVI HLS Sum_NIR HLS Win_MIR HLS Spr_NDVI SPOTMap Blue mean HLS Aut_NIR HLS Spr_Red HLS Aut_Red HLS Sum_SLAVI HLS Spr_Green HLS Sum_Blue HLS Aut_Green SPOTMap Red mean SPOTMap Green mean HLS Aut_MIR HLS Win_SLAVI HLS Sum_NDVI HLS Sum_Green HLS Spr_SLAVI SPOTMap Red SD HLS Spr_FIR HLS Sum_MIR SPOTMap Green SD HLS Aut_NDVI HLS Sum_NDMI SPOTMap Blue SD LiDAR cover 2.5 m LiDAR cover 0.5 m HLS Sum_FIR HLS Sum_Red LiDAR cover 1.5 m HLS Aut_FIR HLS Aut_NDMI LiDAR cover 32.5 m
% Forests used in 0.92 0.92 0.90 0.89 0.89 0.88 0.88 0.88 0.85 0.84 0.82 0.82 0.82 0.82 0.82 0.81 0.81 0.79 0.79 0.79 0.77 0.76 0.75 0.73 0.73 0.72 0.71 0.71 0.69 0.69 0.69 0.68 0.67 0.67 0.67 0.64 0.63 0.63 0.62 0.60 0.59 0.58 0.58 0.56 0.51 0.51 0.50 0.47 0.46 0.46 0.45 0.41 0.03 Improving Modelling of Forest Types and Stand Condition 36
648
Table 27 Details and summary of the model fits (R2) for the various models used to predict Forest Type.
649 Model
Model type
Data sets included
Samples used
Area sampled
Overall
Test accuracy
accuracy 1
Neural network
Landsat 2003 and 2009, Pr(MurrayTree_new)
4500
Chowilla and GKP
53.1%
48.4%
2
Random forest
Landsat 2003 and 2009, Pr(MurrayTree_new)
4500
Chowilla and GKP
82.2%
50.2%
3
Neural network
Historical Landsat, Rapideye, PALSAR, HAR,
4500
Chowilla and GKP
60.6%
56.5%
4500
Chowilla and GKP
88.0%
58.9%
SPOTMAP & Pr(MurrayTree_new) 4
Random forest
Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new)
5
Neural network
Landsat 2003 and 2009, Pr(MurrayTree_new)
2000
GKP
57.0%
53.0%
6
Random forest
Landsat 2003 and 2009, Pr(MurrayTree_new)
2000
GKP
83.1%
54.2%
7
Neural network
Historical Landsat, Rapideye, PALSAR, HAR,
2000
GKP
69.2%
63.6%
2000
GKP
89.5%
64.9%
SPOTMAP, Pr(MurrayTree_new) & LiDAR 8
Random forest
Historical Landsat, Rapideye, PALSAR, HAR, SPOTMAP & Pr(MurrayTree_new) & LiDAR
650
Improving Modelling of Forest Types and Stand Condition 37
651
Table 28 Details and summary of the model fits (R2) for the various models used to predict Stand Condition. Model Model type Data sets included Samples
Area sampled
9
Neural network
Landsat, Pr(MurrayTree_new), original Pr(Forest Types)
75
10
Random forest
Landsat, Pr(MurrayTree_new), original Pr(Forest Types)
11
Neural network
Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye,
Overall R2
Test R2
Chowilla and GKP
0.69
0.60
75
Chowilla and GKP
0.81
0.55
75
Chowilla and GKP
0.77
0.81
75
Chowilla and GKP
0.81
0.59
PALSAR, HAR & SPOTMAP 12
Random forest
Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR & SPOTMAP
13
Neural network
Landsat, Pr(MurrayTree_new), original Pr(Forest Types)
50
GKP
0.69
0.73
14
Random forest
Landsat, Pr(MurrayTree_new), original Pr(Forest Types)
50
GKP
0.77
0.36
15
Neural network
Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye,
50
GKP
0.67
0.72
50
GKP
0.76
0.52
PALSAR, HAR, SPOTMAP & LiDAR 16
Random forest
Landsat, Pr(MurrayTree_new), new Pr(Forest Types), Rapideye, PALSAR, HAR, SPOTMAP & LiDAR
Improving Modelling of Forest Types and Stand Condition 38
Table 29 Sensitivity analysis for variables used in Model 9 (neural network using Landsat, and the original Forest Type probability maps) to predict Stand Condition. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type.
Variable Pr(MurrayTree) Pr(RF)
Error ratio 1.67 1.42
Pr(RGBB)
1.39
LS(2009) NIR
1.28
LS(2009) FIR
1.13
LS(2009) MIR
1.08
LS(2009) Green
1.06
LS(2010) NIR
1.05
LS(2009) NDVI
1.04
LS(2010) FIR
1.02
LS(2009) Red
1.01
Pr(BX)
1.00
LS(2010) MIR
1.00
LS(2010) Blue
1.00
LS(2010) Green
1.00
LS(2010) Red
1.00
Pr(RW)
1.00
LS(2009) Blue
0.99
Pr(BB)
0.98
Improving Modelling of Forest Types and Stand Condition 39
Table 30 Sensitivity analysis for variables used in Model 10 (random forest using Landsat, and the original Forest Type probability maps) to predict Stand Condition. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable Pr(MurrayTree) Pr(RW)
% Forest used in 8.96 7.51
Pr(RF)
6.94
Pr(RGBB)
6.65
LS(2009) Blue
6.65
LS(2009) NDVI
6.65
LS(2010) Blue
6.36
LS(2010) MIR
5.20
LS(2010) Blue
4.91
LS(2009) MIR
4.34
LS(2009) FIR
4.05
LS(2009) NIR
3.76
Pr(BB)
3.47
LS(2009) Green
3.47
LS(2009) Red
3.47
LS(2010) NDVI
3.47
Pr(BX)
3.18
LS(2010) NIR
3.18
LS(2010) FIR
3.18
LS(2010) Red
2.60
LS(2010) Green
2.02
Improving Modelling of Forest Types and Stand Condition 40
Table 31 Sensitivity analysis for variables used in Model 11 (neural network using Landsat, the new Forest Type probability maps, Rapideye, PALSAR, HAR & SPOTMAP) to predict Stand Condition. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable Error ratio PALSAR SEAus LL_HV 1.23 RE (2009) MIR mean 1.20 RE (2009) NDVI SD 1.20 Pr(RF_M4) 1.17 RE (2009) NDVI mean 1.15 SPOTMap Blue SD 1.12 Pr(RGBB_M4) 1.11 PALSAR SEAus LL_HH 1.10 Pr(Non-forest_M4) 1.07 Pr(BB_M4) 1.06 SPOTMap Red mean 1.05 RE (2010) MIR mean 1.05 Pr(RW_M4) 1.04 PALSAR Murray LL_HV 1.04 RE (2009) Red SD 1.03 RE (2009) Red edge mean 1.03 SPOTMap Green mean 1.03 SPOTMap Blue mean 1.03 RE (2010) Red edge SD 1.03 SPOTMap Green SD 1.03 RE (2009) Blue SD 1.03 RE (2010) Red SD 1.03 RE (2009) Blue mean 1.02 SPOTMap Red SD 1.02 RE (2010) Blue SD 1.02 Pr(MurrayTree_new) 1.02 RE (2010) NDVI mean 1.01 PALSAR Murray LL_HH 1.01 RE (2009) NIR SD 1.01 RE (2010) Red edge mean 1.01 HAR 1.01 RE (2009) Green SD 1.01 RE (2010) NIR SD 1.01 RE (2010) Green SD 1.01 RE (2009) Red edge SD 1.00 RE (2010) Red mean 1.00 RE (2010) NDVI SD 1.00 RE (2009) Green mean 1.00 RE (2010) Green mean 1.00 RE (2009) Red mean 1.00 RE (2010) Blue mean 0.99
Improving Modelling of Forest Types and Stand Condition 41
Table 32 Sensitivity analysis for variables used in Model 12 (random forest using Landsat, the new Forest Type probability maps, Rapideye, PALSAR, HAR & SPOTMAP) to predict Stand Condition. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable Pr(MurrayTree_new) RE (2009) NIR mean RE (2009) Blue mean RE (2009) Green mean RE (2009) Red mean RE (2009) Green SD RE (2009) Red SD RE (2009) NDVI mean RE (2009) Red edge mean RE (2009) Red edge SD RE (2009) NIR SD RE (2010) NDVI mean RE (2010) NIR SD RE (2010) Blue SD PALSAR Murray LL_HH RE (2009) Blue SD RE (2010) Green mean RE (2010) Red edge mean RE (2010) Red edge SD RE (2009) NDVI SD RE (2010) NIR mean PALSAR SEAus LL_HH SPOTMap Blue SD SPOTMap Green mean RE (2010) Blue SD RE (2010) Red mean Pr(RW_M4) PALSAR Murray LL_HV SPOTMap Blue mean SPOTMap Red SD RE (2010) Red mean RE (2010) NDVI SD Pr(BB_M4) SPOTMap Red mean RE (2010) Green SD HAR PALSAR SEAus LL_HV SPOTMap Green SD Pr(Non-forest_M4) Pr(BB_M4) Pr(RF_M4)
% Forest used in 8.75 5.83 5.54 4.37 4.37 4.08 4.08 3.79 3.50 3.50 3.50 3.50 3.21 2.92 2.62 2.62 2.62 2.62 2.62 2.04 2.04 1.75 1.75 1.75 1.75 1.46 1.46 1.17 1.17 1.17 1.17 1.17 0.87 0.87 0.87 0.87 0.58 0.58 0.58 0.58 0.29
Improving Modelling of Forest Types and Stand Condition 42
Table 33 Sensitivity analysis for variables used in Model 13 (neural network using Landsat and the original Forest Type probability maps for GKP) to predict Stand Condition. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable LS(2009) NIR Pr(MurrayTree) Pr(RF) LS(2010) NIR Pr(BX) LS(2009) FIR LS(2009) MIR Pr(BB) LS(2009) NDVI LS(2010) NDVI LS(2009) Blue Pr(RW) LS(2010) MIR LS(2010) Red LS(2010) FIR LS(2010) Blue LS(2009) Red LS(2010) Green LS(2009) Green Pr(RGBB)
Error ratio 1.39 1.39 1.17 1.08 1.05 1.05 1.05 1.02 1.02 1.02 1.02 1.01 1.01 1.00 1.00 1.00 0.99 0.99 0.98 0.98
Improving Modelling of Forest Types and Stand Condition 43
Table 34 Sensitivity analysis for variables used in Model 14 (random forest using Landsat and the original Forest Type probability maps for GKP) to predict Stand Condition. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable Pr(BX) LS(2009) NIR Pr(RF) Pr(RW) Pr(RGBB) LS(2009) NDVI Pr(MurrayTree) LS(2009) MIR Pr(BX) LS(2010) NIR LS(2010) NDVI LS(2009) Blue LS(2009) FIR LS(2009) Red LS(2010) MIR LS(2010) Red LS(2010) FIR LS(2010) Blue LS(2010) Green LS(2009) Green
% Forest used in 8.22 8.22 7.31 6.85 6.85 6.85 6.39 5.94 5.48 4.57 4.57 4.11 4.11 3.65 3.65 3.2 3.2 2.74 2.28 1.83
Improving Modelling of Forest Types and Stand Condition 44
Table 35 Sensitivity analysis for variables used in Model 15 (neural network using Landsat, the new Forest Type probability maps, Rapideye, PALSAR, HAR, SPOTMAP and LiDAR for GKP) to predict Stand Condition. The higher an error ratio is over 1.0, the more important a variable is to reducing the error in predicting Forest Type. Variable RE (2009) NIR mean LiDAR cover 0.5 m Pr(Non-forest_M4) LiDAR cover 8.5 m Pr(BB_M4) Pr(RW_M4) RE (2010) Blue mean PALSAR Murray LL_HH RE (2010) Blue SD LiDAR cover 16.5 m RE (2009) NDVI mean RE (2009) Blue mean PALSAR Murray LL_HV Pr(RGBB_M4) PALSAR SEAus LL_HV LS(2010) NIR LS(2010) Green LiDAR cover 1.5 m RE (2010) Red mean SPOTMap Blue mean LiDAR DEM RE (2010) NDVI mean SPOTMap Green SD RE (2010) Green SD RE (2010) Red edge mean RE (2009) Green mean SPOTMap Red mean SPOTMap Red SD SPOTMap Green mean Pr(MurrayTree_new) LiDAR cover 4.5 m RE (2009) Red edge mean RE (2009) Red mean SPOTMap Blue SD PALSAR SEAus LL_HH RE (2010) Red edge SD LiDAR cover 2.5 m HAR RE (2010) Red SD RE (2010) NIR SD RE (2009) NDVI SD RE (2009) Green SD RE (2009) NIR SD RE (2009) Red SD Pr(RF_M4) RE (2009) Red edge SD RE (2009) Blue SD RE (2010) NDVI SD LiDAR cover 32.5 m
Error ratio 1.052 1.046 1.029 1.019 1.019 1.014 1.013 1.013 1.013 1.012 1.011 1.008 1.008 1.007 1.007 1.006 1.006 1.005 1.004 1.004 1.004 1.003 1.003 1.002 1.002 1.002 1.002 1.001 1.001 1.001 1.001 1.000 1.000 1.000 0.999 0.999 0.998 0.998 0.998 0.997 0.995 0.995 0.994 0.991 0.989 0.989 0.987 0.982 0.975
Improving Modelling of Forest Types and Stand Condition 45
Table 36 Sensitivity analysis for variables used in Model 16 (random forest using Landsat, the new Forest Type probability maps, Rapideye, PALSAR, HAR, SPOTMAP and LiDAR for GKP) to predict Stand Condition. Sensitivity for random forests was assessed by the proportion of forests that used a variable. Variable RE (2009) NIR SD RE (2009) NIR mean Pr(MurrayTree_new) RE (2009) Blue SD RE (2009) Green SD SPOTMap Red mean RE (2009) Red edge SD RE (2010) NDVI mean RE (2009) Blue mean RE (2009) Red SD RE (2009) NDVI mean RE (2010) Green mean RE (2010) Blue SD RE (2010) Red mean RE (2009) Green mean RE (2010) Blue mean RE (2010) NIR mean RE (2009) Red mean RE (2010) Red edge mean RE (2010) NIR SD HAR SPOTMap Blue mean RE (2009) Red edge mean RE (2010) Green SD RE (2010) NDVI SD PALSAR Murray LL_HH PALSAR SEAus LL_HV RE (2009) Red edge SD LiDAR cover 0.5 m LiDAR cover 4.5 m Pr(BB_M4) RE (2009) NDVI SD RE (2010) Red SD LiDAR cover 2.5 m LiDAR cover 8.5 m LiDAR DEM Pr(RW_M4) SPOTMap Green mean SPOTMap Green SD PALSAR SEAus LL_HH Pr(Non-forest_M4) Pr(RF_M4) PALSAR Murray LL_HV SPOTMap Red SD SPOTMap Blue SD LiDAR cover 1.5 m LiDAR cover 16.5 m LiDAR cover 32.5 m Pr(RGBB_M4)
% Forest used in 6.28 5.83 4.48 4.48 4.48 4.48 4.04 4.04 3.59 3.59 3.59 3.59 3.14 3.14 2.69 2.69 2.69 2.24 2.24 2.24 2.24 2.24 1.79 1.79 1.79 1.79 1.79 1.35 1.35 1.35 1.35 0.90 0.90 0.90 0.90 0.90 0.90 0.45 0.45 0.45 0.45 0.45 0 0 0 0 0 0 0
Improving Modelling of Forest Types and Stand Condition 46
Discussion Modelling Forest Type The probability maps for the Forest Types were improved by using an alternative modelling approach and new remotely-sensed data sets. The original model of Forest Types from the 2009 Condition Report (Cunningham et al., 2009b) was built using a neural network. Here, we found that using random forests slightly improved the accuracy of the Forest Type model compared with neural networks. Substantial improvements in the accuracy of the Forest Type model across the focal floodplains were achieved by using the new remotely-sensed data sets (50% to 59% using random forests, Table 27). Similarly, the Forest Type model restricted to Gunbower-Koondrook-Perricoota Forests was improved substantially by the inclusion of the new remotely-sensed data sets (54% to 65% using random forests, Table 27). New remotely-sensed data sets that were important predictors of Forest Type included variables derived from Rapideye, the historical Landsat composite, SPOTMap and PALSAR imagery. Imagery from the Rapideye and SPOT satellites have a much finer resolution, 5 m and 2.5 m respectively, than Landsat, which provides reflectance at a 25 m scale. The finer resolution of Rapideye and SPOTMap imagery would provide more accurate estimates of reflectance at the 25 m scale than Landsat imagery. This increased accuracy of reflectance measurements may have improved the accuracy of predictions for Forest Types. Rapideye provides the red edge spectral band that measures between 680-730 nm, which is predominantly not measured by Landsat. The red edge is valuable in determining the physiological condition of vegetation, as it is directly related to chlorophyll production (Boochs et al., 1990). It may be that the red edge provided useful information on the difference in the amount of canopy (woodland versus forest) or differences in spectral characteristics among species. Summer reflectance variables from the historical Landsat composite were important predictors in the neural networks of Forest Type. Given this composite was created from images over a long period (2000-2010), it would provide a more consistent image than individual scenes, which have a large amount of variation due to atmospheric conditions and sensor error. Having a more consistent measurement of reflectance across the floodplain provided better differentiation among the Forest Types. The same historical Landsat composite was used to successfully distinguish among stands of river red gum, black box and coolabah (Cunningham et al., 2013d).
Improving Modelling of Forest Types and Stand Condition 47
LiDAR and PALSAR data sets were included in the modelling to provide structural information beneath the canopy that reflectance data cannot provide. LiDAR derived-variables were not found to be good predictors of Forest Type at Gunbower-Koondrook-Perricoota Forests. This is surprising as we estimated the percentage cover at seven strata between 0 and 32.5 m. These cover variables were expected to help differentiate between different structural types (river red gum forest and woodland) due to differences in the canopy height and cover. In contrast, the HH polarisation from the PALSAR imagery was an important predictor of Forest Type in Gunbower-Koondrook-Perricoota Forests. Modelling Stand Condition Predictions of stand condition in 2010 for the 75 sites across the two focal floodplains were substantially more accurate from the neural networks than the random forests (Table 28). In a previous report (Cunningham et al., 2013b), we found that stand condition in 2010, based on the complete data set of 175 sites was best predicted by neural networks when Landsat imagery was used. However, random forests provided more accurate predictions than neural networks when Rapideye imagery was used. This demonstrates that the modelling approach that provides the most accurate predictions of stand condition is dependent on both the survey data set and remotelysensed data sets included in the modelling. Therefore, different modelling approaches should be explored whenever including new data sets. Important predictors of stand condition from the new remotely-sensed data sets included variables derived from Rapideye, PALSAR and LiDAR, and the new tree presence and Forest Type probabilities. As was found previously, Rapideye provides more useful spectral data for predicting stand condition than Landsat (Cunningham et al., 2013b). LiDAR and PALSAR data are likely to help differentiate between good and degraded condition stands, detecting the structurally complexity of good condition stands compared with stands with little canopy and/or branches. Similarly, the probabilities of trees and the probability of non-forest would be useful estimates of the amount of canopy and, therefore, the stand condition of a location. The important LiDAR variable was cover below 0.5 m suggesting stand condition is associated with differences in understorey structure. This is consistent with the observed increase in plant richness of the understorey with decreasing stand condition of river red gum forests (Horner et al., 2012).
Improving Modelling of Forest Types and Stand Condition 48
Conclusions The modelling reported here for two focal floodplains of the Murray River suggests that the prediction of stand condition across the Murray River could be improved by inclusion of new remotely-sensed data sets. The Stand Condition Tool could be improved by: 1. building new Forest Type extent maps. Currently, the Tool uses extents for the Forest Types that are based on polygons developed from aerial photography or predicted relationships with environmental variables. These Forest Type extents could be improved by modelling accurate location data against Rapideye, the historical Landsat composite, SPOTMap and PALSAR imagery; 2. building a new tree probability layer, like Pr(MurrayTree_new), for the whole Murray River floodplain using the historical Landsat composite and PALSAR data set; 3. rebuilding the models of stand condition that underlie the Tool using Rapideye, PALSAR, LiDAR and a new tree probability layer; 4. using several modelling approaches (e.g. random forest, neural networks) when building any new models to achieve the most accurate predictions.
Improving Modelling of Forest Types and Stand Condition 49
Acknowledgements This project was funded by the Murray-Darling Basin Authority as part of The Living Murray program. We appreciate all the continuing support and discussions from the Environmental Monitoring Team at the MDBA (Greg Raisin, Stuart Little, David Hohnberg and Anne Stensletten). We thank Anisul Islam for organising the ORGE Panel request for remote sensing imagery. We also appreciate the continued support of the Icon Site Management agencies involved with the project, Forests NSW, Goulburn-Broken CMA, Mallee CMA, North Central CMA, South Australian Department of Environment, Water and Natural Resources, and Victorian Department of Environment and Primary Industries. AAM Group, particularly Ken Gillan, for supplying the Rapideye mosaic across the Murray River floodplain for 2009 and 2010. Astrium Services for supplying for supplying the SPOTMaps for Chowilla Floodplain and Gunbower-Koondrook-Perricoota Forests.
Improving Modelling of Forest Types and Stand Condition 50
References Boochs, F., Kupfer, G., Dockter, K. & Kuhbauch, W. (1990) Shape of the red edge as vitality indicator for plants. International Journal of Remote Sensing, 11, 1741-1753. Cunningham, S.C., Griffioen, P. & White, M. (2012) Potential For Additional Remotely Sensed Data To Improve Mapping Of Stand Condition Across The Living Murray Icon Sites. A Milestone Report to the Murray-Darling Basin Authority as part of Contract MD1114. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Read, J., Baker, P.J. & Mac Nally, R. (2007) Quantitative assessment of stand condition and its relationship to physiological stress in stands of Eucalyptus camaldulensis (Myrtaceae) in southeastern Australia. Australian Journal of Botany, 55, 692-699. Cunningham, S.C., Mac Nally, R., Griffioen, P. & White, M. (2009a) Mapping the Condition of River Red Gum (Eucalyptus camaldulensis Dehnh.) and Black Box (Eucalyptus largiflorens F.Muell.) Stands in The Living Murray Icon Sites. A Milestone Report to the Murray-Darling Basin Authority as part of Contract MD1114. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Mac Nally, R., Griffioen, P. & White, M. (2009b) Mapping the Condition of River Red Gum and Black Box Stands in The Living Murray Icon Sites. Stand Condition Report 2009 (with modelled results for 2003 and 2008). Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Griffioen, P., White, M. & Mac Nally, R. (2011) Mapping the Condition of River Red Gum (Eucalyptus camaldulensis Dehnh.) and Black Box (Eucalyptus largiflorens F.Muell.) Stands in The Living Murray Icon Sites. Stand Condition Report 2010. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Griffioen, P., White, M. & Mac Nally, R. (2013a) Mapping the Condition of River Red Gum (Eucalyptus camaldulensis Dehnh.) and Black Box (Eucalyptus largiflorens F.Muell.) Stands in The Living Murray Icon Sites. Stand Condition Report 2012. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Griffioen, P., White, M. & Mac Nally, R. (2013b) Mapping the Condition of River Red Gum (Eucalyptus camaldulensis Dehnh.) and Black Box (Eucalyptus largiflorens F.Muell.) Stands in The Living Murray Icon Sites. Comparison of the predictive power of Landsat and Rapideye imagery, and validation of future predictions based on imagery only. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Griffioen, P., White, M. & Mac Nally, R. (2013c) A Tool for Mapping Stand Condition across the Floodplain Forests of the Living Murray Icon Sites. Murray-Darling Basin Authority, Canberra.
Improving Modelling of Forest Types and Stand Condition 51
Cunningham, S.C., White, M., Griffioen, P., Newell, G. & Mac Nally, R. (2013d) Mapping Floodplain Vegetation Types across the Murray-Darling Basin Using Remote Sensing. Murray-Darling Basin Authority, Canberra. Cunningham, S.C., Mac Nally, R., Read, J., Baker, P.J., White, M., Thomson, J.R. & Griffioen, P. (2009c) A robust technique for mapping vegetation condition across a major river system. Ecosystems, 12, 207-219. Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J., Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M., Peterson, A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire, R.E., Soberon, J., Williams, S., Wisz, M.S. & Zimmermann, N.E. (2006) Novel methods improve prediction of species' distributions from occurrence data. Ecography, 29, 129-151. Friedman, J.H. (2001) Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29, 1189–1232. Horner, G.J., Cunningham, S.C., Thomson, J.R., Baker, P.J. & Mac Nally, R. (2012) Forest structure, flooding and grazing predict understorey composition of floodplain forests in southeastern Australia. Forest Ecology and Management, 286, 148-158. Kocev, D., Vens, C., Struyf, J. & Džeroski, S. (2007) Ensembles of multi-objective decision trees. Machine Learning: ECML 2007. Proceedings of the 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007 (ed. by J. Kok, J. Koronacki, R. De Mántaras, S. Matwin, D. Mladenić and A. Skowron), pp. 624-631. Springer, Berlin. Margules_&_Partners (1990) Riparian Vegetation of the River Murray. Report prepared by Margules and Partners Pty. Ltd., P. & J. Smith Ecological Consultants and Department of Conservation Forests and Lands. Murray-Darling Basin Commission, Canberra. MDBC (2002) The Living Murray: a Discussion Paper on Restoring the Health of the River Murray. In, p. 94. Murray-Darling Basin Commission, Canberra. Özesmi, S.L., Tan, C.O. & Özesmi, U. (2006) Methodological issues in building, training and testing artificial neural networks in ecological applications. Ecological Modelling, 195, 83-93. Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986) Learning representations by back-propagating errors. Nature, 323, 533–536. StatSoft (2011) Statistica Version 10. StatSoft, Inc. www.statsoft.com. Stein, J.L. (2006) A Continental Landscape Framework for Systematic Conservation Planning for Australian Rivers and Streams. Available at http://hdl.handle.net/1885/49406. Australian National University, Canberra. ter Steege, H. (1996) WINPHOT 5.0: a programme to analyze vegetation indices, light and light quality from hemispherical photographs. In. Tropenbos Guyana Programme, Report 95-2., Tropenbos, Guyana. Improving Modelling of Forest Types and Stand Condition 52