Using Tree Clusters to Derive Forest Properties from Small ... - asprs

FL-03-05

11/11/06

7:46 AM

Page 1389

Using Tree Clusters to Derive Forest Properties from Small Footprint Lidar Data Zachary J. Bortolot

Abstract This paper describes a new object-oriented small footprint lidar algorithm in which the objects of interest are tree clusters. The algorithm first thresholds the lidar canopy height model (CHM) at two levels to produce tree cluster grids. Next, two metrics are calculated based on these grids. The metric values are used in a multiple regression equation to predict the forest parameter of interest. To set the two thresholds, an optimization algorithm is used in conjunction with training data consisting of subsets of the CHM in which the forest parameters are known through ground measurements. A test of the algorithm was performed using ground and lidar data from a non-intensively managed loblolly pine (Pinus taeda) plantation in Virginia. The accuracies of the lidar-based predictions of density (0.01 R2 0.80; 126 trees/ha RMSE 8,173 trees/ha) and biomass (0.04 R2 0.62; 12.4 t/ha RMSE 316.5 t/ha) depended on the combination of metrics used, whether trees with a diameter at breast height 10 cm were excluded from the analysis, and the number of plots used for training and testing. However, the fit between the ground measurements and tree cluster-based predictions generally exceeded the fit between ground measurements and the output from an individual tree-based algorithm tested using the same data (100 percent of comparable cases when density was predicted, 85 percent of comparable cases when biomass was predicted, based on the coefficient of determination and RMSE).

Introduction Object-oriented remote sensing processing techniques are designed to identify groups of adjacent pixels that represent a physical entity (object) that is of interest to the user (Geneletti and Gorte, 2003). Compared to techniques that do not attempt to identify groups of adjacent pixels, objectoriented approaches have both advantages and disadvantages. The first principal benefit is that the objects identified by the algorithm often correspond better to objects of interest to the user than individual pixels do because many objects cover more than one pixel (Blaschke et al., 2002). This makes it easier to incorporate the data into a GIS (Blaschke et al., 2002; Geneletti and Gorte, 2003). A second advantage is that information can be extracted from the objects that cannot be extracted from individual pixels. This information includes spatial attributes of the object such as area and compactness (Jensen, 2005, pp. 169–172) and sensor returns for just the objects of interest (e.g., Bortolot and Wynne, 2005). Assuming that the object is properly identified, the spatial attributes of the object should be relatively immune to minor sensor, site, or temporal differences. Disadvantages of object-oriented

techniques include that they are often computationally expensive and that the image pixels must be smaller than the object of interest. In forestry, object-oriented approaches have been used to process medium resolution optical satellite imagery (e.g., Hill, 1999; Dorren et al., 2003), high spatial resolution passive optical data (e.g., Gougeon, 1995; Wulder et al., 2000; Pekkarinen, 2002; Coops et al., 2004; Hyvönen et al., 2005), radar (e.g., Fosgate et al., 1997; Grover et al., 1999) and lidar (e.g., Hyyppä and Inkinen, 1999; Popescu et al., 2003; Coops et al., 2004). For these prior studies, the objects of interest have either been stands of trees or individual trees. In recent years, considerable interest has developed in the use of small footprint lidar data for estimating forest properties, due in large part to the success enjoyed by past studies (e.g., Persson et al., 2002; Lim et al., 2003; Popescu et al., 2004). To process the lidar data, two general approaches have been used. The first is to derive distributional metrics from either the raw returns or a canopy height model (CHM), and then create a regression equation to relate these values to the stand attribute of interest. The second approach is to use an object-oriented algorithm. Typically, the object of interest has been individual trees (e.g., Hyyppä and Inkinen, 1999; Popescu et al., 2003; Bortolot and Wynne, 2005), but in a few studies (e.g., Diedershagen et al., 2003; van Aardt and Wynne, 2003) the use of stands of trees as the object has been investigated. Past studies using individual trees as objects have shown that volume and biomass (Hyyppä and Inkinen, 1999; Popescu et al., 2004; Takahashi et al., 2005), crown width (Popescu et al., 2003), and height (Hyyppä and Inkinen, 1999; Popescu et al., 2002; McCombs et al., 2003) can be predicted accurately. Density has been somewhat harder to measure accurately. McCombs et al. (2003) found that their algorithm missed 13 percent of trees in a low-density pine plantation and 35 percent of the trees in a high-density plantation. Persson et al. (2002) found that 29 percent of the trees in their study area were missed by their algorithm, and Takahashi et al. (2005) found that 14 to 31 percent of the trees were missed or misidentified. Studies that have examined the correlations between actual and predicted tree densities have often found low coefficients of determination; Popescu et al. (2004) found an R2 value of 0.26, and Bortolot and Wynne (2005) found R2 values between 0 and 0.61. These results suggest that in some situations, it is difficult to locate individual trees reliably, and to identify meaningful stand density differences. In order to address the problem of locating individual trees and identifying stand density differences, a new object

Photogrammetric Engineering & Remote Sensing Vol. 72, No. 12, December 2006, pp. 1389–1397. Institute for Regional Analysis and Public Policy, Morehead State University, 100 Lloyd Cassity Building, Morehead, KY 40351 ([email protected]). PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

0099-1112/06/7212–1389/$3.00/0 © 2006 American Society for Photogrammetry and Remote Sensing D e c e m b e r 2 0 0 6 1389

FL-03-05

11/11/06

7:46 AM

Page 1390

Figure 1. (a) a lidar canopy height model (CMH) corresponding to plot 1. It has a 0.5 m spatial resolution. Brighter tones in the CMH correspond to areas of the grid with a taller canopy, and (b) a grid created by applying a threshold to the CMH. The white areas are pixels that exceed the threshold and represent tree clusters.

is proposed: the tree cluster. A tree cluster is a group of connected tree crowns, and examples of tree clusters can be seen in Figure 1. Although to the author’s knowledge tree clusters have not been used as objects in any prior remote sensing studies, tree clusters are closely related to tree cover pattern (Ministry of Sustainable Resource Management, 2002), an attribute some photointerpreters record when manually interpreting aerial photographs in order to assess a stand’s suitability as wildlife habitat. Using tree clusters as an object allows the use of object-based properties for predicting forest attributes of interest, yet does not require the program to find the locations and boundaries of individual trees, which is often a computationally expensive and error prone task. It may also require a lower point density than that required for an individual tree-based approach, thereby reducing data costs and providing an alternative to statistical area-based techniques (e.g., Næsset, 1997) which are currently seen as the best option for low point density data. Not identifying individual trees is not problematic in many cases, since the majority of forest managers manage at the stand rather than the individual tree level. This paper has three major objectives which are: (a) to develop an algorithm based on tree clusters; (b) to assess whether the algorithm can be used to predict forest density and biomass accurately and to compare the results to those obtained using other small footprint lidar processing techniques; and (c) to evaluate which combination of clusterbased metrics work best for predicting density and biomass.

Algorithm Description Algorithm Principles Although there are multiple means of finding tree clusters, one of the simplest is to create a CHM and consider all pixels in the CHM that are above a threshold to be part of a tree cluster, and those that are below the threshold to not be part of a tree cluster. This procedure is computationally efficient and has been shown to separate treed from non-treed areas effectively (Næsset, 1997; Lim and Treitz, 2004). Figure 1 shows a lidar CHM after a threshold has been applied. Once the tree clusters have been identified, a number of metrics can be calculated that relate to forest properties of interest and are independent of the areal extent of the CHM being analyzed. For this paper, four metrics are used: (a) the 1390 D e c e m b e r 2 0 0 6

Figure 2. This figure illustrates the idea of a core pixel by showing a core pixel (black) and edge pixels (gray). In this paper, for a pixel to be considered to be a core pixel, all eight of its neighbors must be part of a tree cluster.

percentage of pixels in the CHM that are part of a tree cluster; (b) the percentage of the pixels in the tree clusters that are core pixels; (c) the mean canopy height of the cluster pixels; and (d) the standard deviation of the canopy heights within the clusters. The percentage of pixels in the grid that are part of a tree cluster (pct_thresh) is similar to the canopy cover density measurement that is commonly used in lidar research (e.g., Nilsson, 1996; Næsset, 1997) and to the crown closure measurement that is commonly used in aerial photography (Spurr, 1960, pp. 367–371). Past studies have shown that the crown cover density and crown closure can be valuable predictors of density, basal area, and volume. However, the nature of these relationships can be complex in some forest types (Spurr, 1960, pp. 367–371 and pp. 386–388; Nilsson, 1996; Næsset, 1997). The percentage of tree cluster pixels that are core pixels (pct_core) was calculated using an 8-neighbor approach (Figure 2). The value of this metric can be interpreted in two ways depending on whether the tree clusters primarily correspond to single trees or groups of many trees. If the tree clusters primarily consist of single trees, this metric can be considered to be a surrogate for average crown width. The basis for this is that for a single tree, as the crown diameter increases, the percentage of core pixels also increases (Figure 3). If we assume that crown boundaries are perfectly circular and ignore the effects of pixelization, this relationship can be expressed mathematically as: 4b 4b2 p 100 # c 1 2 d d d

(1)

where p is the percentage of the area of the cluster that is a core area, b is the width of the border area of the cluster, and d is the crown diameter. Crown diameter is an important variable because it is correlated to stem diameter, volume, and biomass (Spurr, 1960, pp. 379–383; Popescu et al., 2003). If each cluster consists of multiple trees together, the percentage of tree cluster pixels that are core pixels would reflect the density of the stems. This is because if the stems PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

FL-03-05

11/11/06

7:46 AM

Page 1391

Implementation

Figure 3. If the tree clusters generally correspond to single trees, the percentage of the cluster pixels consisting of core pixels will increase as the crown width increases. This figure illustrates this concept using trees having a 4 m and an 8 m crown width. For simplicity, pixelization effects have been ignored.

are closer together (denser) one would expect fewer gaps in the canopy, thereby reducing the number of pixels that are on the edge of a tree cluster and increasing the value of this metric (Figure 4). The mean canopy height (avg_height) was selected because canopy height measurements either alone or in combination with other stand attributes are good predictors of volume and biomass (Spurr, 1960, pp. 385–386; Nelson et al., 1988; Popescu et al., 2003; Lim and Treitz, 2004). Additionally, height can provide information on site quality and tree age (Clutter et al., 1983, pp. 33–40), which both influence properties such as density, volume, and biomass. Finally, the standard deviation of the heights of trees (std_height) is a measure of vertical complexity. Vertical complexity in turn can provide information on stand age, species, and the degree and age of past disturbances (Ministry of Sustainable Resource Management, 2002). In small footprint lidar studies, it has been found to correlate with average stem diameter (Popescu et al., 2004) and volume (Takahashi et al., 2005).

In this implementation, two different thresholds (t1 and t2) are used to create two tree cluster grids (g1 and g2). These grids are then processed using two of the metrics described in the previous section in different combinations. Although it would be possible to manually set the thresholds used to find the tree clusters, this approach is not used because it is likely that the thresholds producing the best result vary from site to site, and that different metrics work better with different thresholds. Instead, an optimal set of thresholds is found based on training data consisting of ground plot measurements and the sections of the lidar CHM corresponding to the plots. To find the optimal set of parameters, a grid search is performed (Figure 5). The first step of the grid search is to find the thresholds at which 90% (tmin) and 10% (tmax) of the pixels in the training CHMs exceeded the threshold. Next, the set of all combinations of t1 and t2 for which (tmin t1 tmax) and (tmin t2 tmax) is created at a precision of 0.1 m. Each member of the set is then used to process the data and the results are evaluated. For each combination of t1 and t2, the thresholds are applied to the training CHMs to create tree cluster grids g1 and g2, and the values of two metrics (m1 and m2) are then calculated using these grids (Figure 5, Step 1). Metric m1 is calculated using g1, and metric m2 is calculated using g2. Multiple linear regression is then used to develop an equation relating the metric values to the forest parameter of interest (y) (Figure 5, Step 2): y b0 b1m1 b2m2

(2)

In order to avoid multicollinearity, the variance inflation factor (VIF) of Equation 2 is then calculated (Figure 5, Step 3). If the VIF is greater than 10, suggesting that multicollinearity is a problem (Neter et al., 1996; pp. 386–388), the combination of t1 and t2 is considered non-optimal and is not considered for further analysis (Figure 5, Step 4). If the VIF is less than 10, Equation 2 is used to predict the parameter of interest for all training plots, and a score (s) reflecting the accuracy of the predictions is calculated (Figure 5, Step 5) using the equation: N

S a (yng ynp)2 n

(3)

1

Figure 4. If the tree clusters generally correspond to a large number of trees, the percentage of the cluster pixels consisting of core pixels will increase as the density increases. This figure illustrates this concept for (a) hypothetical low (percent core 6%), and (b) high density stands (percent core 20%). In this diagram, core pixels are shown in black, and edge pixels are shown in gray.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

where N is the total number of training plots, yng is the ground measurement of the parameter of interest for the nth plot, and ynp is the predicted value of the parameter of interest for the nth plot. This process is repeated until all combinations of thresholds have been evaluated (Figure 5, Steps 6 and 7). Each time a new set of thresholds is assessed by the algorithm, a new equation for using the metrics to predict the parameter of interest (Equation 2) is created. This equation is then used to predict the parameter of interest for the training plots, and a new score is computed using Equation 3. This procedure is repeated until the entire set of threshold combinations has been examined. The set of t1 and t2 producing the lowest (i.e., best) score is considered to be the optimal pair of thresholds. After optimization, CHMs in which the parameter of interest is not known are processed. This processing uses the optimal pair of thresholds to calculate grids g1 and g2, and metrics m1 and m2 are derived (Figure 5, Step 8). The value of the parameter of interest is then calculated using the prediction equation associated with the thresholds having the lowest score (Figure 5, Steps 9 and 10). D e c e m b e r 2 0 0 6 1391

FL-03-05

11/11/06

7:46 AM

Page 1392

Figure 5. A flow chart illustrating the steps followed by the tree clusterbased algorithm described in the text. The numbers in parentheses refer to steps described in the text.

Study Site and Data To evaluate this technique, ground and lidar data were collected in a 4.6 km2 section of the Appomattox-Buckingham State Forest in central Virginia (centered at 37.4193°N, 78.6757°W). The forest is located in the Piedmont physiographic province, and contains both natural hardwood stands and pine plantations. The stands that were examined consist of loblolly pine (Pinus taeda) plantations ranging 1392 D e c e m b e r 2 0 0 6

from 11 to 16 years of age at the time the lidar data were collected. These stands were selected for two reasons. First, they correspond to the stands used by Bortolot and Wynne (2005) to test their individual tree-based algorithm, thereby facilitating a comparison of the results obtained using tree clusters and individual trees as objects. Second, loblolly pine plantations are of high commercial interest, and the age range that was examined is likely to be representative of PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

FL-03-05

11/11/06

7:46 AM

Page 1393

trees grown both for pulp and saw timber. In addition to loblolly pine, some of the selected stands contain large numbers of subdominant Virginia pine (Pinus virginiana) volunteers (i.e., trees that regenerated naturally from seed). Within the loblolly pine stands, 25 plots were established in 2002 to 2003, each having a fixed 15 m radius. To determine the plot locations, GPS data were collected at the center of each plot using a Corvallis Microtechnology (CMT) March II GPS unit (Corvallis Microtechnology, Inc., 2005). These data were then post-processed using the PC-GPS software package (Corvallis Microtechnology, Inc., 2005) in conjunction with data from the closest National Geodetic Survey Continually Operating Reference Station (CORS) for which data were available. According to the manufacturer, post-processed data have a horizontal accuracy of 1.5 m to 2.5 m (Corvallis Microtechnology, Inc., 2005), although this error is likely to be larger under a tree canopy due to multipath errors. Although 15 m radius plots are considerably larger than the inventory plots that are typically used in forest inventory, large plots were considered desirable in order to reduce edge effects and problems of misregistration between the GPSderived ground plot locations and the lidar data. Within each plot, the total number of trees with diameter at breast height (DBH) 7 cm was determined. This cutoff was selected because it was observed that the majority of the trees with DBH 7 cm are Virginia pine volunteers. These trees are of little commercial interest and are not overstory trees so they are unlikely to be detectable in the lidar CHM. The DBHs for a subset of these trees were then measured along with information on whether the trees were dominant or subdominant. This information was used to estimate the total number of trees with DBH 10 cm and the plot biomass. The number of trees in each plot with DBH 10 cm was estimated by calculating the fraction of counted trees with DBH 10 cm using the subset data, and then multiplying this value by the total number of trees counted

in the plot. Plot biomass was estimated by predicting the biomass of each tree in the subset using the biomass prediction equations developed by Naidu et al. (1998), calculating the average biomass per tree, and then multiplying this value by the total number of trees in the plot. The biomass equations developed by Naidu et al. (1998) were selected because they were derived for loblolly pine in the Piedmont physiographic province, included the age range used in this study, and had a high coefficient of determination between actual and predicted biomass (0.99 for dominant trees, 0.98 for suppressed trees). Ground data for the 25 plots are given in Table 1. The small footprint lidar data used to test the tree cluster-based algorithm were collected with the Digital Airborne Topographic Imaging System II (DATIS II) operated by Spectrum Mapping, LLC. (Easton, Maryland). These data were flown in 2002, and have a 1 m average posting interval and up to five returns per pulse. The delivered products included a dataset consisting of returns Spectrum Mapping, LLC had determined to be ground returns. Small CHMs centered at the plot locations were produced by subtracting a grid produced from the ground returns from a grid produced from the first returns. Each CHM has a spatial resolution of 0.5 m, and is 42 m 42 m in size in order to provide a buffer around the ground plots. The ground and first return grids were produced using the linear kriging interpolation procedure in Surfer (Golden Software, 1999), and were created independently of one another (i.e., they were not clipped from a larger grid). Testing Procedure To test the algorithm, all combinations of metrics were used to predict plot density and biomass for trees with DBH 7 cm and for trees with DBH 10 cm. For each calculation, the thresholds were applied to the entire 42 m 42 m CHM, but the metrics were calculated using only the areas of the CHMs

TABLE 1. ATTRIBUTES OF THE STANDS USED IN ALGORITHM TESTING. THE SUBSET ROLE REFERS TO WHETHER THE PLOT WAS USED FOR ALGORITHM TRAINING (TRAIN) OR TESTING (TEST) IN RUNS WHERE ONLY A SUBSET OF THE DATA WERE USED FOR TRAINING AND TESTING. 95 PERCENT CONFIDENCE INTERVALS HAVE BEEN GIVEN FOR THE BIOMASS ESTIMATES

Plot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Mean

Subset Role

Age (2003)

Density, DBH 7 cm (trees/ha)

Train Test Train Test Test Train Test Test Test Test Train Test Test Test Test Train Test Test Train Test Train Train Test Test Test

13 15 16 16 11 13 13 13 13 13 11 13 13 11 16 16 17 12 15 15 13 17 17 11 11 14

1457 1302 1684 1358 1273 1740 1471 2037 1500 2009 1952 1924 1924 1401 2264 1839 1712 1740 1924 1952 1401 1457 1485 2179 1556 1698


Estimated Density, 10 cm (trees/ha)

DBH

1132 1146 1528 1259 1188 1302 1401 1882 1118 1966 1839 1896 1853 1302 1627 1415 1542 1330 1768 1825 1330 1316 1415 1853 1302 1500

Aboveground Biomass, DBH 7 cm (t/ha)

Aboveground Biomass, DBH 10 cm (t/ha)

69.7 9.2 85.2 7.6 137.9 15.1 77.4 7.5 76.0 5.4 81.9 8.5 88.4 6.7 95.8 11.5 67.1 5.5 103.4 8.4 94.4 10.3 87.6 5.2 105.6 7.6 87.9 5.8 113.7 16.0 125.3 14.9 115.3 10.8 75.4 8.9 117.0 10.9 117.0 11.3 97.2 8.2 114.5 9.9 68.3 5.0 118.3 12.2 76.5 6.1 95.9

64.8 7.5 82.9 6.5 136.0 12.3 77.7 5.4 74.4 4.8 75.1 6.9 87.3 6.1 92.9 9.8 61.1 4.0 103.0 7.9 92.1 9.8 87.1 5.0 105.4 6.9 86.4 5.0 104.8 11.9 118.7 10.9 112.6 9.3 70.2 5.8 114.7 9.6 115.2 10.3 96.2 7.6 112.5 8.3 67.2 4.7 113.5 10.0 73.4 4.4 92.9

D e c e m b e r 2 0 0 6 1393

FL-03-05

11/11/06

7:46 AM

Page 1394

corresponding to the 15 m radius ground plots. Algorithm evaluation was performed in two ways: using all plots for both training and testing, and using eight randomly selected plots for training and the remaining 17 plots for testing (see Table 1).

Results and Discussion The results of the test are shown in Tables 2 through 5, and allow two of the objectives of this research to be addressed. The first objective that can be addressed is whether the algorithm developed in this paper can be used to predict forest density and biomass accurately, and to compare the results to those obtained using other small footprint lidar processing techniques. Based on the results when all plots were used for training and testing, the density of trees with DBH 10 cm can be predicted accurately, since the lowest RMSE corresponds to 8.4 percent of the mean density, and the highest coefficient of determination is 0.80. The density of trees with DBH 7 cm also has a low RMSE for the best (i.e., producing the most accurate predictions of the parameter of interest) set of metrics (10.7 percent of the mean), but the highest coefficient of determination is much lower (0.60). This coefficient of determination may be inadequate for some applications. The biomass predictions for both diameter cutoffs have a low RMSE (13.1 percent of the mean for a diameter cutoff of 7 cm, and 13.3 percent of the mean for a diameter cutoff of 10 cm using the best metric pairs) but have coefficients of determination that may be too low for some applications (a maximum of 0.60 and 0.62 for diameter cutoffs of 7 cm and 10 cm, respectively). These biomass statistics should be treated cautiously because of the uncertainty associated with the biomass estimates made on the ground (see Table 1). It is important to note that there is considerable variability in the accuracies of the predictions made with different metric combinations. However, in an applied setting the combination resulting in the lowest RMSE

TABLE 2. THE DENSITY RESULTS OBTAINED USING THE TREE CLUSTER-BASED ALGORITHM WHEN ALL PLOTS WERE USED FOR BOTH TRAINING AND TESTING. m1 AND m2 REFER TO THE METRIC NAMES DESCRIBED IN THE TEXT. THE DIAMETER LIMIT INDICATES WHETHER THE TREES WITH DBH 7 CM OR TREES WITH DBH 10 CM WERE EXCLUDED, AND THE RMSE IS IN TREES/HA. THE NUMBER IN PARENTHESIS FOLLOWING THE RMSE GIVES THE PERCENTAGE OF THE MEAN DENSITY THE RMSE REPRESENTS

m1 pct_thresh pct_core avg_height std_height pct_thresh pct_thresh pct_thresh pct_core pct_core std_height pct_thresh pct_core avg_height std_height pct_thresh pct_thresh pct_thresh pct_core pct_core std_height

m2 pct_thresh pct_core avg_height std_height pct_core avg_height std_height avg_height std_height avg_height pct_thresh pct_core avg_height std_height pct_core avg_height std_height avg_height std_height avg_height

1394 D e c e m b e r 2 0 0 6

Diameter Limit 7 7 7 7 7 7 7 7 7 7 10 10 10 10 10 10 10 10 10 10

cm cm cm cm cm cm cm cm cm cm cm cm cm cm cm cm cm cm cm cm

RMSE

195 196 209 219 189 191 210 182 204 223 161 178 155 207 152 148 187 126 184 215

(11.5%) (11.5%) (12.3%) (12.9%) (11.1%) (11.2%) (12.4%) (10.7%) (12.0%) (13.1%) (10.7%) (11.9%) (10.3%) (13.8%) (10.1%) (9.9%) (12.5%) (8.4%) (12.3%) (14.3%)

R2 0.54 0.53 0.47 0.42 0.57 0.56 0.46 0.60 0.49 0.39 0.67 0.60 0.69 0.45 0.71 0.72 0.55 0.80 0.57 0.41

TABLE 3. THE DENSITY RESULTS OBTAINED USING THE TREE CLUSTER-BASED ALGORITHM WHEN A SUBSET OF THE PLOTS WERE USED FOR TRAINING AND TESTING (SEE TABLE 1). m1 AND m2 REFER TO THE METRIC NAMES DESCRIBED IN THE TEXT. THE DIAMETER LIMIT INDICATES WHETHER THE TREES WITH DBH 7 CM OR TREES WITH DBH 10 CM WERE EXCLUDED, AND THE RMSE IS IN TREES/HA. THE NUMBER IN PARENTHESIS FOLLOWING THE RMSE GIVES THE PERCENTAGE OF THE MEAN DENSITY THE RMSE REPRESENTS. BOTH THE RMSE AND R2 WERE CALCULATED BASED ON THE TESTING DATA ONLY


Diameter Limit


7 7 7 7 7 7 7 7 7 7 10 10 10 10 10 10 10 10 10 10


RMSE

305 346 7129 445 1540 264 498 356 671 1900 319 1618 8173 397 1852 230 362 272 341 1074

(17.8%) (20.2%) (416.7%) (26.0%) (90.0%) (15.4%) (29.1%) (20.8%) (39.2%) (111.0%) (20.9%) (106.2%) (536.4%) (26.1%) (121.5%) (15.1%) (23.8%) (17.9%) (22.4%) (70.5%)

R2 0.46 0.43 0.28 0.00 0.45 0.43 0.25 0.50 0.32 0.33 0.59 0.39 0.17 0.01 0.40 0.68 0.10 0.66 0.15 0.13

TABLE 4. THE BIOMASS RESULTS OBTAINED USING THE TREE CLUSTER-BASED ALGORITHM WHEN ALL PLOTS WERE USED FOR BOTH TRAINING AND TESTING. m1 AND m2 REFER TO THE METRIC NAMES DESCRIBED IN THE TEXT. THE DIAMETER LIMIT INDICATES WHETHER THE TREES WITH DBH 7 CM OR TREES WITH DBH 10 CM WERE EXCLUDED, AND THE RMSE IS IN T/HA. THE NUMBER IN PARENTHESIS FOLLOWING THE RMSE GIVES THE PERCENTAGE OF THE MEAN BIOMASS THE RMSE REPRESENTS



Diameter Limit 7 7 7 7 7 7 7 7 7 7 10 10 10 10 10 10 10 10 10 10


RMSE

13.6 13.4 13.5 17.1 13.4 13.5 12.6 13.7 13.3 13.0 13.7 13.2 13.4 17.4 13.1 13.2 12.4 13.3 13.1 13.0

(14.2%) (14.0%) (14.1%) (17.8%) (14.0%) (14.1%) (13.1%) (14.3%) (13.9%) (13.6%) (14.7%) (14.2%) (14.4%) (18.7%) (14.1%) (14.2%) (13.3%) (14.3%) (14.1%) (14.0%)

R2 0.53 0.55 0.54 0.27 0.55 0.54 0.60 0.53 0.56 0.57 0.53 0.56 0.55 0.24 0.57 0.56 0.62 0.55 0.57 0.58

and highest coefficients of determination would likely be used to the exclusion of all others. Comparisons with other small footprint lidar studies are difficult due differences in the study sites used, the forest parameters that were predicted, and the use of accuracy assessment techniques that cannot be applied to the present PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

FL-03-05

11/11/06

7:46 AM

Page 1395

TABLE 5. THE BIOMASS RESULTS OBTAINED USING THE TREE CLUSTER-BASED ALGORITHM WHEN A SUBSET OF THE PLOTS WERE USED FOR TRAINING AND TESTING (SEE TABLE 1). m1 AND m2 REFER TO THE METRIC NAMES DESCRIBED IN THE TEXT. THE DIAMETER LIMIT INDICATES WHETHER THE TREES WITH DBH 7 CM OR TREES WITH DBH 10 CM WERE EXCLUDED, AND THE RMSE IS IN T/HA. THE NUMBER IN PARENTHESIS FOLLOWING THE RMSE GIVES THE PERCENTAGE OF THE MEAN BIOMASS THE RMSE REPRESENTS. BOTH THE RMSE AND R2 WERE CALCULATED BASED ON THE TESTING DATA ONLY


Diameter Limit


7 7 7 7 7 7 7 7 7 7 10 10 10 10 10 10 10 10 10 10


RMSE

35.4 104.8 202.4 47.5 62.6 28.9 29.5 319.3 34.5 226.2 36.1 112.3 214.8 47.5 142.3 28.9 32.7 316.5 36.7 21.0

(38.6%) (114.3%) (220.7%) (51.8%) (68.3%) (31.5%) (32.2%) (348.2%) (37.6%) (235.9%) (40.5%) (126.0%) (241.1%) (53.3%) (159.7%) (32.4%) (36.7%) (355.2%) (41.2%) (23.6%)

R2 0.51 0.27 0.06 0.17 0.32 0.54 0.44 0.11 0.37 0.09 0.52 0.24 0.04 0.13 0.26 0.57 0.43 0.09 0.35 0.33

study. Small footprint lidar studies that predicted density and/or biomass and used comparable accuracy assessment techniques are listed in Table 6. For density, only two studies can be directly compared with the present study. The most comparable is the study by Bortolot and Wynne (2005) that used the same lidar and ground data. In all cases the results found in this study were better (i.e., lower RMSE, higher coefficient of determination) than the best results found by Bortolot and Wynne (2005). The other study that examined density is Popescu et al. (2004), in which the authors found a coefficient of determination between the predicted and actual density in a mixed pine/hardwood forest that was lower than the results obtained

TABLE 6.

Study

by all but one of the metric combinations assessed in this paper. It is possible to compare the biomass results obtained from the algorithm presented in this paper to results obtained by a number of other researchers (Table 6). The RMSE between actual and predicted biomass is lower for the present study than for all past studies listed in Table 6 that used comparable units. This includes both individual treebased studies (Popescu et al., 2004; Bortolot and Wynne, 2005) and studies that are based on the statistical properties of pulses in a sample of the lidar data (Nelson et al., 1988; Lim and Treitz, 2004). However, the coefficients of determination between actual and predicted biomass is lower than that obtained by three out of the five researchers. This may be due to the lack of biomass variability in the dataset used in this paper or the uncertainty associated with the groundbased biomass estimates due to the data collection method that was used. Based on tests using the same field and lidar data, the cluster-based algorithm described in this paper outperforms the individual tree-based algorithm developed by Bortolot and Wynne (2005). However, it should be noted that the lidar data have a 1 m posting interval that is lower than the interval used by some researchers for individual tree-based algorithms (e.g., Hyyppä and Inkinen, 1999). Therefore, the cluster-based approach should not be considered to be superior to the individual tree-based approach until further testing using data with a higher point density and multiple algorithms has been conducted. In all cases, training and testing using independent subsets of the data (Tables 3 and 5) produced poorer results than training and testing using all plots. A number of factors are likely to have contributed to this result, including the greater impact of measurement errors when fewer plots are used, not being exposed to the full range of site conditions during training, and overfitting. The second objective that can be addressed by the results is to evaluate which combinations of cluster-based metrics work best for predicting density and biomass. As seen in Tables 2 through 5, there are considerable differences in the accuracies with which metric combinations can be used to predict density and biomass. Based on the complete training and testing dataset, the parameter combination yielding the highest coefficients of determination and lowest RMSE values between the ground- and lidar-derived plot density measurements was pct_core, avg_height. The

RESULTS OBTAINED IN OTHER SMALL FOOTPRINT LIDAR STUDIES. IN CASES WHERE THE RESEARCHERS OBTAINED MULTIPLE RESULTS, THE RESULTS GIVING THE HIGHEST R2 AND LOWEST RMSE VALUES WERE SELECTED Individual Tree-based Algorithm?

Predicted Variable

Study Site

R2

RMSE

Bortolot and Wynne (2005)

Yes

Density

Same as current study

0.13 (DBH 7 cm), 0.31 (DBH 10 cm)

Popescu et al. (2004)

Yes

Density

0.26

Bortolot and Wynne (2005)

Yes

Biomass

Pines and hardwoods Same as current study

0.53 (DBH 7 cm), 0.52 (DBH 10 cm)

Lim and Treitz (2004) Lim et al. (2003)

No

Biomass

Hardwood forest

0.90

13.6 t/ha (DBH 7 cm), 13.8 t/ha (DBH 10 cm) 50 t/ha

No

Hardwood forest

0.85

0.46 ln(t/ha)

Nelson et al. (1988) Popescu et al. (2004)

No Yes

Natural log of biomass Biomass Biomass


Pines and hardwoods Pines and hardwoods

0.55 0.82 (pines), 0.33 (hardwoods)

313 trees/ha (DBH 7 cm), 242 t/ha (DBH 10 cm) Not reported

67 t/ha 29 t/ha (pines), 44 t/ha (hardwoods)

D e c e m b e r 2 0 0 6 1395

FL-03-05

11/11/06

7:46 AM

Page 1396

combination of pct_thresh and std_height gave the highest correlations between ground- and lidar-derived plot biomass. As noted in the algorithm description, there are physiological bases for these parameters being useful for predicting density and biomass. However, further research will need to be conducted to discover why these metric combinations outperformed other combinations. When training and tested were conducted using independent subsets of the data, these two combinations of metrics performed well based on the coefficients of determination and RMSE values between the field measured and lidar-derived estimates of density and biomass. However, in the cases of the density of stands based on a 10 cm diameter limit, and biomass based on both diameter limits, these combinations of metrics did not yield the highest coefficients of determination and lowest RMSE. This may indicate that other combinations of metrics are more robust, but further testing is needed to determine if this is the case.

Conclusions Object-oriented approaches to processing small footprint lidar data have generally used individual trees as the object of interest. Although individual tree-based algorithms have given excellent results for volume, biomass, crown width, and height, they have often performed less well for density. This paper describes and tests an algorithm that is based on an alternative object, the tree cluster. Results using the tree cluster approach in a non-intensively managed loblolly pine plantation showed that high prediction accuracies for density can be achieved using a diameter limit of 10 cm, but that lower accuracies resulted when a diameter limit of 7 cm was used. When biomass was predicted, a low RMSE existed between the actual and predicted biomass values, but the coefficient of determination was lower than that found in several of the studies the algorithm was compared to. This may be due to the ground data collection technique used in this study or the lack of biomass variability rather than poor algorithm performance. Multiple combinations of cluster-based metrics were evaluated. It was determined that the percentage of cluster pixels that are core pixels (pct_core) and the mean canopy height of the cluster pixels (avg_height) gave the most accurate predictions of density, and the percentage of pixels in the sampled area that exceeded the threshold (pct_thresh) and the standard deviation of the canopy height of the cluster pixels (std_height) gave the most accurate predictions of biomass. The results obtained using the tree cluster approach can be directly compared to results obtained by Bortolot and Wynne (2005) using an individual tree-based algorithm, since the same data were used in both studies. The comparison shows that in this case, the tree cluster approach performs better than the individual tree-based approach. However, the data used in these studies had a 1 m point spacing, which is lower than the point spacing used by some individual tree-based studies and therefore may unfairly bias the results against the individual tree-based approach. More testing is needed using multiple study sites, lidar datasets with a range of point spacing, and additional individual tree-based algorithms in order to fully assess the relative merits of both approaches. It would also be useful to directly compare the tree cluster-based algorithm to algorithms that use statistical area-based techniques. Based on future testing, it may be possible to establish whether certain approaches are able to yield more accurate predictions of forest parameters using data with different point spacing (e.g., statistical area-based techniques may be best for low point density data, cluster based techniques may be 1396 D e c e m b e r 2 0 0 6

best for intermediate point density data, and individual treebased approaches may be best for high point density data). Other areas of future research could include developing a better understanding of why some combinations of metrics perform better than others when predicting density and biomass, incorporating metrics that are based on intermediate returns, testing the algorithm on natural pine and hardwood stands, and performing a test on entire forest stands rather than plots within stands.

Acknowledgments and Note The author would like to express his deep gratitude to Dr. Randolph H. Wynne for providing the lidar data used in this analysis, to Dr. Jan van Aardt, Gleb Tcheslavski, Jeffrey Bardwell, and Troy Wasky for their assistance in collecting the field data, to Dr. John Paul McTague, Mark Milligan, the two anonymous reviewers for their comments and suggestions on the manuscript, and to the Institute for Regional Analysis and Public Policy. Portions of the algorithm described in this paper are patent pending.

References Blaschke, T., S. Lang, E. Lorup, J. Strobl, and P. Zeil, 2000. Environmental Information for Planning, Politics and the Public, Volume 2 (A. Cremers and K. Greve, editors), Metropolis Verlag, Marburg, Germany, pp. 555–570. Bortolot, Z.J., and R.H. Wynne, 2005. Estimating forest biomass using small footprint LiDAR data: An individual tree-based approach that incorporates training data, ISPRS Journal of Photogrammetry and Remote Sensing, 59(6):342–360. Clutter, J.L., J.C. Fortson, L.V. Pienaar, G.H. Brister, and R.L. Bailey, 1983. Timber Management: A Quantitative Approach, John Wiley and Sons, New York, New York, 333 p. Coops, N.C., M.A. Wulder, D.S. Culvenor, and B. St.-Onge, 2004. Comparison of forest attributes extracted from fine spatial resolution multispectral and lidar data, Canadian Journal of Remote Sensing, 30(6):855–866. Corvallis Microtechnology, Inc., 2005. March-II-E, URL: http:// www. cmtinc.com/fieldcmp/march.html, Corvallis Microtechnology, Inc., Corvallis, Oregon (last date accessed: 27 August 2006). Diedershagen, O., B. Koch, H. Weinacker, and C. Schütt, 2003. Combining LiDAR and GIS data for the extraction of forest inventory parameters, Proceedings of ScandLaser 2003, 02–04 September, Umeå, Sweden, pp. 157–165. Dorren, L.K.A., B. Maier, and A.C. Seijomsbergen, 2003. Improved Landsat-based forest mapping in steep mountainous terrain using object-based classification, Forest Ecology and Management, 183(1):31–46. Fosgate, C.H., H. Krim, W.W. Irving, W.C. Karl, and A.S. Willsky, 1997. Multiscale segmentation and anomaly enhancement of SAR imagery, IEEE Transactions on Image Processing, 6(1): 7–20. Geneletti, D., and B.G.H. Gorte, 2003. A method for object-oriented land cover classification combining Landsat TM data and aerial photographs, International Journal of Remote Sensing, 24(6): 1273–1286. Golden Software, 1999. Surfer 7.0 User’s Guide, Golden Software, Inc., Golden, Colorado, 619 p. Gougeon, F.A., 1995. A crown-following approach to the delineation of individual tree crowns in high spatial resolution aerial images, Canadian Journal of Remote Sensing, 21(3): 274–284. Grover, K., S. Quegan, and C. da C. Freitas, 1999. Quantitative estimation of tropical forest cover by SAR, IEEE Transactions on Geoscience and Remote Sensing, 37(1):479–490. Hill, R.A., 1999. Image segmentation for humid tropical forest classification in Landsat TM data, International Journal of Remote Sensing, 20(5):1039–1044. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

FL-03-05

11/11/06

7:46 AM

Page 1397

Hyvönen, P, A. Pekkarinen, and S. Tuominen, 2005. Segment-level stand inventory for forest management, Scandinavian Journal of Forest Research, 20(1):75–84. Hyyppä, J., and M. Inkinen, 1999. Detecting and estimating attributes for single trees using laser scanner, Photogrammetric Journal of Finland, 16(2):27–42. Jensen, J.R., 2005. Introductory Digital Image Processing: A Remote Sensing Perspective, Third Edition, Pearson Prentice Hall, Upper Saddle River, New Jersey, 526 p. Lim, K.S., and P.M. Treitz, 2004. Estimation of above ground forest biomass from airborne discrete return laser scanner data using canopy-based quantile estimators, Scandinavian Journal of Forest Research, 19(6):558–570. Lim, K., P. Treitz,, K. Baldwin, I. Morrison, and J. Green, 2003. Lidar remote sensing of biophysical properties of tolerant northern hardwood forests, Canadian Journal of Remote Sensing, 29(5):658–678. McCombs, J.W., S.D. Roberts, and D.L. Evans, 2003. Influence of fusing Lidar and multispectral imagery on remotely sensed estimates of stand density and mean tree height in a managed loblolly pine plantation, Forest Science, 49(3):457–466. Ministry of Sustainable Resource Management, 2002. Vegetation Resources Inventory: Photo Interpretation Procedures, Version 2.4, Resources Inventory Committee, Victoria, British Columbia, Canada, 121 p. Næsset, E., 1997. Estimating timber volume of forest stands using airborne laser scanner data, Remote Sensing of Environment, 61(2):46–253. Naidu, J.A., E.H. DeLucia, and R.B. Thomas, 1998. Contrasting patterns of biomass allocation in dominant and suppressed loblolly pine, Canadian Journal of Forest Research, 28(8): 1116–1124. Nelson, R., W. Krabill, and J. Tonelli, 1988. Estimating forest biomass and volume using airborne laser data, Remote Sensing of Environment, 24(2):247–267. Neter, J., M.H. Kutner, C.J. Nachtsheim, and W. Wasserman, 1996. Applied Linear Statistical Models, Fourth Edition, Irwin, Chicago, Illinois, 1408 p.


Nilsson, M., 1996. Estimation of tree heights and stand volume using an airborne lidar system, Remote Sensing of Environment, 56(1):1–7. Pekkarinen, A., 2002. A method for the segmentation of very high spatial resolution images of forested landscapes, International Journal of Remote Sensing, 23(14):2817–2836. Persson, A., J. Holmgren, and U. Söderman, 2002. Detecting and measuring individual trees using an airborne laser scanner, Photogrammetric Engineering & Remote Sensing, 68(9):925–932. Popescu, S.C., R.H. Wynne, and R.F. Nelson, 2002. Estimating plot level tree heights with LiDAR: Local filtering with a canopy height based variable window size, Computers and Electronics in Agriculture, 37(1–3):71–95. Popescu, S.C., R.H. Wynne, and R.F. Nelson, 2003. Measuring individual tree crown diameter with LiDAR and assessing its influence on estimating forest volume and biomass, Canadian Journal of Remote Sensing, 29(5):564–577. Popescu, S.C., R.H. Wynne, and J.A. Scrivani, 2004. Fusion of smallfootprint LiDAR and multispectral data to estimate plotlevel volume and biomass in deciduous and pine forests in Virginia, USA, Forest Science, 50(4):551–565. Spurr, S.H., 1960. Photogrammetry and Photo-interpretation, Second Edition, Ronald Press, New York, New York, 472 p. Takahashi, T., K. Yamamoto, Y. Senda, and M. Tsuzuku, 2005. Predicting individual stem volumes of sugi (Cryptomeria japonica D. Don) plantations in mountainous areas using smallfootprint airborne LiDAR, Journal of Forest Research, 10(4): 305–312. van Aardt, J.A.N., and R.H. Wynne, 2004. A multi-resolution approach to forest segmentation as a precursor to estimation of volume and biomass by species, Proceedings of the ASPRS 2004 Annual Convention, 23–28 May, Denver, Colorado, American Society for Photogrammetry and Remote Sensing, Bethesda, Maryland, unpaginated CD-ROM. Wulder, M., K.O. Niemann, and D.G. Goodenough, 2000. Local maximum filtering for the extraction of tree locations and basal area from high spatial resolution imagery, Remote Sensing of Environment, 73(1):103–114.

D e c e m b e r 2 0 0 6 1397