F A Comparison of Statistical Methods for Estimating Forest Biomass from Light Detection and Ranging Data • Yuzhen Li, Hans-Erik Andersen, and Robert McGaughey Strong regression relationships between light detection and ranging (LIDAR) metrics and indices of forest structure have been reported in the literature. However, most papers focus on empirical results and do not consider LIDAR metric selection and biological interpretation explicitly. In this study, three different variable selection methods (stepwise regression, principle component analysis [PCAI, and Bayesian modeling averaging [BMA]) were compared using hOAR data from three study sites: Capitol Forest in western Washington State, Mission (reek in central Washington State, and Kenai Peninsula in south central Alaska. Separate aboveground biomass regression models were developed for each site as well as common models using three study sites simultaneously. Final biomass models have R2 values ranging from 0.67 to 0.88 for three study sites. PCA indicates that three LIDAR metrics (mean height, coefficient variation of height, and canopy LIDAR point density) explain the majority of variation contained within a larger set of metrics. Within each study area, forest biomass models using these three predictor variables had similar R2 values as the stepwise and BMA regression models. Individual site models using these three variables are recommended because these models are straightforward in terms of model form and biological interpretation and are easily adopted for application.
Keywords: forest biomass, LIDAR, variable selection, principle component analysis, Bayesian modeling averaging irborne laser scanning (or light detection and ranging [LIDAR]) sensors collect high precision estimates of the loationn of points on the earth surface by combining information on their range from the sensor (provided by the time of flight of the emitted and reflected laser pulse) with sensor location and orientation provided b y a global positioning system (GPS) and inertial measurement unit, respectively. A swath beneath the sensor platform is continually scanned and a high point density is provided by the repetition rate of the laser, generally between 50 and 150 kHz in small-footprint discrete-return LIDAR (Moffiet et al. 2005). By studying the three-dimensional (3D) spatial distribution of LIDAR returns, information about objects within a scene that have interacted with the laser pulses can be retrieved. In the case of forest stands, some laser pulses penetrate partly into and possibly through the canopy and produce several separately recorded reflections along the way, e.g., from the canopy, understory, and ground vegetation, which provide opportunities to analyze forest canopy structure (Ackermann 1999). Canopy structure characteristics, such as canopy height profile and canopy LIDAR point density distribution, have been derived successfully from LIDAR data and used to estimate forest stand characteristics, such as basal area, stand density, stand volume, aboveground biomass, and canopy fuel parameters (Lcfsky et al. 1999, 2002; Nisset 2002, Drake et al. 2003, Holmgren 2004, Lim and Treitz 2004, Maltamo et al. 2004, Messer et al. 2004, Andersen et al. 2005). Given the anticipated decline in the cost I l.IDAR data collection in the near future, it is expected that T.IDAR data will be an increasingly useful tool in forest inven-
tory. In a few years, the use of LIDAR data may he as commonplace as the use of aerial photos and topographic maps today. However, most published LIDAR studies focus on developing empirical regression relationships between LIDAR metrics and forest structure field measures and do not consider I,IDAR metric selection and biological interpretation explicitly. In addition, most l.TDAR-based models were developed within a relatively small study area; little work has been done to assess the generality of these models across different forest types and regions. For IJDAR data to be useful as an operational tool in forest management, these questions have to he addressed.
LIDAR Metrics Selection Forest canopy' is the photosynthetic powerhouse of forest productivity and it is closely related to what is commonly referred to as stand structure defined as the size and number ofwoody stems per unit area and related statistics (Oliver and Larson 1996). The close connection between canopy structure and woody stems provides the biological basis for the strong regression relationship between LIDAR-derived (canopy based) structural metrics and field surements of stand structure (woody stem based). However, because of the complex 3D structure (position and orientation) of forest canopy components and the variation in reflectivity between leaves, branches, and twigs within tree crowns, interactions between canopy and laser pulses are very complex. A few existing studies have attempted to describe the laser photon interaction with forest canopy using SLICER large-footprint waveform data (Ni-Meister et al. 2001), but physical models using small-footprint discrete-return
Received June 21, 2007 accepted January 8, 2008. Yuzhen Li (yzhliii. u)asbingcon.edu), college a/Forest Resources, University of Washington, Box 352/00, .Seattle, WA 98195. Hans-Ersk Andersen (
[email protected] ), US Forest .cervice. Pacific Northwest Research Station, Anchorage, Alaska. Robert McGanghey (
[email protected] 'e1. us), US Forest Service, Pacific Northu 'est Research Station, Seattle, Wisi'ington. Copyright © 2008 b y the Society of American Foresters. Wrsr J . Ai'i' FoR. 23(4) 2008 223
LIDAR data still are not available, although with increases in pulse rate and data density this might become possible in the future. The most popular procedure described in the literature is to apply multiple linear regression techniques to link LIDAR canopy structure metrics with coincident forest stand field measurements. The large number and complex spatial arrangement of LIDAR returns over forest canopies can result in a large set of potential predictor variables for regression analysis. As an example, a total of 46, 44, and 39 LIDAR metrics were used in the regression models in Nsset (2002), Nsset (2004), and Hall et al. (2005), respectively. LIDAR data are point 3D location estimates; thus, most LIDAR metrics are related to canopy height and often are highly correlated. Regression models with highly correlated independent variables are not stable from the statistical perspective and are hard to interpret from the biological perspective. Model parsimony—minimizing the number of LIDAR metrics and avoiding redundant information—needs to be seriously considered in model building. Nsset et al. (2005) reduced the number of LIDAR variables from 34 original LIDAR metrics to 7, 5, and 3 noncorrelated principle components for the young forest, mature forest on poor sites, and mature forest on good sites, respectively. Although this method ensured that there was no correlation between predictor variables, it is difficult to interpret the models because principle components themselves are a linear combination of the original LIDAR metrics and they do not have a clear physical meaning. Hudak et al. (2006) applied best-subset regression on a suite of 26 predictor variables derived from LIDAR, Advanced Land Imager multispectral and panchromatic data, and geographic (X, Y, and 4 location and identified small sets of variables for predicting tree basal area and tree density. Best-subset regression uses the branch-and-bound algorithm to find a specified number of best models containing a specified number of predictor variables. The problem with the best-subset regression is the number of predictors has to be defined in advance, so the best model is for a given number of predictor variables instead of for all possible models. Hall et al. (2005) selected LIDAR predictor variables from a pool of 39 LIDAR metrics based on a mechanistic hypotheses of why these metrics should be good predictors for each stand structural variable considered. The problem is that the relationships between LIDAR canopy measurement and field stand structure are very complex and it is difficult to validate their mechanistic hypotheses. Lefsky et al. (2005a) explored LIDAR metrics selection using large-footprint SLICER data in western Oregon and Washington states. The correlations between LIDAR canopy structure and field stand structure indices were analyzed using canonical correlation analysis, and mean height, cover (or leaf area index), and height variability were found to represent the fundamental data structure, contain the majority of data variability, and associate with physical characteristics. This method provided a way to place both LIDAR canopy metrics and field stand indices within the overall covariance structure and can be used as a guide for model selection. Since the description of LIDAR canopy structure developed in their study was designed specifically for large-footprint SLICER waveform data, it is not clear how this method can be adapted to small-footprint LIDAR point data.
Generality of LIDAR-Based Forest Structure Prediction Models Many site-specific empirical relationships have been developed across a variety of forest /ypes in both Europe and North America, but published models are very different in terms of model precision,
224 WEST. J . API'L. FOR. 23(4) 2008
model form, and the LIDAR predictor variables included. As highresolution LIDAR data become increasingly available, there is a great need for simple, accurate, and physically meaningful prediction models that can be used or easily adapted to different regions and sensor systems. Nsset et al. (2005) studied the effect of inventory site on estimating mean tree height, dominant height, mean diameter, stem number, basal area, and timber volume. Separate regression models were developed for each inventory area as well as common models using two inventory areas simultaneously. He concluded that the coefficients of LIDAR-based models do not differ significantly across two tested inventory sites except for the mean height. Lefsky et al. (2002) found that a single regression model based on mean canopy height and mean canopy cover derived from large-footprint waveform SLICER data was sufficient to model aboveground biomass across three biomes: temperate deciduous, temperate coniferous, and boreal coniferous. Lefsky et al. (2005b) compared the relationship between LIDAR-measured canopy structure and coincident field measurements of forest stand structure using data from five locations in the Pacific Northwest of the United States with contrasting composition. Of the 17 stand structure variables considered, they reported eight equations that were valid for all sites, including aboveground biomass and leaf area index. Instead of dividing data into training and testing samples, data from all study sites were used to develop prediction models and the predicted values from the overall regression model were compared with the observed values for each site to check the generality of the model, so the root mean squared error (RMSE) values reported in their article were not truly RMSE, but residual standard deviation. It is highly possible that RMSE values were underestimated and the generality of the overall model was overestimated. In contrast, Drake et al. (2003) reported that the relationship between LIDAR metrics and aboveground biomass was significantly different between two study areas using Laser Vegetation Imaging Sensor data. Besides different LIDAR systems applied, reasons for these inconsistent results are not clear, and additional work is needed to investigate the generality of LIDAR-based prediction models. This study tested three different variable selection methods (stepwise regression, principle component analysis [PCA], and Bayesian modeling averaging [BMA]) to develop LIDAR-based aboveground biomass prediction models for three different forest types—a moist Douglas-fir (Pseudotsuga menziesii)/western hemlock (Tsuga heterophylla) forest in western Washington state, dry ponderosa pine (Pinusponderosa) forest in the eastern Cascade Mountains of Washington state, and a birch/spruce forest on the Kenai peninsula of Alaska. As an exploratory study, our objectives are to investigate (1) whether it is possible to develop LIDAR-based aboveground forest biomass models with a small set of LIDAR metrics that have a clear biological interpretation and (2) whether models from different variable selection methods are significantly different in terms of the goodness of the model fit.
Data and Methods Study Sites Both LIDAR and field data were collected over three study areas: (1) Capitol Forest in western Washington State, (2) Mission Creek in central Washington State, and (3) Kenai Peninsula in south central Alaska (Figure 1). A summary of the field plots for these study sites are shown in Table 1. Area 1 was a 5.2-km 2 study area within the Capitol State Forest, western Washington State (122.990W to 123.323W, 46.828N to
I
Seattle *
Mission Creek
,Anthor9e
Oip *
Kenai Peninsula
Capitol Forest
Figure 1. Location of three study sites (denoted by black stars). Table 1 Study site
Summary of field plots for three study sites. Location
Forest type
Stand age (yr) Plot size (ac) No. of plots Trees per acre
CF Western Washington state Douglas-fir and western hemlock, moist site 70 MC Eastern Cascades, Washington state Douglas-fir and ponderosa pine, dry site 25 74 Spruce and birch KE South central Alaska
0.2 98 0.62 66 0.167 105
60 112 66
CF, Capitol Furec MC, ,\liinn Creek RE, Kenai Peninsula.
47.087N). The area is dominated by Douglas-fir and western hemlock. Additional species include western redcedar (Thuja plicata), red alder (Alnus rubra), and maple (Acer spp.). A total of 98 field inventory 0.2-ac plots were used in this study. Field inventory was conducted in the fall of 1998 and spring of 1999 and measurements acquired at each plot included species and dbh for all trees greater than 14.2 cm in dbh. In addition, total height and height to base of live crown were measured on a representative selection (47%) of trees over the range of diameters using a handheld laser rangefinder. This site is in the location of an ongoing experimental silvicultural trial, and a detailed description of the plot measurement protocol can be found in a previous report (Curtis et al. 2004). Area 2 was located in the Mission Creek watershed, in the eastern Cascade Mountains of Washington State (120.450W to 120.63 1W, 47.383N to 47.477N). The main species are Douglasfir and ponderosa pine with scattered grand-fir (Abies grandis). A total of 66 plots with plot size 50 X 50 m were used in this study. Data collected at each plot included tree species, dbh, and three height measurements for all trees: height to dead crown, height to live crown, and total height. Canopy closure, the proportion of open sky obscured by vegetation, was measured using a spherical densiometer (Model-A, Lemmon Forest Densiometers, Bartlesville, OK) at each sampled grid point. This site is part of an ongoing forest fire and fire surrogates experiment performed by the US Forest Service, and a detailed description of the plot measurement protocol can be found in a previous study (Lolley 2005). Field measurements were collected in the summer of 2003 and all trees were stem-mapped in the summer of 2004 using an Impulse laser-rangefinder (Laser Technology, Inc.) and Trimble GPS system (Trimble Navigation Ltd.). Area 3 was located in the west lowlands of Kenai Peninsula, south central Alaska (149.498W to 151.804W, 59.580N to 61.456N). The area covers approximately 5,000 mi 2 and elevation ranges from sea level to 600 m. Primary forest types are white spruce (Picea glauca), black spruce (Picea mariana), paper birch (Betula papyrif era), and mixed spruce and birch. A total of 105 Forest Inventory and Analysis (FIA) permanent field plots located in this area were used in this study. Each field plot consists of a cluster of four circular subplots approximately 1/24-ac in size with a radius of 24.0 ft. Most plots were measured by FIA crews in the summers of 2001-2003.
Trees Sin, in dbh or more were tallied and tree height was measured for several site trees within the plot. For detailed plot and tree measurement information, please refer to the field procedures for the coastal Alaska inventory (US Forest Service 2003a). Plot-level aboveground biomass (including leaves, branches, and stems) was estimated for Capitol Forest and Mission Creek study areas using BIOPAK (US Forest Service 2003b) (Means et al. 1994). For the Kenai study area, aboveground biomass of individual trees was estimated using equations developed in Washington, Oregon, and the British Columbia (Shaw 1979, Alemdag 1984, Manning et al. 1984, and Singh 1984) and plot-level aboveground biomass then was calculated by summing all trees within the four subplots. LIDAR DATA
LIDAR System Specification High-density LIDAR data were acquired over the Capitol Forest study area with an SAAB TopEye system (Top EyeAB) mounted on a helicopter platform in March 1999. LIDAR data for Kenai Peninsula and Mission Creek study areas were acquired with an ALTM 30/70 kHz LIDAR system (Optech, Inc.) mounted on a twin-engine Cessna 320 in May and August 2004, respectively. The system settings and flight parameters are shown in Table 2.
Derivation of LIDAR Metrics For each study site, the vendor provided raw LIDAR point data consisting of X-, Y-, and Z-coordinates and return intensity information for all LIDAR points in ASCII text format. In addition, the vendor provided "filtered ground" data representing ground returns isolated via a proprietary filtering algorithm. These filtered ground returns were used to generate a digital terrain model (DTM). All return observations (points) were spatially registered to the DTM according to their coordinates. The relative height of each point was computed as the difference between its Z-coordinate and the terrain surface height. Points with a relative height value less than 2 m were excluded to eliminate ground hits and the effect of stones, shrubs, and so on. and the remaining points were considered to be laser canopy hits. A set of variables that describe the canopy height distribution (the 10th, 25th, 50th, 75th, and 90th height percentiles; maximum height; mean height; and coefficient variation of height) WEST. J . A ppt. FOR. 23(4) 2008 225
Table 2. LIDAR system Specification for three study sites. Study site
LIDAR system
Flying height" (rn)
CF SAAB TOPEYE 200 MC OPTECH AIJEM 30/70 kHz LIDAR 1,200 KE OPTECH ALTM 30170 kHz LIDAR 1,200
Flying speed Swath width Laser pulse density Beam footprint (mis) (m) (points/1112) (diameter, cm) 25 50 50
70 300 300
4 >4 >4
40 84 84
Flying height is ahovegroutid level height. CF. Capitol Forest; MC, Mission Creek; KE, Kenai Peninstil.,.
were calculated from the laser canopy hits for each field plot. In addition, the canopy point density was calculated as the percentage of the first return canopy hits divided by the total number of first returns (both canopy hits and ground hits). At the Kenai Peninsula study site, each field plot is the sum of four 1/21-ac subplots and LIDAR metrics were calculated at the plot level (i.e., four subplots) instead of subplot level. The list of plot-level I.IDAR metrics was then merged with the plot-level field-based aboveground biomass estimates and imported into the R statistical analysis software.
LIDAR Metrics Selection Methods Step wise Regression Multiple linear regression models, which include all extracted LIDAR metrics as predictor variables, were first applied for each study area. Based on residual plots and variable transformations suggested by the alternating conditional expectations method (Raftery and Richardson 1996), logarithm transformed forest biomass was used as a dependent variable. Standard backward stepwise regression then was conducted and the best fitting models were selected based on the lowest Akaike information criterion value.
BMA BMA is a Bayesian method that involves averaging over all possible combinations of independent variables and accounts for uncertainty about model form and assumptions (Raftery et al. 2005). Under BMA, all possible models are considered and predictor variables are selected based on the posterior probability. The posterior distribution of predictor variable is a weighted average of its posterior distribution under each of the models considered, where a model's weight is equal to the posterior probability that it is correct, given that one of the models considered is correct. This method avoids the problem that the selected model depends on the order in which variable selection and outlier identification are performed. Suppose we have data D and we want to make inference about an unknown quantity X. If there are possible predictors in the regression model, the number of models K could be quite large (as many as 2P), The BMA posterior distribution of x is P (xJ D = I'. I P(D, M) P(il11 D), where P(D, I14) is the posterior distribution of x given the model Mi,, and P(MID) is the posterior probability that M1 is the correct model, given that one of the models considered is correct. The posterior model probability is given by
P(MD)
P(DIM) * P(M) 1P(DI) *
=
where P(DM) is the integrated likelihood of model M1 and it could be approximated by Bayesian information criterion (BIG). BIG, = * log(l - R) + P1 * log(n), where R,-' is the value of R2 , P1 is the number of predictors for the ith regression model, and n is the sample size (Raftery et al. 1997). The sum over all models is approximated by finding the models with the highest posterior probability
226 WEST. J . A p i'L. FOR. 23(4) 2008
using the fast leap and bounds algorithm. As an attempt to select both LIDAR metrics and models at the same time, BMA was used and the model with the highest model posterior probability was selected. P(.A
PCA describes the variation of a set of multivariate data in terms of set of uncorrelated variables, each of which is a particular linear combination of the original variables. The first principle component accounts for as much variation of the original data as possible, the second component is chosen to account for as much remaining variation as possible subject to being uncorrelated with the first component, and so on (Everitt and Dunn 2001). Using PGA, a subset of variables that explain the ma j ority of variation can be selected from a large set of (possibly highly correlated) predictor variables. The procedure is as follows: (I) decide how much of the total variation contained in the original variables needs to he accounted for, where values between 70 and 90% are usually suggested (Joiliffe 1972); (2) find the number of components that explain such variation (this number indicates the effective dimensionality of the data and is the size of the subset of original variables to he retained); and, finall y , (3) original variables are selected, one associated with each component, as the one not already chosen that has the greatest absolute coefficient value on the component. PCA was used to select LIDAR metrics from the pool of available LIDAR metrics, such as maximum height; mean height; 10th, 25th, 50th, 75th, and 90th height percentiles; coefficient variation of height; and the canopy point density. The minimal variation that needed to be explained was set to 95%. Two kinds of principle component regression models were developed. One was using the most significant principle components as predictor variables (denoted as PGA_1) and the other was using selected LIDAR metrics from PGA as predictor variables (denoted as PGA -2). Separate aboveground biomass regression models were developed using the selection methods described previously for each stud y site as well as common models using three stud y sites simultaneously.
Results LIDAR Metrics Selected by PCA PGA indicated that the first three principle components accounted for more than 95% of the total variation contained in the original set of LIDAR metrics. This is true for three individual study sites and the combined data set. To be specific, the first three principle components explained 98.5, 96.0 1 97.6, and 98.6% of the total variation contained in the original LIDAR metrics for the Capitol Forest, Mission Creek, Kenai Peninsula study sites, and the combined data set, respectively. Based on the criteria we set for variable selection, this means that only three original LIDAR metrics are needed to explain the majority of the variation contained in the UDAR data. The coefficients defining
Table 3. Correlation between principle components and original LIDAR metrics. Study site PCI PC2 I'C3 I'C4 PC5 CF Maxhr -0.368 -0.221 -0.174 0.247 0.595 Meanlir -0.392 0.373 -0.468 CV 0,133 -0.776 0.720 -0.353 PlO -0.298 0.511 -0.491 -0.486 P25 -0.385 -0.155 -0.132 P50 -0.388 -0.110 P75 -0.383 -0.170 0.152 1 1 90 -0.378 -0.206 -0.146 -0.979 0.128 D MC Maxhi 0.334 -0.357 -0.120 -0.473 0.662 Meanht -0.434 -0.158 -0.637 CV -0.696 -0.218 PlO -0.179 0.546 0.361 0.493 P25 0.360 0.325 0.293 P50 -0.427 0.109 -0.331 P75 -0.427 -0.143 -0.343 P90 -0.405 -0.216 -0.975 0.184 D KP -0.313 0.789 Maxhr -0.330 -0.391 0.167 Meanhi -0.391 -0.232 -0.486 -0.788 CV -0.686 -0.305 NO -0.343 0.334 -0.212 P25 -0.372 0.244 0.276 P50 -0.386 0.112 0.114 0.406 P75 -0.386 0.243 -0.169 P90 - 0.378 -0.187 -0.971 [) -0.191 Combined 0.709 '0.350 Mxxiii -0.351 -0.147 Meanhi -0.365 -0.843 -0.365 -0.275 CV 0.274 NO -0.342 -0.135 0.269 -0.869 -0.566 ['25 -0.363 -0.113 0.188 -0.231 P50 -0.364 -0.164 0.192 P75 -0.362 -0.227 0.158 0.110 P90 -0.360 -0.101 -0.105 0.167 0.974 D Cr. Capitol Forest; MC, Mission Creek; KE, Kenai Peninsula; Maxhr, maximum heighi Mcanh
the nine principle components with the original LIDAR metrics are shown in Table 3. These coefficients were scaled SC) that they represent correlations between I.TDAR metrics and the principle components. For all three study sites, mean height had the largest absolute correlation with the first principle component, coefficient variation of height had the largest absolute correlation with the second principle component, and canopy point density had the largest absolute correlation with the third principle component. Therefore, mean height, coefficient variation of height, and canopy point density explain most of the variation in the original LIDAR metrics set and they were selected as the most predictive variables for regression model PCA,2. After combining three study sites together, mean height, canopy point density, and the coefficient variation of height were selected again as the most predictive variables, but their order is slightly different from that for the individual sites (Table 3). For the individual study sites, coefficient variation of height had the largest correlation with the second principle component and canopy point density had the largest absolute correlation with the third principle component, while for the combined data set, the coefficient variation of height had the largest absolute correlation with the third principle component and canopy point density had the largest absolute correlation with the second principle component.
t,
PCO PC7 I'Cs PC9 0.600 -0.141 0.120
-0.443 0.785
0.515 0.271 0.129 -0.124 -0.189 -0.751 -0.227 -0.380 0.795 0.182 -0.394 -0.381 0.594 -0.315 -0.429 -0.279
0.119 0.883 0.627 0.408 0.246 0.247 -0.158 0.580 -0.135 -0.313 0.604 0.400 -0.301 -0.797 -0.141 -0.162 -0.623 0.423 -0.271
0.894 -0.274 0.100 0.341 0.256 -0.123 -0.597 -0.447 0.427 -0.111 -0.412 0.264 -0.654 -0.292 0.171 0.506 0.585 -0.193 0.493 -0.629 -0.157 -0.249 0.492
0.906
-0.165 0.611 0.304 -0.142 -0.220 --0.677 0.505 -0.217 -0.397 -0.195 -0.766 -0.129 -0.437 0.631 0.362 -0.244
mean height.
Model Comparisons Table 4 lists final biomass models selected from different variable selection methods for the individual study sites and the combined study sites. All models have high R2 values ranging from 0.67 to 0.88. /?2 values in the Kenai site were lower than those in the Mission Creek site, which were a little lower than those in the Capitol Forest site. Within each stud y site, stepwise models had slightly higher R2 values than BMA and PCA models, which means that stepwise models explained slightly more variation in aboveground biomass than BMA models and models from PCA (PCA_1 and PCA. 2). BMA models explained almost the same amount ofvariation as models containing the first three principle components (PCA,,.j) and models containing only mean height, coefficient variation of height, and canopy point density (PCA_2). Despite the similar R2 values within each stud y site, the number of LIDAR metrics selected by different statistical methods was different and stepwise models tended to contain more LIDAR metrics than BMA and PCA models. Canopy point density was the only LIDAR metric selected by stepwise, BMA, and PCA....2 models for all three study sites. The coefficients estimate of canopy point density were consistent (i.e., approximately the same) within each study site but not consistent WFST. J. A p ii .
Pos. 23(4) 2008 227
Table 4. Final regression models from different statistical methods. Stud y site and Method CF Step BMA PCA_I PCA2 MC Step BMA PCA_1 PCA2 KE. Step BMA PCA_1 PCA2 Combined Step
BMA
I'CA_1 PCA2
NC). of predictor
Final model
variables
Residual standard deviation (backtransfornicd, kg/ha)
I,N(Bio) = 9.50 -4- 0.097 * meanht + 1.47 * cv -0.05 * p90 + 2.42 * cl LN(Bio) = 9.97 + 0.03 * p25 + 2.39 * d LN(Bio) 12.46-0.02 * p1 + 0.01 * p2-0.61 * p3 LN(Bio) = 9.88 + 0.04 * meanht 0.03 * cv + 2.35 * d
4
0.88
2 3
0.87 0.87
3
0.87
LN(Bio) = 7.97-0.03 * maxht -4- 0.47 * meanhr + 4.73 * cv-0.10 * p25-0.23 * p75 + 1.89 * d LN(Bio) 8.83 + 0.05 * meanht + 2.29 * cv + 1.85 * cI LN(Bio) = 11.70-1.0 * pl-0.09 * p2-039 * p3 LN(Bio) = 8.83 + 0.05 * meanht + 2.29 * cv + 1.85 a d
6
0.76
LN(Bio) = 1.58-2.72 * nieanht + 14.03 * cv + 1.48 * p25 + 1.48 a p75 + 2.90 ad LN(Bio) = 2.83 + 8.70 * cv + 0.25 * p75 ± 2.70 a cI [,N(Bio) = 9.89-0.49 * p1-0.55 * p2-0.41 * p3 LN(Bio) = 2.41 + 0.34 * meanht + 9.36 a cv + 2.61 a d LN(Bio) = 5.49 + 0.42 * meanhe + 5.18 * cv -0.66 * p50 + 0.66 * p75-0.30 * p90 + 2.98 a d LN(Bio) = 5.49 -4- 0.42 * meanht -4- 5.18*cv-0.66*p50 + 0.66 a p75-0.30 * p90 -4- 2.98 a cl LN(Bio) = 11.23-0.42 * p + 0.70 a p2-0.(A* p3 1.N(Bio) = 5.64 + 0.11 * mcanht ± 5.66 * cv - 3.14*d
3
0.74
3 3
0.73 0.74
5
0.70
3
0.69
3 3
0.67 0.68
6
0.75
6
0.75
3 3
0.71 0.72
58,918 (19%) 59.946(21%) 55,862 (18%) 57,003 (19%)
39,338 (28%) 42,424 (31%) 42,242 (31%) 42,424 (31%) 17,709 (35%) 16,352 (32%) 16.695 (33%) 15,523 (31%)
45,724 (28%)
45.724 (28%)
43.220 (26%) 42.850 (26%)
PCA_ i is regression models using the First three principle componenri: pII . p 2 . and p3 as predictor variables. Number inside the parenthesis is the percentage ofbackiranaformed srandard deviation divided by mean biomass at the original scale. CF. Capitol Forest; MC, Mission Creek; KE. Kenai Peninsula.
across study sites. For other LIDAR metrics selected, their coefficients estimate were very different across different selection methods and different stud y sites (Table 4), which indicated that the common model using the combined data set was not good enough to capture individual variation within each study site. Across different study sites, PCA2 models contain the same set of LIDAR metrics: mean height, coefficient variation of height, and canopy point density. Figure 2 shows LIDAR-based biomass prediction from PCA_2 models versus field-based biomass estimate for three separate models and the common model from the combined data set. As indicated in Figure 2, overall model fit was good for both separate models and the common model as the relationship was not far from the 1:1 line. However, the coefficient estimates for these three LIDAR metrics were very different across study sites. The coefficient estimate of mean height was 0.04 at Capitol Forest, 0.05 at Mission Creek, 0.34 at Kenai Peninsula, and 0.11 for the combined data set. The model coefficient estimate of the coefficient variation of height was 0.03 at Capitol Forest, 2.29 at Mission Creek, 9.36 at Kenai Peninsula, and 5.66 for the combined data set. Finally, for the canopy point density, the coefficient estimate was 2.35 at Capitol Forest, 1.85 at Mission Creek, 2.61 at Kenai Peninsula, and 3.14 for the combined data set. Back-transformed SD is also shown in Table 4 along with the percentage of back-transformed SD divided by average field-based biomass at the original scale. At the original scale, SD from fitted models was about 20% of the meats biomass at the Capitol Forest
228 W1 v]. J. Api'i. FOR. 23(4) 2008
site, 30% at the Mission Creek site, 33% at the Kenai Peninsula site, and 28% for the three sites combined.
Discussion As expected, there is a significant relationship between fieldbased aboveground biomass estimates and LIDAR metrics for our three study sites. The biological basis behind this is due to the ecological and biomechanical links between canopy vertical structure and forest stand structure parameters. From the perspective of tree form and function development, there is usually a connection between the differences in vertical canopy structure and differences in forest biomass both through forest succession and across areas with contrasting environmental conditions. For example, Larson (1963) reported that crown geometry and crown position exert considerable control over bole form and vertical distribution of stem increment. LIDAR sensors directly measure 31) characteristics of forest canopy structure, which provides a good foundation for high correlations between LIDAR metrics and forest biomass. However, trees might develop different stem and crown shape relationships across different environmental conditions and geological regions, even for the same species. This might explain why model coefficient estimates were different across the three study sites. In this study, mean height, coefficient variation of height, and canopy point density were selected by PCA as the most predictive variables with the same order for all three 1,IDAR data sets tested, and biomass models developed using these three metrics had high R2
CF
MC 10
0 (I)
a) a)
co
F-
0 z -J
-a a)
C.)
a) 0
C!)
c'.i 310
-o a)
0 C'j
C.)
20
-o L(f
0_
-
10.5 11.0 11.5 12.0 12.5
11.0 11.5 12.0 12.5 13.0 Field LN(Biomass)
KE
Field LN(Biomass)
Combined
C2
C'J
co a)
Cl) Cl) It
8 0 ED z -J 0 () 0 0 (1)
(a (a 8
0
C'J
0 IIC
z
-J 0
0)
a)
0
0 -D
a) a_
N-
(0
(0
4 6 8 10 12 Field LN(Biomass)
4 6 8 10 12 Field LN(Biomass)
Figure 2. Results of plot-level LIDAR-based estimation of aboveground biomass for Capitol Forest (CF), Mission Creek (MC), Kenai Peninsula (KE), and the combined data set with mean height, coefficient variation of height, and canopy point density as predictor variables. The line is 1:1 line. values. From a resource management standpoint, these kinds of LIDAR-based forest structure models would be analogous to the use of aerial stand volume tables that have been widely used in forest inventory for a long time. Aerial stand volume tables present (in tabular form) the relationship between forest structure variables easily estimated from aerial photos—often mean tree height and percent canopy cover—and stand volume (Paine and Kiser 2003). Because aerial photos are passively sensed, these methods can not account for variation in stand volume associated with the vertical structure of the canopy. Previous studies have indicated that crown ratio, defined as the ratio of the crown length to total stern length, is an important indicator of the growth history of the tree and significantly influences the allornetric scaling between foliage and wood biomass (Makela and Valentine 2006). The use of 3D forest structure information provided by LIDAR has the potential to provide reliable estimation of variability in the canopy vertical structure. The most predictive LIDAR metrics set (mean height, coefficient variation of height, and canopy point density) found in this sttidy is consistent with the mean tree height and percent canopy cover used in the aerial stand volume table while the third variable, coefficient
variation of height, is a measure of canopy vertical variation. Because most LIDAR returns arc from the dominant trees, especially from the outer canopy of the dominant trees, the disrribtition ofl,IDAR return heights is weighted toward the tallest trees. As a result, the LIDAR mean height likely represents the height of the overstory trees. On the other hand, field-derived forest stand structure parameters are calculated using all trees in the plot. So, the inclusion of the coefficient variation of height helps to account for intermediate tree crown in the overstory and suppressed trees in the understory. Within each study site, LIDAR canopy structure information summarized by mean height, coefficient variation of height, and canopy point density did explain a similar amount of variation compared with other models. The predictive ability of these three LIDAR metrics is good for forest biomass across all three forest types, which indicates that the combination of mean height, coefficient variation of height, and canopy density represents a sufficient and concise quantitative description of the canopy structural content and therefore provides a good representation of stand structure WEST. J .
Ain-L. FOR. 23(4) 2008
229
characteristics. Models using these three LIDAR metrics likely capture the fundamental allometric relationships between foliage volumes and stem biomass. This finding is consistent with results from large-footprint SLICER data (Lefsky et al. 2005a), in which mean height, cover or leaf area index, and height variability were found to explain the most of variability in forest physical characteristics. After combining our three data sets together, mean height, coefficient variation of height, and canopy point density were again found to explain the majority olvariation. However, the coefficients from the combined model were different from the individual models, which suggest that the general model representing all study sires may produce more bias for each individual site than models developed for the specific site. Additional model sensitivity analysis is needed to confirm this. In comparison with stepwise and BMA models, models containing mean height, coefficient variation of height, and canopy point density (PCA_2) explained similar levels of variation in aboveground biomass, but PcA_2 models is relatively simple in model format and has clear biological interpretation. The straightforward prediction models described in this study will greatly facilitate the application of LIDAR to practical forest inventory and management.
Conclusions This study used LIDAR return data from three different forest types to explore LIDAR metrics selection and LIDAR-based model interpretation. Mean height, coefficient variation of height, and canopy point density were found to explain the majority of variation in the LIDAR structural metrics. The high predictive ability of these three T.TDAR metrics was illustrated by predicting forest biomass for three different forest types. A comparison of models using these three metrics with models developed using standard stepwise regression methods and BMA indicates that compressing a large number of LIDAR metrics into a small set of variables could be an efficient way of developing LIDAR-based allometric forest biomass models. Analysis of the combined LIDAR data from three sites confirmed that mean height, coefficient variation of height, and canopy point density were the most predictive LIDAR metrics. However, the coefficient estimates in the common models differed from those in the models developed within each individual study site. Therefore, even though similar predictor variables are used, models developed for each site may require individual parameterization. To characterize forest stand structure, a remote measurement of canopy structure that is rapid, reproducible, and with a spatial resolution commensurate with the scale of structural variation is needed because existing ground-based approaches are slow, inexact, or highly averaged spatially. As a rapidly growing remote sensing technology, LIDAR offers great potential to capture detailed 3D canopy information rapidly. Findings from this study indicate that it is possible to develop straightforward regression models for different forest types using three primary LIDAR metrics-mean height, coefficient variation of height, and canopy point density. If this is true for a wide range of forest types and LIDAR systems, the operational use of LIDAR for forest inventory may become common in the future.
Literature Cited ACKLRNIANN, F. 1999. Airborne laser scanning-present status and future expectations. ISPRSJ. Phocogramm, Remote Sent. 54:64 -67.
230 WEST. J . Ai'I't FOR. 23(4) 2008
ALL ott,, 1.5. I )S's. iota! tree and men/mutable stein biomes, es/uttiotis for Unit, to hardwoods. Rep. PI-X-46, Can. For. Serv.. Petawawa National Forestry Institute, Chalk River, ON, Canada. 54 p ANDERSEN, N-F., R.J. McGAI,HEV, .SNI) S.F. RieriIsrrcu. 2005. Estimating forest canopy fuel parameters using IIDAR data. Remote Sent. Environ. 94:441-449. CURTIS, R., D. MARSHAL!, AND D. DEBELL. (eds.) (2004). Silvicultural options for
young-growth DougLas-fir fhrests: The capitol forest study-Estabhshment ana' first results.US For. Serv. Gen. Tech. Rep. PNW-GTR-598, Pacific Northwest Research Station, Portland, OR. 110 p. DRAKE, JO., R.G. KNDX,R.O. DeMEAn, D.B. CLARK, R. CONDIT, JO. BI,AIIt,\NI1 M. Ho p rox. 2003. Above-ground biomass estimation in closed canopy neorropical forests using lidar remote sensing: factors affecting the generality of relationship. Global Em!. Biogeogr. 12:147-159. EVEIUTF, 0.5, AND G. M I NN. 2001. Applied multivariate data analysis. Arnold. London. 342 p. H-SI I, S.A., I.C. BURKE, D.O. Box, M.R. KAUIMANN, AND J.M. STOKER. 2005. Estimating stand structure using discrete-return lidar: An example from low density , fire prone ponderosa pine forests. For. Ecol. Manag. 208:189-209. Hot ,\1GREN. J . 2004. Prediction of tree height, basal area, and stem volume in forest stands using airborne laser scanning. Stand. I. For. Res. 19:543-553. HUDAK, A.I., N.L. CRooKsroN. J.S. Lv,.sNs, M.J. FALKOsXSKI, A.S. SUI Ill. P.E. Grssi tR, AND P. MORGAN. 2006. Regression modeling and mapping of coniferous forest basal area and tree densit y from discrete-return lidar and multispectral satellite data. ('an. J. Remote Sens. 32:1-13. jot LiFER, I.T. 1972. Discarding variables in a principle component analysis. I: Artificial data. App!. Stat. 21:160-173. [,AR5ON, P.R. 1963. Stem form development of forest trees. For...ci. Monogr. 5.42 p. LEISKY, MA., W.B. Coi-iix, S.A. AC:KFR, G.G. PARKER, T.A. S p Its, AND D. HARDING. 1999. Lidar remote sensing of the canopy structure and biophysical properties of Douglas-fir Western Hemlock Forests. Remote Sent. Environ. 70:339 -361. I.r p s yy, MA., W.B. Coinx, G.G. PARKER, AND D.J. HARDING. 2002. Lidar remote sensing for ecosystem studies. Bioscience 52(l):19-30. 1.FrsKy, MA., A.T. Ni;DsK, W.B. ColtEx, AND S.A. AcKr.R. 2005a. Patterns of covariance heiween forest stand and canopy structure in the Pacific Northwest. Remote Sens. I-.nvirsn. 95:517-531. LEFSKY, MA., A.T. HUDAK, W.B. COHEN, S.A. Ac:KER. 2005b. Geographic variability in lidar predictions of forest stand structure in the Pacific Northwest. Remote ,Sens. Environ. 95:532-548. List, KS.. .SNI.i P.M. ' I ' REIT7.. 2004. Estimation of above ground forest biomass from airborne discrete return laser scanner data using canopy-based qtianrilc estimators. Scand.J. For. Res. 19:558 -570.
Wildlanclfitel conditions and effi'cts of modelecifliel treatments on behavior and severity in dmyJiresu oJ the Wenatchee Mountains. MS
Loi.LEY. M. R. 2005. mi/a'land fire
thesis, Urns', of Washington, Seattle, WA. 144 p. MAKEIA, A., AND H.T. VALENtINE. 2006. Crown ratio inlltience,s allotnetric scaling in trees. Ecolo' 87:2967-2972. MALTAMO, M., P. PACKALEN, X. Ye, K. EERIKAINF.N, J. Hvvm'i'ss, AND j. PITK.SNLN. 2004. Identicing and 9uantif'uig heterogeneous boreal forest structures using Laser scanner data. P. 153-156 in Proc. of the JSPRS working group VJlI12 laser-scanners for .foresc and landscape assessment. Freiburg, Germany. Oct. 3-6, 2004. Thies, M., B. Koch, H. Spiecker, and H. Weinacker (eds.). Institute for Forest Growth, Institute for Remote Sensing and Landscape Information Systems. Albert Ludwigs University. Freiburg, Germany. MANNING, G.H., M.R.C. MA55IE...\NIJ. RUDD. 1984. Metric single-tree weight tables for the Yukon Terricoty. mE Rep. BC-X-250, Can. For.,Scrv., Pacific Forest Research Centre, Victoria, BC, Canada. 60 p. MEANS, J.E., A.H. HEATHER, J.K. GREG, B.A. PAUL. AND W.K. MARK. 1994. Sofi ware fir computing plant biomass-BJOPAK users guide. US For. Sees'. Gets. Tech. Rep.PNW-GTR-340, Pacific Northwest Res. Sm., Portland, OR. 180 '. MOEFIFT, 'F., K. MENGERSEN, C. WIFFE, R. KING, AND R. DENHAM. 2005. Airborne laser scanning: exploratory data analysis indicates potential variables for classification of individual trees or stands according to species. [SIRS J. l'hotogramm. Remote Sens. 59:289-309. N,RSSFT, E. 2002. Predicting forest stand characteristics with airborne laser using a practical two-stage procedure and field data. Remote Sent. Environ. 80:88 -99. Nsss y T, F. 2004. Practical large-scale forest stand inventory using a small-footprint airborne scanning laser. Scand. J. l'ir. Res. 19:164-179. NISSSET, E., T. GOBAKKEN,J. HOI.Mc:RF.N, H. Hssi'I'A, J. H yypp , M. MALIAMO, M. NILSSON, H. OLssoN, A. PFRSSON, AND U. SODERMAN. 2004. Laser scanning of forest resources: The Nordic experience. Scand. I. For. Res. 19:482-499. N-EstEr, E., O.M. BoIIANDS,AS, AND T. GOBAKKEN. 2005. Comparing regression methods in estimation of biophysical properties of forest stands from two different inventories using laser scanner data. Remote Sent. Environ. 94:541-553. Ni-MI-Is FER, W., D.B.Ju p t', AND R. DcsAvAI-!. 2001. Modeling Lidarwaveforni in
)'Ws. heterogeneous and discrete canopies. lI:EL I runs. Remote (,cosei. 39:1943-1958. OLIVER, CD., AN r) B.C. [.ARw). 1996. Forest stand dynamics. McGraw-Hill, New York. 467 p. PAINE, D.P., ANt) J.D. KIsER. 2003. Aerial photography and image interpretation. Wiley. Hoboken, NJ . 632 p. RAFTERY, A.E., AN[) S. RIChARDSoN. 1996. Model selection for generalized linear models via GLIB, with application to epidemiology. P. 321-354 in Bit yesian biostatistics, Berry, D.A., and D.K. Stangi (eds.). Marcel Dekker, New York. R.-\i• FR), AL., D. MADIGAN, AND J.A. HoF:;IN;. 1997. Bay esian model averaging for regression models. J. Am. Stat. Assoc. 92:179-191. R,•sv;ERs, A., I. PAIN ER, AND C. VOI.INSKY. 2005. BMA: An R Package for Bayesian Model Averaging. R. News 5:2-8.
Si 'es , D-1 .. 1 9-9. llioinass c9Uations for I )oitylas-hr. sveocrn heniluck, and red cedar in Washington and Oregon. P. 763-781 in Proc. of' the Forest Resource Inventories Workshop, Fort Collins, CO. July 23-26, 1979. Prayer, W.E. (ed.). Colorado State University, Fort Collins, CO. SINCH, T. 1984.
Biomass equationsforsix major cite species of the Northwest Territories.
Inf. Rep. NOR-X-257, Can. For. Serv., Northern Forest Research Centre, Edmonton, AB, Canada. 21 p. US FORESt SERVICE. 2003a. Field proceduresfbr the coastalAlas/ta inventozy. Available online at w-ww.Is.ied.tis/pnwlfiallocalresources/pdfifieldrnanualslak/2003 coakJekLtnanual.pdf; last accessed Apr. 3, 2008. US FORESI SEttvlc y, 20031). BIOPAK software package for biomass calculation. Available online at svww.fsl.orst.eduflterldataltoolilsoftware/biopak.cfm? topnav= 149; last accessed Apr. 2008.
Wi;sr. J .
API'L. FOR. 23(4) 2008
231