Generalized linear modelling in periglacial studies - Wiley Online Library

PERMAFROST AND PERIGLACIAL PROCESSES Permafrost and Periglac. Process. 15: 327–338 (2004) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/ppp.482

Generalized Linear Modelling in Periglacial Studies: Terrain Parameters and Patterned Ground Miska Luoto1* and Jan Hjort2 1 2

CSIRO, Sustainable Ecosystems, Private Bag 5, Wembley 6914 WA, Australia Department of Geography, University of Helsinki, PO Box 64, FIN-00014, Finland

ABSTRACT Generalized linear models (GLM) are mathematical extensions of linear models. GLM models are more flexible and better suited for analysing relationships of spatial data, which can often be poorly represented by classical Gaussian distributions such as least-square-regression techniques. This paper demonstrates GLM model-building procedures step-by-step for the distribution and abundance of active patterned ground in northern Finland. The exercise is based on data from an area of 200 km2 (800 modelling squares of 0.25 km2). Both the distribution and abundance models clearly indicate an increasing activity of patterned ground with (1) increasing soil moisture and (2) proportion of concave topography. Activity decreases with increasing altitude. We conclude that GLM techniques combined with a geographic information system can play an important role in analysing and modelling periglacial data sets. Copyright # 2004 John Wiley & Sons, Ltd. KEY WORDS:

patterned ground; generalized linear model; logistic regression, GIS, Finland

INTRODUCTION The analysis and modelling of geomorphological processes and landforms is a central issue in periglacial study. With the rise of statistical techniques and geographic information systems (GIS) tools, the development of spatial models has increased rapidly in geomorphology (Walsh et al., 1998). For example, recent studies have produced models relating permafrost distribution and periglacial landforms and processes to terrain parameters within a GIS environment (Etzelmu¨ller et al., 2001; Gruber and Hoelzle, 2001; Harris et al., 2001; Luoto and Seppa¨la¨, 2003). An important development has been the advance in regression analysis provided by generalized linear modelling (GLM) (Guisan, 2002). GLMs have been * Correspondence to: M. Luoto, Research Programme for Biodiversity, Finnish Environment Institute, PO Box 140, FIN-00251, Finland. E-mail: miska.luoto@ymparisto.fi Copyright # 2004 John Wiley & Sons, Ltd.

used in slope stability evaluation (Rowbotham and Dudycha, 1998; Dai and Lee, 2002) and fluvial process studies (Rice, 1998; Bledsoe and Watson, 2001). Unfortunately, they have not yet been widely employed in periglacial studies (see exceptions in Luoto and Seppa¨la¨, 2002b, 2003). In particular, GLM studies are largely lacking in the study of patterned ground. Patterned ground is one of the most characteristic features of periglacial landscapes in Finnish Lapland, especially above timberline in the zone of discontinuous permafrost (Atlas of Finland, 1986; Seppa¨la¨, 1997). The most common types are sorted and nonsorted nets. At a fine scale, local determinants, such as topography, soil, hydrology and snow thickness are important (Washburn, 1979; French, 1996). At this scale, micro-topography and soil moisture control the distribution and formation of patterned ground (Matthews et al., 1998). However, despite recent progress in both theoretical and empirical understanding (Werner and Hallet, 1993; Matthews et al., 1998), the Received 6 May 2003 Revised 20 January 2004 Accepted 27 January 2004

328

M. Luoto and J. Hjort

formation of patterned ground remains relatively poorly understood. The scale between the fine- and large-scales, which can be called ‘the meso-scale’ (scale with a grain size ranging from 100 100 m to 2 2 km), is poorly explored yet this is the scale which often matches with existing data sources, such as geomorphological, geological and soil maps, and digital elevation models (DEMs) (Luoto and Seppa¨la¨, 2002b). Spatial modelling at the meso-scale has great potential application to a wide range of periglacial studies using existing data sources and reducing the need for costly and time-consuming data gathering. The aims of this paper are to (1) review GLM techniques, (2) demonstrate the GLM model building procedure step-by-step for the distribution and abundance of active patterned ground, and (3) discuss the importance of GLM methods in periglacial research in general. GLM In the last few decades, one of the main modelling tools has been the ‘classical’ least square regression (LS) technique. However, it should be noted that when using LS models there are two implicit statistical assumptions. First, the LS approach is theoretically valid only when the response variable is normally distributed, and second, variance does not change as a function of the mean (homoscedasticity) (Trexler and Travis, 1993; Sokal and Rohlf, 1995). These features of LS are particularly problematic when modelling geographical data sets which are often ‘zero inflated’ and non-normally distributed (Barry and Walsh, 2002). GLMs are mathematical extensions of linear models (McCullagh and Nelder, 1989). GLMs handle nonlinear relationships and different types of statistical distributions characterizing spatial data. They are technically closely related to traditional practices used in linear modelling and analysis of variance (ANOVA) (Guisan et al., 2002). GLMs do not force data into unnatural scales; they allow for non-linearity and non-constant variance structures in the data. Therefore, GLMs are more flexible and better suited for analysing spatial relationships which can be poorly represented by classical Gaussian distributions (McCullagh and Nelder, 1989; Crawley, 1993). In GLMs, the combination of predictors, the linear predictor, is related to the mean of the response variable through a link function. The link function allows transformation to linearity and the predictions to be maintained within the range of coherent values for the response variable. By doing Copyright # 2004 John Wiley & Sons, Ltd.

so, GLMs can handle distributions such as the Gaussian, Poisson, Binomial, or Gamma with respective link functions set, for example, to identity, logarithm, logit and inverse (Guisan and Zimmermann, 2000). The GLM has the form: ð1Þ l ¼ a þ b1 x 1 þ b2 x2 þ þ bk xk where l is known as the linear predictor and is related to the expectation, u, with the link function. The link function has the form: gðuÞ ¼ l ð2Þ In summary, there are three main components to the GLM (Nicholls, 1991): 1) the response variables (y1, y2 . . . yn), which are assumed to be drawn from the same distribution (for example the Normal, Poisson or Binomial), 2) a set of parameters (in example above, a and b1 to bk are the parameters) and a set of explanatory variables (x1 to xk), 3) a link function that relates linear predictor to the expectation or the predicted value. Formal descriptions of GLMs and the range of possible link functions can be found in McCullagh and Nelder (1989) and Crawley (1993). Examples of the use of GLMs in periglacial and permafrost studies can be found in Luoto and Seppa¨la¨ (2002a, 2003). Binary data (presence/absence, geographical distribution) are generally assessed using logistic regression methods (Collett, 1991), a form of the GLM in which the relationship is expressed as a probability surface. The expected error structure is binomial and a logit transform (logit link) is applied to the data (Rita and Ranta, 1993; Trexler and Travis, 1993; Sokal and Rohlf, 1995). This logit link means that the probability of obtaining a positive response is a logistic, s-shaped function when the linear predictor is a firstorder polynomial and for second-order polynomials it will approximate a bell-shaped function (Crawley, 1993). Logistic regression has been shown to be a powerful tool, capable of analysing the effects of one or several independent variables over a binary variable (Pereira and Itami, 1991; Augustin et al., 1996; Brito et al., 1999). It is also an appropriate and widely used method for statistical analysis in different distribution problems in biological and ecological studies (Pereira and Itami, 1991; Carroll et al., 1999). In a model that explains variations in binary data, residuals cannot be normally distributed. This is because there are only two possible values for the response variable: 0 for absence and 1 for presence. Thus, the statistical theory developed for ordinary regression models is not applicable to binomial Permafrost and Periglac. Process., 15: 327–338 (2004)

GLM Modelling in Periglacial Studies

329

distribution data (Hosmer and Lemeshow, 1989). In logistic regression, the binary nature of the response variable variation is the basis of parameter estimation and thus the logistic regression models will not produce inappropriate values ((X) > 1 or (X) < 0) for the probability of presence. Logistic regression has the form (Hosmer and Lemeshow, 1989): expð þ xÞ ð3Þ ðxÞ ¼ 1 þ expð þ xÞ where is the constant and is the coefficient of the respective independent variables. The probability of presence (ranging from 0 to 1), is given as a function of the vector of this model and becomes apparent after the logistic transformation giving the form: ðxÞ ln ¼ þ x ð4Þ 1 ðxÞ where ln denotes the natural logarithm (Sokal and Rohlf, 1995). A more technical and detailed review of logistic regression is presented by McCullagh and Nelder (1989) and Collett (1991). Figure 1

STUDY REGION The study region is located in the zone of discontinuous permafrost in the northern part of Finnish Lapland (Figure 1). The cover of the study region is 200 km2 (800 modelling squares of 0.25 km2). Elevations in the area range from 293 to 641 m a.s.l. The climate of the region is subarctic: the mean annual air temperature was 2.0 C (mean annual min 39.0 C and max þ28.4 C) during the period 1962–90 measured at the Kevo Meteorological Station (69 450 N, 27 010 E; 107 m a.s.l.). This is located 30 km northeast from the study area. Mean annual precipitation at Kevo was 395 mm in 1962–90 (Climatological Statistics in Finland 1961–1990, 1991). Botanically, the region lies north of the northern limit of the continuous Scots pine (Pinus sylvestris) with birch (Betula pubescens ssp. czerepanovii) as the dominating tree species (Hustich, 1960; Ahti et al., 1968). Mires, especially palsa mires, cover ca. 6% of the study region, mostly occurring between 340 m and 390 m above sea level (Luoto and Seppa¨la¨, 2002b). METHODS Patterned Ground Data Patterned ground was mapped in an area of 200 km2 (SW corner 69 280 N, 26 140 E and NE corner Copyright # 2004 John Wiley & Sons, Ltd.

The location of the study area in northernmost Finland.

69 390 N, 26 290 E) in the Paistunturit fell area in northernmost Finnish Lapland. It was classified according to Washburn (1979). Results were digitized on ortho-rectified aerial photographs (Figure 2). For the statistical analysis, the study area was split into 800 equal-size (500 500 m) grid squares (Luoto and Seppa¨la¨, 2002b). The patterned ground was defined visually in the field during June–August 2002. Sorted patterned ground was classified as active if stones were not significantly covered with lichen and the central parts of the patterns were not vegetated (Figures 3 and 4). Visually defining the activity of non-sorted patterned ground is difficult. However, using observations of frost heaving and vegetation disturbance, non-sorted patterned ground was classified as either active or inactive. The extent of active patterned ground (thereafter termed ‘abundance’) was measured using GIS Arc/Info (Esri, 1991). A binary variable was produced, indicating active or inactive patterned ground, in each grid square. Terrain Parameters Five terrain parameters were calculated for each 25-ha square from a DEM using an Arc/Info Grid. The DEM is a regularly spaced matrix of altitude values with planimetric coordinates. The DEM with a 20 m grid size was created by linear interpolation from contour lines (elevation isolines) from 1:10 000 scale paper Permafrost and Periglac. Process., 15: 327–338 (2004)

330

M. Luoto and J. Hjort

Figure 2 A fine-scale map of the distribution of active and inactive patterned ground in the northern part of the study area. Contours # National Land Survey of Finland, Licence number 49/MYY/03.

Figure 3 Active, non-vegetated patterned ground on the bottom of a seasonal pond. All photographs are taken by JH in the study area in summer 2002.

Copyright # 2004 John Wiley & Sons, Ltd.

Permafrost and Periglac. Process., 15: 327–338 (2004)

GLM Modelling in Periglacial Studies

331

Figure 4 Inactive, vegetated pattern ground on the top of a fell. The vegetation is dominated by Betula nana, Empetrum nigrum and Juncus trifidus.

copies using Arc/Info’s TOPOGRID command (Esri, 1991). Quality of the created DEM was evaluated visually and by calculating root mean square error (RMSE) from height points not used in the interpolation. Vertical RMSE was 2.2 m. A slope map of the area was produced from DEM by SLOPE function in GRID module. The terrain parameters were calculated by Arc/Info’s GRID for each 25-ha square. Mean altitude and mean slope angle were calculated directly from DEM and slope map by ZONAL function. We calculated moisture of the study area using the wetness index (!) using the following formula: ! ¼ lnðAs =tan Þ;

ð5Þ

where As represents the upslope contributing area and tan the slope angle (Beven and Kirkby, 1979). The wetness index was calculated from the pit-filled DEM. Sinks and peaks of DEM are often errors in data due to resolution of the data or rounding of elevations to the nearest integer value. They were filled up to the level of the lowest grid cell on the rim of the sink with a defined flow direction using Arc/Info’s FILL command (Esri, 1991). The proportions of flat topography (slope