J Geograph Syst (1999) 1:179±198 ( Springer-Verlag 1999
Terrain complexity and reduction of topographic data Yue-Hong Chou1, Pin-Shuo Liu1, Raymond J. Dezzani2 1 Department of Earth Sciences, University of California, Riverside, CA 92521, USA (e-mail:
[email protected]) 2 Department of Geography, Boston University, Boston, MA 02215, USA (e-mail:
[email protected])
Abstract. Digital terrain data are useful for a variety of applications in mapping and spatial analysis. Most available terrain data are organized in a raster format, among them being the most extensively-used Digital Elevation Models (DEM) of the U.S. Geological Survey. A common problem with DEM for spatial analysis at the landscape scale is that the raster encoding of topography is subject to data redundancy and, as such, data volumes may become prohibitively large. To improve e½ciency in both data storage and information processing, the redundancy of the terrain data must be minimized by eliminating unnecessary elements. To what extent a set of terrain data can be reduced for improving storage and processing e½ciency depends on the complexity of the terrain. In general, data elements for simpler, smoother surfaces can be substantially reduced without losing critical topographic information. For complex terrains, more data elements should be retained if the topography is to be adequately represented. In this paper, we present a measure of terrain complexity based on the behavior of selected data elements in representing the characteristics of a surface. The index of terrain complexity is derived from an estimated parameter which denotes the relationship between terrain representation (percentage surface representation) and relative data volume (percentage DEM elements). The index can be used to assess the required volume of topographic data and determine the appropriate level of data reduction. Two quadrangles of distinct topographic characteristics were examined to illustrate the e½cacy of the developed methodology. Key words: Digital elevation models, terrain complexity, GIS JEL classi®cation: C0, C6, C8 1. Introduction Accurate topographic surface or terrain modeling is a complex task. The fractal nature of landform prohibits exact reconstruction of the details of complicated elevation surfaces. As such, redundant elevation data elements
180
Y. H. Chou et al.
might be eliminated, if the reduced set adequately described the relief by providing su½cient information without raising the level of uncertainty in surface interpolation. Theoretically, if a reduced data set can generate a terrain surface that is statistically indistinguishable from the original, any redundant or unimportant data elements should be eliminated. This problem has been only partially addressed through the development of digital data structures such as the triangulated irregular network (TIN) which is a more e½cient digital terrain representation model than the lattice or grid structure of digital elevation models (Mark 1975; Kumler 1994). Procedures for statistical selection of signi®cant points have provided further improvements in data reduction (e.g., Chen and Guevara 1987). Digital terrain data are useful for a variety of purposes, including the delineation of drainage networks for hydrological study (Mark 1984), vegetation mapping of large areas of inaccessible terrain (Shasby and Carneggie 1986; Talbot and Markon 1986), forest classi®cation and inventory (Franklin et al. 1986), modeling the probability of wild®re distribution in southern California (Chou et al. 1990, 1993; Chou 1992a), deriving slope lines of steepest descent for surface analysis (Chou 1992b), detection of deformation in close-range photogrammetry (Karras and Petsa 1993), mapping wildlife habitat (Aspinall and Veitch 1993), correcting the surface area derived from planimetric maps or scanned images (Chou et al. 1995), predictive vegetation mapping (Franklin 1995), modeling distribution of solar radiation on a terrain (Dubayah and Rich 1995), mapping ecological land systems (Gong et al. 1996), and soil drainage classi®cation (Cialella et al. 1997). Moore et al. (1991) provide a helpful review of general applications of digital terrain models. Most of the available digital terrain data are organized in a raster (grid) format where elements of elevation data are regularly spaced to completely cover the map area. Among the various sources of digital terrain data, the 7.5min digital elevation models (DEM) of the U.S. Geological Survey (1987) have been employed most extensively. The DEM provide the most complete geographical coverage for topographic mapping in the United States with a spatial resolution su½ciently high for most applications. Other sources of digital terrain data are not as widely adopted either because their geographical coverage is limited or because their spatial resolution is not appropriate for general applications. For instance, the vector-based 1:100,000 digital line graphs (DLG) of the U.S. Geological Survey are only suitable for small scale mapping due to the lower spatial resolution. Since no e¨orts are being made to develop a higher-quality, nationwide coverage of topographic data from other sources such as stereoscopic SPOT imagery, the DEM will remain the primary source of digital terrain data in the United States in the foreseeable future. The DEM are organized with a grid size of 30 m and, as such, the common problem of data redundancy among raster data is inevitable and could be severe in many cases. Especially when the terrain contains large areas of low spatial frequency, (i.e., relatively ¯at surfaces or slopes of a constant gradient,) both the storage and processing of DEM become excessively ine½cient due to the severe data redundancy. Regarding the redundancy and e½ciency of the DEM, two premises are commonly acceptable. First, complex terrains require more data elements to represent than simple terrains. Second, the e½ciency of data storage and data processing can be improved by reducing redundant elements. The primary objective of this study is to develop a quantitative method for e½cient reduction of the DEM based on terrain complexity.
Terrain complexity and reduction of topographic data
181
In the following sections, we ®rst discuss the theoretical underpinnings in the evaluation of relative importance of data elements and relate the importance measure to selection criteria in data reduction. We then present a measurement of terrain complexity based on the relationship between data volume and representation of the topography. The index of terrain complexity is designed for assessing the complexity of a terrain in terms of the amount of data that are needed to accurately represent the topography. The method developed in this study is illustrated using two empirical sets of DEM. 2. Basic concepts of data reduction Two approaches are available for mitigating data redundancy in the DEM. The ®rst is a raster approach which maintains the original grid format while reducing the elevation data systematically. Usually, the systematical reduction are exogenously speci®ed by rows and columns. In general, the method is straightforward and its resulting data set preserves the raster structure. The main advantage of this raster approach is that the regularity in spatial con®guration remains unchanged. However, since data are reduced systematically by rows and columns, it is possible that data elements that are crucial to the terrain may be removed while less signi®cant elements remain abundant in the reduced set. Ideally, data elements of the DEM must be reduced in such a way that critical elements are kept while less important elements removed. To do so, data reduction cannot be speci®ed by rows and columns and the raster structure no longer holds. Such a vector approach requires two steps: every data element is evaluated for signi®cance ®rst and then less important elements are removed. Elements in the resulting data set thus become irregularly spaced. The implementation of this vector-based method involves two interrelated problems: (1) how to evaluate the importance of each data element and (2) how many elements can be reduced from a given DEM. These two questions form the central theme of this study. The relative importance of data elements determines which elements should be kept and which elements should be eliminated. An explicitly expressed function is formulated in this paper for evaluating the relative importance of each element. This function is derived based on the information content of the data element set and is formulated independently from any spatial structure such as TIN. Once every element is assigned a weight of relative importance, the next issue is to determine an acceptable level of data reduction. At present, common GIS procedures for data reduction require the analyst to exogenously specify the level of reduction, assuming that the analyst is responsible for, and capable of, determining the appropriate proportion of data elements to retain while no formula for evaluating data e½ciency is provided. E½ciency of data reduction implies that, on the one hand, the amount of data must be minimized while on the other hand the surface should be represented su½ciently accurately from the reduced set. Therefore, it is evident that the amount of data needed to represent a topography depends on the complexity of the terrain. A simple, smooth surface, generally referred to as a surface of low spatial frequency, can be e½ciently represented by a small number of data elements. The theoretical extreme is when the surface is per-
182
Y. H. Chou et al.
fectly ¯at without any variation in relief. In this case, the entire surface can be e½ciently represented by one single data element. The appropriate representation of more complicated terrains requires more data elements. As such, a prerequisite to determining the e½cient level of data reduction is a measure of terrain complexity. In GIS data processing, mitigation of data redundancy has been focused on the e½cient organization of raster data. Available methods of raster data organization, such as the chain code, the run length code, the block code, and the quadtree structure are designed for more e½cient organization of raster data. In general, the methods of data organization are not intended for data reduction since their general objective is to maintain the entire data set in a more e½cient way. The reader is referred to Burrough (1987) and Clarke (1995) for a general discussion of the existing methods of raster data organization. Also, in computer science, operational algorithms have been developed for e½cient compression of image data. There are two major groups of compression methods, lossless and lossy. Lossless algorithms are those that compress a data set without losing any information when the compressed set is decompressed. Lossy algorithms compress the data set further and the original data set cannot be fully recovered when the compressed data are decompressed. These compression algorithms are not developed for the same purpose as the reduction of elevation data. Data compression methods are designed for e½cient storage and transfer of raster images and thus the main consideration is to preserve the quality of the image for visual display and interpretation. As such, a lossy algorithm is considered e½cient if human eyes cannot detect the distortion caused by the compression process. In our study, the main concern is to maintain the highest level of topographic representation and thus the criterion is based on the computed surface area. In other words, while image data compression emphasizes visual e¨ects and data transfer, reduction of topographic data is based on the criterion of maintaining the highest level of elevation accuracy for spatial analysis. For an overview of image data compression, the reader is referred to Arai (1990) and Arps and Truong (1994). Mark (1975) and Kumler (1994) have addressed the relative e½ciencies of TIN and DEM or lattice structure for digital terrain representation. These studies are concerned with the evaluation of appropriate data model structures or arrangements: most commonly, the regular grid expressed as a DEM or lattice versus the irregular structure imparted by TIN. Scale is held constant or uniform in these studies. The problem of scale and terrain representation is treated in Gallant and Hutchinson (1996). The method proposed in this paper requires that a measure of data e½ciency, which is independent from the structure of the data, be evaluated for data reduction. Data e½ciency can be achieved by minimizing the data retained for the construction of a terrain while maintaining the essential shape or gradient characteristics. Thus, there exists a functional relationship in proportional information content between the original data set and a reduced data set. The process of identifying the functional relationship and parameters is assumed to be independent from the data structure employed. Reduction of the DEM may take one of two forms, which, we term the conservative reduction and the aggressive reduction. In the conservative reduction, only those elements that are absolutely redundant are eliminated, im-
Terrain complexity and reduction of topographic data
183
plying that the reduced set correctly represent the entire surface without losing any information about the terrain. The concept of conservative reduction is similar to that of lossless coding in compression of image data (e.g. Arai 1990; Arps and Truong 1994). Alternatively, the aggressive reduction tries to achieve a higher level of data e½ciency by eliminating not only the redundant elements but also the relatively trivial elements that do not contribute signi®cantly to the topographic information set. In general, the aggressive reduction keeps a minimal amount of data elements while providing as much topographic information as possible. Although the conservative reduction can always be achieved, both theoretically and practically, the resulting data set may not be e½cient for data storage and information processing. Thus, the e½ciency of a data set can be considered as the ratio of the information content to the volume of the data set. A reduced data set is e½cient if it provides a su½ciently large amount of topographic information with a relatively small number of data elements. Operationally, the conservative reduction involves only a straightforward compression procedure since only those elements that are absolutely redundant are eliminated. The procedures for the aggressive reduction are much more complicated because both problems of element selection and amount of reduction must be considered simultaneously. 3. Data reduction for one-dimensional features Figure 1 shows the typical raster encoding of a simple linear feature based on an ordered series of 12 data elements. Each element is a point location along the x axis with an elevation value recorded on the y axis. Connecting these elements in their order results in a linear feature with three straight line segments. In a GIS, each line is represented by a ®nite set of data elements in either a raster format or a vector format. The raster structure assumes a constant interval between adjacent elements. In the one-dimensional case, the data can be organized with the constant interval
e de®ned along the x axis. Figure 1
Fig. 1. Raster encoding of a line feature. The line connecting P1-P2-P3-P4 is organized by 12 elements regularly separated by an interval e.
184
Y. H. Chou et al.
shows that some of the elements are redundant and, thus, e½ciency of the data set can be improved by removing those redundant elements. The same line can be more e½ciently encoded in a vector format with 4 data elements (i.e., P1 ; P2 ; P3 ; and P4 ). With only 4 elements (i.e, a1; a4; a8, and a12), the reduced data set is more e½cient than the original set since it represents the complete line in its original form without losing any information. Formally, the Euclidean length of the i-th segment is de®ned as: q
xi ÿ xi1 2
yi ÿ yi1 2 :
di
The length of the line
L is the sum of individual segments, such that, L
nÿ1 X
di :
i1
Let us de®ne L 0 as the length of the line derived from a reduced set of data elements, such that, L0
mÿ1 X i1
di0 ;
where m is the number of elements in the reduced set, di0 is the Euclidean length of the i-th segment of the derived line. In the one-dimensional case, conservative reduction minimizes the number of elements while satisfying the constraint that L 0 L, whereas the aggressive reduction always maintains L V L 0 . De®ne s as the di¨erence in total length between the original line feature and the line represented by the reduced set, such that, s L ÿ L 0 . Conceptually, the conservative reduction eliminates all the redundant elements while satisfying the constraint s L ÿ L 0 0, whereas the aggressive reduction eliminates as many elements as possible while keeping s at a satisfactorily low level. For the example in Fig. 1, the conservative reduction removes all the elements from the data set except for those labeled 1, 4, 8, and 12 on the x axis. For aggressive reduction, additional elements are removed depending on the computed s value for each candidate element. To do so, each of the remaining four elements must be evaluated for the value of s. If element 1 (P1) is eliminated, the resulting line feature is represented by the straight line connection of P2 ÿ P3 ÿ P4 and the total length is reduced by the amount equal to the length of the segment P1 ÿ P2 . If P2 is removed, the line feature becomes P1 ÿ P3 ÿ P4 and the length is reduced by the di¨erence between P1 ÿ P2 ÿ P3 and P1 ÿ P3 . In spatial information theories, one-dimensional features are measured by length. It is therefore appropriate to de®ne the proportional change in information of the reduced set, j, by the ratio of the length of the line derived from the reduced set to that of the original line, such that, j
L0 : L
Terrain complexity and reduction of topographic data
185
Fig. 2. The j curve represents the relationship between the amount of elements remaining in a data set
h and the amount of information provided by the reduced set
j.
In essence, this ratio represents a relative amount of information preserved in the reduced set. Then it is possible to de®ne the amount of actual data remaining in the reduced set relative to that in the original set, h, such that, h
m ; n
where m is the number of data elements, in the reduced set and n is the number of data elements in the original data set. The relationship between the ratio of information set, j, and the ratio of retained data elements, h, can be expressed as a general curve depicted in Fig. 2. When all the data elements are removed, the data set becomes empty and h j 0. Thus, the curve relating proportional information to data retained has an origin of
0; 0. Because the elements that are to be reduced ®rst are of less signi®cance and the elements reduced later are of greater importance, the left hand side of the curve rises at a steeper slope and the gradient gradually levels o¨. Near the tail of the curve on the right hand side, the elements involved in the process of reduction are least important in the information set, thus additional elements at this stage only raise the curve insigni®cantly. At the position k, the data elements contained in the reduced set have already provided 100% of the topographic information, implying that additional data elements beyond this point are absolutely redundant. In other words, the point k represents the proportion of remaining data elements of the conservative reduction. For the aggressive reduction, a point, q, represents the curve in¯ection position where a relatively large number of elements have been eliminated while maintaining the curve as close as possible to the full information representation. The value of q varies from case to case depending on the complexity of the line feature. Figure 3 illustrates two lines of distinct characteristics in terms of the proportional information ratio curve j. Line A is relatively simple and thus its raster encoding has a great deal of data redundancy. If this line is coded with 100 elements, then 96% of the data can be eliminated because 4% (4 elements) of the data describe the entire information set. As such, the slope of the curve is extremely steep on the left hand side and the curve becomes ¯at when it reaches j 1 at h 0:04. Line B is much more complicated, representing a
186
Y. H. Chou et al.
Fig. 3A, B. The j curve of a simpler line (A) has a steep slope on the left hand side and the j curve of a complicated line (B) has a gentler slope throughout the curve.
Fig. 4. This example illustrates the selection criterion for data reduction. Among the four points, P3 is the best candidate for elimination because its removal results in the minimal level of deviation from the original line.
cross-sectional pro®le of a terrain with numerous local variations in relief. Because many of the data elements cause signi®cant change in j, the curve has a gentler slope on the left hand side and the point where f reaches its upper bound of 1 is located toward the tail on the right hand side of the curve. Conceptually, reduction of data elements proceeds progressively, i.e., at each iteration the element causing the minimal change in j is identi®ed and removed. Formally, the element to be removed at each stage satis®es the following conditions: Min s L ÿ L 0 such that s V 0
L V L 0 :
Figure 4 gives an example to illustrate the selection criterion. The line is de®ned by 4 elements, P1 ; P2 ; P3 , and P4 . Accordingly,
Terrain complexity and reduction of topographic data
187
L d
P1 ; P2 d
P2 ; P3 d
P3 ; P4 in general: L
n X
d
Pi ; Pi1 ;
i1
where n is the number of segments. If one of these four elements is to be removed, then 4 quantities must be evaluated and the one yielding the minimal s is selected for removal. Let si denotes the value of s if the i-th element is removed, then the 4 quantities are: s1 L ÿ d
P2 ; P3 d
P3 ; P4 s2 L ÿ d
P1 ; P3 d
P3 ; P4 s3 L ÿ d
P1 ; P2 d
P2 ; P4 s4 L ÿ d
P1 ; P2 d
P2 ; P3 : In this example, it is evident that s3 is the minimum among the 4 quantities. Thus, element P3 is removed and the line represented by the reduced set contains two segments, P1 ÿ P2 and P2 ÿ P4 . Issues related to determining the number of data elements for removal will be discussed later. 4. Data reduction for two-dimensional features While the valid measure of one-dimensional features is length, the valid measure of two-dimensional features is area. The principles established in the onedimensional case are now extended to the two-dimensional situation. Figure 5 shows a pyramid represented by a set of 9 data elements with elevation coded in a raster format identical to that of DEM. In the case of two-dimensional encoding, the selection criterion is based on the quantity of area instead of length used in the one-dimensional case. Among the nine elements in the data set, four are redundant, i.e., their removal does not alter the surface structure at all. They are elements 2, 4, 6, and 8. These four elements are the ®rst candidates for data reduction since they contribute nothing to the information set of the topography. The elimination of these four elements results in the reduced set equivalent to that of conservative reduction. The relative importance for elements 1, 3, 7, and 9 is identical because the removal of any of them causes an identical alteration of the surface structure (Fig. 5). The center element, element 5, is of highest level of importance because the removal of this element turns the entire structure into a ¯at rectangular surface represented by the base of the pyramid. In a two-dimensional case, the relative importance of each data element is evaluated based on the change in total surface area. The total surface area of a structure is expressed as: A
nÿ1 X i1
ai
n V 3;
188
Y. H. Chou et al.
Fig. 5. Raster encoding of a pyramid. The nine data elements are of di¨erent levels of relative importance in representing the structure.
where A denotes the total area of the surface of the structure represented by the original data set, ai represents the surface area of the i-th subset of data elements. Since a triangle is the smallest indivisible geometric unit of a surface, each subset of areal data contains three elements, in contrast to two elements that de®ne a segment in the one-dimensional case. The three elements of an areal subset constitutes a triangular facet. In a planar con®guration, a data set of n elements de®nes at most n-1 triangular facets. Let A 0 denote the total surface area represented by the reduced set, the selection criterion is expressed as: Min o A ÿ A 0 such that o V0
A V A 0 : In Fig. 5, if one element is to be removed, then the removal of any of the redundant elements (2, 4, 6, and 8) results in a minimal o 0. The removal of the central element (element 5) results in the maximal change in surface area, i.e., o is maximum. The removal of any of the corner elements (1, 3, 7, and 9) results in a positive value of o less than the maximum. Operationally, the selection criterion described above can be substituted by the minimal deviation from surrounding elements. Figure 6 illustrates two similar structures where h is much higher than g. The bases are composed of the same set of four elements
q; r; s; t. Since h is higher than g, the value of h deviates from its surrounding elements more than the deviation of g from its surrounding elements. As such, the removal of h results in a larger value of o than the removal of g. Mathematically, this is expressed as og a Dgqr a Dgrs a Dgst a Dgtq ÿ afqrstg oh a Dhqr a Dhrs a Dhst a Dhtq ÿ afqrstg:
Terrain complexity and reduction of topographic data
189
Fig. 6. The point h is more important than g in representing the surface because the removal of h results in a greater deviation of surface area from the original structure.
Since a Dhqr > a Dgqr a Dhrs > a Dgrs a Dhst > a Dgst a Dhtq > a Dgtq: It is clear that oh > og . To select data elements of a DEM for reduction based on the criterion of minimum deviation from surrounding elements, a 3 3 kernel can be applied to every element for calculation of areal deviation. The computation of the deviation is described in Chen and Guevara (1987). The operational procedure is implemented as the VIP (Very Important Points) function in the ArcInfo GIS (ESRI 1988). As in the one-dimensional case, the amount of information provided by the reduced set is expressed as j and the proportion of data elements remaining in the reduced set as h. While h remains the ratio of the number of elements in the reduced set to that of the original set, j is now de®ned as the ratio of the surface area represented by the reduced set to that of the original set, such that j
A0 : A
The behavior of the j curve against h is identical to that of the one-dimensional case (Fig. 2) discussed in the preceding section. In general, the curve has a steeper slope on the left hand side for simpler surfaces and a gentler slope for complex terrains. 5. Reduction of DEM: Two empirical cases Two 7.5-min DEM of distinctive topographic characteristics, the Fortuna SW quadrangle, Arizona and the Idyllwild SW quadrangle, California, are selected for analysis. The Fortuna quadrangle is located in a desert environment with little variation in relief. The slopes are relatively gentle throughout the
190
Y. H. Chou et al.
Fig. 7. Contours generated from 1% DEM elements of the Fortuna SW quadrangle.
area with an elevation ranging generally between 100 and 200 m. This quadrangle is selected to represent a simple, smooth topography of low spatial frequency. The Idyllwild quadrangle in the San Jacinto Mountains contains relatively rugged terrain with elevations ranging roughly between 300 and 700m. The Idyllwild quadrangle is selected to represent a surface of complex terrain (i.e., high spatial frequency). Fig. 7 shows the contour map generated from 1% sample of the Fortuna SW DEM and Fig. 8 shows the contour map of comparable size from the Idyllwild SW DEM. The terrain in the Idyllwild quadrangle is signi®cantly more complicated than the Fortuna quadrangle. Both DEM are processed in the manner described in the preceding section. Table 1 lists the values of j; A 0 , and h computed at di¨erent levels of reduction for both Fortuna and Idyllwild quadrangles. The simple terrain of Fortuna is e¨ectively represented by as few as 1% DEM elements
j 0:998. The j value increases gradually to its upper bound at 6% DEM elements. In Idyllwild, the 1% DEM elements generates a j value of 0.993, which is still a high value although less than that of Fortuna. The j value continues to rise and also reaches its upper bound at about 8% DEM elements (Fig. 9). As a relatively smooth and ¯at terrain, Fortuna's surface can be e¨ectively represented by fewer elements. As such, the vertical intercept of the curve is close to the upper bound and the j level reaches the upper bound within a short range, implying that a small percentage of DEM elements is su½cient to represent the terrain. In Idyllwild, a low percentage of DEM elements still represents the surface quite well, con®rming the previous notion that most DEM are subject to data redundancy. The j curve of Idyllwild starts at an accuracy rate signi®cantly lower than that of Fortuna because of terrain complexity. According to Table 1, 6% of the DEM elements correctly represent the surface characteristics of Fortuna while it takes 8% of the DEM elements to
Terrain complexity and reduction of topographic data
191
Fig. 8. Contours generated from 1% DEM elements of the Idyllwild SW quadrangle. Table 1. Computed quantities of surface representation of the entire DEM h (%)
Fortuna A0
j
Idyllwild A0
j
1 2 3 4 5 6 6.5a 7 8 9 10
163416645 163525355 163609714 163645229 163669725 163682469 163684431 163682364 163681487 163680603 163680777
0.99833 0.99900 0.99951 0.99973 0.99988 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
167399745 167399745 167953182 168290793 168431823 168502253 168526646 168542769 168567300 168593127 168604275
0.99311 0.99311 0.99639 0.99840 0.99923 0.99965 0.99980 0.99989 1.00000 1.00000 1.00000
a
6.5% is used for comparing surface representations in Fig. 10.
represent Idyllwild. According to the criterion of conservative reduction, additional data elements beyond 6% of the DEM are redundant for the Fortuna surface and should be removed to enhance e½ciency of data storage and processing. Also, the conservative reduction requires 8% of DEM elements for Idyllwild. However, e½ciency of data reduction may be enhanced by aggressive reduction. The relationship between data e½ciency and aggressive reduction remains true for 1% random sample of the original data set, as shown in Table 2. The above experiments con®rm that e½cient data reduction depends on terrain complexity. For illustrative purpose, we cut out the 1% sample area
192
Y. H. Chou et al.
Fig. 9. The j curves of Fortuna and Idyllwild quadrangles behave di¨erently because of their di¨erent levels of terrain complexity.
Table 2. Computed quantities of surface representation of 1% sample areas h (%)
Fortuna A0
j
Idyllwild A0
j
0.1 0.5 1.0 4.0 6.0 8.0 10
1615159 1616425 1616993 1618434 1618712 1618594 1618541
0.99808 0.99886 0.99921 1.00000 1.00000 1.00000 1.00000
1585474 1653050 1701334 1714597 1727667 1753216 1753453
0.92852 0.96810 0.99638 1.00000 1.00000 1.00000 1.00000
from both quadrangles and show the di¨erences in the spatial pattern of contours generated from the original complete set of DEM. Figure 10A shows the contours generated from 6% of Fortuna DEM
h 6%. When an additional 0.5% of the DEM data are added to the data set, the generated contours shown in Fig. 10B are identical to those in Fig. 10A. The result is consistent with the quantities in Table 1. Since 6% reduction for Fortuna has reached 100% surface representation, the contours generated from 6% DEM elements are identical to those generated from 6.5% DEM elements. However, the more complicated terrain of Idyllwild requires more elements to represent. Fig. 10C shows the contours of Idyllwild generated from 6% DEM elements while Fig. 10D shows the contours of 6.5% DEM elements. Table 1 indicates
Terrain complexity and reduction of topographic data
193
Fig. 10 (A). Contours generated from 6% DEM elements of the 1% area coverage of the Fortuna quadrangle. (B) Contours generated from 6.5% DEM elements of the 1% area coverage of the Fortuna quadrangle. (C) Contours generated from 6% DEM elements of the 1% area coverage of the Idyllwild quadrangle. (D) Contours generated from 6.5% DEM elements of the 1% area coverage of the Idyllwild quadrangle.
that the additional 0.5% of DEM elements contribute considerably to show additional details on the bottom of the map. 6. Evaluation of terrain complexity In the preceding section we show the relationship between the amount of topographic information represented by a reduced set
j and the proportion
194
Y. H. Chou et al.
of data elements remaining in the reduced set
h. The relationship can be illustrated by a nonlinear curve, hereafter referred to as the j curve, which increases from the origin to the upper bound where j 1. As it is evident that the j curve behaves di¨erently for di¨erent terrains, the complexity of a terrain can be evaluated by a parameter derived from the j curve. For this purpose, we generalize the curve and convert it into a mathematical function. The j curve can be characterized as follows. The curve is monotonically increasing at a decreasing rate due to the selection criterion that elements of least importance are eliminated ®rst. The curve reaches its upper bound at a location where h k (Fig. 2). The value of k denotes the level of conservative reduction and it varies depending on the complexity of the topography. Simpler terrains with vast areas of ¯at surfaces or slopes of a constant gradient have a k at low values of h, implying less data are needed to correctly represent the topography. Rougher terrains with greater local variations in relief tend to have a k closer to the tail on the right hand side, indicating that more data are needed for complete representation of the surface. According to the above properties, the j curve can be approximated by a simple exponential function expressed as: j 1 ÿ eÿbh ; where b is the parameter associated with the slope of the curve. When h 0, j 0. As h increases, j also increases at a rate speci®ed by the parameter b. The estimated parameter thus represents the level of complexity of a terrain. In general, the value of b is always positive without bound. A small value of b denotes a more complicated surface which requires a larger number of data points for appropriate representation. In this case, the slope of the curve is relatively gentle and it gradually rises to the upper bound at a slow rate. A large value of b denotes a relatively simple, smooth surface which may be su½ciently represented by a small percentage of DEM. The corresponding j curve has a sharp rise on the left hand side and it reaches the upper bound considerably quickly. The theoretical extreme case is when the surface is perfectly ¯at with no variation in relief at all. In this case, b is in®nity and j 1 throughout the curve. The behavior of the j curve with respect to b is illustrated in Fig. 11 where three curves of di¨erent b values are depicted. The values of b range between 0.1 and 2. When b 2, the curve has the steepest slope on the left hand side and the curve reaches the upper bound fastest. The curve of b 0:1 represents a rough terrain where the slope of the curve is gentler and the curve reaches the upper bound signi®cantly slower. Based on these properties, the parameter b can be appropriately de®ned as the coe½cient of terrain complexity. Using the Marquardt's method (1963) of non-linear regression available in SAS (1982), the b coe½cient can be estimated for assessing the complexity of a terrain represented by a DEM. Table 3 lists the estimated b indices and the asymptotic 95% con®dence intervals for two analytical cases. In the ®rst case, the 1% sample areas of both quadrangles were analyzed. The j curve was generated and the b coe½cient of terrain complexity derived from the DEM of both sample areas. The estimated b is of 62.52 for the sample area of Fortuna and is 24.78 for the sample area of Idyllwild. These values con®rm the notion that smoother terrains are associated with a larger value of b. The asymptotic 95% con®dence intervals indicate clearly that the
Terrain complexity and reduction of topographic data
195
Fig. 11 Three j curves of di¨erent values of the b parameters
Table 3. Results of the four analytical cases DEM Source
b
Asymptotic Std. error
Asymptotic 95% Con®dence Interval
Fortuna 1% Idyllwild 1% Fortuna 100% Idyllwild 100%
62.52 24.78 470.91 342.03
2.72 2.04 5.45 6.16
57.05 20.67 460.17 329.88
67.98 28.88 481.65 354.17
di¨erence in b index is statistically signi®cant between these two quadrangles, implying that these two areas are signi®cantly di¨erent in topographic characteristics. In the second case, the entire DEM were processed for both quadrangles. Again, Fortuna generates a much larger value of the b coe½cient (470.91), considerably larger than that of Idyllwild (342.03). The asymptotic 95% con®dence intervals also con®rm that the di¨erence in complexity is signi®cant between these two areas. 7. Conclusions This study deals with two questions concerning the reduction of topographic data for improving the e½ciency in data storage and processing: how to eval-
196
Y. H. Chou et al.
uate the importance of a data element and how to determine the amount of elements in a reduced data set. We show that the relative importance of an element can be evaluated by the degree of deviation in surface area caused by the removal of the element. The appropriate amount of elements for removal depends on terrain complexity. There are two types of data reduction, the conservative and aggressive. In both cases, the DEM are processed through the following procedures. First, each data element in the DEM is evaluated for its relative importance in representing the terrain. In general, important elements are those that, when removed from the data set, cause a greater deviation in computed surface area. Then, the least signi®cant elements are selected for removal in order to minimize the alteration of the topographic structure. At each step, a percentage of the existing elements can be speci®ed and the corresponding data elements removed. Third, the quantity of j which represents the proportion of surface information remain in the reduced set is computed. This quantity is derived from the ratio of the surface area computed from the reduced set to the total surface area computed from the entire DEM. Fourth, as the j values at all the desired level of reduction
h are obtained, the curve connecting the j values at each corresponding reduction level can be constructed. The aggressive reduction can be determined either from the listing of the j values or from the j curve. Finally, the j curve is transformed to an exponential function and the b index of terrain complexity is derived. To determine the desired level of aggressive reduction, one may ®rst process the entire DEM and obtain the computed surface area of the full set, then process the DEM starting from a low percentage (e.g., 1% of DEM elements) and gradually increase data volume to a higher level (e.g., 10%). As the j curve is generated from the calculated quantities, the b coe½cient can be estimated from the exponential function proposed in this paper. Once the b parameter for the DEM is obtained, the desired level of surface representation can be speci®ed and the corresponding size of the reduced data set can be determined. The method, developed for e½cient organization of digital topographic data, is most useful for studies that require the processing of a large number of DEM's through a vector-based GIS. The procedures speci®ed in this paper can be automated to pre-process the DEM's into a reduced set which contains data elements representing irregularly spaced point locations of critical elevation information. Triangulated irregular networks (TIN) can then be e½ciently generated from the reduced set for spatial analysis and modeling. The 7.5-min DEM have been, and will continue to be, the most important source of topographic information for GIS applications. The current level of spatial resolution at 30 m is appropriate for most environmental and geological applications at the landscape scale. However, in areas where there are large ¯at surfaces or slopes of a constant gradient, data redundancy unnecessarily increases the size of storage and slows down information processing. The raw DEM of one 7.5-min quadrangle requires over 1MB of disk space. A typical environmental study often deals with tens of quadrangles and sometimes requires hundreds of quadrangles. If the DEM data can be e½ciently reduced, requirements for huge storage capacity can be relaxed drastically and empirical applications of the DEM will be much enhanced. More importantly, the processing speed for an analysis depends on how e½ciently the topographic data are organized. E½ciently reducing the volume of DEM will
Terrain complexity and reduction of topographic data
197
therefore lead to signi®cant improvement in the use of digital topographic data for both topographic mapping and spatial analysis. Acknowledgments. The authors wish to thank Dr. Lewis Cohen, Dr. Doug Morton, and an anonymous reviewer for their helpful comments on an earlier draft.
References Arai K (1990) Preliminary study on information lossy and loss-less coding data compression for the archiving of ADEOS data. IEEE Transactions on Geoscience and Remote Sensing 28(4):732±734 Arps RB, Truong TK (1994) Comparison of international standards for lossless still image compression. Proceedings of the IEEE 82(6):889±899 Aspinall R, Veitch N (1993) Habitat mapping from satellite imagery and wildlife survey data using a Bayesian modeling procedure in a GIS. Photogrammetric Engineering and Remote Sensing 59(4):537±543 Burrough PA (1987) Principles of Geographical Information Systems for Land Resources Assessment. Clarendon Press, Oxford Clarke KC (1995) Analytical and Computer Cartography, 2nd edn. Prentice Hall, Englewood Cli¨s, NJ Chen Z, Guevera JA (1987) Systematic selection of very important points (VIP) from digi talterrain model for constructing triangulated irregular networks. AUTO-CARTO 8 Proceedings, ASPRS-ACSM, 50±56 Chou YH, Minnich RA, Salazar LA, Power JD, Dezzani RJ (1990) Spatial autocorrelation of wild®re distribution in the Idyllwild quadrangle, San Jacinto Mountain, California. Photogrammetric Engineering and Remote Sensing 56:1507±1513 Chou YH (1992a) Management of wild®res with a geographical information system. International Journal of Geographic Information Systems 6:123±140 Chou YH (1992b) Slope-line detection in a vector-based GIS. Photogrammetric Engineering and Remote Sensing 58:227±233 Chou YH, Minnich RA, Chase RA (1993) Mapping probability of ®re occurrence in the San Jacinto Mountains, California. Environmental Management 17(1):129±140 Chou YH, Dezzani RJ, Minnich RA, Chase RA (1995) Correction of surface area using digital elevation models. Geographical Systems 2:131±151 Cialella AT, Dubayah R, Lawrence W, Levine E (1997) Predicting soil drainage class using remotely sensed and digital elevation data. Photogrammetric Engineering and Remote Sensing 63(2):171±178 Dubayah R, Rich P (1995) Topographic solar radiation models for GIS. International Journal of Geographic Information Systems 9:405±419 Environmental Systems Research Institute (1988) TIN User's Guide: ARCINFO Surface Modeling and Display. Redlands, California Franklin Logan JT, Woodcock CE, Strahler AH (1986) Coniferous forest classi®cation and inventory using Landsat and digital terrain data. IEEE Transactions on Geoscience and Remote Sensing GE±24:139±149 Gallant J, Hutchinson MF (1996) Towards an understanding of landscape scale and structure. Third International Conference on the Integration of GIS and Environmental Modeling, Santa Fe. National Center for Geographic Information and Analysis, Santa Barbara, CA. CD-ROM and World Wide Web http://www.ncgia.ucsb.edu/conf/SANTA FE CD-ROM Gong P, Pu R, Chen J (1996) Mapping ecological land systems and classi®cation uncertainties from digital elevation and forest-cover data using neural networks. Photogrammetric Engineering and Remote Sensing 62(11):1249±1260 Karras GE, Petsa E (1993) DEM matching and detection of deformation in close-range photogrammetry without control. Photogrammetric Engineering and Remote Sensing 59(9):1419± 1424 Kumler MP (1994) An intensive comparison of Triangulated Irregular Networks (TINs) and Digital Elevation Models (DEMs). Cartographica 31(2):1±99
198
Y. H. Chou et al.
Marquardt DW (1963) An algorithm for least-square estimation of nonlinear parameters. Journal for the Society of Industrial and Applied Mathematics 11:431±441 Mark DM (1984) Automated detection of drainage networks from digital elevation models. Cartographica 21:168±178 Mark DM (1975) Computer analysis of topography: a comparison of terrain storage methods. Geogra®ska Annaler 57A(3±4):179±188 Moore ID, Grayson RB, Ladson AR (1991) Digital terrain modeling: a review of hydrological, geomorphologic and biological applications. Hydrological Processes 5:3±30 SAS User's Guide: Statistics (1982 Edition) SAS Institute, Cary, North Carolina, pp 13±37 Shasby M, Carneggie D (1986) Vegetation and terrain mapping in Alaska using Landsat MSS and digital terrain data. Photogrammetric Engineering and Remote Sensing 52:779±786 Talbot SS Markon CJ (1986) Vegetation mapping of Nowitna National Wildlife Refuge, Alaska, using Landsat MSS digital data. Photogrammetric Engineering and Remote Sensing 52:791± 799 U.S. Geological Survey (1987) Digital Elevation Models, Data User's Guide, 5, 38