Probing the Relationship Between Classification Error and ... - CiteSeerX

04-042.qxd

11/11/05

4:30 AM

Page 1365

Probing the Relationship Between Classification Error and Class Similarity Ola Ahlqvist and Mark Gahegan

Abstract We present a rationale and method for representing the vagueness in taxonomic class definitions in cases where classes are described by a set of characteristics, such as those sometimes used as the basis for land-cover category discrimination. We further describe methods to estimate the semantic similarity between any two classes by calculating semantic similitude metrics based on such parameterized class definitions. Our working hypothesis is that a large similitude would predict categories that will be more prone to confusion and hence image or map misclassification. We use two different existing data sets to demonstrate and evaluate the method, and the results support our original hypothesis. Consequently, we argue that classification schemes that are based on parameterized definitions could be assessed for problematic categories during their construction using our approach, and thus, enabling the identification of a thematic vagueness component to supplement the more traditional statistical measures derived from the error matrix.

Introduction Accuracy Assessment Background Uncertainty is an inseparable companion of almost any type of information; its multi-faceted form being the result of many different factors. For example, empirical measurement error, inadequate computational methods, cognitive ambiguity, and often through socially constructed “truths” (Klir and Wierman, 1998; Couclelis, 2003). The last 20 years has seen a wealth of research in the modeling of uncertainty and studies of the way uncertainty propagates through data processing, analysis, and interpretation. Such research has already broken the previously tight link between uncertainty and probability theories, and there are today a number of frameworks and mathematical approaches to handle uncertainty representation and analysis for remote sensing and GIS (e.g., Klir and Wierman, 1998; Fisher, 1999; Foody and Atkinson, 2002; Zhang and Goodchild, 2002). One well-researched type of uncertainty is often termed spatial accuracy (Veregin, 1999), referring to the precision by which the position of objects is known. Another type, thematic accuracy, usually measures the validity of categori-

Ola Ahlqvist is with the Department of Geography, The Ohio State University, 1036 Derby Hall, 154 North Oval Mall, Columbus, OH 43210 ([email protected]) and previously at the GeoVISTA Center, Department of Geography, The Pennsylvania State University. Mark Gahegan is with the GeoVISTA Center, Department of Geography, The Pennsylvania State University, 302 Walker Building, University Park, PA 16802 ([email protected]). PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

cal labels applied to data, such as when classifying landcover or the ethnicity of an urban region (Veregin, 1999). Both types are related, since a thematic error will have spatial implications and vice versa (Heuvelink and Burrough, 1993; Ehlers and Shi, 1997; Leung and Yan, 1998). Our research is concerned with how semantic uncertainty, in the form of vagueness that often inhabits the definitions of categories, impacts reported thematic mapping accuracy measurements. The de facto standard method to describe thematic uncertainty in geographic datasets is an empirical evaluation of the correspondence between the dataset and some reference dataset or so-called ground truth with these site-specific measurements systematically entered into a square matrix called an error or confusion matrix (Card, 1982). This matrix becomes the source for many different measures of agreement between estimated class labels and reference values. These various global measures, such as producer’s and consumer’s accuracy, percentage classified correct, and the Kappa statistic are effective ways to represent category uncertainty (Congalton and Green, 1999). Scholars have recognized that many factors influence the usefulness of an error matrix. Spatial autocorrelation of the land-cover features may cause the classification accuracy to vary spatially (Congalton, 1988), the spatial units in data may cause heterogeneous classes (Crapper, 1984), or the class definitions may cause confusion (Leung, 1988). This last factor, arguably the least researched of the list, is the focus for this paper. A good deal of previous work has addressed category confusion, but from substantially different perspectives. Fisher (1996) demonstrates how randomized class assignments can be used to visualize confusion between specific class pairs. Gopal and Woodcock (1994) propose extensions to the error matrix to work with vague categories in an accuracy assessment. Gahegan and Brodaric (2002) demonstrate how to track the consistency with which categories are interpreted in the field over time. By contrast, here we develop a new method to produce estimates of potential confusion between categories based on their definition. This paper first reviews the idea of soft accuracy assessment to handle vague categories, followed by an outline of our idea to quantify category vagueness prior to data gathering based on the classification methodology and category definitions. We initially describe how category definitions may be represented by a rough-fuzzy set approach applied to the defining characteristics, i.e., those properties that

Photogrammetric Engineering & Remote Sensing Vol. 71, No. 12, December 2005, pp. 1365–1373. 0099-1112/05/7112–1365/$3.00/0 © 2005 American Society for Photogrammetry and Remote Sensing D e c e m b e r 2 0 0 5 1365

04-042.qxd

11/11/05

4:30 AM

Page 1366

people use to describe or define them. We then argue that the risk of confusing two categories can be estimated by calculating semantic similitude metrics based on the similarity of their respective collection of defining characteristics. Following Bouchon-Meunier et al. (1996), we use similitude as a general term for measures of satisfiability, resemblance, and inclusion defined to quantify the level of agreement or common points between descriptions of objects. The use of this term also helps us to make a difference between the numerical estimate (similitude) and its cognitive equivalent (similarity). Our hypothesis is that large similitude values would predict category pairs that will be more prone to confusion, and hence show up as off diagonal entries in an error matrix based on empirical data. After describing the data used and the setup for our experiment, we evaluate if the suggested concept similitude estimates can predict classes that are prone to misclassification by comparing them with actual reports about misclassifications from empirical accuracy measurements. Different Types of Uncertainty Interact The traditional error reporting in the form of an error matrix typically expresses the combined effect of a number of different aspects of uncertainty that may not be obvious, though ideally would need to be addressed in different ways. For example, an entity can be classified in error because of a poor choice of the classification algorithm or random mistakes by the interpreter; the entity can also be classified in error because of confusion resulting from intergrades or borderline cases between similar classes, or an entity can be a mixture of pure classes. A useful generalization of different uncertainty types and formal models to handle them is realized by separating out the components of object measurement and concept definition. Using this separation, Figure 1 illustrates how we can imagine different combinations of uncertainties in different datasets (Figure 1). A simple example would be a land-cover map derived from satellite imagery data, where conceptual definition uncertainty, such as vague land-cover classes, combine with object measurement uncertainty, such as classification error (Figure 1: B). In an accuracy assessment of a satellite-based forest and non-forest classification for example, we need to deal with this combined uncertainty; both the question about how much tree cover is required for a patch of land to be classified as forest, and how certain we can be that a specific spectral signature is in fact trees. This compound uncertainty makes it difficult to see if low accuracy is due to poor measurements that maybe could be improved, or if the classification system itself is problematic due to significant confusion between class definitions. Gopal and Woodcock (1994) and Congalton and Green (1999) have

Figure 1. Aspects of uncertainty in four geoinformation data sets (A, B, C, and D) and related formal models to handle concept definition and object measurement uncertainties.

1366 D e c e m b e r 2 0 0 5

addressed this problem and developed methods that try to separate the uncertainty attributed to fuzziness in the definition from measurement errors. Their soft accuracy assessment methods are based on object measurements where original classifications (poorly measured data as shown in Figure 1: B) are compared with well-measured ground truth or reference data estimates based on fuzzy measures (Figure 1: D). Following the soft accuracy assessment methodology described in Congalton and Green (1999), we can generate two error matrices. The first, difference matrix, is the result of a standard procedure where the fuzziness in both map or image data and the reference data is unknown (Figure 2a). The second, error matrix, is generated using methods equivalent to those described by Gopal and Woodcock (1994) in which linguistic expressions such as “absolutely wrong,” “acceptable,” or “probably right” are used in the accuracy assessment. In this way, off-diagonal entries in the difference matrix, that are normally counted as wrong, may be moved to the diagonal of the error matrix if they are judged to be “right,” “probably right,” or “acceptable.” The overall accuracy typically increases using this method for the simple reason that we remove errors that are attributed to fuzziness (Figure 2b). We may also produce a third matrix, termed a fuzziness matrix, by subtracting the error matrix from the difference matrix. The resulting matrices give users a better idea of the reason some reference sites fall off the diagonal and an indication of weaknesses in the classification system itself, as opposed to the data or the classifier (Congalton and Green, 1999). Some of these problems would be possible to mitigate if they are discovered before efforts and resources are devoted to the gathering of reference data. As an example, most people would probably agree that “coniferous forest” is more similar to “deciduous forest” than to “open water,” so the two forest types would probably be harder to distinguish, and more often, result in some classification confusion. So, if we could formulate a systematic way of assessing the conceptual similarity between categories in a classification system, we could produce something similiar to a confusion matrix with values of similitude between each pair of categories in the system (Figure 2a and 2c). This semantic similitude matrix, we argue, would enable classification schemes to be assessed for problematic categories during their construction, separating the thematic confusion or vagueness component from the compound uncertainty normally found in a traditional error matrix that also includes errors in the data due to the performance of the classification algorithm.

Methods The issue of assessing and measuring semantic similarity of categories is an emerging area of active research and still under development (Hahn and Chater, 1997; Jones et al., 2003). Our methodology is based on the cognitive theory of conceptual spaces (Gärdenfors, 2000). A conceptual space is a multidimensional attribute space constructed from a number of defining attribute domains, for example, temperature, shape, and location. A property is defined as a point or region in a low dimensional subspace, for example, the interval of tree cover percentages that is used to separate a forest from a non-forest. Moreover, for any concept definition, each property of that concept is assigned a certain importance, or salience, in relation to other properties of the concept. This method enables us to declare certain properties as more important than others for defining a concept. We formally represent a conceptual space as a collection, or set, of property definitions, similar to the frame-based object models in Faucher (2001) and Mennis (2003). A property PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

04-042.qxd

11/11/05

4:30 AM

Page 1367

Figure 2. Schematic overview of accuracy assessment methods and the way they differ in how measurement error and definition uncertainty (fuzziness) is manifest: (a) Traditional accuracy assessment, (b) “Soft” accuracy assessment, and (c) Semantic similarity assessment.

definition is represented as a set of values from a certain domain, for example, the interval of tree crown cover values. To represent the semantic uncertainty, typically found for conceptual definitions such as “forest,” we use the idea of rough fuzzy sets (Dubois and Prade, 1990) in which combine an explicit representation of both indiscernibility and vagueness of information. There are many prior examples of research describing fuzzy approaches to conceptual models for GIS and remote sensing (Usery, 1996; Cross and Firat, 2000; Yazici and Akkaya, 2000; Morris, 2003), and a recent overview of fuzzy set theory applications in GIS can be found in Robinson (2003). Moreover, rough sets and the concept of rough classification have demonstrated promising applications for handling uncertainty related to the granularity of geographic information (Schneider, 1995; Worboys, 1998; Ahlqvist et al., 2000). Fuzzy and rough set theories have also been further generalized by Dubois and Prade (1990) into rough fuzzy sets, a joint representation for vague and resolution limited information; Ahlqvist et al. (2003) recently demonstrated a geographic application of rough and fuzzy data integration. Describing Concept Uncertainty A rough set-based approach (Pawlak, 1991) is often relevant for environmental data where classes impose a granularity on a domain (for example, {0 to 10 percent, 10 to 20 percent, . . . , 90 to 100 percent} for tree crown cover). At the same time, the semantics used to discern between a “forest” and “agriculture” may include “dense tree crown cover” and “open to sparse tree crown cover” which could be defined by a membership function over the same domain. Following Ahlqvist (2004), we therefore use a Rough Fuzzy Set as a general representation for combined vague and granular semantic uncertainty. Thus, a category C is defined by three vectors (Si, Ri, Wi) each holding a collection of rough fuzzy approximation spaces Si with corresponding property values Ri given as rough fuzzy set definitions, and accompanying salience weights, Wi. A detailed description can be found in Ahlqvist (2004). In this way, for example, the land-cover classes “Conifer Forest” and “Blue Oak/Grey Pine Woodland” can be described by a collection of rough fuzzy sets defined over the domains “% tree crown cover,” “Hardwood % of tree crown cover,” “Grey Pine % of tree crown cover,” and “Predominant species” (Figure 3, adapted from Congalton and Green, 1999). PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Figure 3. Example collections of fuzzy sets describing the conditions under which a land unit is classified as “Conifer forest” (CF) or “Blue Oak/Grey Pine Woodland” (BOGP). “(x)” denotes membership to the row category as a function of the x axis values scaled to range from 0 no membership, to 1 full membership. Of four available attribute domains, only two (shaded) are used in the classification of CF, whereas all four domains are used to classify BOGP.

Salience values are set to 1.0 (shaded functions in Figure 3) for those domains that are used in the classification of each category, and to 0.0 (not shaded) for the other domains. In these two examples the “Conifer Forest” classification rule uses two domains and the “Blue Oak/Grey Pine Woodland” class uses four domains. It is still possible (and preferable) to define functions in other domains even if they are not used to identify or define that specific class, since they can help to evaluate the semantic similitude with other classes that we will demonstrate later. Note that we make only limited use of the full capabilities of a rough fuzzy representation to define graded transitions between categories, or granularity effects; its primary purpose is to enable the use of concept similitude measures defined for fuzzy sets as described below. Estimating Semantic Similitude Once the classification categories have been described, there are a number of ways to estimate the similitude of D e c e m b e r 2 0 0 5 1367

04-042.qxd

11/11/05

4:30 AM

Page 1368

categories, but two common approaches to estimate concept similitude use either the proportion of shared features (Tversky, 1977) or the psychological distance between related properties (e.g., Nosofsky, 1986). A number of formalizations of these and other similitude measures can be found in Bouchon-Meunier et al. (1996). We use one example from each of those two approaches using the following equations: o(pA,pB) ∫ min ( fpA(x), fpB(x))dx> ∫ fpB(x)dx, and

(1)

|U|

O(CA,CB)

WBi*o(PAi, PBi)2. Ba i

(2)

In Equation 1, the property overlap, o, is measured as overlap of the fuzzy functions fPA (x) and fPB (x), each defining a property p for concepts A and B, respectively. The concept Overlap metric, O (Equation 2) is a weighted measure of satisfiability (Bouchon-Meunier et al., 1996) following the shared feature approach (Tversky, 1977). It evaluates how similar CA is to CB by summing the squared overlaps of all properties, P, from Equation 1. The overlap measure applies the perspective of concept B by using fPB (x) as the denominator in Equation 1 and by multiplying domain salience weights, WB, in Equation 2, where a WBi 1 . Concepts with large overlaps will have

Figure 4. Examples showing overlap and similarity values for four pairs, A through D, of hypothetical crisp concepts defined by fuzzy membership functions (dashed and solid) over some single domain. “(x)” denotes membership as a function of the x axis values scaled to range from 0 no membership to 1 full membership.

i

values close or equal to 1, and non-overlapping concepts will have overlap value 0. The concept nearness metric follows the distance-based approach (Nosofsky, 1986), and we employ a dissimilarity measure using a Euclidean distance metric formalized as: |U|

d(CA,CB)

WBi(PAi PBi )2, Ba i

(3)

where we calculate the difference PAi PBi using the fuzzy dissemblance index (Kaufman and Gupta, 1985) that calculates the distance between two membership functions. Again, the weights are adjusted to a sum of 1: a WBi 1. i

Nearness is then calculated as an exponentially decaying function of the distance (Shepard, 1987): s ecd.

(4)

In Equation 4, c is a general sensitivity parameter, and d is given by Equation 3. Similar concepts will have values close to 1, and very different concepts will have values approaching 0. Figure 4 illustrates how the nearness and overlap metrics vary for different combinations of four hypothetical and simple (only one domain) concept definitions. The two concepts in Figure 4a are fully overlapping with respect to the dashed concept function, but still not very similar. In fact, the dashed-line concept can be held as a sub-concept of the solid-line concept. Figure 4b concepts are very dissimilar, since both overlap and nearness value is zero and very low, respectively. The concepts in Figure 4c are very similar, and both the overlap and nearness metric is at or close to 1. Figure 4d concepts are also very similar but still disjoint, as indicated by a high nearness value and zero overlap. Taking the more complex concept definitions from Figure 3, we get a weighted average of similitude metrics from each domain according to Equations 1 through 4. A cross product of each one of these two similitude metrics for the two categories from Figure 3 creates two matrices where each cell takes the similitude value of the column concept to the row concept (Figure 5). 1368 D e c e m b e r 2 0 0 5

Figure 5. Example of similitude matrices, distance based nearness (a) and feature based overlap (b), for two land-cover classes: “Conifer forest” (CF) and “Blue Oak/Grey Pine woodland” (BOGP).

The major diagonal will typically have similitude values equal to 1.0 since a category is absolutely similar to itself. The estimate of how similar Conifer Forest (CF) is to Blue Oak/Grey Pine Woodland (BOGP) (nearness 0.37, overlap 0.54) is not necessarily the same as the inverse; similitude of BOGP to CF (nearness 0.69, overlap 0.71) because of the way the similitude measures are formulated. In this example, comparing CF to BOGP will use all four domains in the comparison, whereas to evaluate the similitude of BOGP to CF will only use the two domains used to classify CF (Figure 3). Another important property of our parameterized concept definition is that it allows us to identify a semantic overlap even for classes that appear crisp and mutually exclusive in the classification schema. Often, an apparent semantic crispness of categories in a taxonomy hides a multitude of compromises and choices, such as, classes to use, the classifier, the training data, and choice of parameters. Each of these choices may affect the definition of the categories used, and their separability. Exposing some of this confusion for scrutiny might help the users to clearly understand the more traditional reporting of classification accuracy. In this sense the nearness and overlap matrices can be held as a measure of semantic similarity in the definitions of the categories (Figure 2c), derived independently from empirically based difference and error matrices (Figure 2a and 2b). Following our original hypothesis, we will, in the following empirical study, evaluate the use of semantic similitude matrices to identify category pairs that will often be confused and lead to classification errors in a final data product that uses these categories. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

04-042.qxd

11/11/05

4:30 AM

Page 1369

Hypothesis Evaluation To evaluate our hypothesis, that a large semantic similitude value between categories would predict classification errors, we need to investigate the relationship between semantic similitude values and number of classification errors. Because some of these variables are not a normal distribution and the level of measurement of the similitude metrics is not entirely known (nearness 0.8 does not necessarily mean twice as similar as nearness 0.4), we use the nonparametric Spearman’s Rank Correlation Coefficient (r) to evaluate the ordinal correlation between similitude and misclassification. We use only off-diagonal entries from the matrices so that the very large number of correct classifications with full similitude would not falsely indicate a correlation between the more subtle variation of off-diagonal similitude values and error numbers. If the similitude metrics are indicative of misclassification due to semantic vagueness, then a high frequency of misclassification, that is large off-diagonal entries in a Fuzziness matrix (Difference – Error matrix, as shown in Figure 2), would correspond to high values of nearness and overlap for the same categories resulting in a high positive correlation value. On the other hand, we likely do not expect any correlation between error matrix values and the values of nearness and overlap.

For both datasets the reference data were collected using the same classification schema as the original data, and either a stratified (Hardwood data) or systematic (Landscape data) random sampling process was employed to minimize errors caused by spatial autocorrelation.

Data Description

Accuracy Assessment Details The most important common feature of the accuracy assessment was the measures taken to handle the ambiguity of classes. In both studies, the landscape type and/or landcover classes create crisp classes from continuous parameters, and the accuracy assessment used either fuzzy (Gopal and Woodcock, 1994) or rough (Ahlqvist et al., 2000) set approaches in an effort to produce error matrices that separate misclassifications due to category vagueness from misclassification due to other errors, such as misidentification of tree species. The landscape type dataset used the rough classification approach for the accuracy assessment (Ahlqvist et al., 2000) which produced an extended error matrix. This result could be separated into a difference matrix and an error matrix by separating the upper approximation entries from the lower approximation entries (Figure 7a and 7b). Following the previously outlined soft accuracy assessment methodology, a fuzziness matrix was produced by subtracting the error matrix from the difference matrix (Figure 7c). The hardwood land-cover dataset used the fuzzy set approach for the accuracy assessment (Gopal and Woodcock, 1994), and for the purpose of this study, two error matrices from that assessment were chosen. The first, corresponding to a difference matrix, contained all occurrences of correctly and not correctly classified pixels. The second, corresponding to an error matrix included “acceptable” and “probably right” classifications on the major diagonal, following the assumption that this assessment would remove most of the errors caused by fuzziness. But, as we shall return to in the discussion, this procedure does not manage to separate all of the errors attributed to fuzziness. As before, a fuzziness matrix was produced by subtracting error from difference.

In our evaluation, we sought to isolate the effect of fuzziness between classes from other error components. We therefore required the experimental data to have a well-documented accuracy assessment, including difference and error/fuzziness matrices. Furthermore, our method relies on a detailed description of the data categories to set up the parameterized class descriptions, so the experimental data we use must have well documented class definitions. Based on these requirements and the availability of data, we selected two experimental datasets: the first describes landscape types over central Sweden (Ahlqvist, 2000), and the second depicts hardwood rangelands in California (Congalton and Green, 1999). Both datasets were produced using manual interpretation of remote sensing imagery. The first dataset is an accuracy assessment including multiple interpretations of a Swedish topographic map sheet at 1:50 000 scale, classified into four different landscape type classes: coastal district, urban/suburban district, agricultural district, and forest district. The spatial units were defined by dividing each map sheet into a 5 km by 5 km square grid that is 1 cm by 1 cm at the map scale. For each spatial unit, the areal coverage of certain land-cover/land-use types depicted by the map was estimated visually and recorded. Based on those estimates the classification into landscape types proceeds in steps according to the classification rules such as: class is “urban/suburban district” if at least one-eighth of the areal unit is covered by urban and suburban features. The second dataset is an accuracy assessment of a map product using reference data from aerial photo interpretation and field validation. The map was created using twelve land-cover categories to monitor change in California’s hardwood rangelands (Pillsbury et al., 1991). The categories are: Blue Oak Woodland, Blue Oak/Grey Pine Woodland, Valley Oak Woodland, Coastal Oak Woodland, Montane Woodland, Potential Hardwood, Conifer, Shrub, Grass, Urban, Water, and Other. Classification follows rules such as: Class is “Shrub or Herbaceous” if less than 10 percent of the areal unit is tree covered. Reference data for this dataset were collected through a combination of aerial photo interpretation and field control. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Parameterized Class Descriptions The documentation of each classification system was used to set up formal parameterized definitions of each class using the methodology described previously. This process created two collections of parameterized concept descriptions: one for the landscape type dataset and one for the hardwood land-cover dataset. Figure 6 illustrates how parameterized concept descriptions have been defined for the hardwood land cover dataset. Each row defines one land-cover class as a collection of fuzzy set membership functions defined over six different domains. For each class, different salience weights are applied to the domains. For example, the Valley Oak Woodland class (row 5) is defined as having 10 percent tree crown cover, 50 percent hardwoods in the tree crown cover, and Valley Oak as the dominant species. These three defining domains are given a salience weight of 1.0, indicated by the shading. For all domains, we can choose a value in the range 1 salience 0. For clarity in the following example, we set salience to 1.0 for defining domains and 0.0 to all other domains.

Similitude Assessment of Parameterized Concepts With each dataset taxonomy described as a collection of parameterized concept descriptions, the two previously described similitude metrics, overlap and nearness, were calculated for all category combinations within each taxonomy. The metrics were then entered into one overlap matrix and one nearness matrix for each taxonomy. D e c e m b e r 2 0 0 5 1369

04-042.qxd

11/11/05

4:30 AM

Page 1370

Figure 6. Parameterized concept descriptions of the Hardwood land-cover categories (adapted from Congalton and Green, 1999). Each row defines one category as a collection of fuzzy set membership functions on six different domains. “(x)” denotes membership to the row category as a function of the x axis values scaled to range from 0 no membership to 1 full membership. For each category different salience weights are applied to the domains. Shaded functions have a salience weight of 1, and non-shaded have a weight of 0.

Figure 7. (a) Difference, (b) Error, and (c) Fuzziness matrices, generated from accuracy assessment of the landscape type data (Ahlqvist, 2000).

Figures 8 and 9 show the resulting semantic similitude matrices for the landscape type and the Hardwood land-cover datasets, respectively. In these matrices, each cell shows semantic similitude values for the column class compared with the row class with respect to the row class definition. 1370 D e c e m b e r 2 0 0 5

The non-symmetric nature of these matrices is due to the non-reflexive property of the similitude metrics, and the way the parameterized concept definitions are set up to use different salience weights for the domains. For all similitude matrices, we also calculated a product matrix as “overlap nearness.” The rationale for this approach is based on the different aspects of similitude that these two metrics try to evaluate (see Figure 4). Two very similar concepts would have both a high nearness and high overlap, as opposed to cases with very near but not overlapping concepts. This product would generate the highest values for the first situation, lower for the other situations, and lowest values when both nearness and overlap are low. Our original hypothesis was that the off-diagonal similitude values should correlate with the off-diagonal entries in the accuracy assessment matrices,. More specifically, the similitude metrics (for example, Figure 8) should correlate PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

04-042.qxd

11/11/05

4:30 AM

Page 1371

Figure 8. (a) Overlap and (b) Nearness matrices for the landscape type dataset. Each cell entry is an evaluation of (a) how much of a row category definition overlaps semantically with a column category, or (b) how semantically close a column category definition is to a row category.

Figure 10. Correlation (Spearman’s Rank Correlation Coefficient) between semantic similitude metrics and the number of errors attributed to fuzziness for each combination of two landscape classes in the landscape type dataset.

Figure 11. Correlation (Spearman’s Rank Correlation Coefficient) between semantic similitude metrics and the number of errors attributed to fuzziness for each combination of two land-cover classes in the hardwood dataset. Figure 9. (a) Overlap and (b) Nearness matrices for the Hardwood land cover dataset. Each cell entry is an evaluation of (a) how much of a row category definition overlaps semantically with a column category, or (b) how semantically close a column category definition is to a row category. (NH Non-Hardwood, BOGP Blue Oak/Grey Pine Woodland, BOW Blue Oak Woodland, COW Coastal Oak Woodland, MH Montane Hardwood Mix, VOW Valley Oak Woodland).

the strongest with the fuzziness matrix (Figure 7c) since offdiagonal values in that matrix are supposed to be caused by fuzzy concept definitions. A general notion as to whether such a correlation exists can be gained by plotting pairs of values from similitude and error matrices in a simple scatter diagram (Figures 10 and 11). In these figures, the numbers of errors are taken from the fuzziness error matrices. A full evaluation of the correlation can be found in Table 1 and Table 2, where the Spearman’s Rank Correlation Coefficient is given. We can see that the similitude metrics have a high positive correlation with the empirically estimated fuzziness in the landscape dataset, and much lower (approximately 50 percent) correlation values are found between the similitude metrics and the error component. Among the similitude metrics, nearness has the highest correlation with empirically estimated fuzziness. For the Hardwood land-cover dataset the result is slightly different. For the nearness metric, the correlation with the difference and the error matrix is much lower in relation to the empirically estimated fuzziness. The overlap metric, on the other hand, seems to follow the same pattern as we saw for the landscape data set. Both overlap and nearness demonstrate the highest correlation with the fuzziness matrix and lowest correlation with the error matrix. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

TABLE 1. CORRELATION BETWEEN SIMILITUDE METRICS AND EMPIRICAL ACCURACY ESTIMATES FOR THE LANDSCAPE TYPE DATA USING SPEARMAN’S RANK CORRELATION COEFFICIENT, n 30 Landscape Type Data Nearness Matrix Overlap Matrix Nearness Overlap

Difference Matrix (error fuzziness)

Error Matrix (error only)

Difference-Error Matrix (fuzziness only)

0.83

0.45

0.86

0.73

0.41

0.78

0.76

0.41

0.80

TABLE 2. CORRELATION BETWEEN SIMILITUDE METRICS AND EMPIRICAL ACCURACY ESTIMATES FOR THE HARDWOOD LAND-COVER DATA USING SPEARMAN’S RANK CORRELATION COEFFICIENT, n 12 Hardwood Land-cover Data Nearness Matrix Overlap Matrix Nearness Overlap

Difference Matrix (error fuzziness)

Error Matrix (error only)

Difference-Error Matrix (fuzziness only)

0.27

0.12

0.49

0.52

0.34

0.55

0.32

0.16

0.50

Discussion and Conclusions The results support our original hypothesis that similitude metrics calculated from category definitions can predict those categories that are more prone to confusion in their application, and those that are not. High semantic nearness D e c e m b e r 2 0 0 5 1371

04-042.qxd

11/11/05

4:30 AM

Page 1372

and overlap values are positively correlated with the fuzziness-only component estimates from empirical accuracy assessments. Equally supportive of our hypothesis is the fact that the similitude metrics show much lower correlation with the error-only component estimates in all six cases as is summarized in Tables 1 and 2. We acknowledge, however, that these results are influenced by a number of factors that in current practice are difficult to control. So, while we have tried to select datasets with accuracy assessments that are detailed enough to enable an assessment of both error and vagueness, it remains problematic to prove conclusively that the accuracy assessment methods used to separate errors from vagueness really succeed. In fact, Congalton and Green (1999) point out that the error matrix of the Hardwood dataset still contains some errors related to vagueness even after including the “acceptable” and “probably right” classifications on the diagonal. This result may explain why there is a significant correlation between this error matrix and the overlap measure in our second experiment, and why the Hardwood dataset correlation was not as strong as for the landscape data. Another factor that will significantly affect outcome is the formalization of taxonomies into parameterized concept definitions. Our methodology requires a parameterization of the chosen classification system. In our experiment, we made a relatively crude and straightforward translation of category definitions into parameterized definitions using a simplified salience weighting procedure to calculate the resulting similitude metrics. The strong correlations are therefore somewhat surprising but also encouraging, since they support the proposed method with low demand on model sophistication. Nevertheless, there is a demand placed on the investigator to supply such a translation, and depending on how well classes are specified and/or understood, this task might range in complexity from straightforward to quite demanding. Kavouras et al. (2003) demonstrated how methods for semantic information extraction from on-line dictionaries (Jensen and Binot, 1987; Vanderwende, 1995) could be used to perform an extraction of formal semantic relations from textual definitions of geographic categories. Such structured approaches could prove useful for our methodology in providing the necessary parameterization. We would argue, however, that the ideas we demonstrate above, along with many other reasons, make a strong argument that the potential benefits of a more effective translation between scientific taxonomies is a sound reason to pay careful attention to the creation of such taxonomies so that they are more amenable to translation. Another important benefit of this method is serving as an aide to analysts during the taxonomy formation, as a means to test out likely uncertainties before expensive field data gathering. Just as with an error matrix, we can, after evaluating the semantic similitude metrics, decide that two categories overlap too much, and using them both would lead to unacceptable error rates in the resulting maps. We can therefore, either collapse the categories (semantic generalization) or consider how to strengthen the category definitions with additional attributes that make a firmer separation between them. In such analyses, the different metrics help to identify different types of similarities as shown by Figure 4 and its accompanying explanation. A final benefit, which we anticipate, but have no evidence of as yet, is that semantic similitude measures should provide researchers and organizations with a finer set of definitions to understand the likely accuracy of a thematic map and to supplement the more traditional classification error matrix and associated statistics. We are currently investigating the provision of a visual interface by which the system can help the analyst to set-up 1372 D e c e m b e r 2 0 0 5

the parameterized definitions of categories more efficiently. In future work, we plan to expand our range of applications to geological taxonomies (that also differ by national boundaries) and to conduct an expanded set of experiments on taxonomies that relate to measures of climate vulnerability in addition to land-use and land-cover.

Acknowledgments This work was supported by the National Science Foundation under Grant No. BCS-9978052. We wish to thank the anonymous reviewers for many helpful comments and suggestions on the manuscript.

References Ahlqvist, O., 2000. Context Sensitive Transformation of Geographic Information, Dissertation No.16, The Department of Physical Geography, Stockholm University Dissertation Series, 141 p. Ahlqvist, O., 2004. A parameterized representation of uncertain conceptual spaces, Transactions in GIS, 8(4):493–514. Ahlqvist, O., J. Keukelaar, and K. Oukbir, 2000. Rough classification and accuracy assessment, International Journal of Geographic Information Science, 14(5):475–496. Ahlqvist, O., J. Keukelaar, and K. Oukbir, 2003. Rough and fuzzy geographical data integration, International Journal of Geographic Information Science, 17(3):223–234. Bouchon-Meunier, B., M. Rifqi, and S. Bothorel, 1996. Towards general measures of comparison of objects, Fuzzy Sets and Systems, 84:143–153. Card, D.H., 1982. Using known map categorical marginal frequencies to improve estimates of thematic map accuracy, Photogrammetric Engineering & Remote Sensing, 48:431–439. Congalton, R., 1988. Using spatial autocorrelation analysis to explore the errors in maps generated from remotely sensed data, Photogrammetric Engineering & Remote Sensing, 54:587–592. Congalton, R.G., and K. Green, 1999. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, CRC Press, 137 p. Couclelis, H., 2003. The certainty of uncertainty: GIS and the limits of geographic knowledge, Transactions in GIS, 7(2):165–175. Crapper, P.F., 1984. An estimate of the number of boundary cells in a mapped landscape coded to grid cells, Photogrammetric Engineering & Remote Sensing, 50:1497–1503. Cross, V., and A. Firat, 2000. Fuzzy objects for geographical information systems, Fuzzy Sets and Systems, 113:19–36. Dubois, D., and H. Prade, 1990. Rough fuzzy sets and fuzzy rough sets, International Journal of General Systems, 17:191–209. Ehlers, M., and W. Shi, 1997. Error modelling for integrated GIS, Cartographica, 33(1):11–21. Faucher, C., 2001. Approximate knowledge modeling and classification in a frame-based language: The system CAIN, International Journal of Intelligent Systems, 16:743–780. Fisher, P.F., 1996. Visualization of the reliability in classified remotely sensed images, Photogrammetric Engineering & Remote Sensing, 60:905–910. Fisher, P.F., 1999. Models of uncertainty in spatial data, Geographical Information Systems – Principles and Technical Issues, Second Edition, (P.A. Longley, M.F. Goodchild, D.J. Maguire, and D.W. Rhind, editors), John Wiley & Sons, Inc., pp. 191–205. Foody, G.M., and P.M. Atkinson (editors), 2002. Uncertainty in Remote Sensing and GIS, Wiley, Chichester, England, 307 p. Gahegan M., and B. Brodaric, 2002. Examining Uncertainty in the Definition and Meaning of Geographical Categories, 5th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, 10–12 July, Melbourne, pp. 290–299. Gopal, S., and C. Woodcock, 1994. Theory and methods for accuracy assessment of thematic maps using fuzzy sets, Photogrammetric Engineering & Remote Sensing, 60:181–188. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

04-042.qxd

11/11/05

4:30 AM

Page 1373

Gärdenfors, P., 2000. Conceptual Spaces: The Geometry of Thought, MIT Press, Cambridge, Massachusetts, 307 p. Hahn, U., and N. Chater, 1997. Concepts and Similarity, Knowledge, Concepts and Categories (K. Lamberts and D. Shanks, editors), East Sussex, Psychology Press, pp. 43–92. Heuvelink, G.B.M., and P.A. Burrough, 1993. Error propagation in cartographic modelling using Boolean logic and continuous classification, International Journal of Geographical Information Systems, 7(3):231–246. Jensen, K., and J.-L. Binot, 1987. Disambiguating prepositional phrase attachments by using on-line dictionary definitions, Computational Linguistics, 13(3–4):251–260. Jones, C.B., H. Alani, and D. Tudhope, 2003. Geographical terminology servers – Closing the semantic divide, Foundations of Geographic Information Science, (M. Duckham, M.F. Goodchild, and M.F. Worboys, editors), Taylor & Francis, London, pp. 205–222. Kaufman, A., and M.M. Gupta, 1985. Introduction to Fuzzy Arithmetic, Van Nostrand Reinhold Company, New York, 351 p. Kavouras, M., M. Kokla, and E. Tomai, 2003. Determination, visualization and interpretation of semantic similarity among different geographic ontologies, Proceedings of the 6th AGILE Conference on Geographic Information Science, 24–26 April, Lyon, France, pp. 51–56. Klir, J.K., and M.J. Wierman, 1998. Uncertainty-Based Information: Elements of Generalized Information Theory, Physica-Verlag, Heidelberg, 168 p. Leung, Y., 1988. Spatial Analysis and Planning Under Imprecision, Elsevier, Amsterdam, 375 p. Leung, Y., and J. Yan, 1998. A locational error model for spatial features, International Journal of Geographical Information Science, 12(6):607–620. Mennis, J.L., 2003. Derivation and implementation of a semantic GIS data model informed by principles of cognition, Computers, Environment and Urban Systems, 27:455–479. Morris, A., 2003. A framework for modeling uncertainty in spatial databases, Transactions in GIS, 7(1):83–101. Nosofsky, R.M., 1986. Attention, similarity, and the identificationcategorization relationship, Journal of Experimental Psychology: General, 115:39–57.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Pawlak, Z., 1991. Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Dordrecht/Boston, 229 p. Pillsbury, N., M. DeLasaux, R. Pryor, and W. Bremer, 1991. Mapping and GIS database development for California’s hardwood resources, California Department of Forestry and Fire Protection, Forest and Rangeland Resources Assessment Program (FRRAP), Sacramento, California, 64 p. Robinson, V.B., 2003. A perspective on the fundamentals of fuzzy sets and their use in geographic information systems, Transactions in GIS, 7(1):3–30. Schneider, M., 1995. Spatial Data Types for Database Systems, Ph.D. Thesis, Fernuniversität, Hagen, Germany. Shepard, R.N., 1987. Toward a universal law of generalization for psychological science, Science, 237:1317–1323. Tversky, A., 1977. Features of similarity, Psychological Review, 84:327–352. L. Vanderwende, 1995. The Analysis of Noun Sequences using Semantic Information Extracted from On-Line Dictionaries, Ph.D. thesis, Faculty of the Graduate School of Arts and Sciences, Georgetown University, Washington, D.C. Worboys, M.F., 1998. Computation with imprecise geospatial data, Computers, Environment and Urban Systems, 22(2):85–106. Usery, E.L., 1996. A conceptual framework and fuzzy set implementation for geographic features, Geographic, Objects, With Indeterminate Boundaries, (P.A. Burrough and A.U. Frank, editors), Taylor & Francis, London, pp. 71–85. Veregin, H., 1999. Data quality parameters, Geographical Information Systems – Principles and Technical Issues, Second Edition, (P.A. Longley, M.F. Goodchild, D.J. Maguire, and D.W. Rhind, editors), John Wiley & Sons, Inc. pp. 177–189. Yazici, A., and K. Akkaya, 2000. Conceptual modeling of geographic information systems applications, Recent Issues on Fuzzy Databases, (G. Bordogna, and G. Pasi, editors), Studies in Fuzziness and Soft Computing, Physica-Verlag, Heidelberg, New York, pp. 129–151. Zhang, J., and M. Goodchild, 2002. Uncertainty in Geographical Information, Taylor & Francis, London, 266 p. (Received 10 August 2004; accepted 02 November 2004; revised 02 December 2004)

D e c e m b e r 2 0 0 5 1373

Probing the Relationship Between Classification Error and ... - CiteSeerX

Probing the Relationship Between Classification Error and ... - CiteSeerX

Suggest Documents

On the Relationship between Dependence Tree Classification Error ...

Error Limiting Reductions Between Classification Tasks - CiteSeerX

Relationship Between Smoking and Bleeding on Probing

A Chemical Screen Probing the Relationship between ... - CiteSeerX

The RELATIONSHIP BETWEEN - CiteSeerX

Relationship Between Functional Classification ...

Probing the relationship between Gram-negative and ... - BioMedSearch

The relationship between the measurement error and the linearity of

Investigating the Relationship between Classification Quality ... - MDPI

The relationship between phylogenetic classification ...

Measurement Error and the Relationship between Investment and q

The Relationship Between the Customer Relationship ... - CiteSeerX

Relationship between the Predictability Limit and Initial Error in ...

The Relationship between the Environmental and ... - CiteSeerX

The Relationship between the Environmental and ... - CiteSeerX

Relationship Between Central Corneal Thickness, Refractive Error ...

THE RELATIONSHIP BETWEEN eCRM ... - CiteSeerX

THE RELATIONSHIP BETWEEN FLOWERING ... - CiteSeerX

THE RELATIONSHIP BETWEEN eCRM ... - CiteSeerX

THE RELATIONSHIP BETWEEN CRYSTALLOGRAPHIC ... - CiteSeerX

THE RELATIONSHIP BETWEEN MATHEMATICS ... - CiteSeerX

Probing the relationship between electromagnetic ion cyclotron waves ...

THE RELATIONSHIP BETWEEN DIALOGUE ACTS AND ... - CiteSeerX

The Relationship Between Accounting and Taxation - CiteSeerX