Textural and Contextual Land-Cover Classification Using ... - CiteSeerX

Textural and Contextual Land-Cover Classification Using Single and Multiple Classifier Systems Ollvler Debelr, Isabelle Van den Steen, Patrlce Latlnne, Phlllppe Van Ham, and Eleonore Wolff

Abstract The objective of this study was to improve the quality of the digital land-cover and land-use classification when using high-resolution (10 to 30 m) remote sensing data. Three classification techniques were compared, which can be divided into two groups: single classifiers (a five-nearest neighbor and the C4.5 decision tree classifier) and multiple classifier systems (BAGFS). Textural and contextual features (roads, hydrology, relief, etc.) were introduced during the classification process. Eleven land-cover categories, in a Belgian varied landscape, were analyzed and classified using Landsat Thematic Mapper data. The accuracy assessment increased with the introduction of textural features and contextual data, between 0.60 and 0.82 for the Kappa coefficient. The best kappa value was achieved using numerous textural and contextual features with the multiple classifier system (BAGFS).

Introduction Since the early 1970s,researchers have always tried to interpret land cover or land use as a main component of the landscape for land management (Anderson, 1971). Since the 1980s, the improvement of automated land-cover interpretation has been an important research topic; the aim has been to get closer to the accuracy and detail of manual interpretation (Campbell,1981). Although numerical techniques have improved, operational programs of land-cover cartography and inventory are still based on visual interpretation techniques (e.g., COW Land Cover, an exhaustive European database on land cover established at 1:100,000scale; MURBANDY, a European database established at 1:25,000 scale for 2 1 cities, etc.). This choice may be explained by the number of classes included in the classification scheme (44 in CORINE Land Cover and 52 in MURBANDY), the diversity of landscapes, the size of the area under interpretation (Europe,Ahica, etc.), and the weak results of automated techniques. While abundant literature attests to important developments in image processing techniques exploiting textural, i.e., the description of the spatial variability of tones (Gurney, 1983); and contextual, i.e., the description of the spatial relationship of a pixel with the remainder of the scene (Gurney, 1983),information of remote sensing images (Argialas et al., 1990;Gurney

0.Debeir ([email protected])and P. Van Ham (pavaha@ulb. ac.be) are with Information and Decision Systems CP 165157, I. Van den Steen ([email protected]) and E. Wolff (ewolm ulb.ac.be) are with the Institute of Management of the Environment and Regional Development CP 130102, and P. Latinne ([email protected])is with the Artificial Intelligence Department, all at the Universit6 Libre de Bruxelles, Avenue Franklin Roosevelt, 50, 1050 Brussels, Belgium. PHOTOGRAMM€rRICENGINEERING & REMOTE SENSING

eta]., 1983;Haralick et al., 1973; Rosenfield et al., 1379;Ryherd et al., 1996; Tung Fung et al., 1994;Weiler et al., 1991),these techniques remain poorly accessible either because they are not adapted to large images, or because their application might be limited in standard commercial image processing software (i.e., size and shape of the moving windows, number of channels, etc.). Moreover, ancillary data, imperative for the interpretation of the use of the land, is far more difficult to take into account when using automated interpretation (Richards, 1994).It has been shown that the integration of some of these parameters has significantly increased the level of detail and the quality of the interpretation (Mesev, 1998;Ricchetti, 2000). In addition to the integration of textural and contextual features, another research topic concerns the improvement of classification techniques. Several classifiers have been applied to interpreting land cover numerically £rom remote sensing data, such as decision trees (Hansen et al., 1996; Friedl, 1997) and neural networks (Atkinson, 1997).In the field of pattern recognition, a recent trend proposes to use multiple classifier systems in order to improve classification accuracy (Kittler et al., 2000). This paper aims to improve numerical techniques of remote sensing image processing for land cover interpretation and mapping by using spatial information and ancillary data with a multiple classifier system, "BAGFS. "

Study Area The study area has been chosen for its varied and complex landscapes mixing urban, industrial, rural, and wooded areas. It is located in the west of Belgium (Europe),more precisely, from 3'54' to 4'22' east longitude and 50'22' to 50°41' north latitude, and covers an area of 30 by 30 km (Figure 1). The zone includes the cities of Mons south-west, Nivelles north-east, Fontaine-1'EvSque south-east, and Soignies northwest.

Data The data are data extracted from the Landsat Thematic Mapper imagery (seven bands; 199-floating 1990-05-01), provided by the National Geographic Institute; digital terrain models (DLMS, level 2) provided by the National Geographic Institute; and

Photogrammetric Engineering & Remote Sensing Vol. 68, No. 6, June 2002, pp. 597-605. 0099-1112IO2I6806-597$3.00/0

O 2002 American Society for Photogrammetry and Remote Sensing

hierarchical levels. Although it is not recommended to mix morphological and functionally orientated classification scheme, i.e., land cover and land use (Anderson, 1971),this is actually the case for the CORINE Land Cover classification scheme (CEC, 1993);this increases the difficulty of classification for some classes using remote sensing data. For this study, we worked with the third level of the classification scheme. Among the 44 original classes of the classification scheme, only 19 are present in our study areas. Some classes were removed because of their very low accuracy associated with automated classification; indeed, aerial photographs, and their use under stereoscopy,are needed to interpret them because of their spectral, spatial, and relief characteristics. These classes are "mineral extraction sites," "dump sites," "construction sites," "sport and leisure facilities," "moors and heathland," and "transitional woodland-shrub." These classes are mainly land-use classes. Their visual interpretation relies more on ancillary data. Their automated interpretation is not considered in this paper. In all, 11classes were used in this work, including classes such as continuous urban fabric, industrial or commercial units, non-irrigated arable land, coniferous forest, etc. (see Table 1).Figure 1shows the study area mapped and Plate l a shows it with the C ~ R I N ELand Cover classification scheme.

--Is*-

Training and Valldatlon Sets In order to train the classifier, two types of supervised data sets were chosen: an "expert set" and a "random stratified set," each of them being split into two subsets, one for the training of the classifiers and the second for the validation phase. Expert Training and Validation Sets

3'64'E

---

-hrdarl

=

-w-

!F'YCt,

42ZE

Figure 1. Study area.

ancillary data such as road, motorway, hydrographic, and rail networks as well as villages (mainly provided by Tele-Atlas). The Landsat TM data were geometrically corrected in two dimensions (Xand Y)by the National Geographic Institute of Belgium using the 1/250,000-scale database in the Belgium Larnbert Projection. The exhaustive European-wide CORKNELand Cover database was used as the reference data (CEC, 1993).These landcover data had been visually interpreted at 1:100,000 scale from various remote sensing imagery. For Belgium, the smallest mapping unit is 15 ha. Although these data are spatially highly generalized, they have become a standard at the European level.

Classification Scheme The classification scheme was extracted from the CORINE Land Cover program (CEC, 19931, the inventory of which provides localized geographical information on land cover and land use in the Member States of the European Community. The methodology relies on the exploitation of satellite images, together with other relevant documents. The inventory consists in mapping and storing the land-cover data in a geographic information system (GIS).The classification scheme has three 598

June 2002

A training set is defined and used to extract not only the spectral signatures, but also the textural and contextual information. As it is commonly done, training areas were selected by an the expert using a Landsat color composite (TM~-TM~-TM~), COW Land Cover data, and topographic maps. Between 105 and 342 pixels per class were chosen as being representative of each class. In total, the training set for all classes consisted of 2096 pixels. These training areas were defined to be spectrally homogeneous and therefore generally delimited far from the borders of a region. We could consequently consider these training areas as prototypes of the class they belong to; they did not necessarily reflect the spectral heterogeneity of each class. The validation set was selected according to the same method. It consisted of a total of 2028 validation cases (different from the training cases).

Random Stratifled Training and Validation Sets A random selection (without replacement) was employed to choose 200 pixels per class, both for the training set and for the validation set, as defined in the Land Cover database. These 2200 pixels were used for the training set, and another set of 2200 pixels was used for the validation set. These training and validation sets were spectrally heterogeneous, but were more representative of the intra-class variability. They were labeled using the CCIFUNE Land Cover data. Such a random choice ensured that border pixels and heterogeneous areas were also selected for training and validation. Although these sets were much more heterogeneous, they were expected to give a better generalization of the classification results.

Features One-hundred thirty-three numerical and ordinal features were computed using spectral, textural, and contextual (ancillary data) information in order to describe the different classes. Spectral lnfonnation

Usually, numerical land-cover classifications are essentially based on spectral information. This is due to the assumption PHOTOGRAMMmUC ENGINEERING& REMOTE SENSING

TABLE1. CLASSES U S E ~EXTRACTEDFROM CORlNE LANDCOVER (CEC, 1993)

1. ARTIFICIAL SURFACES 1.1. Urban Surfaces 1.1. I . Continuous urban fabric: Most of the land is covered by structures and the transport network. Buildings, roads, and artificially surfaced areas cover more than 80 percent of the total surface. Non-linear areas of vegetation and bare soil are exceptional. 1 .I .2. Discontinuous urban fabric: Most of the land is covered by structures. Buildings, roads, and artificially surfaced areas are associated with vegetated areas and bare soil, which occupy discontinuous but significant surfaces. 1.2. Industrial or commercial units and communications networks 1.2.1. Industrial or commercial units: Artificially surfaced areas (cement, asphalt, macadam, or stabilized, e.g., beaten earth) without vegetation occupy most of the area, which also contains buildings and/ or vegetation. 1.2.2. Roads and rail networks and associated land 1.2.2.1 Road networks and associated land: Motorways including associated installations. Minimum width for inclusion: 100 m. 1.2.2.2 Rail networks and associated land: Railways, including associated installations (stations, embankments). Minimum width for inclusion: 100 m. 2. AGRICULTURAL AREAS 2.1. Arable land 2.1.1. Non-irrigated arable land: Cereals, legumes, fodder crops, root crops, and fallow land. Includes flowers and tree (nurseries cultivation and vegetables, whether open field or under plastic or glass (includes market gardening). Includes aromatic, medicinal, and culinary plants. Does not include permanent pasture. 2.1.2. Arable land without vegetation: Without vegetation 2.3. Pastures 2.3.1. Pastures: Dense grass cover, of floral composition, dominated by graminaceae, not under a rotation system. Mainly for grazing, but the folder may be harvested mechanically. Includes areas with hedges (hedged farmland). 3. FOREST AND SEMI-NATURAL AREAS 3.1. Forests 3.1. I . Broad-leaved forest: Vegetation formation composed principally of trees, including shrub and bush under-stories, where broad-leaved species predominate. 3.1.2. coniferous forest: Vegetation formation composed principally of trees, including - shrub and bush under-stories. where coniferous species predominate. 5. WATER BODIES 5.1. Water surfaces 5.1.2. Water bodies: Natural or artificial stretches of water

that different land covers have distinct spectral signatures. Using the spectral information only, confusions between landuselland-cover classes are numerous. All seven Landsat Thematic Mapper spectral bands (TMI-TM~)were used. The two first-component and the seven equalized images were also used. Textural Information

There is no unique definition of texture. Briefly, a texture is the visual impression of coarseness or smoothness caused by the variability or uniformity of image, tone, and color (Emerson et al., 1999).Textures are homogeneous patterns or spatial arrangements of pixels that regional intensity or color alone do not sufficiently describe. Textural filters were developed and computed in order to introduce textural features in the classification process (Haralicket al., 1973). Classical statistical filters were applied to the circular region surrounding each pixel as follows: statistical pixel value distribution in a moving window (Parker, 1997) such as average, skewness, kurtosis, maximum, and minimum; auto-correlation measurement with linear and rank-order versions together with a related covariance measure and variance ratio (Harwood et al., 1995); and PHOTOGRAMMETRICENGINEERING & REMOTE SENSING

scattering vector and Hurst coefficient (Russ, 1990).

All these textural features were used during the classification process. Because objects in the image may have different dimensions or different orientations, the features were calculated for different sizes of circular windows (isotropic). In order to assess the influence of the choice of the size of the moving window on classification accuracy, several tests were applied using four spectral bands and their standard deviations (used as a textural feature) computed with a moving window of an increasing size (Figure 2). The radius of the circular window (i.e., the parameter of the neighborhood) influenced the size of the smallest detail at could be described by the features. The global accuracy and the accuracy per class were assessed. A few classes, such as coniferous forest and water bodies, have the same Kappa coefficient for all window sizes, but they are not well represented in the studied area, but only in rather small patches. Some other classes, such as continuous urban fabric, see their Kappa coefficient increasing with the window size. Others see their kappa decreasing for small window sizes before increasing. For some well represented classes, such as arable land, discontinuous urban fabric, and industrial units, the Kappa coefficient increases at the 5-pixel radius, which corresponds to the generalized spatial scale of the reference data. From these preliminary tests, it was concluded that a single optimal window size could not be adopted; therefore, an approach taking into account several window sizes was followed. Contextual Information

The contextual information can be divided into two types according to the data source: context may be assessed using measures applied internally on a remote sensing image or by using ancillary data (Gurney, 1983).Methods have been developed to integrate quantitative ancillary data with the numerical interpretation of remote sensing data into a classification process (e.g., the altitude) (Stralher, 1980;Richards, 1982; Gong et al., 1992),but much data of a qualitative nature are generally used during visual interpretation of land cover (e.g., presence of an industry or relationship to a class of soils, etc.) and, as a matter of fact, should be integrated into numerical techniques as well. Only a few of the more complex systems are able to insert multi-sensors and multi-sources data into a geographic information system by using qualitative data during the interpretation process (Richards, 1994). Attempts have been made to formalize the knowledge of interpreters as rules included in an expert-system, but, because the knowledge is complex, it is very difficult to formalize (McKeown et ul., 1999; Zhu, 1997; Peddle, 1995;Richards, 1994;Gong et al., 1992;Mulder et al., 1991). The choice of the features used here is derived from previous research (Wolff et al., 1999)in which the use of ancillary data during visual interpretation was analyzed. Proximity to relevant objects in the image (i.e., Euclidean distance) is commonly used during visual interpretation. We refer to the interpreter's knowledge of the landscape while interpreting land use or land cover. In this study, the contextual features data were derived from a vector topographical database (roads, railways, hydrological network, settlements, etc.) and from a digital terrain model (slope, orientation, etc.) in order to be included in the classification process.

Classifiers We used a five-nearest-neighbor(5-NN) classifier and the C4.5 decision tree as single classification algorithms. These methods were compared to a multiple classifiers system labeled "BAGFS."

j u n e 2002

599

+Discontinuous urban fabric -t Industrial or commercial units -t Noninigated arable land +Coniferous forest

0.10

0.W ~

2

3

4

5

6

7

B

8

1

0

1

1

1

2

1

3

Radius windows ske Iln pixels)

Figure 2. Radius influence on kappa for some classes.

Single ClasslRers

Two different single classifiers were used. First, we applied a simple and robust classifier method, a five-nearest-neighbor(5-NN)classifier. It represents the earliest general (non-parametric)method which was heavily investigated in the field of pattern recognition (Duda et al., 2001). It does not demand global dimensionality reduction in the training feature space to ensure accurate results (Fukunaga, 1990). In order to avoid scale effect due to the various ranges of features, all feature values were linearly normalized between 0 and 1.Being non-parametric, this method allows the simultaneous use of spatial, textural, and contextual features without any hypothesis regarding their distribution. Second, we used the Ross Quinlan's decision-tree classifier, C4.5 Release 8 (see Quinlan, 1993),with its default parameis based on a ters values and its pruning method. ~ 4 . 5 supervised inductive algorithm and performs as follows. At each level of the tree, the observations are divided according to a specific decision: IF condition THEN first choice OTHERWISE second choice. Each branch (or node) is then divided into sub-branches and finally into leaves which correspond to a class label. The C4.5 decision tree is also a non-parametric classifier (i.e.,that does not require any hypothesis on data distribution). It selects features that mostly contribute to an entropy gain, i.e., only features that have a high discriminant power are retained. As an interesting consequence, it might process a large number of features at the same time. Indeed, the number of features quickly increases while introducing spectral and textural features with different window sizes and contextual information. Finally, decision trees are unstable for small modifications among the training examples and the features. Nearest-neighbor methods are also unstable for small modifications of the feature space but not in the training set (Breiman, 1996). These instability properties will be used while combining several classifiers as explained in the next section: multiple classifier systems. Multiple Classifier Systems Many studies have shown that, in several applications, a multiple classifier system is an effective technique for reducing classification errors (Xu eta]., 1992; Ho et al., 1994; Kittler, 1998; 600

/line Z O O 2

Kittler et al., 2000). The main way to design a multiple classifier system is based on the output combination of different classifiers. Several efficient and new multiple classifier systems are based on weakening techniques that create different classifiers. A "weak classifier" (Schapire, 1990;Ji et al., 1997)refers to a classifier whose capacity has been reduced so as to increase its prediction diversity. Either its internal architecture is simple (e.g., the use of mono-layer perceptrons or one-nearest neighbors), or it is prevented from using all the information available. Several ways of manipulating a training set (Bootstrap Replicates (Breiman, 1996)and Random Subspaces (Ho, 1998))to create a set of weak classifiers have shown that once combined, these classifiers can improve prediction accuracy. In this paper, we propose applying a multiple classifier system labeled "BAGFS" to the remote sensing application. "BAGFS" (Latinne et al., 2000) combines bootstrap aggregating ("Bagging" (Breiman, 1996))with Multiple Feature Subsets ("MFS;" (Bay, 1998);see also "Random Subspaces" (Ho, 1998)). Bagging (Breiman, 1996)is a popular solution for classification problems and consists in building bootstrap (i.e., sampling N training examples among Nwith replacement) replicates of the original training set and in using these to run a learning algorithm. Quinlan (1996)has validated the bagging method with ~4.5 decision trees. Once the classifiers have been independently deduced from the data (decision tree building), their predictions, made on an independent testing case, are combined with a plurality voting rule. Breiman (1996)argues that the main reason why bagging works is the instability of the chosen learning algorithm (i.e., decision trees or neural networks) with respect to the variations in the learning set introduced by bootstrapping. The Multiple Feature Subsets (MFS)method (Bay, 1998:Ho 1998)consists in training a given number of classifiers, with each having as input a given proportion of features picked randomly from the original set of features with or without replacement. Ho (1998) proposed this approach for decision trees while Bay (1998)studied MFS on nearest neighbors. So, like bagging with training patterns, MFS attempts to use classifier instability (this time, with respect to feature selection) to generate a set of classifiers with uncorrelated errors. To obtain the BAGFS architecture, B bootstrap replicates of the training set are generated (Bagging component). In each replicate, a subset off' features, randomly selected among the f PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

initial features without replacement, is independently sampled (MFS component). We denoted k = f 'If as the proportion of features in these B subsets. We applied this algorithm to Quinlan's ~ 4 . decision 5 tree with its default parameter values and its pruning method (all trees were pruned). The plurality voting rule is applied to combine the predictions made by the so-created decision trees. The optimum value of k was obtained by performing a ten-fold cross-validation on the training set. We obtained k = 40 percent among all the spectral features or all spectral and textural features and k = 10 percent among all the 133 features. As a good trade-off between accuracy and timeconsumption, we combined B = 50 decision trees.

Accuracy Measurement Overall and Per-Class Accuracies (Kappa Degmof-Agreement)

A quantitative measurement of classification accuracy is used to assess the quality of the image classification. The Kappa degree-of-agreement,originally developed by Cohen (1960),is used here as a measurement of classification accuracy (Rosenfield et al., 1979;Hudson, 1987).The quality of the results was assessed for each classification method, with the same validation set of pixels being used for computing the confusion matrix and the Kappa estimates (global and per-class measurements). Classiflcatlon Cornpartson (McNemar Test of Significance)

In this paper, we used the non-parametric McNemar test (Siege1 et al., 1988;Rosner, 1995; Salzberg,1997;Dietterich, 1998)as a direct comparison method for testing whether two sets of classifications differed significantly among themselves. Given two this test compares classifiers C1 and C2(e.g.,C4.5 versus BAGFS), the number of pixels misclassified by C,, but not by C2(Mlz), with the number of cases misclassified by C,, but not by C1 (M2,).If Mlz + Mzl r 20, the X2statistics can be considered as following a chi square distribution (with one degree of freedom): i.e.,

If XZis greater than X: = 3.841459 (p < 0.05 confidence level), the algorithms have significantly different levels of performance. We applied the McNemar test to each pair of compared algorithms. The case MI, + M2, < 20 for which we should not apply the chi square approximation but the exact test (as described in Rosner (1995))never appeared in our experimental design.

Results The overall and per-class accuracies in terms of the Kappa coefficients are presented in Table 2 for each classification method for the stratified random learning and testing sets. The three classifiers (K-NN, C4.5, and BAGFS)were f i s t applied only to spectral features, then to spectral and textural features, and finally to spectral, textural, and contextual features. Table 2 shows that BAGFS always exhibited the best overall accuracy for each feature selection with respect to the McNemar test. The improvement obtained by using all the features was also significant and suggested that these features were required to classify the whole image. Among single classifiers, the performance of the five-nearest-neighbor classifier was better than that of the ~ 4 . with 5 respect to the McNemar test whatever the feature set but we should consider that the results were rather similar and that ~ 4 . was 5 much more efficient in terms of the computation time. Most of the kappas exceeded 0.80 in the most accurate classification; the highest accuracies were reached for the "continuous urban fabric," the "rail networks," the "coniferous forest," and the "water bodies," which all exceeded 0.90. Despite the general considerable increase in accuracy, some classes remained poorly classified; this was the case for the "discontinuous urban fabric." This CORINE Land Cover class is known to be very heterogeneous because of its definition, which includes built up areas from 20 percent to 80 percent of artificial areas. This is the reason why this class comprises at the same time relatively dense cores of villages and their recent extensions along roads. Within these extensions, houses are often associated with roads, gardens, ~astures, or arable land. Using any of the classification methods, some classes, poorly identified with spectral information, saw their accuracy drastically improved by the use of textural and contextual information (e.g., continuous urban fabric and rail networks). For others (discontinuous urban fabric), although the accuracy increased, it remained low. Indeed, the CORINELand Cover definition of discontinuous urban fabric is very heterogeneous. It includes at least 25 percent to 75 percent of the built up areas mixed with other land-cover types (garden, street, etc.). Nevertheless, BAGFS led systematically to the highest per-class accuracy. Once the models were validated with the testing sets using the stratified random learning set, they were applied to all the image pixels in order to produce classified images. Some relevant results are presented in Plate 1.Plate l b shows the results of the ~ 4 . decision-tree 5 applied only to spectral features and Plate l c to spectral, textural, and contextual features. Plate I d is

ACCURACYAND PERCLASS ACCURACYFOR EACH CLASSIFIER FOR DIFFERENTSETS OF FEATURESUSING THE RANDOM TRAINING AND VALIDATION SETS TABLE2. OVERALL

Spectral, Textural

Spectral Continuous urban fabric Discontinuous urban fabric Industrial or commercial units Road networks and associated land Rail networks and associated land Arable land: cultivated soil Arable land: without vegetation Pastures Broad-leaved forest Coniferous forest Water bodies Overall accuracy Mc Nemar significance

Spectral, Textural, and Contextual

5NN

C4.5

BAGFS

5NN

C4.5

BAGFS

5NN

C4.5

BAGFS

0.56

0.53 3

0.60 1

0.73 2

0.66 3

0.80 1

0.77 2

0.74 3

0.82 1

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

2

J u n e 2002

601

Plate 1. (a) cORINE Land Cover. (b) Decision-tree classifier C4.5 on spectral features. (c) Decision tree classification ~ 4 . 5on spectral, textural, and on spectral, contextual features. (d) Multiple classifier systems (BAGFS) textural, and contextual features, using the stratified random sets.

Plate 2. Artifacts: (a) CORINE Land Cover. (b) Distance to the rail networks. (c) One of the decision tree classifier using for the BAGFS. (d) Decision tree classifier C4.5. (e) Multiple classifier systems: BAGFS.

TABLE3.

OVERALL ACCURACY(KAPPA) FOR THE DIFFERENT TRAINING AND TESTINGSETS AND APPROVEMENTTHAT A METHOD IS SIGNIFICANTLY BETTER (1.E..

RANKSINTO BRACKETS ON THE SECOND LINEFROM MCNEMAR TEWS 1)OR NOT, THAN ANOTHER(2 OR 3) --

Spectral Overall Accuracy

5NN

C4.5

Spectral, Textural BAGFS

5NN

Spectral, Textural, and Contextual

BAGFS

C4.5

5NN

C4.5

BAGFS

Training set: manual Validation set: manual Training set: manual Validation set: CORINE Training set: manual Validation set: random Training set: random Validation set: random Training set: random Validation set: CORINE

the classified image reaching the best overall accuracy, obtained with the BAGFS on all spectral, spatial, and textural features.

Discussion Manual and Random Stratifled Training and Testing Set

First, we validated our approach using the manual training and testing sets described earlier. We faced a paradox: overall Kappas are high (Table 3) but classified images did not satisfy the expert visual evaluation, and misclassifications were very frequent at the borders of the image objects (Figure 3). These artifacts were generated by the increasing use of textural and contextual information. This visual assessment was confirmed while crossing the classified images with the whole C ~ R I N ELand Cover database; overall accuracy fell (Table 3). It showed clearly that a manual selection of a validation set introduces a strong bias in the accuracy assessment. Indeed, the expert tended to select very pure areas (i.e.,no border, no transition, no mixels) which were not representative of the class diversity. In order to avoid this bias in selecting the training and the testing sets, a training and testing set was drawn at random in the CORINE Land Cover database (see the section on Random Stratified Training and Validation Sets). The stratified random training set was more representative of each class heterogeneity (borders,transitions, mixels) and,

Multipb Chsdfbn System (BAQFS) on Ipscbrl. WturJ and wmtexblal feetun8 uslng the mnud Mt

therefore, classified images were closer to the abstraction level of O Land ~ Cover image interpreted visually. the reference C This was confirmed by the results shown in Table 3. The global kappas of the multiple classifier system BAGFS obtained with the stratified random training set on all featureswere higher than those obtained with the manual training set if classified images were assessed with the random testing set (Table 3). The visual assessment by the expert was also better. Indeed, no more artifacts were present in the classified images; BAGFS was also able to recognize the objects borders as parts of a specific class. This visual assessment was confirmed while crossing the classified images with the whole CoRINE Land Cover database (Table 3). The problem is that the stratified random training set is more difficult to implement in a standard ground survey mainly because some areas may not be accessible. Stratified random training and testing sets could only be implemented if an exhaustive or a sampled data source is available (e.g., map update) and constitutes a valuable alternative to a standard ground survey, i.e., aerial photographs and topographic maps. Textural and Contextual Contribution

Table 2 shows that global accuracy increased when textural anu contextual features were introduced into the classification process. It was mainly the textural information that improved the kappa.

-

Multip* ClurlR.n System ( W F S ) on Wal, lsxhnl and contextual fedunr using the random set

Arable land cultivated soils ~ r a b l land e without v s g a s t i o n ~Other classes

I Pwtures

~

o

a network d

0

0.2 0.4

0.6

0.8

1 kin

Figure 3. Artifacts created using a manual set with a lot of textural and contextual information that does not appear while using a stratified random training set.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

reference data in order to test the classification accuracy. TexThe improvement induced by the introduction of textural tural and contextual features were taken into consideration and contextual features was significant for all classes with with the spectral information during the classification. Several respect to the McNemar test. For some classes the kappa got classifiers have been tested: single (five-nearest-neighborsand nearer to 100 percent, i.e., for continuous urban fabric, rail net~ 4 . 5decision tree) and a multiple classifier system (BAGFS works, coniferous forest, and water bodies. For others, the applied to ~ 4 . 5 ) . kappa increased but remained low; this was the case, for examIt was shown that including the textural and the contextual ple, for the discontinuous urban fabric and the industrial or information led to changing the method of collecting training commercial zones. and testing sets. The use of random stratified training and testThis additional information improved the classification, ing sets was more relevant than were manual expert sets. but also caused some misclassifications at the edges of homoBecause the random training set took a better account of the geneous spectral zones. Indeed, most of the textural features were computed using a moving window. Because of the variable class heterogeneity, i.e., border areas as well as transition zones, it increased the level of generalization and the accuracy size of the scene objects, several sizes of windows were considof the classification. The testing set led to an accuracy assessered. Most of these features underlined the edges between ment which was closer to the visual expert assessment of the homogeneous spectral zones which were not always relevant classification quality. regarding the classification scheme. It can therefore be said At the same time, the introduction of textural and contexthat the introduction of textural features into the classification tual features during the classification increased significantly process may create some misclassification underlying the global and per-class kappa coefficients, mainly for classes edges of homogeneous spectral zones. which were not spectrally homogeneous. But the introduction Although, the introduction of contextual information of such data led to artifacts in the classification. These artifacts improved classification accuracy, it also induced artifacts. could be removed by means of a multiple classifier system Plate 2 illustrates such an artifact induced by contextual (BAGFS),which was composed of a plurality vote among the information, i.e., the distance to the rail network on several classification results. It shows the reference C O W E Land Cover predictions of weak single classifiers. These single classifiers are called "weak" because, here, they were applied to one pordatabase (Plate za), the image of the distance to the rail network tion of the sets of features and examples and because each of (Plate 2b), and some classification results (Plates 2d and 2e) for them had then a low classification accuracy. Because artifacts a detailed area (Soigniearea). 5 with and misclassification did not happen in each single weak clasIf we compare the classification results of ~ 4 . trained a stratified random training set (Plate 2d) and the one predicted sification at the same place, their combination with a plurality vote led to a classification close to the visual interpretation by BAGFS trained with the same training set (Plate 2e), we with a very high accuracy. The best classification is given by the observe that, for an equivalent prediction accuracy (see Table multiple classifier system which used the whole range of fea2), BAGFS results showed fewer artifacts than those obtained tures with a stratified random set. ) a rectangular shaped area with C4.5 alone. Plate 2d ( ~ 4 . 5shows that does not correspond to any actual shape from the visual interpretation. Indeed, the corresponding decision tree preAcknowledgments sented a threshold on the "Distance to Rail" feature that preThe study was funded by the Federal Office for Scientific, Techdicted the class of the artifact pixels (Plate 2b); this threshold nical and Cultural Affairs (Belgium) in the framework of the defined the limits of this artifact. This ~ r o b l e marises when TELSAT 4 program. Helpful support was provided by the classifiers based on single feature thresholds (such as decision MARCH project funded by the Free University of Brussels. The trees) are coupled with the use of ancillary data (containing authors give specials thanks to the Belgian National Geographic synthetic shapes). Institute for the remote sensing and topographic data. Isabelle This also occurs for decision trees generated inside BAGFS Van den Steen would like to thank the FRIA [Fond pour la For(Plate 2c). Plate 2c shows one of the weakened decision tree mation B la Recherche dans 1'Industrie et dans 1'Agriculture). predictions with 10 percent of the features for the same area of the image. We observed that the previously depicted artifact References was also present, and that circular zones were generated by the use of the "Distance to Village" feature. But, thanks to the ranAnderson, J.R., 1971. Land-use classification schemes, Photogmmmetdom feature and training example selections, BAGFS generated ric Engineering, 37(4):379-387. versatile predictions for the same area, with probably different Argialas, D.P., and C.A. Harlow, 1990. Computational image interpretaartifacts. Plate 2e shows how BAGFS was finally able to avoid tion models: An overview and perspective, Photogrammetric Engineering b Remote Sensing, 56(6):871-886. the presence of artifacts by means of the plurality voting rule. When using BAGFS, it is difficult to know exactly how each Atkinson, P.M., and A.R.L. Tatnall, 1997. Neural networks in remote sensing, International Journal of Remote Sensing, 18(4):699-709. single feature is effectively used or not during the classification process. It is also difficult to evaluate the influence of feaBay, S.D., 1999. Nearest neighbour classification from multiple feature tures on the final classification result. In fact, while a single subsets,Intelligent Data Analysis, 3(3):191-209. decision tree classifier intrinsically achieves a feature selection Breiman, L., 1996. Bagging predictor, Machine Learning, 24(2): based on the gain (based, among others, on the information cri123-140. terion (Quinlan, 1993)),the use of decision tree ensembles Campbell, J., 1981. Spatial correlation effects upon accuracy of superbehaves in a completely different manner. Indeed, when each vised classification of land cover, Photogrammetric Engineering b Remote Sensing, 47(3):355-363. weakened decision tree is built, only a small number of features are kept (MFS), and therefore the built decision tree may be CEC, 1993. CORlNE Land-Cover: Guide Technique, Commission of the forced to use redundant, noisy, or inefficient features. ThereEuropean Communities, Luxembourg, 144 p. fore, it implies that the multiple classifier system used is Cohen, J., 1960. A coefficient of agreement for nominal scales, Educaunsuited for single features assessment. tional and Psychological Measurement, 20(1):37-46.

Conclusion High-resolution remote sensing data, i.e., Landsat TM data, were classified according to a simplified CORINE Land Cover classification scheme. The visual interpretation was used as 604

June

2002

Dietterich, T., 1998. Approximate statistical tests for comparing supervised learning algorithms, Neural Computation, 10:1895-1923. Duda, R., P.E. Hart, and D.G. Storck, 2001. Pattern Classification and Scene Analysis. Second Edition, Wiley Interscience, New York, N.Y., 482 p. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Edwards, G., and K.E. Lawell, 1996. Modeling uncertainties in photo interpreted boundaries, Photogrammetric Engineering & Remote Sensing, 62(4):337-391. Emerson, Ch., N. Siu-Ngan Lam, and D.A. Quattrochi, 1999. Multiscale fractal analysis of image texture and pattern. Photogmmmetric Engineering 6 Remote Sensing, 65(1):51-61. Friedl, M.A., and C.E. Brodley, 1997. Decision tree classification of land cover from remotely sensed data, Remote Sensing of Environment, 61:399-409. Fukunaga, K., 1990. Introduction to Statistical Pattern Recognition, Second Edition, Academic Press, New York, N.Y., 591 p. Giacinto, G., F. Roli, and L. Bruzzone, 2000. Combination of neural and statistical algorithms for supe&sed classification of remotesensing images, Pattern Recognition Letters, 21(5):385-397. and P' Howarth* 'lassification and gray-level vector reduction for land-use identification, Photogmmmetric Engineering & Remote Sensing, 58(4):423-437. Gurney, M.C., and J.R.G. Townshend, 1983. The use of contextual information in the classification of remotely sensed data, Photogmmmetric Engineering & Remote Sensing, 49(1):55-64. Hansen, M., R. Dubyah, and R. Defries, 1996. Classification trees: An altemtive land cover InternationalJournal of Remote Sensing, 17(5):1075-1081. Haralick, R.M., K. Shanmugan, and I. Dinstein, 1973. Textural features for image classification, Proceedings of the ZEEE. SMC-3(6): 610-619. Harwood, D., T. Ojala, M. Pietkainem, S. Kelman, and L. Davis, 1995. Texture classification by center-symmetric auto-correlation, using Kullback discrimination of distribution, Pattern Recognition Letters, 16(1):1-10. Ho, T.K., 1998. The random subspace method for constructing decision forests, LEEE flansactions on Pattern Analysis and Machine Intelligence, 20:832-844. Ho, T., J.J. Hull, and S.N. Srihari, 1994. Decision combination in multiple classifier systems, LEEE ?hnsactions on Pattern Analysis and Machine Intelligence, 16:66-75. Hudson, W., 1987. Correct Formulation ofthe Kappa Coefficient, photogmmmetric Engineering b Remote Sensing, 53(4):421-422. JL C., and S. Ma, 1997. Combinations of weak classifiers, LTEE Thnsaction of Neural Network, 7(1):32-42. Kittler, J., 1998. Combining classifiers: A theoretical framework, Pattern Analysis and Application, 1:18-27. Kittler, J., and F. Roli (editors), 2000. Proceedings of the First International Workshop on Multiple Classifier Systems [MCSZOOO),2123 June, Cagliari, Italy, LNCS 1857, Springer, 402 p. Latinne, P., 0. Debeir, and Ch. Decaestecker, 2000. Different ways of weakening decision trees and their impact on classification accuracy, Proceedings of the First International Workshop of Multiple Classifier System (MCS'2000)(J. Kittler and F. Roli, editors), 21-23 June, Cagliari, Italy, LNCS 1857, Springer, pp. 200-209. McKeown, D., S. Cochran, S. Ford, C. McGlone, J. Shufelt, and D. Yocum, 1999. Fusion of HYDICE hyperspectral data with panchromatic imagery for cartographic feature extraction,ZEEE lhnsaction on Geoscience and Remote Sensing, Special Issue on Data Fusion, 37(3):1261-1277. Mesev, V., 1998. The use of census data in urban image classification, Photogmmmetric Engineering 6.Remote Sensing, 64(5):431438. "3

PHOTOGRAMMETRICENGINEERING & REMOTE SENSING

Mulder, N.J., H. Midelkoop, and J.W. Miltenburg, 1991. Process in knowledge engineering for image interpretation and classification, Journal of Photogmmmetry and Remote Sensing, 46:461-171. Parker, J.R., 1997. Algorithms for Image Processing and Computer Vision,Wiley Computer Pub., New York, N.Y., 417 p. Peddle, D.R., 1995. Knowledge for supervised evidential classification, Photogmmmetric Engineering & Remote Sensing, 61(4):409-418. Qui~~lan, J.R., 1993. C4.5: Progmms for Machine Learning, Morgan Kaufmam Publishers, San Mateo, California, 302 p. ,1996. Bagging, boosting and C4.5, Proceedings of the Thirteenth National Conferenceon Artificial Intelligence, 04-08 August, Cambridge, Massachusetts (AAAI PressIMIT Press], pp. 725-730. Ricchetti, E., 2000. Multispectral satellite image and ancillary data integration for geological classification, Photogmmmetric En@neering & Remote Sensing, 66(4):429-435. Richards, J.A., 1994. Remote SensingDigital hage Analysis: An Inkoduction, Springer-Velag, Berlin, Germany, 340 p. Richards, J.A., D.A. Ndgrebe, and P.H. Swain, 1982. A means for utilising ancillary information in multispectral classification, Remote Sensing of Environment, 12:463-477. Rosenfield, A., and L.S. Davis, 1979. Image segmentation and image models, Proceedings of the IEEE, 67(5):764-772. Rosner, B., 1995. Fundamentals of Biostatistics, Fourth Edition, Duxbury Press, Belmont, California, 682 p. RUSS,J.C., 1990. Surface characterisation: Fractal dimension, Hurst coefficient and frequency transform, Journal of Computer Assisted Microscopy, 2:249-257. Ryherd, S., and C. Woodcock, 1996. Combining spectral and texture data in the segmentation of remotely sensed images, Photogmmmetric Engineering 6 Remote Sensing, 62(2):181-194. Salzberg, S., 1997. On comparing classifiers: Pitfalls to avoid and a recommended approach, Data Mining and Knowledge Discovery, 1~317-327. Siegel, S., and N.J. Castellan, 1988. Non-Parametric Statistics for the Behavioral Sciences, Second Edition, McGraw-Hill, New York, NeY.*399 P. Stralher, A.H., 1980. The use of prior probabilities in maximum likelihood classification of remotely sensed data, Remote Sensing of Environment, 10:135-163. 'I\mg, Fung, and King-Chung Chan, 1994. Spatial composition spectral classes: A structural approach for image analysis of heterogeneous land-use and land cover types, Photogrammetric Engineering 6 Remote Sensing, 60[2):173-180. Weiler, R.A., and D.A. Stow, 1991. Spatial analysis of land cover patterns and corresponding remotely-sensed image brightness, International Journal of Remote Sensing, 12(11):2237-2257. Wolff, E., P. Van Ham,M. Sintzoff, I. Van der Steen, 0.Debeir, and M. Bouazza, 1999. Aide d la Reconnaissance et d l'lnterpr6tation de I'Occupation du Sol [ARIOS),Final Report of TELSAT contract 4lDDl007, Earth Observation Program, Brussels, Belgium, 127 p., unpublished. Xu, L., A. Krzyzak, and C.Y. Suen, 1992. Methods of combining multiple classifiers and their applications to handwriting recognition, BEE Thnsactions on Systems, Man and Cybernetics, 22:418-435. Zhu, A-Xing, 1997. Measuring uncertainty in class assignment for natural resource maps under fuzzy logic, Photogmmmetric En@neering 6 Remote Sensing, 63(10):1195-1202. (Received 20 February 2001; accepted 22 May 2001; revised November 2001)

-

June

ZOO2

605