A credit assignment approach to fusing classifiers of ... - IEEE Xplore

2 downloads 80 Views 790KB Size Report
Charles M. Bachmann, Member, IEEE, Michael H. Bettenhausen, Member, IEEE, ... J. H. Porter is with the Department of Environmental Sciences, University of.
2488

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 11, NOVEMBER 2003

A Credit Assignment Approach to Fusing Classifiers of Multiseason Hyperspectral Imagery Charles M. Bachmann, Member, IEEE, Michael H. Bettenhausen, Member, IEEE, Robert A. Fusina, Member, IEEE, Timothy F. Donato, Member, IEEE, Andrew L. Russ, Joseph W. Burke, Gia M. Lamela, W. Joseph Rhea, Barry R. Truitt, and John H. Porter

Abstract—A credit assignment approach to decision-based classifier fusion is developed and applied to the problem of land-cover classification from multiseason airborne hyperspectral imagery. For each input sample, the new method uses a smoothed estimated reliability measure (SERM) in the output domain of the classifiers. SERM requires no additional training beyond that needed to optimize the constituent classifiers in the pool, and its generalization (test) accuracy exceeds that of a number of other extant methods for classifier fusion. Hyperspectral imagery from HyMAP and PROBE2 acquired at three points in the growing season over Smith Island, VA, a barrier island in the Nature Conservancy’s Virginia Coast Reserve, serves as the basis for comparing SERM with other approaches. Index Terms—Barrier Islands, decision-based classifier fusion, hyperspectral remote sensing, land-cover classification, maximum estimated reliability measure (MAXERM), multiple classifier systems, multiple classification system, multiseason classification, smooth estimated reliability measure (SERM), Virginia Coast Reserve.

I. INTRODUCTION

I

T IS WELL KNOWN that the accuracy of land-cover classification can be improved by the use of multitemporal or multisource data [5], [17], [27], [32], [35]. This is particularly true in remote sensing of coastal land-cover, where there are many sources of variability such as inundation in beach zones and tidally influenced wetlands, atmospheric water vapor, and seasonal variations in vegetation. Combining the results of classifiers obtained at different points in the growing cycle or tidal stage can be used to reduce noise and achieve better classification accuracy. Airborne hyperspectral imagery provides a powerful means of discriminating coastal land-cover types with fine detail [2], [12], but yields large data volumes, especially when multiseason data are used. At the same time, many realManuscript received September 20, 2002; revised August 5, 2003. This work was supported by the Office of Naval Research under Contracts N0001400WX40016, N0001401WX40009, and N0001402WX30017. C. M. Bachmann, M. H. Bettenhausen, R. A. Fusina, T. F. Donato, G. M. Lamela, and W. J. Rhea are with the Remote Sensing Division, Naval Research Laboratory, Washington, DC 20375 USA (e-mail: [email protected]). A. L. Russ is with the Department of Geography, University of Maryland, College Park, MD 20742 USA and also with the USDA Agricultural Research Service, Hydrology and Remote Sensing Laboratory, Beltsville, MD 20705 USA. J. W. Burke is with the Department of Geography, University of Maryland, College Park, MD USA 20742. B. R. Truitt is with the The Nature Conservancy, Virginia Coast Reserve, Nassawadox, VA USA 23413 (e-mail: [email protected]). J. H. Porter is with the Department of Environmental Sciences, University of Virginia, Charlottesville, VA 22904-41231 USA. Digital Object Identifier 10.1109/TGRS.2003.818537

world applications demand reasonable “turnaround” time to be of practical utility in a production mode, especially when a large volume of data must be processed. Even in a research environment, large data volumes may limit the practical utility of algorithms that are too slow. In this study, for example, the three-season hyperspectral imagery of Smith Island amounts to 6 GB of data, and Smith Island is just one of six islands for which we have acquired multiseason imagery in our study area. Thus, we seek approaches that are either: 1) as accurate as existing methods but significantly faster or 2) more accurate than existing methods without dramatically sacrificing processing speed. This paper focuses on a new approach to decision-based fusion of classifiers, building on the work of a number of authors [5], [6], [31], [35], [36], [39], [42], who have addressed the problem of classifier fusion either in remote sensing or other applications. The new method of classifier fusion that we develop here is as accurate or better than many competing options but is significantly faster because it does not require additional training beyond the training of the constituent classifiers. SERM also scales better as the number of classifiers in the pool increases. Results of the new approach are evaluated using multiseason hyperspectral imagery of land-cover on a barrier island that was previously studied using single-season hyperspectral imagery [2]. Ultimately, the new approach, SERM, achieved the highest test accuracy of all methods that we explored in this study. The rest of our paper is organized as follows. In Section II, after providing a historical context and perspective for our work, we present the new approach. In Section III, we briefly describe the hyperspectral data and ground truth used in our experiments. In Section IV, we present the results comparing the new approach to the performance of single-season classifiers and other multiclassifier fusion algorithms, and in Section V, we summarize the results and draw conclusions. II. APPROACH AND METHODS A. Combining Classifiers: A Historical Perspective A variety of approaches to the problem of combining classifiers have been proposed over the years. These include the Borda Count [19], Bayesian classifiers, Dempster–Shafer [16], [40], and optimal weighted averaging with linear or log pools [6], [14]. In some instances, the problem of credit assignment has also been directly addressed. The problem of credit assignment is fundamental to the successful fusion of classifiers. By some means, the goal is that the composite output should produce a

0196-2892/03$17.00 © 2003 IEEE

BACHMANN et al.: FUSING CLASSIFIERS OF MULTISEASON HYPERSPECTRAL IMAGERY

result that is more robust than the performance of any single constituent classifier. This can be achieved in several ways: by partitioning the problem into credit assignment zones in the input domain [21], by smoothing in either the input domain [36], [39] or the output domain [15], [28], [29], [40] to reduce noise, or by hierarchical methods [6], [24]. In some instances, smoothing and hierarchical processing are combined [6]. For high-dimensional data sources such as hyperspectral imagery, output domain approaches that focus on the classifier posterior probabilities may be preferable to input domain approaches because output domain methods involve lower dimensional information and, thus, avoid problems related to the curse of dimensionality [4] and multidimensional scaling [9]; output domain approaches are also usually less computationally intensive in these circumstances. Multiple classifier fusion has been viewed from the perspective of the well-known bias-variance dilemma [13], [37]. Classifier error can be decomposed into two terms: one is the classifier bias, and the other is the variance of the classifier estimate. Many multiclassifier algorithms fuse classifiers with approximately the same bias but different error distributions; combining these classifiers then reduces the variance term in the error made in approximating the true mapping [37]. There are several ways to produce uncorrelated errors in the pool of classifiers to be fused [37]; these include: 1) varying the architecture of the classifier (e.g., the number of nodes and free parameters in a neural network), 2) varying the choice of algorithm (e.g., see [36]), or 3) varying the training data which is known as “bagging” or “boosting” [8]. In remote sensing applications, variance reduction also can be achieved by using multisensor or multitemporal data to produce a pool of classifiers with decorrelated error distributions. In this paper, we use hyperspectral data from three different seasons to train a set of classifiers that will produce statistically decorrelated errors. We also combine different algorithms in a second set of experiments and examine the robustness of fusion algorithms when weak classifiers are added to the pool. B. Overview of Competing Approaches to Fusing Classifiers: Experimental Design In Section IV, we compare multiseason classification results produced by classifier fusion using two local reliability estimates, one which we call the maximum estimated reliability measure (MAXERM) an averaged version of a measure found in [35] and a smoothed version of this (SERM); these are compared with several other methods for fusing classifiers. Algorithms also evaluated included the generalized ensemble model (GEM) [31], a decoupled GEM (DGEM), majority vote, and a simple composite approach. Each of these algorithms was used to produce a multiseason classification using the outputs of the three single-season classifiers trained on either data from the HyMAP1 May 2000 imagery, PROBE22 August 2001 imagery, or PROBE2 October 2001 imagery. Each multiseason classification was also compared against the performance of individual classifiers applied to single-source data. In the first set of ex1Analytical 2ESSI.

Imaging and Geophysics, LLC. See http://www.aigllc.com. See http://www.earthsearch.com.

2489

periments, the single-season constituent classifiers were optimized using the backward propagation neural network [34] with a cross-entropy cost function [33] and an adaptive sampling algorithm known as adaptive resampling strategy for error prone exemplars (ARESEPE) [1]. Each of the classifier fusion approaches was applied to this constituent pool of three singleseason BPCE-ARESEPE (BPCE using the ARESEPE adaptive sampling strategy) classifiers. In a second set of experiments, we added three additional single-season constituent classifiers. The goal in these experiments was to test the robustness of classifier fusion algorithms to the presence of weak, suboptimal classifiers in the pool. We added three single-season classifiers to the pool, consisting of either: 1) principal components analysis (PCA) [43] followed by a distance weighted K-nearest neighbor (DWKNN) classifier [10] or 2) DWKNN by itself. In the PCA-DWKNN case, each suboptimal classifier consisted of a two-stage process in which spectra were projected using PCA . For the PCA and then classified using DWKNN analysis, 30 components were retained for each season, since this explained 99.99% of the variance. DWKNN test accuracy was not very sensitive to , so we used an intermediate value. The primary reason for using PCA was to decrease the dimensionality of the DWKNN reference codebook (the training set). In Sections II-C–II-I, we briefly describe the other single-season and multiseason classifier fusion algorithms. C. BPCE Classifier In the first set of experiments described in Section IV, singleseason classifiers were all developed using the backward propagation of error classifier [34] with a cross-entropy cost function (BPCE) [33]. The BPCE cost function is

(1) where the superscript is the last layer in an -layer classifier, is the desired output, either 0 or 1, for one of the category is the actual response nodes at the output of the classifier, of the output node to a particular input pattern propagated foris the net ward through the classifier, and the weight vector and the offset. input to that node, with We use the cross-entropy cost function because it is less prone to local minima than the originally proposed least mean square (LMS) error, owing to the form of the gradient used in the stochastic gradient descent [2], [33]. Specifically, the presence of the logarithms in (1) eliminates terms in the last layer gradient that are present when an LMS cost function is used

BPCE

BPLMS

(2)

(3)

2490

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 11, NOVEMBER 2003

Fig. 1. Schematic diagrams for (left) the simple composite and (right) SERM. The simple composite uses the same algorithm in the back-end classifier that was used in the constituent single-season classifiers, BPCE with ARESEPE. SERM computes a smoothed estimated reliability measure without further training. Smoothing is done at both the classifier and pool levels.

Here is the last layer “error signal” [34], which is the negative partial derivative of the cost function with respect to the node net input , and is the gain of the sigmoidal . The gradient transfer function with respect to the weight vector is just the product of the error signal and the input vector from the previous layer: , and as usual, the weight up, date rule is where (set to 0.6 in the experiments described here) is the is the time-dependent learning “momentum” term, and rate, which decreases (in our case) logarithmically with time [18]. Comparing (2) and (3), it is obvious that BPLMS has and and two local minima when and ; BPCE does not have likewise at these local minima. Thus, BPCE is less likely to be trapped near a local minimum in which the desired pattern response is at one extreme, while the actual response is antipodal. Note that because of the backward propagation of the error signal [34], these local minima can effect all layers in BPLMS, whereas this effect is eliminated for all layers when BPCE is used. One additional feature of our classifier implementation was the use of a resampling buffer which detects the presence of patterns causing classifier output errors and resamples these more frequently during the optimization cycle. A more thorough description of the ARESEPE algorithm is given in [1]. We give a short overview in Section II-D.

D. ARESEPE Active sampling training strategies have been studied in the context of a variety of pattern recognition applications [20], [30], [41]. These strategies aim to reduce significantly the amount of training time required to optimize classifiers and in some cases also produce more accurate results. These methods focus optimization sampling on pattern boundaries between categories. ARESEPE [1] achieves this by using a resampling buffer that is part of the total input stream along with the regular data stream that comprises the training set. Entry into the resampling buffer is determined for each sample as it is processed, a category response determined, and updates performed. The buffer entry criterion is proportional to the degree of misclassification that occurs. Patterns that do not produce error should not enter the buffer, while those that cause the most error should be the most likely to enter the resampling buffer. ARESEPE uses a misclassification measure that was first defined as an alternative -category discriminant function [22] (4) where index is the true category associated with input sample is just vector . The asymptotic limit as (5) is the maximum responding discriminant function where not equal to the true discriminant function. Thus, a positive represents the amount by which the winning value of

BACHMANN et al.: FUSING CLASSIFIERS OF MULTISEASON HYPERSPECTRAL IMAGERY

2491

TABLE I SUMMARY OF ALGORITHM ACRONYMS

discriminant function was larger than the true discriminant function. Let (6) for each category node , where as before, is the index of the true category, and then compute (7) indicates whether the pattern was misclasthen the sign of ; when a sified and automatically determines the quantity pattern is misclassified, entry to the resampling buffer is propor. Further implementation details are in [1]; however, tional to the main reason for using ARESEPE is that convergence time for this hyperspectral data is improved by one or two orders of magnitude depending on the rate of resampling from the buffer for all experiments, but a complete [1]. In this paper, analysis of this parameter and the buffer size appears in [1]. E. Majority Vote Consensus voting schemes have been widely used in many applications [5], [14]. As implemented here, when a majority of classifiers in the pool respond with the same classification label, that label becomes the classification of the composite. If a tie occurs, then the label is chosen randomly from the tied categories. We are actually describing a plurality voting rule

[15]: the majority in [15] actually requires that more than half agree on the same label; however, in the results below, plurality and majority are equivalent in the three-classifier pools. Consensus votes are known to improve results provided the constituent classifiers generalize well and the distribution of errors is somewhat decorrelated [15]. F. GEM The generalized ensemble model [31] is a general framework for combining the results of classifiers, independent of the underlying form of the algorithm, or the data distribution to be modeled. However, it is a linear pool, which is known to have , the limitations [6]. Given a set of classifiers, with output is output probability of the ensemble estimate, where (8) The principal weakness of GEM is that it does not perform credit assignment on a local basis; it achieves an optimal solution only in an average sense by inverting a conglomerate error covariance matrix (9) As a result, it will be biased toward the dominant categories in the training set.

2492

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 11, NOVEMBER 2003

G. Decoupled GEM In Section IV, we also include a decoupled version of GEM (DGEM) in which the ensemble estimate over all categories is replaced by separate estimates and error covariance matrices, one associated with each category (10) (11) (12) DGEM primarily addresses the problem of sparsely represented categories but not the problem of complex category distributions: each category node in the fused classifier output still has only one associated error covariance matrix for all input vectors. Section IV shows that even DGEM cannot compete with truly local credit assignment classifier fusion algorithms such as SERM.

domain responses obtained for perturbations to

according

(15) When is low dimensional, the approaches described in [36] and [39] effectively sample the input domain subset described by (15) at the available sample points, although the distanceweighted KNN approach used in [36] probably does a better job of ensuring that the neighborhoods remain relatively small. However, for very high dimensional vectors, multidimensional scaling problems can distort the concept of proximity in the input space when simple distance measures are used. Thus, we focus on the output domain in deriving classifier reliability measures. A zero-order approximation to (15) is to replace the Taylor series by the first term, namely (16)

H. Simple Composite Classifier One of the simplest decision-based fusion options is to use the same algorithm [BPCE (with ARESEPE)] that was used to optimize the single-season classifiers, to develop a simple composite classifier. In this approach, we use the outputs of the single-season classifiers as inputs to a back-end BPCE/ARESEPE classifier; subsequently we refer to this approach as the simple composite (Fig. 1). Compared to GEM and DGEM, this approach has the advantage that it provides additional decision hyperplanes, rather than a single decision hyperplane across the entire data space. The principal disadvantages of the simple composite are the need for additional training of the back-end classifier and poor scaling with increasing pool size. The latter stems from the fact that constituent classifier output vectors are concatenated as inputs to the back-end classifier, leading to back-end classifiers with many more parameters to optimize, and also slower feedforward propagation. I. MAXERM and SERM for the th classifier are proThe output vector elements portional to the posterior probabilities that the classifier assigns to each category. Therefore, choosing the category of the output node with the largest response will be equivalent to choosing the category with the maximum a posteriori (MAP) probability as the predicted category label. For classifier , this is (13) In the first experiment of Section IV, our constituent classifiers are neural networks optimized using BPCE with ARESEPE. For many neural networks, including BPCE, each output category must be divided by the sum of these responses to response obtain posterior probabilities on a per-sample basis (14) The output domain neighborhood around a particular classifier’s response to input vector can be described by the set of output

In this approximation, the probabilities described in (14) can be thought of as representing a zeroth-order sample of the output domain neighborhood around the sample point , and in particular are related to the probability that another class label would be returned from a nearby sample. Thus, if the th classifier’s predicted category label is given by (13), then a measure of the can be written as classifier’s self-reported reliability

(17) We note that this is a sum over a set of local category reliability originally defined in [35] measures (18) . In [35], (18) apwhere peared as a local reliability estimate in a penalty term of a log likelihood function that was used to model temporal transition probabilities in multidate SAR and Landsat Thematic Mapper imagery. In the present work, we use a sum over local reliability estimates for each category to derive a reliability (17) for the classifier label returned by (13). Equation (17) is the first of two reliability expressions that will be evaluated in Section IV. Specifically, the approach that we call MAXERM assigns the fused class label for a particular sample to that of the classifier with the maximum reliability determined by (17) (19) When all of the category probabilities are the same, the selfreported local reliability estimate in (17) is zero; however, when the output response is unity for one category and zero for all other categories, then the classifier reports perfect reliability. In order to minimize the risk of using the self-reported reliabilities, we can smooth the reliability estimates over the set of classifiers. To achieve this, we compute the predicted class labels from (13) for each classifier and the corresponding selfreported reliability from (17), and calculate the sum of the reliabilities for each category over the set of classifiers. For each

=

=

=

=

=

=

Fig. 2. (Top) Three-season airborne hyperspectral imagery of Smith Island. RGB composites: (left) May HyMAP scene (R 650.5 nm, G 542.8 nm, B 452.1 nm); (middle) August PROBE2 scene (R 446.2 nm); (right) October PROBE2 scene (RGB channels same as August). (Middle) Enlargements, southern end of Smith Island, showing seasonal variations, especially in swale 645.9 nm, G 552.8 nm, B vegetation. (Bottom) Ground photographs of a dominant swale grass Distichlis spicata seen in the middle row. (Left) May 2001, (middle) August 2001, and (right) October, 2001, (nearby Hog Island).

2493 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 11, NOVEMBER 2003

2494

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 11, NOVEMBER 2003

TABLE II

possible output category label sifier reliabilities

, we take the sum over the clas-

(20)

The final predicted category of the composite classification is the category with the largest (21) Equations(13), (17), (20), and (21) define the SERM algorithm, the second new reliability method for fusing classifiers (Fig. 1). J. Summary of Algorithms Table I contains a summary of the algorithms used in this paper. The acronym for each algorithm is followed by a brief description of the algorithm, the type of approach it represents, and references to the algorithm in the literature. We have first listed the classifiers used in single-season models, as well as ARESEPE, the active sampling strategy described earlier. The second half of the table consists entirely of the multiple-classification systems (MCS) [36]. Under the heading “Type of Approach,” the MCS algorithms are characterized using the taxonomy defined in [36], where algorithms are defined as performing either “classifier selection” or “classifier fusion.” “Classifier selection” implies that the label of the best classifier in the MCS pool, using some measure of fitness, is the response of the MCS, while “classifier fusion” implies that all classifiers in the MCS pool play a role in the final answer. Note that given this MAXERM [see (19)] is a “classifier selection” algorithm, while SERM [see (20) and (21)] is a “classifier fusion” algorithm. All but one of the MCS algorithms are “Type III” [36], meaning that they use some measures of the constituent classifier outputs to determine the MCS output, while the majority vote is “Type I” [36], indicating that it uses only the labels themselves in determining the MCS output. Before describing the data and results, it is worth emphasizing that DGEM, MAXERM, and SERM are novel algorithms in this paper. Although MAXERM uses a reliability measure first defined in [35], the use of the reliability measure for classifier selection is novel to this paper; in [35], the reliability measure was used to model temporal transition probabilities in multitemporal data, but not for classifier selection. Likewise, SERM is novel in a second way because it performs smoothing by summing these reliabilities over all classifiers represented in the pool.

Rather than an exhaustive comparison, we chose a subset of algorithms that are in common usage or representative of extant algorithms for relatively fast, decision-based classifier fusion to compare against the new algorithms SERM and MAXERM. The majority vote is, of course, widely used. Likewise, GEM was chosen because it is popular in the neural network community and optimizes quite quickly, but as we will see, GEM generalization accuracy is suboptimal because it does not perform credit-assignment on a local basis. DGEM was an attempt to make GEM a little more local by decoupling the covariance matrices, but its performance on the database used in this paper was not significantly different from that of GEM. The simple composite architecture is included because it is the simplest approach for classifier fusion; it is also widely used, and achieves good results. However, the simple composite does require further optimization once the constituent pool is optimized. MAXERM and SERM were developed to avoid the requirement of further optimization in the simple composite while still achieving good classification results. III. DATA AND STUDY AREA A. Virginia Coast Reserve The present study builds upon earlier research [2], in which land-cover classification of Smith Island, a barrier island in the Virginia Coast Reserve (VCR), was investigated using a HyMAP image acquired in May 2000.3 In this paper, multiseason classifications are compared against single-season classifications of airborne hyperspectral imagery acquired at three points in the growing cycle. HyMAP [25] imagery of Smith Island was acquired on May 8, 2000, and PROBE2 imagery was subsequently acquired on August 22 and October 18, 2001. RGB composites derived from these data are shown in Fig. 2. PROBE2 and HyMAP are similar hyperspectral sensors that cover the spectral range between 440–2500 nm with a spectral FWHM of typically 15–20 nm. Both sensors have 128 spectral channels, but for these collections, there were 124 usable channels for PROBE2, and 126 for HyMAP. Scene 2.5 km (HyMAP), dimensions are approximately 16.1 km 12.4 km 2.6 km (PROBE2, August), and 12.2 km 2.5 km (PROBE2, October). Atmospheric corrections for the HyMAP scene were applied using ATREM/EFFORT by AIG prior to delivery. For the PROBE2 data, we applied an algorithm described in [11], commonly known as the 6S model, to the radiance data, which we then polished using the EFFORT algorithm [7]. The May HyMAP imagery and the August 3Web site for the University of Virginia’s Long Term Ecological Research Program. See http://www.vcrlter.virginia.edu.

BACHMANN et al.: FUSING CLASSIFIERS OF MULTISEASON HYPERSPECTRAL IMAGERY

2495

TABLE III PERCENTAGE ACCURACY

FOR

TRAINING

AND TEST SETS. OPTIMIZATION TIMES FOR ARESEPE EXPERIMENTS, r

PROBE2 imagery were acquired near high tide, while the October PROBE2 imagery was acquired near low tide. The early May scene is taken in a period during which the vegetation will typically be a mixture of new growth and senescent vegetation from the previous season; the senescent vegetation may partly obscure the emerging new growth beneath it. The mid-August scene was acquired during the peak of the growing season, while the mid-October scene was acquired as the vegetation had begun to senesce. In the latter case, some vegetation will show more deeply contrasting colors, and tonal changes in the visible part of the spectrum may provide better contrast for discrimination purposes. B. Land-Cover Categories and Ground Data Our supervised classification category maps have been validated with in situ observations made by us during a series of field surveys with global positioning system (GPS) and differential GPS conducted on Smith Island as described in [2]. Surveyed regions were used to create spectral libraries from the georectified and coregistered HyMAP and PROBE2 imagery. These spectra were divided into a training set (3632 pixels), a cross-validation test set (1971 pixels) used to stop optimization, and a second sequestered test set (2834 pixels) that served as an independent assessment of expected performance. The DGPS ground data also were used to improve georectification of the imagery. The categories used in this study appear in Table II. IV. RESULTS Ten trials were performed for each single-season classifier and the corresponding three-season fused classification algo-

ARE

FOR AN

= 05

ATHLON XP 1800+ PROCESSOR.

:

rithms, which used these as inputs. Mean and standard deviation of accuracies of all classification approaches are shown in Table III. It is not surprising that the single-season May HyMAP classification results are significantly lower than those obtained for the August and October PROBE2. We expect this because of the seasonal variations in vegetation previously described in Section III-A. For the first multiclassifier experiments using the three single-season BPCE-ARESEPE classifiers as input, Table III shows that the best results for the test data sets were achieved by the SERM and simple composite algorithms; SERM mean accuracy for the sequestered test set exceeded simple composite accuracy by roughly one standard deviation. All other competing techniques produced less accurate generalization to test data. In the second set of experiments in which the weak classifiers based on PCA-DWKNN or DWKNN were added to the pool, we compared only the simple composite, MAXERM, and SERM and found that SERM and MAXERM accuracy did not change by a statistically significant amount, while the simple composite performance dropped considerably. Looking at the results for the three-classifier pools, we see that smoothing in SERM, achieved by the two sets of sums in (17) and (20), which effectively averages at both the classifier and pool level, is apparently more robust than just averaging at the classifier level as is done in MAXERM. SERM test accuracy exceeded that of the simple composite approach despite the large number of free parameters available in the simple composite, and SERM performance was achieved without the need for back-end optimization. Likewise, as the pool size and complexity grew, optimization time for the simple composite grew considerably (Table III). SERM also achieved

2496

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 11, NOVEMBER 2003

Fig. 3. Example land-cover classification subsets from the southern end of Smith Island, showing single-season classification results for (upper left) HyMAP, 5/8/00, (upper right) PROBE2 8/22/01, (lower left) PROBE2 10/18/01, and for SERM (lower right) based on these three models as inputs.

the highest overall classification accuracy on the Sequestered Test Set. Compared to the best single-season classifications, the SERM result is an improvement in average accuracy of 6.7% for the sequestered test set and 5.2% for the cross validation.

Most of the improvements in three-season composite classifications occurred in a few specific categories, principally in marsh categories. For the Sequestered Test Set, the largest improvements in three-season ensemble classification accuracies were in categories such as Distichlis spicata, one of the dom-

BACHMANN et al.: FUSING CLASSIFIERS OF MULTISEASON HYPERSPECTRAL IMAGERY

inant swale grasses, Juncus roemerianus, Wrack, and Spartina alterniflora, the dominant salt marsh vegetation. In the singleseason classifications, some categories were more accurately identified in a particular season. For example, among singleseason classifications, Iva frutescens was most readily identified in the August classifiers; the May classifiers were less accurate because the leaves have typically not emerged this early in the growing season; likewise, in October the leaves may have senesced to a great degree or fallen off completely. Algorithms such as SERM and the simple composite are able to select the best performance in one category from a particular classifier, such as Iva, so that it is included in the composite classification. We see this, for example, in Fig. 3. Likewise, artifacts that appeared in specific single-scene classifiers, e.g., water regions mislabeled because of glint, have been corrected in the fused classification. Furthermore, the SERM classification in Fig. 3 also shows a dramatic reduction in false alarms for the invasive plant Phragmites australis. Looking beyond the test set accuracy, we know from our surveys that most of the SERM improvement in this example came from the removal of false positives that occurred in the center of swales and on the western edge of the back-dune vegetation in the single-season classifications. These successes are due to the local nature of the credit assignment achieved in SERM.

V. SUMMARY AND CONCLUSION We have introduced two new approaches to fusing classifiers that relied on single-sample estimated reliability measures, one based on the maximum classifier reliability (MAXERM) and another based on a smoothed version (SERM), which averages over the reliability of all predicted category labels across all classifiers in the pool. The reliability measures are directly available from the outputs of the pool of trained classifiers to be fused, without further optimization. These measures do not depend on the specific type of classifiers in the pool. A statistical argument was advanced to show that these reliability measures directly estimate a zeroth-order approximation to the output domain neighborhood of the classifier posterior probability. For the initial set of illustrative hyperspectral landcover classification experiments, SERM was superior to a variety of different approaches to fusing classifiers including GEM, DGEM, and majority vote, and about one standard deviation better than the simple composite. Likewise, once the classifier pool was assembled, SERM required no optimization, while the simple composite did require further optimization. The simple composite also scales poorly in terms of optimization time and feedforward complexity (number of free parameters) as the pool size grows, while SERM scales well, relying only on simple formulas that are functions of the classifier outputs to classify novel inputs. SERM was the most robust classifier fusion algorithm, as the pool size was increased and weak classifiers were added to the initial pool of three singleseason hyperspectral classifiers, while the simple composite performance degraded significantly when the weak classifiers were added.

2497

ACKNOWLEDGMENT The authors acknowledge computing resources provided by the DOD High Performance Computing (HPC) Modernization Program, including SMDC, the Maui High Performance Computing Center (MHPCC), and the Army Research Laboratory’s Major Shared Resource Center (ARLMSRC). REFERENCES [1] C. M. Bachmann, “Improving the performance of classifiers in high-dimensional remote sensing applications: An adaptive resampling strategy for error-prone exemplars (ARESEPE),” IEEE Trans. Geosci. Remote Sensing, vol. 41, pp. 2101–2112, Sept. 2003. [2] C. M. Bachmann, T. F. Donato, G. M. Lamela, W. J. Rhea, M. H. Bettenhausen, R. A. Fusina, K. DuBois, J. H. Porter, and B. R. Truitt, “Automatic classification of land-cover on Smith Island, VA using HYMAP imagery,” IEEE Trans. Geosci. Remote Sensing, vol. 40, pp. 2313–2330, Oct. 2002. [3] G. D. Bailey, S. Raghavan, N. Gupta, B. Lambird, and D. Lavine, “InFuse—An integrated expert neural network for intelligent sensor fusion,” in Proc. IEEE/ACM Int. Conf. Developing and Managing Expert System Programs, 1991, pp. 196–201. [4] R. Bellman, Adaptive Control Processes: A Guided Tour. Princeton, NJ: Princeton Univ. Press, 1961. [5] J. A. Benediktsson and P. H. Swain, “Consensus theoretic classification methods,” IEEE Trans. Syst., Man, Cybern., vol. 22, pp. 688–704, Apr. 1992. [6] J. A. Benediktsson and I. Kanellopoulos, “Classification of multisource and hyperspectral data based on decision fusion,” IEEE Trans. Geosci. Remote Sensing, vol. 37, pp. 1367–1377, May 1999. [7] J. Boardman, “Post-ATREM polishing of AVIRIS apparent reflectance data using EFFORT: A lesson in accuracy versus precision,” in Summaries of the 7th Annu. JPL Airborne Geoscience Workshop. Pasadena, CA, 1998. [8] G. J. Briem, J. A. Benediktsson, and J. R. Sveinsson, “Multiple classifiers applied to multisource remote sensing data,” IEEE Trans. Geosci. Remote Sensing, vol. 40, pp. 2291–2299, Oct. 2002. [9] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. New York: Wiley, 2001. [10] S. A. Dudani, “The distance-weighted k-nearest-neighbor rule,” IEEE Trans. Syst. Man, Cybern., vol. SMC-6, pp. 325–327, 1976. [11] B. Gao and C. O. Davis, “Development of a line-by-line atmosphere removal algorithm for airborne and spaceborne imaging spectrometers,” Proc. SPIE, vol. 3118, pp. 132–141, 1997. [12] M. Garcia and S. L. Ustin, “Detection of interannual vegetation responses to climactic variability using AVIRIS data in a Coastal Savanna in California,” IEEE Trans. Geosci. Remote Sensing, vol. 39, pp. 1480–1490, July 2001. [13] S. Geman, E. Bienenstock, and R. Doursat, “Neural networks and the bias/variance dilemma,” Neural Comput., vol. 4, pp. 1–58, 1992. [14] C. Genest and J. V. Zidek, “Combining probability distributions: A critique and an annotated bibliography,” Stat. Sci., vol. 1, no. 1, pp. 114–148, 1986. [15] L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. 993–1001, Oct. 1990. [16] S. Le Hegarat-Mascle, I. Bloch, and D. Vidal-Madjar, “Application of Dempster-Shafer evidence theory to unsupervised classification in multisource remote sensing,” IEEE Trans. Geosci. Remote Sensing, vol. 32, pp. 768–778, July 1994. [17] S. Le Hegarat-Mascle, A. Quesney, D. Vidal-Madjar, and O. Taconet, “Land cover discrimination from multitemporal ERS images and multispectral Landsat images: A study case in an agricultural area in France,” Int. J. Remote Sens., vol. 21, no. 3, pp. 435–456. [18] T. M. Heskes, E. T. P. Slijpen, and B. Kappen, “Cooling schedules for learning in neural networks,” Phys. Rev. E, vol. 47, no. 6, pp. 4457–4464, 1993. [19] T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision combination in multiple classifier systems,” IEEE Trans. Pattern Anal. Machine Intell., vol. 16, pp. 66–75, Jan. 1994. [20] J.-N. Hwang, J. J. Choi, S. Oh, and R. J. Marks, II, “Query-based learning applied to partially trained multilayer perceptrons,” IEEE Trans. Neural Networks, vol. 2, pp. 131–136, Jan. 1991.

2498

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 11, NOVEMBER 2003

[21] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Comput., vol. 3, pp. 79–87, 1991. [22] J. H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. Signal Processing, vol. 40, pp. 3043 –3054, Dec. 1992. [23] L. Kanal and S. Raghavan, “Hybrid systems—A key to intelligent pattern recognition,” in Proc. Int. Joint Conf. Neural Networks, vol. IV, 1992, pp. 177–183. [24] S. Kumar, J. Ghosh, and M. Crawford, “Hierarchical fusion of multiple classifiers for hyperspectral data analysis,” Pattern Anal. Applicat., vol. 5, pp. 210–220, 2002. [25] F. A. Kruse, J. W. Boardman, A. B. Lefkoff, J. M. Young, and K. S. Kierein-Young, “The 1999 AIG/HyVista HyMap group shoot: Commercial hyperspectral sensing is here,” in Proc. SPIE Int. Symp. AeroSense. Orlando, FL, 2000. [26] P. Loonis, E.-H. Zahzah, and J.-P. Bonnefoy, “Multi-classifiers neural network fusion versus Dempster-Shafer’s orthogonal rule,” in Proc. IEEE Int. Conf. Neural Networks, vol. 4, 1995, pp. 2162–2165. [27] F. Melgani and S. Serpico, “A statistical approach to the fusion of spectral and spatio-temporal contextual information for the classification of remote sensing images,” Pattern Recognit. Lett., vol. 23, pp. 1053–1061, 2002. [28] C. J. Merz, “Using correspondence analysis to combine classifiers,” Mach. Learn., vol. 36, no. 1–2, pp. 33–58, July 1999. [29] C. J. Merz and M. J. Pazzani, “A principal components approach to combining regression estimates,” Mach. Learn., vol. 36, no. 1–2, pp. 9–32, July 1999. [30] J.-M. Park and Y. H. Hu, “Adaptive on-line learning of optimal decision boundary using active sampling,” in Proc. 1996 Workshop Neural Networks for Signal Processing VI, S. Usui, Y. Tohkura, S. Katagiri, and E. Wilson, Eds. Kyoto, Japan: IEEE, 1996, pp. 253–262. [31] M. P. Perrone and L. N. Cooper, “When networks disagree: Ensemble methods for hybrid neural networks,” in Artificial Neural Networks for Speech and Vision, R. J. Mammone, Ed. New York: Chapman Hall, 1993. [32] L. E. Pierce, K. M. Bergen, M. C. Dobson, and F. T. Ulaby, “Multitemporal land-cover classification using SIR-C/X-SAR imagery,” Remote Sens. Environ., vol. 64, pp. 20–33, 1998. [33] M. D. Richard and R. P. Lippman, “Neural network classifiers estimate Bayesian a posteriori probabilities,” Neural Comput., vol. 3, pp. 461–483, 1991. [34] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing, Explorations in the Microstructure of Cognition, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA: MIT Press, 1986, vol. 1, Foundations, pp. 318–362. [35] A. H. Schistad Solberg, A. K. Jain, and T. Taxt, “Multisource classification of remotely sensed data: Fusion of Landsat TM and SAR images,” IEEE Trans. Geosci. Remote Sensing, vol. 32, pp. 768–778, July 1994. [36] P. C. Smits, “Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection,” IEEE Trans. Geosci. Remote Sensing, vol. 40, pp. 801–813, Apr. 2002. [37] A. J. C. Sharkey, “Multi-net systems,” in Combining Artificial Neural Nets, Ensemble and Modular Multi-Net Systems. Berlin, Germany: Springer-Verlag, 1999. [38] J. T. Tou and R. C. Gonzales, Pattern Recognition Principles. Reading, MA: Addison-Wesley, 1974. [39] K. Woods, W. P. Kegelmeyer Jr., and K. Bowyer, “Combination of multiple classifiers using local accuracy estimates,” IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 405–410, Apr. 1997. [40] L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multiple classifiers and their applications to handwriting recognition,” IEEE Trans. Syst., Man, Cybern., vol. 22, pp. 418–435, Mar. 1992. [41] K. Yamauchi, N. Yamaguchi, and N. Ishii, An Incremental Learning Method With Relearning of Recalled Interfered Patterns, S. Usui, Y. Tohkura, S. Katagiri, and E. Wilson, Eds. Piscataway, NJ: IEEE, 1996, pp. 243–252. [42] A. Verikas, A. Lipnickas, K. Malmqvist, M. Bacauskiene, and A. Gelzinis, “Soft combination of neural classifiers: A comparative study,” Pattern Recognit. Lett., pp. 429–444, 1999. [43] J. H. Wilkinson and C. Reinsch, Handbook for Automatic Computation, 1971, vol. 2, Linear Algebra.

Charles M. Bachmann (M’92) received the A.B. degree from Princeton University, Princeton, NJ, in 1984, and the Sc.M. and Ph.D. degrees from Brown University, Providence, RI, in 1986 and 1990, respectively, all in physics. While at Brown University, he participated in interdisciplinary research in the Center for Neural Science, investigating adaptive models related to neurobiology and to statistical pattern recognition systems for applications such as speech recognition. In 1990, he joined the Naval Research Laboratory (NRL), Washington, DC, as a Research Physicist in the Radar Division, serving as a Section Head in the Airborne Radar Branch from 1994 to 1996. In 1997, he moved to the Remote Sensing Division, where he is currently Head of the Coastal Science and Interpretation Section of the new Coastal and Ocean Remote Sensing Branch. He has been a Principal Investigator for projects funded by the Office of Naval Research, and more recently for an internal NRL project that focused on coastal land-cover from hyperspectral and multisensor imagery. His research interests include image and signal processing techniques and adaptive statistical pattern recognition methods and the instantiation of these methods in software. His research also focuses on specific application areas such as multispectral and hyperspectral imagery, field spectrometry, SAR, and multisensor data as these apply to environmental remote sensing, especially wetlands and coastal environments. Dr. Bachmann is a member of the American Geophysical Union, the Society of Wetland Scientists, and the Sigma Xi Scientific Research Society. He is the recipient of two NRL Alan Berman Publication Awards (1994 and 1996) and an Interactive Session Paper Prize at IGARSS ’96.

Michael H. Bettenhausen (S’93–M’95) received the B.S., M.S., and Ph.D. degrees in electrical engineering from the University of Wisconsin, Madison, in 1983, 1990, and 1995, respectively. His graduate research focussed on theoretical and computational studies of radio frequency heating in plasmas. He did software development and algorithm research for particle simulation while with the Mission Research Corporation, Santa Barbara, CA, from 1997 to 2000. In 2000, he joined Integrated Management Services, Inc., Arlington, VA, where he worked on projects for analysis and processing of hyperspectral remote sensing data and inverse synthetic aperture radar data. He is currently with the Remote Sensing Division of the Naval Research Laboratory, Washington, DC. His research interests include analysis of hyperspectral remote sensing data, high-performance computing, and passive microwave remote sensing.

Robert A. Fusina (M’01) received the B.S. degree from Manhattan College, NY, and the M.S. and Ph.D. degrees from the State University of New York, Albany, all in physics. He has been with the Remote Sensing Division, Naval Research Laboratory, Washington, DC, since 1993. His current research involves land cover classification, hyperspectral remote sensing, and data fusion. His previous work included calculation of radar scattering from ocean waves. Timothy F. Donato (M’99) was born in Washington, DC, on August 8, 1961. He received the B.S. degree in biology from Christopher Newport College, Newport News, VA, in 1986, and the M.S. degree in physical oceanography from North Carolina State University (NCSU), Raleigh, in 1994. He is currently pursuing the Ph.D. degree in physical ocean sciences and engineering at the College of Marine Studies, University of Delaware, Newark. He has conducted research on the active microwave observations of the Gulf Stream frontal region at low grazing angles in support of a Naval Research Laboratory (NRL), Washington, DC, Advanced Research Initiative. He is currently a Geophysicist with the Remote Sensing Division, NRL. He has been with NRL since 1995, and his current research involves quantitative interpretation and analysis of moderate- to high-resolution (spatial and spectral) satellite imagery (hyperspectral, multispectral, and synthetic aperture radar imagery) in coastal environments, continental shelf plankton dynamics, hydrodynamic modeling of the coastal ocean, remote sensing data fusion, and the integration of hydrodynamic models and the landscape/ecosystems analysis of coastal wetlands. Concurrently, with his M.S. work at NCSU, he worked for Science Applications International Corporation (SAIC), Raleigh, NC as a Satellite Oceanographer. While at SAIC, he conducted work on a variety of coastal and open ocean environmental-related projects for Mobile Oil, the Minerals Management Service, and the Environmental Protection Agency, Washington, DC. In 1993, he joined Allied Signal Technical Services (now Honeywell) as a Research Scientist, performing work for the Remote Sensing Division at the NRL on the analysis of active microwave back scatter from open ocean environments.

BACHMANN et al.: FUSING CLASSIFIERS OF MULTISEASON HYPERSPECTRAL IMAGERY

Andrew L. Russ received the B.S. degree in biological sciences and the M.A. degree in geography from the University of Maryland, College Park, in 1993 and 2003, respectively. He is currently with the USDA Agricultural Research Service, Beltsville, MD. His research involves hyperspectral remote sensing data analysis for the retrieval of plant biophysical parameters.

Joseph W. Burke received the B.S. degree from Louisiana State University, Baton Rouge, in 2000, and the M.A. degree from the University of Maryland, College Park, in 2003, both in geography. His undergraduate work focused on coastal geomorphology and coastal marsh processes. The focus of his graduate research was on coastal remote sensing and mapping.

2499

Barry R. Truitt was born on October 8, 1948, in Norfolk, VA. He received the B.S. degree in biology from Old Dominion University, Norfolk, VA, in 1971. He is currently Chief Conservation Scientist, responsible for the design and implementation of site conservation plans, research, and biological monitoring. He has been with The Nature Conservancy since 1976 at the Virginia Coast Reserve. His main professional interests include island biogeography, landscape ecology, conservation science, and marine and migratory bird conservation. He conducts and coordinates with other partners a 28-year-long colonial waterbird and shorebird monitoring program on the seaside. He is also involved in efforts to restore eelgrass and oyster reefs in the coastal bays. His interest in landscape ecology and barrier island history led to the publication, with Miles Barnes, of Seashore Chronicles: Three Centuries of the Virginia Barrier Islands (Charlottesville, VA: University of Virginia Press: 1999).

Gia M. Lamela received the B.S. degree (with honors) in biological sciences from the University of Maryland, Baltimore County in 2000. She has been with the Naval Research Laboratory, Washington, DC, since 1989 and joined the Optical Sensing Section in 1996.

W. Joseph Rhea received the B.S. degree in oceanography from the University of Washington, Seattle, WA, in 1986. From 1985 to 1988, he was an Assistant Scientist for the Oceanographic and Meteorological Science Group of Envirosphere Company, Bellevue, WA. From 1988 to 1994, he was with the Biological and Polar Oceanography Group, Jet Propulsion Laboratory, Pasadena, CA. Since 1994, he has been with the Optical Sensing Section, Naval Research Laboratory, Washington, DC.

John H. Porter received the A.A. degree from Montgomery College, in 1974, the B.S. degree from Dickinson College, Carlisle, PA, in 1976, and the M.S. and Ph.D. degrees from the University of Virginia, Charlottesville, in 1980 and 1988, respectively. He is currently a Research Assistant Professor in the Department of Environmental Sciences, University of Virginia. He is the Information Manager and one of the three lead Principal Investigators of the Virginia Coast Reserve Long-Term Ecological Research Project.

Suggest Documents