Pattern Recognition Letters 48 (2014) 24–33
Contents lists available at ScienceDirect
Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec
A rule-based classification methodology to handle uncertainty in habitat mapping employing evidential reasoning and fuzzy logic q Zisis I. Petrou a,b,⇑, Vasiliki Kosmidou a, Ioannis Manakos a, Tania Stathaki b, Maria Adamo c, Cristina Tarantino c, Valeria Tomaselli d, Palma Blonda c, Maria Petrou a a
Information Technologies Institute, Centre for Research and Technology Hellas, P.O. Box 60361, 6th km Xarilaou – Thermi, 57001 Thessaloniki, Greece Department of Electrical and Electronic Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, United Kingdom Institute for Studies on Intelligent System for Automation (ISSIA), National Research Council (CNR), Via Amendola 122/D-O, 70126 Bari, Italy d Institute of Plant Genetics (IGV), National Research Council (CNR), via G. Amendola 165/A, 70126 Bari, Italy b c
a r t i c l e
i n f o
Article history: Available online 15 November 2013 Keywords: Dempster–Shafer theory Fuzzy rule-based object-oriented classification Remote sensing Habitat mapping Uncertainty handling Biodiversity
a b s t r a c t Habitat mapping is a core element in numerous tasks related to sustainability management, conservation planning and biodiversity monitoring. Land cover classifications, extracted in a timely and area-extensive manner through remote sensing data, can be employed to derive habitat maps, through the use of domain expert knowledge and ancillary information. However, complete information to fully discriminate habitat classes is rarely available, while expert knowledge may suffer from uncertainty and inaccuracies. In this study, a rule-based classification methodology for habitat mapping through the use of a pre-existing land cover map and remote sensing data is proposed to deal with uncertainty, missing information, noise afflicted data and inaccurate rule thresholds. The use of the Dempster–Shafer theory of evidence is introduced in land cover to habitat mapping, in combination with fuzzy logic. The framework is able to handle lack of information, by considering composite classes, when necessary data for the discrimination of the constituting single classes is missing, and deal with uncertainty expressed in domain expert knowledge. In addition, a number of fuzzification schemes are proposed to be incorporated in the methodology in order to increase its performance and robustness towards noise afflicted data or inaccurate rule thresholds. Comparison with reference data reveals the improved performance of the methodology and the efficient handling of uncertainty in expert rules. The further scope is to provide a robust methodology readily transferable and applicable to similar sites in different geographic regions and environments. Although developed for habitat mapping, the proposed rule-based methodology is flexible and generic and may be well extended and applied in various classification tasks, aiming at handling uncertainty, missing information and inaccuracies in data or expert rules. Ó 2013 Elsevier B.V. All rights reserved.
1. Introduction Habitat mapping is mainly performed through either in situ or remote sensing observations, the latter being increasingly popular due to their advantages in large area coverage, time and cost efficiency (Nagendra, 2001). Land cover (LC) maps extracted from remote sensing data in a more straightforward way, are often used as proxies for habitat map extraction, since they describe observable characteristics of a landscape, through the use of ancillary information. Habitat changes constitute significant indicators for biodiversity monitoring, ecosystem preservation and sustainability management, thus their mapping attracts the interest of various
q
This paper has been recommended for acceptance by Edwin Hancock
⇑ Corresponding author at: Information Technologies Institute, P.O. Box 60361, 6th km Xarilaou – Thermi, 57001, Thessaloniki, Greece. Tel: +30 2311 257711; fax: +30 2310 474128. E-mail addresses:
[email protected],
[email protected] (Z.I. Petrou). 0167-8655/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.11.002
organizations and authorities worldwide (Bunce et al., 2013; Schmeller, 2008). Based on the generalized trend of international, national and regional authorities in producing LC maps, partially due to legal obligations (Tomaselli et al., 2013), an efficient framework for the conversion of LC into habitat classes is largely beneficial for sustainability management, conservation planning and biodiversity monitoring. A large variety of classification approaches has been employed and evaluated to perform habitat or LC mapping using remote sensing data. They include supervised (Chan et al., 2012; Walker et al., 2010; Féret and Asner, 2012; Longépé et al., 2011; Vyas et al., 2011) or unsupervised (Muad and Foody, 2012; Mwita et al., 2013) classification techniques. In cases where prior expert knowledge is available in the form of explicit rules, rule-based approaches may be employed to incorporate such information in the classification process (Kumar and Patnaik, 2013; Lucas et al., 2011a; Evans et al., 2010). Although efficient in a number of classification tasks, such methods may prove inadequate in handling uncertainty, due
Z.I. Petrou et al. / Pattern Recognition Letters 48 (2014) 24–33
to missing information, and inaccuracies, caused by noisy data or vague rules. Such problems are common in remote sensing applications, where, on the one hand, necessary sensor or ancillary data may be unavailable for certain landscapes, and, on the other hand, noise affliction may be introduced to the data, during acquisition and processing, such as registration, quantization and topographic and atmospheric correction. Despite various attempts in LC to habitat conversion (Adamo et al., 2013; Tomaselli et al., 2013), no previous framework has been suggested employing evidential reasoning for handling uncertainty and missing information. Dempster–Shafer (DS) theory, a mathematical theory of evidence, has been broadly used for information fusion and handling uncertainty and missing data (Saffiotti, 1994; Yager, 1987). In pattern recognition, DS theory has been used, principally, to combine results generated from multiple classifiers, even resulting in different classes (Ahmadzadeh and Petrou, 2003), and, less frequently, as the core of individual rule-based inference engines. Combining the results of different classifiers, DS has been applied in numerous fields, including three-dimensional object reconstruction (Díaz-Más et al., 2010), industrial parts inspection (Osman et al., 2011; Basir and Yuan, 2007; Kaftandjian et al., 2003), desertification risk and water quality assessment (Ahmadzadeh and Petrou, 2001; Aminravan et al., 2011), medical imaging (Bloch, 1996), road extraction from satellite images (Cleynenbreugel and Osinga, 1991), speaker identification (Chen et al., 1997) and optical character recognition (Rogova, 1994), providing results outperforming those derived from the individual classifiers. Fuzzy logic has been incorporated in various classifier fusion tasks using DS theory, in order to deal with data and rule vagueness (Deng et al., 2011; Zhang et al., 2011; Deng et al., 2010; Wu, 2009). In landscape characterization, fuzzy DS frameworks have been used to combine sensor data (Sarkar et al., 2005; Pinz et al., 1996) or contextual information (Laha et al., 2006) to improve classification. Individual rule-based classifiers have also been designed based on the combination of DS theory and fuzzy logic (Liu et al., 2004; Parikh et al., 2001; Yager, 1992). In landscape monitoring, DS theory with fuzzy sets has been used for LC classification in agricultural (Lein, 2003) and complex landscapes (Cayuela et al., 2006). The ability of the theory to incorporate multiple sources of information has been demonstrated by Franklin et al. (2002), where a classifier based on DS was compared with a conventional maximum likelihood classifier unable to incorporate all available ancillary information, thus resulting in significantly lower accuracy in discriminating habitat classes, compared with the DS-based classifier. The contribution of this study lies in the proposal of a robust methodology based on DS theory and incorporating fuzzy logic for habitat mapping using remote sensing data. The proposed methodology builds on a pre-existing LC map and converts it into a habitat map, incorporating domain expert rules and additional information. Different fuzzification approaches are proposed and introduced to the DS theory framework to deal with inaccurate rules provided by domain experts or noise afflicted data. The objective is to increase the framework robustness and make it readily applicable and transferable to similar landscapes in different locations. The flexibility of the framework in handling composite classes when adequate information for the discrimination of single classes is missing is also studied.
2. Application field and methods 2.1. Land cover to habitat mapping The application field of the proposed fuzzy evidential reasoning classification approach lies in the area of ecological monitoring,
25
biodiversity assessment and ecosystem preservation. In particular, the developed classification framework deals with the fusion of diverse information and rules provided by domain experts for habitat mapping, based on a LC map and through the use of remote sensing data. The employed LC map is expressed in the Land Cover Classification System (LCCS) taxonomy, proposed by the United Nations Food and Agriculture Organization (FAO) (di Gregorio and Jansen, 2005). LCCS classes are organized in eight main categories, depending on whether the area element of interest is vegetated or not, aquatic or terrestrial and managed or artificial or (semi-) natural. Classes are further refined with the inclusion of additional information, such as life form (e.g., woody or herbaceous vegetation), vegetation coverage, leaf type and phenology (e.g., broadleaved, evergreen, deciduous), canopy height, soil type and lithology. LCCS taxonomy has been proposed as a generic framework able to describe adequately any LC class globally, while has been recently recognized as the most appropriate LC taxonomy to serve as basis for habitat mapping (Tomaselli et al., 2013). Habitat classification in this study is expressed in a recently developed taxonomy, the General Habitat Categories (GHC), based on life and non-life forms (Bunce et al., 2008). GHC classes are organized in five main categories, namely: (i) urban, (ii) cultivated, (iii) sparsely vegetated, (iv) trees and shrubs and (v) herbaceous vegetation. Various classes belong in each category, based on life or non-life forms present in a studied area element, leaf properties, height of canopy, etc. The classes were initially defined to link in situ and remote sensing observations, thus facilitating their extraction through data derived from satellite or airborne sensors. Based on a LCCS map, information from remote sensing and ancillary data is combined using evidential reasoning to perform habitat classification of a study area, using expert decision rules (Kosmidou et al., 2014; Adamo et al., 2013). DS theory is employed to handle uncertainty and multiple classes, when adequate information for the discrimination among single classes is unavailable, and to provide a framework for embedding fuzzy logic to counteract for noisy data and increase framework robustness and transferability. 2.2. Dempster–Shafer theory principles DS theory, introduced by Dempster (1967) and Shafer (1976), is a mathematical theory of evidence, considered as a generalised form of the Bayesian theory of subjective probability. It is popular in rule-based expert systems, mainly because of its ability in handling uncertainty, lack of information and vague rules leading to composite events (Ahmadzadeh and Petrou, 2003). To each individual event, or set of events, belief and plausibility values are assigned, defining a belief interval. Belief on an event expresses the degree of confidence that the event holds, based on supporting evidence. Its plausibility value reflects the highest confidence on an event if all missing information were to support its validity. The difference between the plausibility and belief of a single or composite event expresses its uncertainty. When no uncertainty exists, plausibility and belief values coincide. One of the principal concepts in DS theory is the basic probability assignment function, m, describing the degree an event, A, from the set of all possible events, or frame of discernment H, is supported with evidence. A can be a single event or a set of two or more single events; m values assigned to the latter indicate lack of adequate evidence to distinguish among the single events. The m values assigned to all subsets of H sum up to 1. The belief function, bel : 2H ! ½0; 1, of a set A # H, is defined as the summation of the m values of all subsets of A, i.e.,
belðAÞ ¼
X X#A
mðXÞ;
for all A # H:
ð1Þ
26
Z.I. Petrou et al. / Pattern Recognition Letters 48 (2014) 24–33
belðAÞ is the total belief on set A and all its subsets X. The support on each subset of A is included in the belief value of A. Therefore, belðAÞ expresses the confidence that at least one of the subsets of A is true. Plausibility of event A is defined as
plsðAÞ 1 belðAÞ;
ð2Þ
where A stands for the complement set of A with regards to H. plsðAÞ expresses how much the confidence that A is true would be if all missing evidence were in favor of A. In other words, plausibility expresses the maximum belief that can potentially be assigned to the set A. Having in mind that belðAÞ expresses the belief in A based on the existing evidence, it can be inferred that the true probability of A lies in the interval ½belðAÞ; plsðAÞ, i.e., the belief interval. While in the Bayesian theory belief in one set A implies disbelief in its complement A, thus no uncertainty is considered, in DS theory no such conjecture is necessarily true. Additionally, the DS approach provides the flexibility of assigning m values to composite events, in case not enough information is available for the discrimination of the constituting single events, and naturally update them, in case new information arises that supports single events. m values assigned to sets from different frames of discernment, e.g., values concerning different variables in a rule-based classifier, can be multiplied to assess the support in the combined event that all individual sets hold simultaneously. Concerning the final class selection, while in the Bayesian probability framework decisions are made according to the highest probability rule, in Dempster–Shafer theory decision making is customizable and not straightforward. At the end of the process, each potential single or composite event is accompanied with a belief and a plausibility value. The final classification decision may be made based on maximum belief, plausibility or combination of the two. An additional consideration is related with the acceptable complexity of the selected event, i.e., whether a composite event with high belief consisting of a large number of non-distinguishable single events is more preferable than a single event with lower belief and to what degree. The Dempster–Shafer theory allows for flexibility in the final decision criteria after the extraction of belief intervals for all potential single or composite events, which is highly dependent on user preferences and the specific application.
the rules involving the particular feature and splits the field of values of the feature in a respective number of regions. The method is schematically depicted in Fig. 1(a), where, for instance, function (1), p1 ðtÞ, describes the probability that the observed feature value, t, is smaller than the threshold t 1 ; p1 ðtÞ ¼ pðt < t 1 Þ, while function (2), p2 ðtÞ, the respective probability of belonging in the region ½t1 ; t2 Þ; p2 ðtÞ ¼ pðt1 6 t < t 2 Þ. The membership functions are defined to represent the probability of a feature value to belong in the respective region. When the observed feature value equals a threshold, e.g., t2 , the probabilities to belong to either the region on the left ðpðt 6 t 2 ÞÞ or the right part of the threshold value ðpðt P t2 ÞÞ, are defined to be equal ðpðt 6 t 2 Þ ¼ pðt P t2 Þ ¼ 0:5Þ. In addition, the probability that a feature value equal to the mean of two threshold values belongs to the region defined by those values, is set equal to 1, e.g., if t ¼ ðt2 þ t3 Þ=2, then pðt2 6 t 6 t3 Þ ¼ 1. For a region larger than one defined by two consecutive threshold values, e.g., ½t 2 ; t 4 , the probability of the observed value to belong in it is calculated as pðt 2 6 t 6 t 4 Þ ¼ pðt 2 6 t 6 t3 Þ þ pðt3 < t 6 t 4 Þ. Three versions of this approach are considered: (i) the slope of the linear functions is equal to 1=d, where d represents the minimum distance between two consecutive thresholds, for a specific feature, in order to avoid overlaps of the defined membership functions (F1); (ii) the slope is equal to 1=s, where s stands for the standard deviation of the observed values of the specific feature considering all area elements, to link the membership functions with statistical characteristics of the feature (F2); or (iii) the maximum slope from the two previous approaches is considered (F3). Additionally, another linear approach (F4) is proposed, similar to the previous one, with the difference that the slopes of the curves of the membership functions are not constant, but depend on the distance between two consecutive thresholds, in such a way that the probability that an observed feature value falls in a region between two consecutive thresholds equals 1 only if the feature value is equal to the mean of the two thresholds, and it is smaller than 1 elsewhere. This approach tends to discourage large membership function value changes when small feature value changes occur, due to the smaller slopes in membership functions compared with the previous approach. The approach is schematically represented in Fig. 1(b).
2.3. Proposed fuzzification approaches DS framework provides large flexibility in incorporating fuzzy logic in the classification process. Fuzzy logic has been extensively used to moderate the drawbacks of hard classification approaches and counteract for inaccuracies in the process, caused mainly by noise afflicted data (Klir, 2001). The classification task described in this study may be sensitive to inaccurate rules provided by the domain experts or noise introduced during data acquisition and processing. Furthermore, slight changes in rule thresholds, e.g., when applying the method in similar sites located in different areas, may, similarly, significantly decrease the accuracy achieved by a hard classifier. A number of linear fuzzification approaches are proposed and evaluated, applied to the numerical features used for the classification. Within each approach, appropriate membership functions are defined to provide a confidence degree that a numerical feature value lies within the region set by the expert rule under consideration. Membership functions for a particular numerical feature are either defined by considering all thresholds included in the expert rules regarding the specific numerical feature and remain constant throughout the classification process, or defined individually for each rule checked. For a specific numerical feature, the first proposed approach, takes into consideration all thresholds given by the experts in
Fig. 1. The proposed fuzzification approaches: multiple membership functions with (a) constant and (b) variable slopes and individual membership functions based on the standard deviation of the observed values of the segments with membership value (c) 1 in the middle of the region of interest and (d) 0.5 in the edge points.
Z.I. Petrou et al. / Pattern Recognition Letters 48 (2014) 24–33
In order to give emphasis on the statistical nature of the observed feature values, two more fuzzification approaches are proposed, considering solely the standard deviation of the values of a particular feature and ignoring the threshold values induced by the expert rules. In both approaches, the slope of the functions equals 1=s, where s stands for the standard deviation of the observed values of a certain feature. Their difference lies in the way the probability of a feature value to belong in a region between two threshold values, t 1 and t 2 , is calculated. The first approach (F5) dictates that if an observed feature value, t, is equal to the mean of the two threshold values, its probability to belong in the area equals one, pðt 1 6 t 6 t 2 Þ ¼ 1 (Fig. 1(c)). The second approach (F6) requires that the probability in belonging in the region when the observed feature value is equal to one of the threshold values is equal to 0.5, e.g., if t ¼ t 2 , then pðt 1 6 t 6 t 2 Þ ¼ 0:5 (Fig. 1(d)).
27
A LC map of Le Cesine, validated through field campaigns in 2008–2009, with overall accuracy 95% and error tolerance 2%, and expressed in LCCS taxonomy using expert knowledge (Lucas et al., 2011b), was used as the basis for the classification of habitats. In addition, two multispectral satellite images of 2 m spatial resolution, a Quickbird acquired in June 2009 and a Worldview-2 from October 2010, were used in the classification process, after being ortho-rectified, co-registered and calibrated in Top of Atmosphere reflectance values. Additionally, an Object Height Model (OHM) extracted from Light Detection And Ranging (LiDAR) data acquired in spring 2009, during the peak vegetation productivity, was used to further disambiguate habitat classes based on their canopy height.
3.2. Classification process and implementations 3. Experimental process 3.1. Study area and available data The study area where the developed framework is applied, Le Cesine site, is located in Apulia region, south eastern Italy (Fig. 2). It is one of the oldest protected sites in the region, belonging to the Natura 2000 network and covering an area of approximately 3.48 km2. It consists of a variety of habitats within agricultural, semi-natural and natural areas, including two extended coastal lagoons, numerous channels, marshes and humid grasslands. Wildfire spreading, increase of the salinity of the lagoon water, agricultural practices and illegal urban development cause changes and increasing pressure to ecosystem equilibrium.
An object-oriented approach is followed during the process. Objects represent area elements of particular habitat on the landscape and segments on the satellite imagery. They are derived from the source LC map. A two-stage approach is followed for the habitat classification. During the first step, solely the LCCS classes are used to extract all potential GHC classes for each object, based on expert knowledge (Kosmidou et al., 2014). In general, more than one GHC classes correspond to each specific LCCS class, therefore, this process results in composite GHC classes. As an example, a vegetated area indicating trees in semi-natural terrestrial vegetated area in LCCS, may correspond to either tall (TPH) or forest (FPH) or giant (GPH) phanerophytes or urban woody vegetation (TRE) and additional information of vegetation height and its adjacency to urban areas is needed to dissolve the composite GHC class (‘TPH or FPH or
Fig. 2. Worldview-2 image of Le Cesine from October 2010. The solid dark lines depict the area segments derived from the thematic land cover map of the site.
28
Z.I. Petrou et al. / Pattern Recognition Letters 48 (2014) 24–33
GPH or TRE’) into its constituting single classes (‘TPH’, ‘FPH’, ‘GPH’, ‘TRE’). After the first-stage classification, a number of numerical features are calculated, for each object, to be used in the main classification process, namely: object shape characteristics, i.e., area of the object and ratio of its actual perimeter to the perimeter of its smallest enclosing rectangle; spectral characteristics, i.e., ratio of reflectances in the green to red and blue to near-infrared bands for both satellite images; topological characteristics, i.e., the degree of adjacency of each object to urban artificial (ART) or non-vegetated (NON) elements and to submerged hydrophytes (SHY), derived from the first-stage classification process; and structural characteristics, i.e., mean vegetation height through the use of the OHM. The second stage of the classification is performed embedding in the framework knowledge provided by domain experts in the form of if-then rules (Adamo et al., 2013). For each LCCS class present, domain experts provide the conditions through which the composite GHC class of each object, acquired in the previous stage, can be refined in a single class or a composite one with less candidate single classes. The rules take into account the LCCS class of each object and a number of values of its aforementioned extracted numerical features. The rules are examined sequentially for all objects. The classification process is conducted both using the fuzzy approaches discussed in Section 2.3 and hard classification employing no fuzzy logic. In the latter case, a binary selection is done on the validity of each examined rule, which may either hold or not. In the former cases, a degree of support is assigned to each rule based on the confidence in each individual condition, derived through the fuzzification approaches and propagated using the DS principles. Following these approaches, multiple single or composite GHC classes may result for each object, after the consideration of all rules, with a belief interval assigned to each one of them. As a note, composite classes may result following the hard classification too, when non-adequate information exists to discriminate among certain GHC classes, a typical example being herbaceous therophytes (THE) and geophytes (GEO). Two further considerations in order to enhance the classifier flexibility and applicability in various landscapes are made, resulting in different classification implementations. In the aforementioned implementation, the rules provided by the experts are characterized by certainty on the final outcome, in case the described conditions hold, e.g., valid rules may be in the form ‘‘If condition 1 and condition 2 hold, then the object is classified as class A’’, where A may refer to either a single or composite class. However, uncertainty may exist in the rules provided by the experts, therefore rules leading in uncertain outcomes are also included, e.g., of the form ‘‘If condition 1 and condition 2 hold, then the object is classified as class A with confidence 80%, or anything with 20%’’. A second consideration is related with the restricted availability or acquisition difficulty of LiDAR data for various sites and landscapes. A robust method with extensive applicability should be developed in a way to be readily applicable under conditions of lack of information. Therefore, implementation of rules without considering the LiDAR data set, used for the disambiguation of several classes based on their canopy height, is also developed, to test the approach applicability in similar conditions. As a result of the aforementioned considerations, four different implementations are developed testing all four combinations of rules with definite or uncertain outcome and availability or lack of LiDAR information. In particular, the main implementations
are characterized by (A) absence of LiDAR data and rules expressing certainty (A1) or uncertainty in the outcome (A2), or (B) presence of LiDAR data and definite (B1) or uncertain (B2) outcome rules. For each implementation, all fuzzy logic approaches of Section 2.3 are tested, in addition to hard classification (F0). F0 is based on strict application of expert knowledge and crisp rule thresholds provided, generated through in situ campaigns, and used as reference. Therefore, in total, 28 different classifier versions (Fig. 3) are tested for habitat mapping, based on the principles of DS theory. Each classification process may result in one or multiple single or composite classes for an object, each class being associated with a belief interval. Composite classes tend to have larger belief values than the single classes they include, since, based on (1), the belief in a composite event is calculated by aggregating the support to each of its subsets. In the present study, a trade-off between favouring single class selection and high belief is attempted as a rational solution, in order to achieve meaningful and informative final classes with acceptably high belief values. Thus, for the selection of the final single or composite class of an object, starting from the event with the highest belief, simpler events are searched under the condition of having a belief larger than a predefined threshold, e.g., 0.6, for a single event, or another, e.g., 0.75, for a composite. 3.3. Example A simple example of an object recognized as trees in semi-natural terrestrial vegetated area in LCCS is given to demonstrate the way an expert rule is handled by the different proposed implementations and fuzzification approaches. At the first classification stage, the object is classified in the composite GHC class ‘TPH or FPH or GPH or TRE’. During the second classification stage, following implementation B1 where LiDAR data and expert certainty in the outcome are considered, a rule of the form ‘‘if vegetation height is between 5 m and 40 m, then the object is FPH or TRE’’ is examined. The calculated mean height value of the object, averaging the height values of its pixels, based on the OHM, is 7 m. Following the hard classification (F0), the rule is valid and the composite class is replaced by the simpler ‘FPH or TRE’ with probability equal to 1. In F1, F2, F3 and F4 fuzzification approaches, all rule thresholds concerning the vegetation height numerical feature need to be considered. It is assumed that thresholds t 1 ; t 2 ; t 3 and t4 in Fig. 1(a) and (b) correspond to values 0.6 m, 2 m, 5 m and 40 m, respectively, used by the experts to discriminate among different habitats based on their vegetation height. Using Fig. 1(a), it can be inferred that for F1 and F3 approaches, the membership of t ¼ 7m in the region ½t3 ; t4 Þ is 1, since the small distance between t 1 and t 2 results in a steep slope of the membership functions. The membership value derived from F2 approach will depend on the standard deviation of the observed vegetation height values of all objects and it might be less than 1. In a similar manner, from Fig. 1(b), it is expected that the membership value of t ¼ 7 will be significantly lower than 1, i.e., around 0.2, since it becomes 1 in the middle point of the region ½t3 ; t4 Þ, i.e., for t ¼ 22:5. In this case, the ’FPH or TRE’ class will be assigned to the object with a basic assignment probability function value, m, equal to 0.2; additional m values and possible classes may be added from the application of following rules to arrive to the final class decision. As far as fuzzification approaches F5 and F6 are regarded, considering t1 ¼ 5 m and t 2 ¼ 40 m in Fig. 1(c) and Fig. 1(d), it is inferred that, the smaller the variation, or standard deviation, in the observed values of vegetation height within the objects, the steeper the slope of the membership functions and the larger the membership value in t ¼ 7.
Z.I. Petrou et al. / Pattern Recognition Letters 48 (2014) 24–33
29
Fig. 3. The 28 classifier versions, resulting from the combination of the four implementations (Section 3.2) and the seven approaches, including the hard classification and the six fuzzification approaches (Section 2.3).
Expressing expert uncertainty in the outcome (implementation B2), the rule may be expressed as: ‘‘if vegetation height is between 5 m and 40 m, then the object may be FPH or TRE by 80% or ’TPH or FPH or GPH or TRE’ by 20%’’. In such a case, the m value assigned to the validity of the rule calculated from the fuzzification functions is multiplied with the confidence on the final outcome, in order to get a total m value for each potential class. For instance, in case mR ¼ 0:9 is assigned to the rule, then m1 ¼ 0:9 0:8 ¼ 0:72 is assigned to the ‘FPH or TRE’ class and m2 ¼ 0:18 to the composite ‘TPH or FPH or GPH or TRE’ class. In case LiDAR data are considered absent (implementations A1, A2), the rule will not be applied and the final class will remain the same, ‘TPH or FPH or GPH or TRE’. To demonstrate the final class decision process, it is assumed that an object is classified as ‘FPH or TRE’ with a value m1 ¼ 0:8 and as ‘TPH or FPH or GPH or TRE’ with the remaining m2 ¼ 0:2, as they sum up to 1 (Section 2.2). From (1) and (2), it is calculated that for ‘FPH or TRE’ it is bel1 ¼ m1 ¼ 0:8 and pls1 ¼ 1, while for ‘TPH or FPH or GPH or TRE’ it is bel2 ¼ m1 þ m2 ¼ 1, since ‘FPH or TRE’ is subset of ‘TPH or FPH or GPH or TRE’, and pls2 ¼ 1. The decision for the final class is flexible and depends on the specific application and user needs. In case a belief value of 0.8 is considered satisfactory, or in case the composite class ‘TPH or FPH or GPH or TRE’ is considered inadequately informative, the ‘FPH or TRE’ class is assigned, otherwise the ‘TPH or FPH or GPH or TRE’ class.
4. Results A total of 583 objects were formed in the study site and each one was classified in a GHC single or composite class with all different classifier versions. 250 points were derived from field campaigns conducted recently and used for validation purposes, identifying 14 different GHC classes, including 13 single and one composite class (‘THE or GEO’). An object was considered as correctly classified when the class of its corresponding validation point coincided with the class of the object, in case of a single final class, or was among the multiple selected classes, in case of a composite final class. Fig. 4 presents the overall accuracy achieved by each classifier, as the ratio of the correctly classified objects to the total checked ones using the validation points.
4.1. Classification per habitat class In Table 1, the results obtained by the hard classification approaches, for each main implementation, are used as reference for the comparison and performance evaluation of the fuzzy approaches, for each individual habitat. The relative performance accuracy is calculated by the ratio of the correctly classified segments belonging to the selected GHC class by the specified fuzzy approach to the ones correctly classified by the hard classification approach, for the same main implementation framework. Then, the highest performing fuzzification approaches are reported with the respective relative accuracy in parenthesis. When a fuzzy classifier outperforms the hard one, accuracies larger than 1 appear. 4.2. Class disambiguation with LiDAR In order to test the classification performance of the DS framework under missing information and the handling of composite events, LiDAR data are not considered in two out of the four main implementations of the classifier. Inclusion of the LiDAR is expected to discriminate among similar classes using canopy height information and dissolve composite into single classes. To assess the impact of the inclusion of LiDAR data in the resulting classes, Table 2 draws the number of different classes resulting from the F1 classification approach for the four different implementations, A1, A2, B1 and B2. F1 approach was selected as an indicative example, with the results from the rest fuzzification approaches being similar. The table reports the number of the single classes (e.g., ‘CHE’), composite classes consisting of two single ones (e.g., ‘THE or GEO’) and composite classes with more than two single classes (e.g., ‘TPH or FPH or GPH or TRE’), resulting in the entire site from each implementation. 4.3. Robustness and transferability tests In order to examine the robustness of the classification framework under varying conditions, a series of simulations with modified data and rule thresholds were performed. Such simulations may be used as an indication of the robustness of the classifier to noise in data or inaccurate expert rules and its transferability to similar sites to Le Cesine, where the respective data and the expert rules are expected to vary.
30
Z.I. Petrou et al. / Pattern Recognition Letters 48 (2014) 24–33
Fig. 4. Overall classification accuracy of the suggested approaches using the DS framework, for the different implementations (Section 3.2), with the hard classification (F0) and the proposed fuzzification approaches (Section 2.3).
Table 1 Best performing fuzzification approaches for each GHC habitat class present in the site and each implementation. GHC acronyms are reported in Bunce et al. (2011). The relative accuracy of the best performing fuzzification approach, for each case, compared with the respective hard classification, is presented in parenthesis, calculated as the ratio of the accuracy of the former to the latter. ‘All’ indicates that all fuzzy classifiers achieved the same accuracy for the specific class.
CRO WOC HEL CHE THE or GEO SHY EHY SCH LPH MPH TPH FPH ART NON
A1
A2
B1
B2
All (1) F2, F5, F6 (2.13) F1, F4 (1.22) All (1.33) All (1) All (1.14) F1, F4, (1.08) All (1) All (1) All (1.03) All (1) All (1) All (1) All (1)
All (1) F2, F5, F6 (2.13) F1, F4 (1.22) All (1.33) All (1) All (1.14) F1, F4, (1.17) All (1) All (1) All (1.03) All (1) All (1) All (1) All (1)
All (1) F2, F5, F6 (2.13) F1, F4 (1.22) All (1.33) All (1) All (1.14) F1, F4, (1.08) F2, F4, F5, F6 (1.5) F2, F5, F6 (1.5) F5, F6 (1.13) F2, F5, F6 (1.07) All (1) All (1) All (1)
All (1) F2, F5, F6 (2.13) F1, F4 (1.22) All (1.33) All (1) All (1.14) F1, F4, (1.08) All (1.5) F2, F5, F6 (1.5) F2, F5, F6 (1.13) F2, F5, F6 (1.07) All (1) All (1) All (1)
Table 2 Number of resulting classes for F1 implementations, for the entire Le Cesine site.
Single classes Double classes Multiple classes
A1
A2
B1
B2
11 5 6
11 5 6
16 5 5
17 5 5
standard deviations equal to 5%, 10% and 20% of the mean values of the data are compared with the accuracies achieved using the original data. 5. Discussion 5.1. Overall performance
Rule thresholds were modified adding random noise drawn from a zero-mean normal distribution. The standard deviation of the noise applied to a threshold was set as percentage of its original value defined by the experts. Fig. 5 depicts the classification accuracies achieved when rule thresholds were modified by noise with standard deviation equal to 5%, 10% and 50% of the original values. Additional experiments were performed using noise afflicted data. In realistic conditions, noise may be added to remote sensing data during different steps of data processing, from its capture by the sensor to its atmospheric and topographic correction, orthorectification or co-registration. Following the notion that, according to the central limit theorem, different sources of noise result to a Gaussianly distributed overall noise, the remote sensing data used in the classification, i.e., the Quickbird and Worldview-2 images and the OHM, were afflicted by additive homogeneous noise drawn from a zero-mean Gaussian probability density function (Petrou and Petrou, 2010). For the OHM and each image band of the multispectral images, the mean value of the active pixels, i.e., the pixels falling in the Le Cesine site, was calculated. The standard deviation of the noise added to an image was calculated as a percentage of the mean value of its pixels. In Fig. 6, the overall accuracies achieved by the classifiers using data afflicted by noise with
The efficiency of the DS classification framework is compared with the reference study of Adamo et al. (2013), in which a hard classification methodology has been developed and LiDAR data were employed to convert LCCS to GHC classes in the same study area. The achieved classification accuracy reached 70% with error tolerance 5%. The applied methodology has a close correspondence with the B1 hard classification (F0) implementation (78.4% classification accuracy), where disjoint classes arise from the application of the rules and the property of DS in handling composite classes is not practically applied. The advantages offered by the suggested fuzzification approaches may be observed in the rest classifiers under the B1 implementation which outperform the hard classification one. In certain cases, improvement of up to 12% compared with the hard classification is achieved, e.g., with the classifiers embodying fuzzification approaches F5 and F6, where accuracy reaches 88.4%. Additionally, further improvements due to the use of the DS framework may be seen by the comparison of the outcomes with the results of implementation B2, where, for all fuzzy classifiers, slight accuracy increments of up to 2% are achieved. It is evidenced that the inclusion of uncertainty in the rule outcome (B2), successfully handled by the DS framework, and the incorporation of fuzzification approaches, lead to a significant increase in the
Z.I. Petrou et al. / Pattern Recognition Letters 48 (2014) 24–33
31
Fig. 5. Comparison of the overall classification accuracies embodying the original and the modified rule thresholds. The classification accuracies with the original thresholds are depicted in bars, while the ones with the thresholds afflicted by additive zero-mean noise with standard deviations equal to 5%, 10% and 50% of the threshold values with green, yellow and blue stems, respectively. (For interpretation of the references to colour in this figure caption, the reader is referred to the web version of this article.)
Fig. 6. Comparison of the overall classification accuracies embodying the original and the noise afflicted data. The classification accuracies using the original data are depicted in bars, while the ones using the data afflicted by additive zero-mean noise with standard deviations equal to 5%, 10% and 20% of the data mean values with green, yellow and blue stems, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
classification accuracy, compared with the hard classifiers of the B1 implementation and the validated approach in Adamo et al. (2013). This proves the ability of the DS framework to handle uncertainty in expert rules and the flexibility that the framework provides to express such uncertainty when it exists and employ fuzzy logic. In all implementations of the proposed DS framework, the fuzzy approaches outperformed the hard classification, as drawn in Fig. 4, reaching accuracies from 80% to almost 90% for 14 habitat classes. Among the proposed fuzzy approaches, the ones applying fuzzification based solely on information derived from the standard deviation of the observed values, rather than being influenced by the rule thresholds, i.e., F5 and F6, showed the highest performance. F2 approach, considering generic membership functions with slope based on the standard deviation of the observed values for a specific feature, performed similarly well. Approaches designing generic membership functions based mainly on the rule thresholds, i.e., F1, F3 and F4, performed lower. In particular, approach F3, adjusting the slope of the membership functions based on the minimum value of the minimum distance between two consecutive points and the standard deviation of the observed values for the specific feature, was the weakest. This seems to be caused by the fact that using the minimum of the values maximizes the slope of the membership functions (Section 2.3), thus approaching the hard classification, which can be represented by membership functions with as steep as possible, i.e., vertical, slopes. 5.2. Contribution of LiDAR in performance As far as the main implementations are concerned, the ones without LiDAR data, i.e., A1 and A2, show better accuracy than B1 and B2, where LIDAR data were used. Having the accuracy reduced with the incorporation of additional information may seem contradictory. However, it results from the way accuracy is calculated. With the addition of LiDAR data, composite classes that
include the correct single ones, based on the validation, tend to be refined and split into simpler or single classes. This offers more informative results with less ambiguous and mixed classes, although it may introduce some error, when erroneously removing the correct single class from the previous composite ones. In the former case, the object would be considered as correctly classified, even though with largely ambiguous classes, while in the latter it would be considered as erroneously classified. Therefore, an interesting trade-off between high classification accuracy and user satisfaction by unambiguous classes is created. In fact, the influence of the use of LiDAR data in the classification process can be observed in Table 2 from the changes in the number of classes between implementations A1 and A2 and implementations B1 and B2. Use of LiDAR data in implementations B1 and B2 results in the discrimination of 5 and 6 more single GHC classes, respectively, compared with implementations A1 and A2, while reduced the number of ambiguous composite classes by 1 in case of composite classes with more than two single ones. Table 2 provides a reasonable explanation for the decrement in accuracy of implementations B1 and B2 compared with their A1 and A2 counterparts, observed in Fig. 4, since the use of LiDAR might have lead to over-refinement of composite classes appearing in A1 and A2. The number of single classes is significantly increased in B1 and B2 in comparison with implementations A1 and A2—resulting in four single classes absent from the validation set—, which seems to reveal the large influence of LiDAR data in refining composite classes, although with the expense of incorrect classification increment. 5.3. Performance per habitat class Table 1 provides information on the classification accuracies of the different fuzzification approaches in correctly mapping each separate GHC class identified through in situ campaigns, by listing
32
Z.I. Petrou et al. / Pattern Recognition Letters 48 (2014) 24–33
the best performing fuzzification approaches for each implementation and habitat class. In accordance to the inferences made based on Fig. 4, fuzzification approaches F5 and F6 seem to provide the best overall accuracy, being the ones appearing more frequently in the cells of the table. Analogous thoughts may be done on the classification accuracies of the rest of the fuzzification approaches, where the observed data comply with the ones in Fig. 4, indicating F3 as the weakest classifier, appearing less frequently than all others, as well as the comparable performance of the rest of the classifiers. It may also be observed that, in any case, the best fuzzification approaches perform equally well or outperform the hard classification, having a relative accuracy equal to or larger than 1. In many cases, all different approaches, including the hard classification, perform equally well, implying that certain classes, e.g., urban artificial (ART) or cultivated herbaceous crops (CRO), are more clearly distinguished than others. An interesting observation may involve the tree and shrub categories of shrubby chamaephytes (SCH), low (LPH), mid (MPH) and tall (TPH) phanerophytes, whose main criterion of discrimination is height. In implementations A1 and A2 were LiDAR data are not employed, all approaches perform similarly well, in many cases not being able to distinguish among those classes, resulting in multiple ones. In B1 and B2, where LiDAR data are used, discrimination of these classes is performed up to a certain degree and the performance of the different approaches varies, with the best fuzzy one outperforming the hard classification. 5.4. Robustness to noise and transferability potential Fig. 5 depicts the performance of the classifier versions with rule thresholds modified by random additive noise, in order to investigate the robustness of the methodology to inaccurate expert rules or its transferability to similar sites, where slightly different rule thresholds may apply. As observed in the figure, almost all classifiers are influenced in a negative way with the changes in thresholds. More precisely, as a general trend, the larger the degree of noise added to the thresholds, the more the degradation in performance of the classifiers, for all four implementations. The classifiers with the high performing fuzzification approaches F2, F5 and F6, seem to be unaffected by small changes in the thresholds, when the standard deviation of the added noise is 5% of the threshold value. The performance of the rest of the classifiers deteriorates up to 3% in some cases. An interesting remark is the slight increase in the performance of the hard classifiers (F0), which reveals that an even better fine-tuning of the thresholds provided by the experts may be performed for Le Cesine site; however, such finetuning may lead to over-fitting effects and reduce the methodology applicability to similar sites. When the degree of noise increases to standard deviations of 10% and 50% of the threshold values, further decrements in classification accuracy are observed for all classifiers, including the hard and the top performing fuzzy ones. However, the reduction is not significantly large in absolute terms and the accuracies of the classifiers remain over 75%, revealing the robustness of the proposed framework to inaccurate rule thresholds and its potential for effective application in similar sites. The results of the simulations to examine the robustness of the framework to noise in data used for the classification are shown in Fig. 6. Different levels of added noise were tested. As seen in the figure, the performance of all classifiers remains almost unaffected by the noise affliction of the used data, regardless of the degree of the added noise. This may be explained by the fact that the classification performed is object based, rather that pixel based. Therefore, the features of the objects extracted from the remote sensing data are calculated by averaging the values of their pixels. Since the noise is zero-mean and added to each pixel independently and in a random way, the average of the pixel values for
each object, especially the large objects, is expected to be insignificantly affected. Thus, the performance of the classifiers, especially the ones employing the top performing fuzzification approaches, is expected to remain almost intact. 6. Conclusion In this study, a rule-based classification methodology, built on DS theory principles and employing fuzzy sets, has been proposed for habitat mapping, using remote sensing data. A pre-existing LC map was considered and its conversion into a habitat map was attempted, incorporating domain expert rules and additional information. No previous attempt has been made so far to involve evidential reasoning in a LC to habitat classification task, providing an efficient framework for handling uncertainty and missing information. Comparison with reference data proved the advantageous characteristics of the introduction of DS theory and fuzzy logic in a rule-based classification framework. Fuzzification approaches embodied in the framework significantly improved the results provided by hard classification. The DS principles further enhanced the classifier performance, compared with its counterparts with definite strict expert rules used, by efficiently handling uncertainty in expert rules and allowing the introduction of such uncertainty in the process. A variety of implementations was generated to evaluate the performance of the classifier in handling uncertainty in rules and missing information. Analysis indicated the ability of the method to deal with uncertainty in different steps of the process and counteract lack of data by considering composite events. When additional information, i.e., LiDAR data, was added to the framework, single classes could be discriminated and composite events were able to be refined into simpler ones. A set of linear fuzzification approaches were proposed and embodied to the classification process. Improvements in the classification accuracy of up to 12% have been observed, compared with the hard classification. The ability of the methodology to counteract changes in rule thresholds and noise afflicted data was tested. The performance of the framework did not decrease significantly, verifying (a) its robustness to noise and its potential transferability to similar landscapes and different geographical regions, and (b) the ability to adapt and be applied to new conditions, where modified rules and thresholds would normally hold. Further steps may include the application of the framework to real data from different sites as well as the adoption and evaluation of non linear fuzzy approaches. The proposed methodology provided an efficient framework for the conversion of land cover into habitat classes using remote sensing data, largely beneficial for sustainability management, conservation planning, and biodiversity monitoring. The framework was able to successfully handle uncertainty and missing information, flexible to receive and embody new information when available and cope with inaccurate rules or noise afflicted data. Although developed for habitat mapping, the proposed rule-based methodology is flexible and generic and may be well extended and applied in various classification tasks aiming at handling uncertainty, missing information and inaccuracies in data or expert rules. Acknowledgements The work presented was inspired by and partly conducted under the guidance of the authors’ colleague and supervisor, Maria Petrou, to whom it is gratefully dedicated. In particular, the idea of employing the Dempster–Shafer theory was hers and her support
Z.I. Petrou et al. / Pattern Recognition Letters 48 (2014) 24–33
and guidance throughout the conceptual design and implementation of the framework was constant. Unfortunately, she did not have the opportunity to review and approve the results, conclusions and the final form of the classification framework presented herein. The work was supported by the European Union Seventh Framework Programme FP7/2007-2013, SPA. 2010.1.1-04: ‘‘Stimulating the development of 490 GMES services in specific area’’, under grant agreement 263435, project BIO_SOS: BIOdiversity Multi-Source Monitoring System: from Space To Species, coordinated by CNR-ISSIA, Bari-Italy. LiDAR data were provided by Dr. S. Costabile from the Geoportale Nazionale – Ministero dell’Ambiente e della Tutela del Territorio e del Mare. References Adamo, M., Tarantino, C., Kosmidou, V., Petrou, Z., Manakos, I., Lucas, R.M., Tomaselli, V., Mücher, C.A., Blonda, P., 2013. Land cover to habitat map translation: disambiguation rules based on earth observation data. In: Int. Geoscience and Remote Sensing Symp.. IEEE, Melbourne, pp. 3817–3820. Ahmadzadeh, M.R., Petrou, M., 2001. An expert system with uncertain rules based on Dempster–Shafer theory. In: Int. Geoscience and Remote Sensing Symp.. IEEE, Sydney, pp. 861–863. Ahmadzadeh, M.R., Petrou, M., 2003. Use of Dempster–Shafer theory to combine classifiers which use different class boundaries. Pattern Anal. Appl. 6, 41–46. Aminravan, F., Sadiq, R., Hoorfar, M., Rodriguez, M.J., Francisque, A., Najjaran, H., 2011. Evidential reasoning using extended fuzzy Dempster–Shafer theory for handling various facets of information deficiency. Int. J. Intell. Syst. 26, 731– 758. Basir, O., Yuan, X., 2007. Engine fault diagnosis based on multi-sensor information fusion using Dempster–Shafer evidence theory. Inf. Fusion 8, 379–386. Bloch, I., 1996. Some aspects of Dempster–Shafer evidence theory for classification of multi-modality medical images taking partial volume effect into account. Pattern Recogn. Lett. 17, 905–919. Bunce, R.G.H., Metzger, M., Jongman, R.H.G., Brandt, J., de Blust, G., Elena-Rossello, R., Groom, G., Halada, L., Hofer, G., Howard, D., Kovárˇ, P., Mücher, C.A., PadoaSchioppa, E., Paelinx, D., Palo, A., Perez-Soba, M., Ramos, I., Roche, P., Skånes, H., Wrbka, T., 2008. A standardized procedure for surveillance and monitoring European habitats and provision of spatial data. Landscape Ecol. 23, 11–25. Bunce, R., Bogers, M., Roche, P., Walczak, M., Geijzendorffer, I., Jongman, R., 2011. Manual for habitat and vegetation surveillance and monitoring: temperate, mediterranean and desert biomes. Technical Report 2154. Alterra. Bunce, R.G.H., Bogers, M.M.B., Evans, D., Halada, L., Jongman, R.H.G., Mücher, C.A., Bauch, B., de Blust, G., Parr, T.W., Olsvig-Whittaker, L., 2013. The significance of habitats as indicators of biodiversity and their link to species. Ecol. Indic. 33, 19–25. Cayuela, L., Golicher, J.D., Rey, J.S., Benayas, J.M.R., 2006. Classification of a complex landscape using Dempster–Shafer theory of evidence. Int. J. Remote Sens. 27, 1951–1971. Chan, J.C.W., Beckers, P., Spanhove, T., Borre, J.V., 2012. An evaluation of ensemble classifiers for mapping Natura 2000 heathland in Belgium using spaceborne angular hyperspectral (CHRIS/Proba) imagery. Int. J. Appl. Earth Obs. 18, 13–22. Chen, K., Wang, L., Chi, H., 1997. Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. Int. J. Pattern Recogn. 11, 417–445. Cleynenbreugel, J.V., Osinga, S.A., 1991. Road extraction from multi-temporal satellite images by an evidential reasoning approach. Pattern Recogn. Lett. 12, 371–380. Dempster, A.P., 1967. Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Stat. 38, 325–339. Deng, Y., Su, X., Wang, D., Li, Q., 2010. Target recognition based on fuzzy Dempster data fusion method. Defence Sci. J. 60, 525–530. Deng, Y., Sadiq, R., Jiang, W., Tesfamariam, S., 2011. Risk analysis in a linguistic environment: a fuzzy evidential reasoning-based approach. Expert Syst. Appl. 38, 15438–15446. Díaz-Más, L., Muñoz Salinas, R., Madrid-Cuevas, F., Medina-Carnicer, R., 2010. Shape from silhouette using Dempster–Shafer theory. Pattern Recogn. 43, 2119–2131. di Gregorio, A., Jansen, L.J.M., 2005. Land cover classification system (LCCS): classification concepts and user manual for software version 2. Technical Report 8. FAO Environment and Natural Resources Service Series, Rome. Evans, T.L., Costa, M., Telmer, K., Silva, T.S.F., 2010. Using ALOS/PALSAR and RADARSAT-2 to map land cover and seasonal inundation in the Brazilian Pantanal. IEEE J. Sel. Top. Appl. 3, 560–575. Féret, J.B., Asner, G.P., 2012. Tree species discrimination in tropical forests using airborne imaging spectroscopy. IEEE Trans. Geosci. Remote 51, 73–84. Franklin, S.E., Peddle, D.R., Dechka, J.A., Stenhouse, G.B., 2002. Evidential reasoning with Landsat TM, DEM and GIS data for landcover classification in support of grizzly bear habitat mapping. Int. J. Remote Sens. 23, 4633–4652.
33
Kaftandjian, V., Dupuis, O., Babot, D., Min Zhu, Y., 2003. Uncertainty modelling using Dempster–Shafer theory for improving detection of weld defects. Pattern Recogn. Lett. 24, 547–564. Klir, G.J., 2001. Foundations of fuzzy set theory and fuzzy logic: a historical overview. Int. J. Gen Syst. 2, 91–132. Kosmidou, V., Petrou, Z.I., Bunce, R.G.H., Mücher, C.A., Jongman, R.H.G., Bogers, M.M., Lucas, R.M., Tomaselli, V., Blonda, P., Padoa-Schioppa, E., Manakos, I., Petrou, M., 2014. Harmonization of the land cover classification system (LCCS) with the general habitat categories (GHC) classification system. Ecol. Indic. 36, 290–300. Kumar, T., Patnaik, C., 2013. Discrimination of mangrove forests and characterization of adjoining land cover classes using temporal C-band synthetic aperture radar data: a case study of Sundarbans. Int. J. Appl. Earth Obs. 23, 119–131. Laha, A., Pal, N.R., Das, J., 2006. Land cover classification using fuzzy rules and aggregation of contextual information through evidence theory. IEEE Trans. Geosci. Remote Sens. 44, 1633–1641. Lein, J.K., 2003. Applying evidential reasoning methods to agricultural land cover classification. Int. J. Remote Sens. 24, 4161–4180. Liu, J., Yang, J.B., Wang, J., Sii, H.S., Wang, Y.M., 2004. Fuzzy rule-based evidential reasoning approach for safety analysis. Int. J. Gen Syst. 33, 183–204. Longépé, N., Rakwatin, P., Isoguchi, O., Shimada, M., 2011. Assessment of ALOS PALSAR 50 m orthorectified FBD data for regional land cover classification by support vector machines. IEEE Trans. Geosci. Remote 49, 2135–2150. Lucas, R., Medcalf, K., Brown, A., Bunting, P., Breyer, J., Clewley, D., Keyworth, S., Blackmore, P., 2011a. Updating the Phase 1 habitat map of Wales, UK, using satellite sensor data. ISPRS J. Photogramm. 66, 81–102. Lucas, R.M., Baraldi, A., Arvor, D., Durieux, L., Kosmidou, V., Tomaselli, V., Adamo, M., Tarantino, C., Biagi, B., Lovergine, F., Blonda, P., 2011b. Habitat maps. BI_SOS Biodiversity Multisource Monitoring System: from Space TO Species. Deliverable 5.1. Muad, A.M., Foody, G.M., 2012. Super-resolution mapping of lakes from imagery with a coarse spatial and fine temporal resolution. Int. J. Appl. Earth Obs. 15, 79– 91. Mwita, E., Menz, G., Misana, S., Becker, M., Kisanga, D., Boehme, B., 2013. Mapping small wetlands of Kenya and Tanzania using remote sensing techniques. Int. J. Appl. Earth Obs. 21, 173–183. Nagendra, H., 2001. Using remote sensing to assess biodiversity. Int. J. Remote Sens. 22, 2377–2400. Osman, A., Kaftandjian, V., Hassler, U., 2011. Improvement of X-ray castings inspection reliability by using Dempster–Shafer data fusion theory. Pattern Recogn. Lett. 32, 168–180. Parikh, C.R., Pont, M.J., Barrie Jones, N., 2001. Application of Dempster–Shafer theory in condition monitoring applications: a case study. Pattern Recogn. Lett. 22, 777–785. Petrou, M., Petrou, C., 2010. Image Processing: The Fundamentals, second ed. John Wiley & Sons Ltd., Chichester, UK. Pinz, A., Prantl, M., Ganster, H., Kopp-Borotschnig, H., 1996. Active fusion – A new method applied to remote sensing image interpretation. Pattern Recogn. Lett. 17, 1349–1359. Rogova, G., 1994. Combining the results of several neural network classifiers. Neural Networks 7, 777–781. Saffiotti, A., 1994. Issues of knowledge representation in Dempster–Shafer’s Theory. In: Yager, R.R., Fedrizi, M., Kacprzyk, J. (Eds.), Advances in the Dempster–Shafer theory of evidence. John Wiley & Sons Inc., New York, NY, pp. 415–440 (chapter 19). Sarkar, A., Banerjee, A., Banerjee, N., Brahma, S., Kartikeyan, B., Chakraborty, M., Majumder, K.L., 2005. Landcover classification in MRF context using Dempster– Shafer fusion for multisensor imagery. IEEE Trans. Image Process. 14, 634–645. Schmeller, D.S., 2008. European species and habitat monitoring: where are we now? Biodivers. Conserv. 17, 3321–3326. Shafer, G., 1976. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ. Tomaselli, V., Dimopoulos, P., Marangi, C., Kallimanis, A.S., Adamo, M., Tarantino, C., Panitsa, M., Terzi, M., Veronico, G., Lovergine, F., Nagendra, H., Lucas, R., Mairota, P., Mücher, C.A., Blonda, P., 2013. Translating land cover/land use classifications to habitat taxonomies for landscape monitoring: a Mediterranean assessment. Landscape Ecol. 28, 905–930. Vyas, D., Krishnayya, N.S.R., Manjunath, K.R., Ray, S.S., Panigrahy, S., 2011. Evaluation of classifiers for processing Hyperion (EO-1) data of tropical vegetation. Int. J. Appl. Earth Obs. 13, 228–235. Walker, W.S., Stickler, C.M., Kellndorfer, J.M., Kirsch, K.M., Nepstad, D.C., 2010. Large-area classification and mapping of forest and land cover in the Brazilian Amazon: a comparative analysis of ALOS/PALSAR and Landsat data sources. IEEE J. Sel. Top. Appl. 3, 594–604. Wu, D., 2009. Supplier selection in a fuzzy group setting: a method using grey related analysis and Dempster–Shafer theory. Expert Syst. Appl. 36, 8892–8899. Yager, R.R., 1987. On the Dempster–Shafer framework and new combination rules. Inform. Sci. 41, 93–137. Yager, R.R., 1992. Decision making under Dempster–Shafer uncertainties. Int. J. Gen Syst. 20, 233–245. Zhang, Y.J., Deng, X.Y., Kang, B.Y., Wu, J.Y., Sun, X.H., Deng, Y., 2011. Developing environmental indices based on fuzzy set theory and evidential reasoning. In: Chinese Control and Decision Conference. IEEE, Mianyang, pp. 270–273.