International Journal of Geographical Information Science
ISSN: 1365-8816 (Print) 1362-3087 (Online) Journal homepage: http://www.tandfonline.com/loi/tgis20
A flexible multi‐source spatial‐data fusion system for environmental status assessment at continental scale P. Carrara , G. Bordogna , M. Boschetti , P. A. Brivio , A. Nelson & D. Stroppiana To cite this article: P. Carrara , G. Bordogna , M. Boschetti , P. A. Brivio , A. Nelson & D. Stroppiana (2008) A flexible multi‐source spatial‐data fusion system for environmental status assessment at continental scale, International Journal of Geographical Information Science, 22:7, 781-799, DOI: 10.1080/13658810701703183 To link to this article: http://dx.doi.org/10.1080/13658810701703183
Published online: 07 May 2008.
Submit your article to this journal
Article views: 172
View related articles
Citing articles: 11 View citing articles
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=tgis20 Download by: [Universiteit Twente.]
Date: 03 May 2016, At: 04:47
International Journal of Geographical Information Science Vol. 22, No. 7, July 2008, 781–799
Research Article
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
A flexible multi-source spatial-data fusion system for environmental status assessment at continental scale P. CARRARA*{, G. BORDOGNA{, M. BOSCHETTI{, P. A. BRIVIO{, A. NELSON§ and D. STROPPIANA{ {IREA CNR, Institute for Electromagnetic Sensing of the Environment, Via Bassini 15, I20133, Milan, Italy {IDPA CNR, Institute for the Dynamics of Environmental Processes, c/o POINT, Via Pasubio 5, I24044 Dalmine (Bg), Italy §European Commission–DG Joint Research Centre, Institute for Environment and Sustainability, I-21020 Ispra (VA), Italy (Received 11 January 2007; in final form 18 August 2007 ) The monitoring of the environment’s status at continental scale involves the integration of information derived by the analysis of multiple, complex, multidisciplinary, and large-scale phenomena. Thus, there is a need to define synthetic Environmental Indicators (EIs) that concisely represent these phenomena in a manner suitable for decision-making. This research proposes a flexible system to define EIs based on a soft fusion of contributing environmental factors derived from multi-source spatial data (mainly Earth Observation data). The flexibility is twofold: the EI can be customized based on the available data, and the system is able to cope with a lack of expert knowledge. The proposal allows a soft quantifier-guided fusion strategy to be defined, as specified by the user through a linguistic quantifier such as ‘most of’. The linguistic quantifiers are implemented as Ordered Weighted Averaging operators. The proposed approach is applied in a case study to demonstrate the periodical computation of anomaly indicators of the environmental status of Africa, based on a 7-year time series of dekadal Earth Observation datasets. Different experiments have been carried out on the same data to demonstrate the flexibility and robustness of the proposed method. Keywords: Environmental indicator; Continental scale; Fuzzy integration; OWA; Satellite data
1.
Introduction
Operational environmental monitoring techniques aim to provide decision-makers with reliable information to help them to evaluate the effectiveness of current practices and to identify ongoing alarming conditions. However, environmental status assessment relies on the integration of multi-source information, a challenging activity especially when the assessment is at a continental or even global scale. Environmental indicators (EIs) are, indeed, a means to reduce a large quantity of data to a more simple form, while retaining the essential meaning (Ott 1978) for
*Corresponding author. Email:
[email protected] International Journal of Geographical Information Science ISSN 1365-8816 print/ISSN 1362-3087 online # 2008 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/13658810701703183
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
782
P. Carrara et al.
synthesizing the status, response, and development of important aspects of the environment. EI describe complex phenomena, and they are often defined based on the contribution of several factors that contain complementary evidence of the status of the environment. Continental and global scale assessments of the environment require the fusion of spatial data from multiple sources—typically Earth Observation (EO) data— coupled with a modelling framework to allow interpretation (Lenz et al. 2000). In this context, we retain the general definition of data fusion given by Wald (1999) where ‘data fusion is a formal framework in which are expressed the means and tools for the alliance of data originating from different sources to the aims of obtaining information of greater quality’. Nevertheless, it is worth noting that considerable difficulties can arise if data fusion is based on classical decision-making approaches that cannot deal with the uncertainty and imprecision/vagueness that are inherent in environmental data. The main source of this uncertainty is the incomplete and approximate knowledge of the phenomenon: in most cases, the contributing factors affecting the status of the environment, their interrelationships and their influence on the phenomenon, cannot be defined precisely, only approximately. For example, we can roughly say that ‘a large decrease in rainfall will have a negative impact on the vegetation cover of a region’, but it is very hard to identify a crisp threshold in the amount of rainfall that would trigger an alarm. The choice of the appropriate datasets from which to derive the contributing factors is also affected by uncertainty. For example, ‘the increase of seasonal rainfall’ could be determined from the analysis of either daily, weekly, or even monthly rainfall data. Other difficulties relate to the lack of knowledge of the relative importance of the contributing factors, and the incompleteness of the input data. An operational system for environmental assessment should be able to deal with knowledge on the importance of factors, and it should provide useful and reliable results in cases of incomplete or missing data. Several mathematical theories have been proposed to model the fusion of spatial data to cope with uncertainty and/or incomplete knowledge in geographic data analysis (Burrough and McDonnel 1998, Jeon and Landgrebe 1999, Valet et al. 2001, Malczewski 2006, Smith and Singh 2006). Among these theories, fuzzy logic has been successfully applied, since it allows the modelling of the subjective nature of uncertainty and the imprecision/vagueness of our knowledge of environmental phenomena (Silvert 1997, Solaiman 1999, Morris and Jankowski 2001, Tran et al. 2002, Robinson 2003). Soft fusion techniques can be flexibly defined by taking advantage of the rich variety of fuzzy aggregation operators (Dubois and Prade 1994, Bloch and Maıˆtre 1996, Yager 2004). These operators help to create realistic, human-centred representations of environmental phenomena, avoiding oversimplifications based on crisp transitions from one state to another. 1.1
Research aims and system motivations and requirements
In this paper, we propose an architecture for a flexible system to define and compute an EI based on a data-fusion approach (Wald 1999, Chen and Meer 2005). Its aim is to assess environment status at a continental or global scale. The target user of the system is an expert in the application field, familiar with the tools for spatial data
A flexible multi-source spatial-data fusion system
783
analysis, such as image-processing systems, and Geographical Information Systems (GIS). The system is designed to enable the modelling of two classes of EI:
N
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
N
Anomaly Indicator: This is intended for the periodic monitoring of the environment at a continental scale, with the objective of identifying alarming conditions. Alarming conditions are defined as the occurrence of changes (not necessarily negative changes) that are undergoing or are likely to occur in the near future. This indicator is based on the analysis of anomalous conditions of a set of contributing factors. Specific Ecosystem Status Indicator: This aims to provide environmental information on the status of the environment for a specific ecosystem. In order to build this kind of indicator, the specific aspect under analysis must be defined with respect to the broader ‘status of the environment’. For example, an indicator of vegetation regrowth after the fire season in semi-arid ecosystems can provide information on the vegetation growth and vigour that can be useful for ecological studies or graze-management activities. The indicators belonging to this class are specifically tuned to different ecosystems.
The system allows an expert to define a soft fusion strategy capable of integrating data representing heterogeneous information (from values of physical properties to decision judgements) (Dasarathy 1997), to compute an overall indicator. The strategy is expressed by a linguistic quantifier, such as ‘few’, ‘most of’, ‘at least 30%’ (Zadeh 1983, Yager 2004, Malczewski and Rinner 2005). In our approach, the linguistic quantifier is implemented by an Ordered Weighted Averaging (OWA) operator expressing the semantics of a fuzzy majority, which aggregates a set of contributing factor scores—eventually weighted by their relative importance—into the synthetic overall indicator (Yager 1988, 1996). Factor scores are interpreted as sources of partial evidence of the phenomenon to be studied. The use of OWA operators in spatial-data analysis is not completely new (see Chanussot et al. 1999, Jiang and Eastman 2000, Bone et al. 2005, Malczewski and Rinner 2005, for earlier examples). The novelty of our proposal is that we provide a flexible modelling approach for environmental assessment at continental scale where data and especially models are generally lacking. This flexibility is evident in several aspects:
N
N
N
The contribution of factors towards the final indicator can be derived with the aid of either a completely data driven method, so as to alleviate the expert from the difficult task of specifying them when there is little guidance or expert knowledge available (Robinson 2003), or a partially data-driven method where the expert can complement observations with some knowledge of the phenomenon, or even a manually driven approach that completely relies on expert knowledge. The completely data-driven approach is fundamental at continental scale where commonly accepted models could be found lacking. The expert can specify the soft fusion strategy for aggregating the factor scores. This soft fusion is achieved by defining the semantics of the fuzzy majority concept which evaluates the EI indicator (Kacprzyk 1986). The expert can also define the importance of each factor. The user can choose, change, update, or modify the set of contributing factors without the need to explicitly redefine the semantics of the fusion criterion (i.e. the fusion strategy is independent from the factors).
784
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
N
P. Carrara et al. Finally, in the phase of computation of the overall indicator, the system yields a result that copes with missing data. This is achieved by dynamically generating the OWA operator associated with the specified fuzzy majority by taking into account the number of available contributing factors (Yager 1996).
The next section provides a brief overview of spatial-data fusion methods which leads to the motivation and requirements of a continental scale data-fusion system. This is followed by a description of the proposed system architecture and a definition of the system components. Finally, we illustrate the application of the system in a pre-operational case study for Africa, and some experiments demonstrating the capabilities of the system. The proposed system is exemplified with a real case study within the Observatory for Landcover and Forest change (OLF) of the GeoLand Project. One of the main objectives of the OLF is the periodical assessment of environmental status of the African continent— with particular reference to vegetation cover status and change—in order to highlight anomalous situations on the basis of earth observation (EO) and non-EO data. 2.
Spatial-data fusion
Spatial-data fusion is a controversial term that has slightly different meanings in different application contexts such as measures fusion, attribute fusion, and rule or decision fusion. Classical applications are found in: the military field for the detection, tracking and identification of targets; remote sensing for classification and interpretation of images; and spatial-data analysis for decision processes (Dasarathy 1997, Valet et al. 2001). In the sensor fusion community, it refers to raw data which have undergone at most only some preliminary processing such as filtering. In defence applications, data fusion spans raw data as well as processed data which are input to higher-level decisions. Several characterizations of a fusion framework have been proposed in the literature based on the detail of the information in the inputs and outputs: a common taxonomy considers the fusion at data level, feature level, and decision level (Dasarathy 1997, Hall 1997). The US Department of Defence (DOD) developed a multi-level data-fusion functional model, called JDL (Joint Directors of Laboratories), which has widespread use in the military community and is used to determine the identity of individual objects and for assessing situations and threats (US Department of Defence 1991). The JDL defines data fusion as a ‘multilevel, multifaceted process dealing with the automatic detection, association, correlation, estimation, and combination of data and information from single and multiple sources’ (US Department of Defence 1991). More recently, data fusion has been defined as a framework that allies available data from multiple sources, possibly characterized by distinct levels of detail, to generate spatial data of ‘higher quality’, which contains information that is not available in an individual source (Wald 1999, Chen and Meer 2005). This is the interpretation that we adopt in this paper. Essentially, ‘higher quality’ depends on the objective of the application, and it can be defined as either making a better description of a spatial feature, providing a better signal, or leading to a better decision. In our context, we retain this last interpretation, so data of ‘higher quality’ means data characterized by a higher abstraction level than the single spatial-data layers in the inputs which are used to support decisions or assessments. Thus, EI maps are
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
A flexible multi-source spatial-data fusion system
785
sources of information that can support experts and decision-makers in environmental planning and management for identifying alarming conditions or the possible occurrence of environmental disasters, i.e. desertification and erosion. The role of a mathematical theory for formalizing the spatial-data fusion is first to represent the input data in a consistent way, and then to define the way in which these data are combined to generate the EI. The first step is necessary when the input data are heterogeneous. Specifically, spatial data can be characterized by different resolutions, extents, temporal coverages, measurement errors, and ranges of values. These are all sources of uncertainty and imprecision that must be dealt with appropriately when fusing spatial information. A rich survey of multicriteria fusion literature can be found in previous studies (Bloch and Maıˆtre 1996, DeCETI 2000, Valet et al. 2001, Malczewski 2006, Smith and Singh 2006). Probability theory is the most commonly adopted mathematical theory, in which the input data are merged by applying Bayesian rules (Jeon and Landgrebe 1999, Kam 1999). The main drawback of this approach is that source data are considered independent, an assumption which is rarely true in the analysis of environmental variables such as rainfall, vegetation vigour, temperature, and seasonality. This approach also models a very strict fusion where all of the factors must contribute to one degree or another. The Dempster–Shafer theory of evidence (Dempster 1968, Shafer 1976) constrains the data fusion based on the Dempster–Shafer rule, while flexibility is incorporated in the definition of the mass functions (i.e. the degree of belief assigned to a solution). The Dempster–Shafer rule generates errors when the degree of conflict among the single sources of evidence that support each of the considered hypotheses becomes relevant (Zadeh 1979, Wu et al. 2002, Corgne et al. 2003). This approach is too rigid to model the kind of applications we are considering, since our aim is to build up a robust fusion system able to generate outputs, even when the sources of evidence for a phenomenon and the fusion criteria are both ill-defined. Approaches based on neural networks are particularly useful for modelling complex processes such as those involved in pattern recognition (Wan and Fraser 1999). Their applicability is limited by the need for large data sets to train the networks which are scarce in many real applications, and are certainly lacking in the applications targeted here. Fuzzy set theory has been applied in data fusion to flexibly model the expert knowledge of the fusion strategy by means of fuzzy aggregation operators which can be defined with a severe, compromise, or indulgent behaviour, corresponding with the modelling of a risk taken, a risk trade-off, or a risk adverse decision attitude, respectively (Robinson 2003, Yager 2004). If fuzzy set theory is coupled with possibility theory (which represents both the possibility and the necessity degrees of the occurrence of an event), then uncertainty in the information can be represented and managed (Dubois and Prade 1994). A combination of these two approaches is appealing, since they can be defined to model a variety of real situations in a flexible way. In this context, we formalize our proposal of a flexible system to support the data analyst in defining a model for the estimation of an EI at continental scale. The approach enables the definition of an EI based on a soft quantifier-guided fusion strategy that aggregates factor scores. These scores are computed by the evaluation of soft constraints on contributing factors. The soft fusion strategy is defined by a linguistic quantifier corresponding to a fuzzy majority and implemented by an OWA operator (Yager 1988).
786
P. Carrara et al.
Jiang and Eastman (2000) outlined the usefulness of modelling soft fusion strategies by means of fuzzy measures that aggregate factor scores obtained by nonlinear scaling. In their paper, they show how it can be useful in some real applications to normalize the factor scores in the real domain [0,1] by means of a nonlinear function, and further how these scores can be aggregated by the use of fuzzy measures so as to reflect a distinct decision attitude towards risks of the decision-maker. We build upon this work by providing a system to guide the definition and computation of EIs.
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
3.
Flexible system for EI modelling and generation
The proposed system is highly interactive and allows the user to define and tune the EI, depending on the spatial and temporal scale of analysis. Figure 1 is an overall representation of the system architecture structured into two distinct levels. The lower rectangle is called the active expert’s knowledge base. It depicts the building blocks of the expert’s knowledge—which are often incomplete and vague—that are necessary to define the EI. The upper space represents the main functional components of the system with their input and output data, and control information from the expert’s active knowledge-base. The analyst interacts with the system at several stages; working on the input data, producing intermediate output data and generating the final results, i.e. the EI values which are generally presented as maps. The two levels communicate by means of flows of knowledge, and control information which are represented by thick grey vertical arrows in figure 1. The system applies the user’s knowledge to drive the upper functional modules thus creating new products which, in turn, can be exploited by the user to improve and refine their knowledge. Several feedback cycles can be necessary to properly tune the final EI model. In particular, the expert can revise the list of contributing factors, and the semantics of both the score functions defining the soft constraints and the fusion criterion. 3.1
Logic elements of the active expert’s knowledge-base
3.1.1 Environmental indicator. This logical element identifies the piece of knowledge that describes the general meaning of the user-defined EI. Specifically, an EI is represented by a quintuple (EI-name, (N, M ), SR, TR, geo), where EI-name is a string identifying the name of the EI and (N, M) is a pair of values, specifying the dimension of the output EI maps in spatial units (e.g. pixels). SR and TR are the desired spatial and temporal resolutions of the output maps, respectively. They are conditioned by the resolutions of the available input data. geo is the geo-referencing system of the output EI map. It is assumed that the input spatial data share the same resolution and georeferencing system as the synthetic EI output map. If this is not the case, they have to be converted into grids (raster representation) of dimension N6M and resolutions SR and TR. The domain of the EI values is in the range [0,1]. 3.1.2 Contributing factors. The expert must identify the factors that are thought to influence the EI. Any factor deemed relevant can be included, even when the information is redundant. For efficiency reasons, redundancy can be limited to reduce the amount of data processing, although the inclusion of redundant factors can be useful in order to cope with missing data sources, thus improving the robustness of the system.
787
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
A flexible multi-source spatial-data fusion system
Figure 1. Functional components of the flexible system for supporting the definition and application of multi-source spatial data fusion.
The input data often require further spatial or statistical analysis to generate a contributing factor. This analysis can include:
N N N
a temporal synthesis of each spatial unit (e.g. the maximum and minimum temperatures over a given period); a spatial synthesis over an area surrounding each spatial unit (e.g. the difference in value of the indicator with respect to the average value within the same land cover class); a combination of the previous two.
788
P. Carrara et al.
The piece of knowledge defining a set of k contributing factors is represented by k quintuples:
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
ðFi , Domi , Di , OPi , Ii Þ, i~1 to k
ð1Þ
where: Fi is the name of the ith contributing factor; Domi is the domain of the values of Fi; Di is the set of input data from which the factor Fi can be defined; OPi is a spatiotemporal operation that is used to generate the map of dimension (N, M) and resolution SR, representing the factor Fi from the input data in Di. Ii is the relative importance of the contributing factor Fi in determining the overall EI value. It can be just an index on an ordinal scale in the range of {1, …. k}, where k is the number of defined contributing factors, so that the greater the importance of a factor (i.e. its influence), the smaller the ordinal index Ii. The operation OPi may be either a simple identity, associating a single data layer to Fi, or a complex operation, involving several transformations to convert the input data into maps of desired spatial and temporal resolutions. 3.1.3 Factor score functions. The factor values that are generated from the available input data do not contribute directly to the computation of the EI value. First, we apply a constraint on the values of the contributing factor, and only those factor values that satisfy the constraint can be interpreted as partial evidence of the EI, and thus contribute to the overall EI value. However, since the expert’s knowledge is often vague in this respect, then we allow the representation of the possible influence of a factor Fi by means of soft constraints. Soft constraints, hereafter named factor score functions, are defined as fuzzy subsets on the contributing factor domain Domi: mFi : Domi ?½0, 1
ð2Þ
where mFi is the factor score function of the ith contributing factor. For example, in the case of an indicator of a climatic anomaly, a contributing factor can be defined as the difference (D) between the current temperature and the long-term average temperature for a time period. A factor score function mD can be defined as shown in figure 2(a). The value of mD tends to zero as D approaches zero, i.e. as the current temperature tends towards the long-term average temperature.
Figure 2. Forms of score functions. Completely data driven (a): the factor score function (black line) derived as the complement of the curve (dotted line) interpolating the histogram (circle markers). Partially data driven (b): the shape of the function is chosen by the analyst and the threshold value derived from the analysis of data (e.g. K1 could be the max historical value of the contributing factor).
A flexible multi-source spatial-data fusion system
789
There are several advantages in using soft constraints:
N N
N
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
N
Soft constraints are a means to represent the vagueness of the expert’s knowledge when generating an EI. They admit degrees of satisfaction in the continuous [0,1] range, which is a more adequate representation of reality since the transitions from one environmental status to another are rarely abrupt but rather characterized by gradual changes (Robinson 2003). When fusing multiple factors it is necessary to normalize them to the same domain so as to achieve consistency and comparability. Since the contributing factors have distinct domains, the computation of the degrees of satisfaction of the respective soft constraints in the [0,1] range also serves the purpose of standardization (Jiang and Eastman 2000). Nonlinear scaling of the factor values is possible (Malczewski 2006), since a soft constraint can be defined with any membership function that is deemed meaningful for the application.
3.1.4 Fusion criterion. The two most commonly applied procedures for aggregating standardized factors in a GIS are Boolean Combination or Weighted Average. These approaches suffer from several drawbacks, e.g. they are too rigid, and they do not allow a choice of the trade-off level of the criteria, where the trade-off level qualifies the decision risk implicit in the fusion strategy (Jiang and Eastman 2000). The analyst’s attitude in fusing contributing factor scores is seldom founded on quantitative criteria (especially at the continental or global level), and this is characterized by a specific decision risk (Yager 2004). For this reason, the proposed system makes it possible to implement fusion strategies characterized by distinct decision risks, and without the need to state precisely the relationships between the contributing factors and the modelled EI. Rather, the fusion strategy can be defined to resemble the expert’s decision attitude, based on incomplete knowledge of the phenomenon, thus implementing the mutual reinforcement of partial, complementary and redundant pieces of evidence. The fusion operation is conceived as a quantifier-guided aggregation that computes an overall synthetic EI value based on the evaluation of a fuzzy majority of distinct contributing factor scores which can be weighted by their importance. A fuzzy majority can be expressed by a relative monotone non-decreasing linguistic quantifier Q—such as ‘most of’—and is defined by a fuzzy set: Q : ½0, 1?½0, 1
ð3Þ
Experts confident in their knowledge of the phenomenon are likely to select a crisp majority of contributing factors, for example by specifying the minimum proportion of factors that are necessary to determine the EI evidence. Conversely, experts willing to take into account the vagueness of their knowledge can specify a fuzzy majority so as to compute degrees of satisfaction of the EI. The use of relative quantifiers (Q) makes it possible to implement a robust fusion strategy, independent of the actual number of factors available (Yager 1992, 1994). The correspondent fusion function is implemented by an OWA operator that is automatically and dynamically generated given the relative linguistic quantifier definition, and by considering the available factors.
790
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
3.2
P. Carrara et al. Functional modules of the flexible system
3.2.1 Contributing factors generation module. This module implements a highly interactive operative environment, allowing a user to create contributing factors according to the definitions in the knowledge base. In the simplest case, a contributing factor is plainly associated with the values of one input data layer (when the operator OPi is an identity function). If OPi corresponds to several spatio-temporal operations, this module offers dataprocessing facilities to perform these operations easily. It also allows the user to specify the relative importance Ii of the selected factor on an ordinal scale. While it is often difficult to specify the values of importance by numbers, it is much more natural to specify them either in a linguistic form with the use of qualifiers among which an order is defined (for example very important, important, fairly important, not important), or by the aid of a graphic representation, such as either moving a cursor on a bar or selecting a grey level on a scale. These ordinal values are automatically converted into degrees of importance di g [0,1]. Notice that we do not force importance degrees to sum to 1. In some cases, the degrees of importance can be directly computed based on statistical or multivariate analysis. 3.2.2 Score functions definition module. This module supports the definition of the soft constraints, i.e. the factor score functions on the domain Domi of the contributing factors Fi. To this end, this module offers some GIS functionality and tools for statistical analysis. The system has three options for defining the soft constraints according to the expert’s knowledge (Robinson 2003): (a) completely data-driven, (b) partially data-driven, and (c) user-driven. In case (a), the factor score functions are defined, based on the statistical analysis of a set of values from the contributing factor. For example, if the contributing factor is the difference between the current temperature and its long-term average, its frequency histogram describes how the factor behaves on average, and hence which range of values can be considered normal, i.e. not anomalous. The interpolation of the histogram provides the normality function (the dotted curve in figure 2(a)), where values generating scores close to 1 may be considered fully normal, while values close to 0 indicate an anomaly. The complement of this function (black curve in figure 2(a)) quantifies the observed phenomenon’s deviation from normality. In the case where a priori or expert’s knowledge is available to the analyst, the data-driven score function can be integrated/modified by introducing a simple model (partially data-driven approach). An example of this class of factor score functions is represented in figure 2(b), where the relationship between data and evidence supporting the analysis of the observed phenomenon is described by a simple linear function, while analysis of data is used to support the definition of the threshold. In these cases, the membership function shape is suggested by the analyst, while the function parameters are derived from the analysis of historical data. Finally, the user-driven factor score functions are completely defined based on information available to the analyst (heuristic knowledge). This is the case in which there is an accepted model that quantifies the influence of the contributing factor to the phenomenon under observation. This is, however, a very rare condition in the assessment of EIs at continental/global scale. Though heuristic knowledge could greatly improve the accuracy of the results, the proposed methodology does not rely on it. That is, if no information is available or
A flexible multi-source spatial-data fusion system
791
cannot be easily formalized by the analyst, then a purely statistical analysis can be performed. However, any previous studies and experience that provide some models and values can be easily included in the methodology by the definition of crisp functions taking values in the set {0,1}. 3.2.3 Factor score computation module. This module evaluates the soft constraints mFi in each value of the contributing factor map generated by the ‘Contributing factors generation module’. The obtained results are the factor scores pi g [0,1], i.e. the degrees of satisfaction of the soft constraints mFi , evaluated according to the following expression:
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
pi ðm, nÞ~mFi ðFi ðm, nÞÞ
ð4Þ
where (m, n) are the coordinates of a pixel in both the maps. Both this module and the following one support facilities to generate spatiotemporal syntheses of results required to improve experts’ analysis. 3.2.4 Soft fusion module. The soft fusion module aggregates the factor scores pi— possibly weighted by their numeric importance degrees di—by applying the fusion function defined in the ‘Soft fusion criterion definition module’. The fusion function—represented by a monotone non-decreasing relative quantifier Q—is implemented through an OWA operator (Yager 1988). An OWA k operator of dimension k is a function OWA:[0,1] Pk R[0,1], with associated weighting vector W5(w1, …, wk), so that wi g [0, 1] and i~1 wi ~1, which aggregates a set of given factor scores {p1, …, pk} according to the following expression: OWAðp1 , . . . , pk Þ~
k X
wj psð jÞ
ð5Þ
j~1
where s:{1, …, k}R{1, …, k} is a permutation such that ps(j)>ps(j + 1), Yj51, …, (k21), i.e. ps(j) is the jth highest value in the set {p1, …, pk}. Through the permutation s, the elements {p1, …, pk} are sorted in decreasing order and each weight wi of the OWA is associated with an ordered position. Notice the distinct semantics of the importance degrees d1, …, dk, in which dj is manually specified by the expert and is uniquely associated with the jth factor score, and of the weighting vector W5(w1, …, wk), automatically derived by the quantifier Q. Weight wj is associated not with the jth factor score but with the jth element in the ordered list of factor scores ranked from the greatest to the smallest. Then, wj is associated with a different factor score depending on the values of all the factor scores that are being aggregated. We can model the distinct semantics of the OWA aggregation by changing the associated weighting vector W. For example, an OWA* with w*5(1, 0, … 0) selects the maximum of the factor scores, while when w*5(0, 0, … 1), the OWA* selects the minimum. The average is modelled by an OWAave with wave5(1/k, 1/k, … 1/k). 3.2.5 Soft fusion criterion definition module. This module allows the definition of the fusion criterion as a relative quantifier-guided aggregation function. The user can either select a predefined linguistic quantifier such as most of, at least x%, from an available list, or define a quantifier by specifying a membership function Q: [0,1]R[0,1] (Yager 1996). To this end, a parametric Q function is used as prototype to easily derive the desired quantifier (figure 3).
792
P. Carrara et al.
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
Figure 3.
Semantics of a relative monotone non-decreasing quantifier Q.
The distance between a and b in figure 3 describes the gap in satisfaction in having one less non-null factor score, while the size of the range of values greater than b is related to the percentage of non-null factor scores which contribute to the final score. If c51, the function trend between a and b is linear (a pure average), while if c.1, the function is concave, and if c,1, the function is convex. The values of the k dimensional OWAQ weighting vector WQ, which correspond to a fuzzy majority, can be directly computed from the definition of the relative quantifier Q, expressing the semantics of the fuzzy majority as follows: i i{1 {Q , i~1, . . . , k ð6Þ wi ~Q k k Each wi represents the increment in satisfaction in having i% non-null contributing scores with respect to having only (i21)%. Since Q is a relative quantifier, we can use it to compute the correspondent WQ with a desired dimension k. This dynamic computation of the OWA weighting vector W copes with potentially missing data and thus makes the soft fusion module robust. When different numeric importance degrees d1, … dk are associated with the contributing factors, the weighting vector WQ is computed as follows: X X 1 1 i i{1 e {Q e ð7Þ wi ~Q j~1 j j~1 j e e in which ej is di associated with the jth largest of the k factor scores, and e is defined as: Xk Xk e~ e ~ d ð8Þ j j~1 i~1 i This way, the increment in satisfaction in having j% non-null factor scores with respect to (j21)% increases with the importance ej. The factors which have no importance play no role. The output of this module constitutes the input of the ‘Soft fusion module’ that applies the aggregation defined by the OWAQ to the factor scores. In cases where the sum of degrees di is normalized to 1, equation (7) can be simplified to: Xi Xi{1 wi ~Q e e {Q ð9Þ j j~1 j~1 j
A flexible multi-source spatial-data fusion system 4.
793
Case study: An anomaly indicator for Africa
The approach described in the previous sections has been applied to define and compute an Anomaly Indicator (AI) of the environmental status for all Africa. The indicator aims to highlight anomalous conditions and changes in vegetation component of African ecosystems at continental scale. Although commonly used with a negative meaning, an anomaly (i.e. change) does not necessarily have a negative connotation (Lambin et al. 2003).
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
4.1
Description of the available data
The spatial data available for this application are vegetation phenology and rainfall. Vegetation phenology is described by the date of the start of season, the peak of the season represented by the maximum value of the Normalized Difference Vegetation Index (NDVI), and the length of greenness (number of dekads). It is derived from the analysis of the time series of NOAA Advanced Very High Resolution Radiometer (AVHRR) Global Area Coverage (GAC) satellite images. Specifically, the dataset is composed of 10-day synthesis of NDVI at 8 km spatial resolution for the period 1990–2002 over the African continent. The continental 10day rainfall estimates (mm) of the Famine Early Warning System Network (FEWSNET) Meteosat Rainfall Estimation (RFE) dataset were used to derive 30-day cumulated estimates, for each dekad between 1996 and 2002 with an 8 km spatial resolution. The output AI maps have the same spatial resolution (SR58 km) with TR51 month. All the data were geo-referenced to a common system (geographic projection datum WGS84). 4.2
Contributing factors defined for the Anomaly Indicator
The input data were processed to generate four contributing factors. Each factor corresponds to an available dataset and is labelled after the associated input layer, i.e. F15start of greenness, F25peak of greenness, F35length of greenness, and F45rainfall. Since we are interested in detecting anomalous behaviour, each contributing factor was derived from the correspondent input data layers as the difference between the current value and the long-term average (LTA), which was calculated on the available historical datasets (1996–2002). Therefore, the generation operator OPi, applied pixel by pixel, was formalized as follows: OPi ðm, nÞ~DFik ðm, nÞ~Fik ðm, nÞ{LTAF k ðm, nÞ i
ð10Þ
where: DFik (m, n) is the delta (i.e. anomaly) of factor Fi in the pixel (m, n) for the dekad tk; Fik (m, n) is the value of factor Fi in the pixel (m, n) for the dekad tk; LTAFik(m, n) is the Long-Term Average of factor Fi in the pixel (m, n) for the dekad tk. Obtained values are then cumulated over 30 days. 4.3
Factor score functions definition
The soft constraints used to evaluate factor scores were derived with a completely data-driven approach for factors F1, F3, and F4 and a partially data-driven approach for F2. Since the ecosystems are characterized by specific meteorological and seasonal behaviour, the Global Land Cover 2000 (GLC2000) map (Bartholome´ and Belward 2005) was used as a stratification criterion to characterize the behaviour of the selected factor for each land-cover class.
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
794
P. Carrara et al.
For each land-cover class and dekad, the score functions defining the soft constraints of F1, F3, and F4 were derived as the 1-complement of the best Gaussian function fitting to the frequency histogram of the delta values computed from the historical dataset (figure 2(a)). The best Gaussian fit represents the normal behaviour, based on the underlying assumption that the most frequent cases are the normal ones. The anomaly scores were derived from these score functions and have a range [0,1]. Figure 4 shows an example of the results for the factor F1 (i.e. start of greenness) in a specific dekad for two GLC2000 land cover classes. The example shows that Grassland/savanna class (GLC_13) is characterized by a more homogeneous start of greenness, compared with evergreen forest (GLC_1), and so small changes in the date of the start of greenness in GLC_13 will probably have a higher anomaly score. The score function of F2 (peak of greenness) was interpreted in a different way, according to a consolidated conceptual model for which NDVI Peak values higher than LTA are not anomalous conditions; therefore, the partially data-driven function is formalized as follows: AFk2 ðm, nÞ~1{Gaussian fit for DF2k ðm, nÞv0
ð11Þ
AFk2 ðm, nÞ~0 for DF2k ðm, nÞ§0
ð12Þ
where AF2k (m, n) is the anomaly score of factor 2 in the pixel (m, n) for the dekad t k. 4.4
Soft fusion of contributing factors
The anomaly scores of the four contributing factors were computed for each dekad of the period 1996–2002 in the ‘Factor scores computation module’ and averaged to obtain monthly values. The synthetic AI was computed by aggregating all the monthly factor anomaly scores using an OWAQ operator. The OWAQ operator is a weighted average of the four monthly factor scores with the function Q defined by the parameters a50, and b5c51 (figure 3). This function defines a quantifier-guided fusion reflecting a neutral attitude. Since it is not possible to state the relative importance of the factors at a continental scale, we decided to set the importance
Figure 4. Factor F1 (i.e. start of greenness delta) frequency distribution of two GLC2000 classes (a): the grey curve represents the behaviour of evergreen forest (GLC_1) shown in map (b) as grey pixels, while the black curve corresponds to the behaviour of grassland/savannah (GLC_13) (black pixels in map (b)).
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
A flexible multi-source spatial-data fusion system
795
degrees based on the different levels of uncertainty associated with each contributing factor quantified by the Relative Mean Squared Error (RMSE) (i.e. the higher the uncertainty, the lower the importance of the factor in the synthetic indicator). The correspondent weighting vector WQ was computed according to equation (1), where di5(1/RMSEi), for i51, … 4, is the importance value of the ith factor. These importance values are 0.22, 0.37, 0.30, and 0.11 for F1, F2, F3, and F4, respectively. Figure 5 shows the monthly AI maps for the year 1998. The colour scale represents areas of increasing anomaly from green (no anomaly) through yellow (somewhat anomalous), to red (strong anomaly), whereas white areas are oceans, inland water basins, and deserts. The AI maps highlight a high inter-annual variability where anomalous patterns can be clearly observed. In particular, strong anomalies (orange and red areas) have been detected for the entire year in Eastern Africa (Sudan–Ethiopia–Kenya), and in the southern regions of Africa (Namibia) after June. There are also areas that have somewhat anomalous conditions for the whole year, such as central Africa (Congo) and sub-Saharian regions (NigeriaChad). The latter also shows stronger anomalies at the end of the year. Nicholson and Entekhabi (1986) state that the El Nin˜o phenomenon can be considered the most dominant perturbation responsible for inter-annual climate variability over southern and eastern Africa. The 1997–1998 El Nin˜o event is recognized as being the strongest on record, and the anomalies identified in our AI maps agree with the conditions that were experienced for that period. A validation of the map results with independent datasets is difficult due to both the continental scale of analysis and the synthetic character of the AI. Nevertheless, the comparison with documented climatic anomalies related to El Nin˜o events, which are widely accepted to affect the vegetation (Anyamba et al. 2002), supports the reliability of our AI results.
Figure 5.
Monthly maps of Anomaly Indicator for Africa for the year 1998.
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
796
P. Carrara et al.
4.5
Further experiments with different fusion strategies and with missing data
To test the flexibility and robustness of the system, we carried out two experiments aiming to analyse the effect of other aggregation criteria on the AI by using different quantifiers, and of missing data by excluding one of the input factors. These experiments were performed on a region of interest (ROI) of 500 pixels located in Eastern Africa (border between Kenya, Ethiopia, and Somalia). Figure 6 shows the results of the experiments. In the first experiment, all the four contributing factors F1, F2, F3, and F4 were considered. Figure 6(a) depicts the temporal trends of the anomaly scores for each of the four factors. These were then aggregated using three different criteria (representing cautious, neutral and alarming attitudes) by changing the parameters of the Q function. The resulting AI temporal trends are shown in figure 6(b) (three curves labelled as 4F). The upper curve in the legend (parameters a50, b5c51) corresponds to a neutral attitude; the second curve (a5c50.25, b50.75) and the third (a50, b51, c54) represent an alarming and cautious attitude, respectively. As expected, when using all the factors with an alarmist fusion criterion, the AI values are generally higher than in the neutral attitude, and they are always higher than those in the cautious fusion. In general, the neutral approach requires consensual evidence from all the factors in order to produce high AI scores. The cautious fusion criterion produced an AI trend that was deeply influenced by the F2 profile, due to the high level of importance of this factor. It is interesting to observe
Figure 6. Anomaly scores profiles for the available factors (a) and resulting AI trends (b) obtained from the four factors (4F) with three different soft fusion criteria (neutral, alarming and cautious) and from three factors (3F) with a neutral criterion extracted from Eastern Africa for the period 1996–2002. Year 1998, whose maps are shown in figure 5, is highlighted by vertical dotted lines.
A flexible multi-source spatial-data fusion system
797
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
that high anomalous values (,0.8) are indicated by all approaches for the year 1998. This supports the previous results that suggested that anomalous conditions were experienced in that period, whatever attitude is adopted. These results also suggest that the comparison of AI maps that are generated by applying distinct soft fusion criteria may be useful to identify areas with stable AI values, thus reinforcing the accumulation of evidence approach. Figure 6(b) also shows the temporal trend of the AI in the second experiment, where we excluded the second factor F2 (Peak of greenness) and aggregated the remaining factors with a neutral attitude (the curve labelled 3F in the legend). No modification of the score functions was necessary, and the importance degrees were automatically re-normalized by the system. The results show that the system is able to produce an output that has a similar trend to the neutral 4F output. In particular, it is still able to highlight the strong anomaly that occurred in 1998. 5.
Conclusion
This paper proposes an architecture for a flexible system for implementing a soft approach to modelling EIs at a continental and global scale, and for producing indicator maps. The design of this system has addressed the following requirements for generating environmental indicator maps:
N N N
the need to cope with the lack of established continent-wide models of the indicators; the scarcity/poorness of datasets (which must be based also on EO records); and the need to encourage experts’ active participation in the indicators modelling.
These requirements led to the definition of a flexible system that is capable of coping with uncertainty and imprecision. The proposed system has been tested in the case of the generation of an anomaly indicator for Africa. The soft fusion model that was defined to create the AI maps proved to be easily feasible and required no in-depth background information or knowledge, other than the usual expertise of climate and environment analysts. The results were consistent with the known climatological events in the continent. The proposed system was therefore able to produce results at continental scale without referring to previously established models. It allowed the participation of users in the choice of the contributing factors, the assignment of their relative importance, and the definition of the functions to evaluate factor scores in different ways (either totally or partially driven by data, or based on existing models). The flexibility and robustness of the system have been successfully demonstrated by implementing different fusion strategies and omitting input factors while still achieving plausible and consistent results. Acknowledgements This work has been carried out within the Observatory for Land cover and Forest change (OLF) of the GeoLand project (2004–2006). GeoLand (http://www.gmesgeoland.info) is an Integrated Project of the European Union 6th Framework Programme focusing on GMES (Global Monitoring for Environment and Security) priorities ‘Land cover change & environmental stress in Europe’ and ‘Global vegetation monitoring’. Particular thanks to Bruno Combal (JRC-EC) who provided time series of data on vegetation phenology in the frame of GeoLand-OLF activities. The authors also wish to thank the anonymous referees for their helpful comments.
798
P. Carrara et al.
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
References ANYAMBA, A., TUCKER, C.J. and MAHONEY, R., 2002, From El Nin˜o to La Nin˜a: Vegetation response pattern over East and Southern Africa during the 1997–2000 period. Journal of Climate, 15, pp. 3096–3103. BARTHOLOME´, E. and BELWARD, A., 2005, GLC2000: a new approach to global land cover mapping from Earth observation data. International Journal of Remote Sensing, 26, pp. 1959–1977. BLOCH, I. and MAIˆTRE, H., 1996, Information combination operators for data fusion: A comparative review with classification. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 26, pp. 52–67. BONE, C., DRAGICEVIC, S. and ROBERTS, A., 2005, Integrating high resolution remote sensing, GIS and fuzzy set theory for identifying susceptibility areas of forest insect infestations. International Journal of Remote Sensing, 26, pp. 4809–4828. BURROUGH, P.A. and MCDONNEL, R.A., 1998, Principles of Geographical Information Systems (Oxford: Oxford University Press). CHANUSSOT, J., MAURIS, G. and LAMBERT, P., 1999, Fuzzy fusion techniques for linear features detection in multitemporal SAR images. IEEE Transactions on Geoscience and Remote Sensing, 37, pp. 1292–1305. CHEN, H. and MEER, P., 2005, Robust fusion of uncertain information. IEEE Transactions on Systems, Man and Cybernetics, Part B, 35, pp. 578–586. CORGNE, S., HUBERT-MOY, L., DEZERT, J. and MERCIER, G., 2003, Land cover change prediction with a new theory of plausible and paradoxical reasoning. In Proceedings of the 6th International Conference of Information Fusion, 2003. Fusion 2003 Conference, 8–11 July 2003, Cairns, Au, pp. 1141–1148. DASARATHY, B.V., 1997, Sensor fusion potential exploitation—innovative architectures and illustrative applications. Proceedings of the IEEE, 85, pp. 24–38. DECETI PROJECT 2000, Multi-sources information fusion for satellite image classification. Electronic report of the DeCETI Project, Leonardo da Vinci Programme, Strand II, Measure II.1.1.C, Contract No. GR/1996/II/0953/PI/II.1.1.c/FPC. Available online at: http://www.survey.ntua.gr/main/labs/rsens/DeCETI/IRIT/MSI-FUSION/ (accessed 20 December 2006). DEMPSTER, P., 1968, A generalization of the Bayesian inference. Journal of Royal Statistical Society, 30, pp. 205–447. DUBOIS, D. and PRADE, H., 1994, Possibility theory and data fusion in poorly informed environments. Control Engineering Practice, 2, pp. 811–823. JEON, B. and LANDGREBE, D.A., 1999, Decision fusion approach for multitemporal classification. IEEE Transactions on Geoscience and Remote Sensing, 37, pp. 1227–1233. JIANG, H. and EASTMAN, J.R., 2000, Application of fuzzy measures in multi-criteria evaluation in GIS. International Journal of Geographical Information Science, 14, pp. 173–184. HALL, D., 1997, An Introduction to multisensor data fusion. Proceedings of the IEEE, 85, pp. 6–23. KACPRZYK, J., 1986, Group decision making with a fuzzy linguistic majority. Fuzzy Sets and Systems, 18, pp. 105–118. KAM, M., 1999, Performance and geometric interpretation for decision fusion with memory. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 29, pp. 52–62. LAMBIN, E., GEIST, H.J. and LEPERS, E., 2003, Dynamics of land-use and land-cover change in tropical regions. Annual Review of Environmental Resources, 28, pp. 205–241. LENZ, R., MALKINA-PYKH, I.G. and PYKH, Y., 2000, Introduction and overview. Ecological Modelling, 130, pp. 1–11. MALCZEWSKI, J., 2006, GIS-based multicriteria decision analysis: a survey of the literature. International Journal of Geographical Information Science, 20, pp. 703–726.
Downloaded by [Universiteit Twente.] at 04:47 03 May 2016
A flexible multi-source spatial-data fusion system
799
MALCZEWSKI, J. and RINNER, C., 2005, Exploring multicriteria decision strategies in GIS with linguistic quantifiers: A case study of residential quality evaluation. Journal of Geographical Sytems, 7, pp. 249–268. MORRIS, A. and JANKOWSKI, P., 2001, Fuzzy techniques for multiple criteria decision making in GIS. In Joint 9th IFSA World Congress and 20th NAFIPS International Conference, 25–28 July 2001, Vancouver, pp. 2446–2451 (CD-ROM proceedings). NICHOLSON, S.E. and ENTEKHABI, D., 1986, The quasi-periodic behavior of rainfall variability in Africa and its relationship to the Southern Oscillation. Journal of Climate and Applied Meteorology, 34, pp. 331–348. OTT, W.R., 1978, Environmental Indices: Theory and Practice (Ann Arbor, MI: Ann Arbor Science). ROBINSON, P.B., 2003, A perspective on the fundamentals of fuzzy sets and their use in Geographic Information Systems. Transactions in GIS, 7, pp. 3–30. SHAFER, G., 1976, A Mathematical Theory of Evidence (Princeton, NJ: Princeton University Press). SILVERT, W., 1997, Ecological impact classification with fuzzy sets. Ecological Modelling, 96, pp. 1–10. SMITH, D. and SINGH, S., 2006, Approaches to multisensor data fusion in target tracking: a survey. IEEE Transactions on Knowledge and Data Engineering, 18, pp. 1696–1710. SOLAIMAN, B., 1999, Multisensor data fusion using fuzzy concepts: application to land-cover classification using ERS-1/JERS-1 SAR composites. IEEE Transactions on Geoscience and Remote Sensing, 37, pp. 1316–1326. TRAN, L.T., KNIGHT, C.G., O’NEILL, R.V., SMITH, E.R., RIITTERS, K.H. and WICKHAM, J., 2002, Environmental assessment, fuzzy decision analysis of integrated environmental vulnerability assessment of the Mid-Atlantic region. Environmental Monitoring, 29, pp. 845–859. US DEPARTMENT OF DEFENCE 1991, Data fusion subpanel of the Joint Directors of Laboratories, Tech. Panel for C3, ‘Data fusion lexicon’. VALET, L., MAURIS, G. and BOLON, P., 2001, A statistical overview of recent literature in information fusion. IEEE AESS Systems Magazine, 1, pp. 7–14. WALD, L., 1999, Some terms of reference in data fusion. IEEE Transactions on Geoscience and Remote Sensing, 37, pp. 1190–1193. WAN, W. and FRASER, D., 1999, Multisource data fusion with multiple self-organizing maps. IEEE Transactions on Geoscience and Remote Sensing, 37, pp. 1344–1349. WU, H., SIEGEL, M., STIEFELHAGEN, R. and YANG, J., 2002, Sensor fusion using Dempster– Shafer Theory. In IEEE Instrumentation and Measurement Technology Conference, 21–23 May 2002, Anchorage, AK, pp. 7–12. YAGER, R.R., 1988, On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Transactions on Systems, Man and Cybernetics, 18, pp. 183–190. YAGER, R.R., 1992, Applications and extensions of OWA Aggregations. International Journal of Man–Machine Studies, 37, pp. 103–122. YAGER, R.R., 1994, Interpreting linguistically quantified propositions. International Journal of Intelligent Systems, 9, pp. 541–569. YAGER, R.R., 1996, Quantifier guided aggregation using OWA operators. International Journal of Intelligent Systems, 11, pp. 49–73. YAGER, R.R., 2004, A framework for multi-source data fusion. Information Sciences, 163, pp. 175–200. ZADEH, L.A., 1979, On the validity of Dempster’s rule of combination of evidence, Memo M79/24, University of California, Berkeley. ZADEH, L.A., 1983, A computational approach to fuzzy quantifiers in natural languages. Computers and Mathematics with Applications, 9, pp. 149–184.