merging is addressed a antother volume of the Handbook of Fuzzy Sets (Grabisch et ...... (u), that he calls "weighted max-min function", is equal to Sg(u), the.
6 MERGING FUZZY INFORMATION Didier Dubois Henri Prade Ronald Yager
Abstract: This chapter deals with an important issue pertaining to intelligent information processing systems, that of managing information coming from several sources. Possibility th eory and the body of aggregation operations from fuzzy set theory provide some tools to address this problem. The fusion of imprecise information is carefully distinguished from the estimation problem. The approach to fusion is set-theoretic and the choice of conjunctive versus disjunctive fusion modes depends on assumptions on whether all sources are reliable or not. Quantified, prioritized and weighted and fusion rules are described. Fuzzy extensions of estimation processes are also discussed. The approach, based on conflict analysis, applies to sensor fusion, aggregation of expert opinions as well as the merging of databases.
6.1
INTRODUCTION
The problem of combining pieces of evidence issued from several sources of information can be encountered in various fields of application, particularly in i) sensor fusion, i.e., when pieces of information coming from different sensors are to be aggregated, ii) multiple source interrogation systems where several databases can provide precise, imprecise or uncertain information about values of interest, iii) expert opinion pooling, when different individual statements have to be synthesized.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
This problem is rather new, and is one that has typically appeared with the emergence of smart computers in the age of information. Our capacity of sensing, storing and retrieving information has tremendously increased, and the need for merging information becomes very strong. When this information is not reliable, the fusion problem comes close to an old question that has bothered 17th century scholars in probability theory (Shafer, 1978, 1986): how to model and combine the testimonies of unreliable witnesses in a law court? To-date human witnesses are replaced by robot sensors (Abidi and Gonzalez, 1992), intelligent datases (Baral et al., 1992; Cholvy, 1998), or cameras (Bloch and Maitre, 1997); expert opinions about the probability of unlikely events are also collected and exploited for system safety assessment when statistics are missing, as in nuclear engineering (Cooke, 1988, 1991), or environmental problems (Bardossy et al.,1993). Traditionally, probability theory had little to say about information fusion except under very restrictive assumptions. Classical logic or fuzzy logic provide fusion connectives but give no recipe as to what is the good combination mode in a given situation. The typical fusion situation is when several possibly dependent sources of various nature supply heterogeneous incomplete non fully reliable data about some question of interest, about which no prior information is available, nor is there information about which source may be erroneous. In a robotics application, sensors may be very different from one another. In surveillance problems some information is provided by cameras, or radars, and other information may come from humans in a linguistic format. This chapter gives an overview of what fuzzy set theory can offer to address the information fusion problem, especially when the imprecision of the pieces of information can be modelled by possibility distributions. Our basic claim is that there cannot be a unique mode of combination, which would be satisfactory in any situation, even when the framework for representing information is chosen, here possibility theory. First we clarify what we mean by merging information, a problem which is different from the aggregation of preference, and that can be understood either as a form of filtering and estimation or as a kind of inconsistency analysis that extracts the most reliable information out of imprecise data, what we call here fusion. Then probability theory is discussed for what concerns its relevance and its limitations in the information fusion problem. Section 6.3 briefly recalls the basics of possibility theory and points out situations when information comes up in the form of possibility distributions. Section 6.4 presents the basic modes of information combination in the possibilistic framework. This setting is interesting due to its flexibility and expressiveness for the modelling of combination rules under various assumption. The next section deals with extensions of the basic modes, namely accounting for the relative importance of the sources, and making the combination mode dependent on the level of conflict between sources. Section 6.6 presents fuzzy extensions of estimation techniques, and considers introducing constraints that increase the precision of the result. Section 6.7 deals with the case when a priori information is available and must be combined with several inputs. Lastly section 6.8 discusses
MERGING FUZZY INFORMATION
the syntactic counterparts of the combination rules introduced so far, for the purpose of merging prioritized knowledge bases in possibilistic logic. This paper borrows from and elaborates of previous papers by the authors, especially (Dubois and Prade, 1992a, b;), (Dubois and Prade, 1994), Benferhat et al., 1997b), (Yager, 1997). 6.2
INFORMATION FUSION
In this section we first clarify what we mean by information fusion as opposed to preference aggregation, and estimation problems. Then we examine the contribution of probability theory to the fusion problem. We show that probability theory is a rather constraining framework, more adapted to estimation problems than to fusion of imprecise information. On the contrary possibility theory offers a framework that is more faithful to the actual contents of imprecise information, simple enough, and more flexible as to the choice of combination operations. 6.2.1
Characteristics of the Information Fusion Problem
The fusion problem considered here can be summarized as follows: consider a question whose answer is unknown and that comes down to finding the real value of a parameter x (generally a numerical value, but also possibly a Boolean one if the question is of the yes/no type; possibly several such parameters). Let U denote the referential where x takes value on. A set S of sources is supposed to provide some estimations of the value of x. These pieces of information can be very poor, not necessarily point-values. They can take the form of tolerance intervals, or plausibility distributions of some sort. In the limit they can be non-informative at all, as when some source expresses its ignorance as to the value of x. The fusion process aims at finding the most plausible values of the parameter x on the basis of the available pieces of information. As already said, by a source, we mean a lot ot different things: it can be a human (an expert), a sensor, or a database, hence a potential heterogeneity of the pieces of information. In the case of an expert, information can be linguistic, or can be expressed as a set of intervals or subsets weighted by levels of confidence. Information extracted from a database can be a histogram (for a statistical database) or a logical formula (for a deductive database). For a sensor, information is a number with an error interval, or a likelihood function induced by a set of chance densities. Our view of information to be combined is thus rather generic. A first issue to be clarified is the nature of the items that are supposed to be merged. Major distinctions are to be made between: i) ii)
preference aggregation versus information aggregation; the combination of information coming from parallel sources versus the revision of already available information; iii) estimation versus fusion. In this paper, we are interested in merging information, not preference. Preference merging is addressed a antother volume of the Handbook of Fuzzy Sets (Grabisch et
APPROXIMATE REASONING AND INFORMATION SYSTEMS
al., 1998; Kacprzyk and Nurmi, 1998). In the preference aggregation problem, it makes sense to find the opinion of the "average man" in an homogeneous group of individuals, and to look for trade-offs between preferences expressing different points of views. On the contrary, the information merging problem is a matter of truth and reliability: what is required is to make the best of the available information by discarding the wrong data when possible and keeping the right information. Logical combinations are natural candidates as fusion operators. Conjunctive combinations apply when all the sources are reliable, while disjunctive combinations deal with the case of unreliable sources hidden in a group of other reliable ones. Obviously, prioritized logical combinations may be considered in particular when the sources are not equally reliable. Averaging operations in information aggregation can be justified when the set of sources can be viewed as a single random source producing different inputs. In that case, indeed, the set of data to be fused can be interpreted as standard statistics. For instance, several successive measurements from a single sensor can be viewed as the result of a random experiment. Then the discrepancies between the sources can be explained in terms of random variability and the fusion comes close to a filtering problem. However in the case of unique measurements issued from heterogeneous sensors, or in the case of expert knowledge or databases, it is not clear that averaging combination modes make sense. Besides, we oppose the case of merging information from parallel sources to the problem of belief revision where sources do not play a symmetrical role. In the first situation, all the sources provide information simultaneously, while in the revision process there is a chronological ordering between the source which represents the prior state of belief, and the source which issues the new information. In each case the pooling obeys different requirements, for instance belief revision is generally not commutative. Kalman filtering proceeds by repeated sequences of a prediction step followed by a revision step. In the case of parallel sources one must distinguish the estimation problem from the fusion problem. In the case of estimation, the problem is to find a representative value for a numerical parameter x, on the basis of point-value estimates of x, say, a set D = {u1 ,…, un } on a referential U, which can be an interval, or a Cartesian product thereof. Several kinds of estimators exist in the statistical literature, like the mean value, most prominently, but also the median value, or the mode of the distribution induced by the set {u1 ,…, un }. In the fusion problem, the data is generally imprecise, intervals, fuzzy intervals, uncertainty distributions or even linguistic terms are present in D. These pieces of information may be on different referential sets, and a common space must be built for all of them. Moreover it is assumed that all pieces of information pertain to the same matter, but this assumption is not always verified in practice. The latter problem of information correspondence is an area of research of its own and is not addressed here (see Tang and Lee, 1992, for instance). Once sources pertaining to the same matter are clustered and the common referential is found, the fusion problem is then one of finding zones of agreement and disagreement between the pieces information so as to extract something reliable
MERGING FUZZY INFORMATION
enough. Hence the use of set-theoretic operations and logical analysis that are more akin to inconsistency management techniques for logical databases (see Benferhat et al., 1997a for a survey) than to estimation processes. This point of view is confirmed by the following remark: information fusion problems have been encountered in the literature of expert systems for the purpose of combining uncertain facts obtained by triggering uncertainty-weighted production rules. A famous example of this kind of attempt is the MYCIN system (Buchanan and Shortliffe, 1984) for which a combination rule was devised. The failure of this operator as a universal combination technique has motivated the development of assumption-based truth maintenance systems (De Kleer, 1986) and the emergence of non-monotonic reasoning. In the following the advocated approach to data fusion also insists on the management of assumptions regarding the validity and dependence of sources. The role of fuzzy sets in merging information can be understood in two ways: i) either as a tool for extending estimation techniques to fuzzy data. This is done applying the extension principle to classical estimators, and methods of fuzzy arithmetics (see Dubois Kerre Mesiar and Prade,1998, in the Handbooks of Fuzzy Sets Series, for a recent survey); ii) or as a tool for combining possibility distributions that represent imprecise pieces of information. Then fuzzy set-theoretic operations are instrumental for this purpose (See Fodor and Yager, 1998, in the Handbooks of Fuzzy Sets Series, for a recent survey). In view of this dichotomy, the role of standard aggregation operations like arithmetic mean is twofold. It is a basic operation for estimation. Yet it is also a fuzzy set-theoretic connective. Hence it can be used for the fusion of imprecise information, by taking the average of membership grades, which, for each possible value of x, enables the number of sources asserting this value to be accounted for in the result of the fusion, and is very different from computing the average of values in the referential set U. 6.2.2
Some Limitations of a Pure Probabilistic Approach to Fusion
It is possible to envisage the fusion problem from a pure probabilistic perspective, although the aim of probabilistic methods is more oriented towards estimation. Bayesian Fusion and Estimation. Let x be a variable ranging on a finite set U. Let u1 and u2 be elements of U representing two observations of the value of x. A likelihood function P(u1 ,u 2 | x) is supposed to describe the probability that source 1 observes u1 , and source 2 observes u2 , given the "real value" of x. Some a priori information about the value of x is also supposed to exist. Let P be the prior probability of the actual value of x. If these pieces of information are available, then
APPROXIMATE REASONING AND INFORMATION SYSTEMS
a posterior probability distribution that merges the prior probability and the likelihood functions, can be derived by Bayes theorem: P(x | u1 , u2 ) = P(u1 , u2 | x) · P(x) P(u1 , u2 )
(6.1)
where P(u1 ,u 2 ) = ∑u∈U P(u1 , u2 | u) · P(u). The fusion itself does not explicitly appear in this formula; its result is the likelihood function P(u1 ,u 2 | x). The fusion problem is precisely to reconstruct P(u1 ,u 2 | x) from information pertaining to single sources P(u1 | x) and P(u2 | x) and possibly extra information about the dependence between sources or their reliability. Known dependencies between sources can be summarized by correlation coefficients that are enough on numerical spaces with Gaussian models. In the case of independence between sources, it is assumed that the pieces of information u1 and u2 are conditionally independent given x so that P(u1 , u2 | x) = P(u1 | x) · P(u2 | x) (6.2) It gives the most usual Bayesian fusion formula P(u | u1 , u2 ) = P(u1 | u) · P(u2 | u) · P(u) ∑ P(u1 | u) · P(u2 | u) · P(u)
(6.3)
The possibility of biased sources is expressed by changing u1 and u2 into f1 (u1 ), f2 (u2 ) ∈ U, where f1 and f2 would capture systematic biases. It is patent that this way of modelling biases can be used regardless of the involved theory of uncertainty. The Bayesian method is applied to sensor fusion and also to expert opinion pooling. The a priori opinion of the analyst about the true value of x is revised on the basis of expert opinions, expressed as point-values. The credibility of experts, from the standpoint of the analyst, is modelled by conditional probabilities of what an expert will claim the true value of x is, given this true value. Once the expert point-values are known, the a priori probability distribution of x, as possessed by the analyst is revised through Bayes theorem. The model by Mosleh and Apostolakis (1984) tries to account for the dependence between experts via a correlation coefficient. A lot of Bayesian approaches to expert opinion pooling exist and are reviewed by Cooke (1991) and French (1985). The next step in probabilistic approaches is to propose a representative value of x. This is estimation. In the above approach the chosen value is the most probable one according to the posterior probability. This is called Bayesian estimation. When prior probabilities are not available, only the likelihood function P(u1 ,u 2 | x) is used, and the representative value is then taken as the one of maximum likelihood.
MERGING FUZZY INFORMATION
It is easy to see that both estimation methods coincide if the Bayesian prior is uniformly distributed. Other methods for estimation totally obviate the fusion step such as least squares or linear estimation. Yet they can be interpreted as particular case of the Bayesian estimation, as explained by Brown et al. (1992). The least square method for numerical values justifies that the representative value of x is the arithmetic mean of u1 and u2 . It coincides with Bayesian estimation for independent identically distributed noisy measurements with uniform priors. In terms of the underlying fusion, these are very strong assumptions. Kalman filtering is a linear estimation method (the value of x is a linear combination of u1 and u2 ). It is a Bayesian estimation process that presupposes Gaussian priors, and independent noisy Gaussian numerical measurements. Then all estimation methods coincide. This is an even more constrained framework from a fusion point of view. Moreover in the case of Kalman filtering, the pieces of information coming from sources are taken in sequence, one after the other at different points in time, by means of successive revision steps; moreover, the process takes place in a dynamic setting, whereby the current estimate at each time point is used to produce a predicted value at the next time point, and the revision operates on this predicted value. This framework is thus far from the idea of information fusion from parallel sources. The Consensus Method. In problems of expert opinion pooling (French, 1985; Cooke, 1991) there are two main approaches to the pooling of probability distributions: the consensus method justified by Wagner and Lehrer (1981), used by Cooke (1988), and the Bayesian approach, exemplified by the works of Mosleh and Apostolakis (1984) (see also Wu et al., 1990). In the consensus method each expert is asked about the value of some parameter x by specifying fractiles of a probability distribution function (pdf), for instance the 5% and the 95% fractiles. In other words, the expert supplies values xe and xu such that P(x ≤ xe) = 0.05 and P(x ≤ xu) = .95 respectively. Moreover some information about the mode, or the mean, or else the median of the distribution is often asked. Based on these values, and on the choice of a parametrized family of distribution functions, (for instance a betadistribution), a given distribution function is chosen that supposedly best represents the available information. The nature of the probabilities appearing during the elicitation is sometimes controversial. Some authors call them "subjective probabilities". This term is ambiguous insofar as it may mean the numerical estimate of a feeling of certainty, or a subjectively assessed objective frequency. In reliability engineering, the second interpretation sounds more instrumental. When several experts supply this kind of information, their responses are pooled so as to derive a single distribution that reflects the opinion of the group. However it is clear that the opinion of reliable experts should be more important than the one of unreliable ones. In the consensus method each expert Ei supplies a pdf pi , and the resulting distribution is a weighted average p = ∑i wi pi where the weights wi reflect the reliability of experts. Cooke (1988) has developed a theory of weights that act as scoring rules. They tend to force experts to be calibrated and informative.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Critique. The probabilistic approaches can be criticized for several reasons: • First, the identification of a probability distribution requires more information than what is sometimes actually available. For instance, there are many distribution functions corresponding to given .5 and .95 fractiles and prescribed mode given by an expert. The choice of a parametrized family of distribution functions is basically a matter of making calculations simple. As a consequence the faithfulness of the modeling of uncertain information by a single probability function can be questioned. • Human experts better supply intervals rather than point-values because their knowledge is not only of limited reliability but also tainted with imprecision. Probability theory is not concerned with the modeling of imprecision but rather captures the notion of random variability, or in the case of subjective probabilities, betting odds. Betting odds do reflect beliefs induced by available information, but this relation is not one to one. For instance, uniform betting rates resulting in a uniformly distributed probability are in order when the expert believes that the occurence of the event is governed by randomness, as well as when this expert knows nothing. Ignorance cannot be properly distinguished from the certainty that a process is governed by randomness, by means of a single probability function. • The consensus method has a basic flaw in the context of information fusion: it is a voting-like procedure. Indeed if two reliable sources supply conflicting information about the value of x, such that one gives a small value to x, and the other gives a high value to x, the consensus method will supply a probability distribution whose mean value is medium, i.e., estimation results in a value on which both source agree as not being the true one. What is needed is a method which in the best case, guesses the true value and discards the wrong source, or in the worst case, proposes a cautious response that fits the available data (e.g., x is either small or large, but certainly not medium). The weighted average method sounds more natural when expert opinions express preference but does not seem to be adapted when a true answer (what is known about the actual value of x) is to be determined instead of a preferred one. • Another interpretation of the consensus method comes down to considering the pieces of information as stemming from a single random source, each weight reflecting the probability of getting the corresponding piece of information. The underlying homogeneity assumption is questionable in the case of experts some of which are wrong, or hererogeneous sensors, some of which are erratic or erroneous. Moreover the weighted average method may affect the variance, in the sense that the variance of the result becomes smaller than the one of any input distributions. This phenomenon is acceptable in the scope of independent sources, which is very limitative. • The main drawback of the Bayesian method seems to be, as usual, the need for a priori knowledge about the value of x. In expert opinion pooling, the analyst that looks for expert advice must be an expert himself. However in many cases the
MERGING FUZZY INFORMATION
analyst has no idea about the value of x and all that he may learn is about the reliability of experts, by a technique as the one by Cooke (1988). The Bayesian method cannot update from the state of complete ignorance (i.e. without priors). Prior probabilities are not often available in measurement: many statisticians use independent likelihood functions only. The possibilistic approach that is proposed in this paper tries to cope with some of the difficulties faced by the probabilistic approach on the problem of pooling imprecise information coming from heterogeneous sources; its main features are: faithfulness of the representation of subjective data, no need for a priori knowledge, and a variety of fusion methods whose choice depends about the reliability of experts or sources and the level of conflict between their opinions. The possibilistic approach is not built as opposed to the probabilistic one, though; both approaches are related, complementary, and shed light on each other. Moreover the possibilistic approach is more concerned with fusion than with estimation; the latter problem is more closely related to the issue of defuzzification (Yager and Filev, 1994). 6.3.
THE POSSIBILISTIC REPRESENTATION OF INCOMPLETE DATA
Possibility theory offers a flexible tool for representing uncertain information that is incomplete, imprecise, such as expressed by humans. It can be connected to probability theory if possibility distributions are understood as likelihood functions or as encoding a special kind of imprecise probability. However it can also be viewed as reflecting a more qualitative view of uncertainty. The recent detailed account by the authors (Dubois and Prade, 1998) clearly distinguishes between quantitative and qualitative sides of the theory. For an extensive survey of references in possibility theory, in connection with probability theory, see Dubois, Nguyen and Prade (1999). Mathematical aspects of possibility theory are studied by De Cooman (1997). 6.3.1
Possibility Theory: A Refresher
A possibility distribution π x (Zadeh, 1978) can be viewed as the membership function of the fuzzy set of possible values of a quantity x. These values are assumed to be mutually exclusive, since x takes on only one value (its true value), that belongs to a set U. This set is often considered to be a closed, bounded real interval. The actual value of x is unknown. But, since one of the elements of U is the true value of x, π x (u*) = 1 for at least one value u* ∈ U. This is the normalization condition that claims that at least one value is viewed as totally possible. When unique, u* is the most plausible value of x. If u and u' are such that π x (u) > π x (u'), u is considered a more plausible value than u'. When π x (u) = 0, then x cannot take on value u. Knowing a possibility distribution, the likelihood of events can be described by means of two set-functions: the possibility measure and the necessity measure
APPROXIMATE REASONING AND INFORMATION SYSTEMS
(Dubois and Prade, 1988b), denoted Π and N respectively. When π is the membership function of a crisp set E, it expresses that the value of parameter x certainly belongs to E. Then, an event B is said to be possible if and only if E ∩ B ≠ Ø, and certain if and only if E ⊆ B; by definition, let Π(B) = 1 and N(B) = 1 in these respective situations. When Π(B) = 1, there is a value u of x in B, considered as totally possible (since B and E intersect). When N(B) = 1 no possible value of x lies ouside B and it is thus certain that x lies in B. In the general case where π x is the membership function of a fuzzy set, the possibility and necessity measures are defined as follows. Π(B) = supu∈B π x (u) N(B) = infu∉B 1 – π x (u) = 1 – Π(Bc)
(6.4) (6.5)
where Bc is the complement of B with respect to U. This duality expresses that B is all the more certain as Bc is impossible. These evaluations only exploit the ordinal part of the information, not its quantitative contents since only the plausibility ordering is involved in (6.4) and (6.5). A third set-function can be devised on purely ordinal grounds, namely a measure of 'guaranteed possibility'(Dubois and Prade, 1998) ∆(B) = infu∈B π x (u) which estimates to what extent all the values in B are actually possible for x according to what is known, i.e., each value in B is at least possible for x at the degree ∆(B). Clearly ∆ is a stronger measure than Π, i.e., ∆ ≤ Π, since Π only estimates the existence of at least one value in B compatible with the available knowledge, while the evaluation provided by ∆ concerns all the values in A. Note also that ∆ and N are unrelated. Revision in qualitative possibility theory is accomplished by means of a conditioning device similar to the one in probability theory. The notion of conditional possibility measure goes back to Hisdal (1978) who introduces the set function Π(· | A) through the equality ∀B, B ∩ A ≠ Ø, Π(A ∩ B) = min(Π(B | A), Π(A)).
(6.6)
This equation has more than one solution. Dubois and Prade (1988b) have proposed to select the least specific one, that is (for Π(A) > 0) Π(B | A) = 1 if Π(A ∩ B) = Π(A) = Π (A ∩ B) otherwise.
(6.7)
The conditional necessity function is defined as N(B|A) = 1 – Π(Bc| A) by duality. Note that N(B | A) > 0 ⇔ Π(A ∩ B) > Π(A ∩ Bc), which expresses that B
MERGING FUZZY INFORMATION
is an accepted belief in the context A if and only if B is more plausible than Bc when A is true. The associated possibility distribution is given by π(u | A) = 1 if π(u) = Π(A), u ∈ A = π(u) if π(u) < Π(A), u ∈ A = 0 if u ∉ A.
(6.8)
The above definitions of conditioning only require a purely ordinal view of possibility theory and a finite setting. In case of a numerical scale, the product should be used instead of min in the conditioning equation (6.6) so as to avoid the discontinuity induced by (6.8). It leads to ∀B, B ∩ A ≠ Ø, Π(B | A) = Π(A ∩ B) Π(A)
(6.9)
provided that Π(A) ≠ 0. This is Dempster rule of conditioning, specialized to possibility measures, i.e., consonant plausibility measures of Shafer (1976). The corresponding conditional possibility distribution is π(u | A) = π(u) , ∀ u ∈ A Π(A) = 0 otherwise. 6.3.2
(6.10)
Building Possibility Distributions
The simplest form of a possibility distribution on a numerical interval U is the characteristic function of a sub-interval I of U, i.e., π x (u) = 1 if x ∈ I, 0 otherwise. This type of possibility distribution is naturally obtained from experts claiming that "x lies between a and b". This way of expressing knowledge is more natural than giving a point-value u* for x right away, because it allows for some imprecision: the true value of x is more likely to lie between a and b than to be equal to u*. Allowing for imprecision reduces uncertainty. However this representation is not entirely satisfactory for two reasons. First, claiming that π x (u) = 0 for some u means that x = u is impossible. This is too strong for an expert who is then tempted to give wide uninformative intervals. Moreover, it is more satisfactory to ask from the expert several intervals with various levels of confidence, and sometimes to admit that even the widest, safest interval does not rule out some residual possibility that the value of x lies outside it. Fuzzy Intervals. A possibility distribution can be seen as a qualitative, approximate representation of unspecificity, where some values are more plausible than others. Then a linear representation on a numerical scale is the simplest, namely an interval A = [a, b] taken as the support of the distribution and a most plausible value u*. What is obtained is a triangular fuzzy number M by linear interpolation between the core and the support ends. The core {u*} can be relaxed
APPROXIMATE REASONING AND INFORMATION SYSTEMS
into a sub-interval A' of A of values with possibility 1. This view is also useful to model linguistic information pertaining to linguistic scales, following Zadeh (1978) whereby such imprecise natural language statements of the form "x is F" (x is tall, great, small, etc.) translate into possibility distributions that equate the membership function of the fuzzy set F. Confidence Intervals. More generally, a possibility distribution π x can represent a finite family of nested confidence ("focal") subsets {A1 , A2 ,…, Am} where Ai ⊂ Ai+1 , i = 1,m – 1 assuming that the set of possibility values {π(u) | u ∈ U} is finite. Each confidence subset Ai is attached a positive confidence level λ i . The links between the confidence levels λ i 's and the degrees of possibility are defined by postulating λ i = N(Ai ) the degree of necessity of Ai (Dubois and Prade, 1988b, 1992c). This entails that λ 1 ≤… ≤ λ m due to the monotonicity of N. The possibility distribution equivalent to the family {(A1 , λ 1 ), (A2 , λ 2 ),…, (Am, λ m)} is defined as the least specific (see, e.g., Yager, 1992) possibility distribution π that obeys the constraints λ i = N(Ai ), i = 1,n. It comes down to maximizing the degrees of possibility π(u) for all u in U, subject to these constraints. The solution is unique and is ∀u, π x (u) = min i max(1 – λ i , Ai (u)) 1 if u ∈ A1 = mini: u∉Ai (1 – λ i ) otherwise
(6.11)
where Ai (·) is the characteristic function of Ai . This solution is the least committed one with respect to the available data, since by allowing the greatest possibility degrees in agreement with the constraints, it defines the least restrictive possibility distribution. Conversely, the family {(A1 , λ 1 ), (A2 , λ 2 ), …, (Am, λ m)} of confidence intervals can be reconstructed from the possibility distribution π x . Namely it can be proved that if the set of possibility values π x (u) is {α 1 = 1, α 2 ≥ α 3 … ≥ α m}, and letting α m+1 = 0: Ai = {u | π x (u) ≥ α i } ; λ i = 1 – α i+1 , ∀ i = 1,m.
(6.12)
In particular λ m = 1 and Am is the subset which for sure contains x; moreover, Am = U if no strict subset of U can be ascertained as including x. This analysis extends to an infinite nested set of confidence intervals (Aα , λ(α)). Random Sets. Note that there is a set of weights p1 , p2 , …, pm summing to one, such that (Dubois and Prade, 1988a) ∀u, π x (u) = ∑i:u∈Ai pi .
(6.13)
MERGING FUZZY INFORMATION
Namely pi = α i – α i+1 , ∀ i = 1,m. Hence the possibility distribution can be cast in the setting of random sets, and more precisely Shafer (1976) theory of evidence. From a mathematical point of view, the information can be viewed as a nested random set {(Ai ,p i ), i = 1,m}, which allows for imprecision (the size of the Ai 's) and uncertainty (the pi 's). And pi is the probability that the source supplies exactly Ai as a a faithful representation of the available knowledge of x (it is not the probability that x belongs to Ai ). In such a context Π(B) and N(B) are expected possibility and certainty degrees in the usual probabilistic sense, since there hold: Π(B) = ∑i=1,m p i · Π i (B) N(B) = ∑i=1,m p i · Ni (B) where Π i and Ni are the {0,1}-valued possibility and necessity measures induced by the set Ai (i.e., Π i (B) = 1 iff Ai ∩ B ≠ Ø, Ni (B) = 1 iff Ai ⊆ B). The random set view of possibility theory is developed in greater details in (Gebhardt and Kruse, 1998). Imprecise Probability. The level of confidence λ i can also be conveniently interpreted as a lower bound on the probability that the true value of x hits Ai . Then the possibility distribution is viewed as the family of probability measures P = {P, P(Ai ) ≥ λ i , i = 1, m}and Π(B) coincides with the upper probability P*(B) = sup {P(B), P∈ P} while the necessity measure N(B) = inf {P(B), P∈ P} is the lower probability (Dubois and Prade, 1992c). In reliability engineering, an expert is supposed to be capable of supplying several nested intervals A1 , …, Am directly, together with levels of confidence λ 1 , …, λ m (e.g., from the point of view of the expert, the best lower bound of the proportion of cases where x ∈ Ai from his experience). In (Sandri et al., 1995), three intervals only have been kept and the confidence levels have been predefined: A1 with λ 1 = 0.05, A2 with λ 2 = 0.5, and A3 with λ 3 = 0.95. A1 corresponds to the "usual values" of x, and there is a 0.05 probability that x misses A3 , i.e., the residual uncertainty of the conservative evaluation. Finally, the focal subset Am = A4 (with λ 4 = 1) is always U itself, due to the residual uncertainty. The data supplied by one expert is made of the three intervals A1 , A2 , A3 .They correspond to specific questions asked to experts (see Kalfsbeek, 1989; Dubois and Kalfsbeek, 1990). Although no exact counterparts of intervals [a1 , b1 ], [a2 , b2 ], [a3 , b3 ] in Figure 6.1 are actually used in the probabilistic approaches, these intervals can be interpreted in terms of fractiles of a probability distribution, e.g., [a1 , b1 ] corresponds to the range between the 2.5% and the 97,5% fractiles. In terms of fuzzy sets, [a3 , b3 ] corresponds to the core of the fuzzy set with membership function π x since ∀u ∈ [a3 , b3 ], π x (u) = 1. The obtained possibility distribution is
APPROXIMATE REASONING AND INFORMATION SYSTEMS
pictured in Figure 6.1. See Coolen (1994) for the use of imprecise probabilities in the modeling of expert opinions. Probabilistic Confidence Intervals. Possibility distributions may also come from the transformation of a probability distribution (Dubois et al. 1993). Namely, given a unimodal probability density on the real line, a (nested) set of confidence intervals can be encoded as a possibility distribution. For any interval [a, b], let π(a) = π(b) = 1 – P([a, b]). Choosing the confidence intervals centered on the mode of P and obtained by level-cutting the density, π is the most specific possibility distribution compatible with P (the inequality Π ≥ P holds.). Starting from a uniform possibility distribution on [a, b], one can justify the symmetric triangular fuzzy number of support [a, b] as one best possibilistic approximation of the uniform density in the above sense. Moreover, as claimed in Lasserre et al. (1998), if Π is the possibility measure obtained from the symmetric triangular fuzzy number of support [a, b], then the inequality Π ≥ P holds for any symmetric normal density of support [a, b]. This suggests that triangular fuzzy numbers are genuine counterparts of uniform probability densities.
1 0.95 0.5 0.05 0
U a
1
a2
a
3
b
3
b
2
b1
Figure 6.1. Expert-originated possibility distribution
Likelihood Functions. In sensor modelling, likelihood functions often model some aspects of measurement errors (Parratt, 1961).Yet another interpretation of the possibility distribution π consists in viewing it as a likelihood function, that is, identifying π(u) to P(um | u), the probability that the source indicates the measured value um, when the actual value of x is u. Indeed suppose only P(um | u) is known, ∀ u ∈ U. The probability P(um | u) is understood as the likelihood of x = u, in the sense that the greater P(um | u) the more x = u is plausible. In particular, note that ∀A⊆U min u∈A P(um | u) ≤ P(um | A) ≤ maxu∈A P(um | u) (6.14) Identifying π(u) to P(um | u), it is clear that Π(A) is the upper bound of the probability P(um | A) (see Dubois et al., 1997). The lower bound in (6.14) corresponds to the degree of "guaranteed possibility" ∆(A) defined above. Hence there is a strong similarity between maximum likelihood reasoning and possibilistic
MERGING FUZZY INFORMATION
reasoning and in the finite case, let the degree of possibility that x = u in the face the measurement um be defined as π m(u) = π x (u | um) = P(um | u). In the continuous case π x (u | um) must be extrapolated from the density, whose values may be greater than 1. Note that this view is somewhat restrictive because in general, the available information may be weaker than the probability P(um | u). We may only have limited knowledge of this probability under the form of some bounds. For instance all that may be known is that if x = u then um ∈ M for some interval M. Then define π(um | u) = 1 if um ∈ M and 0 otherwise. More generally the conditional possibility distribution may take values between 0 and 1. In general supu π(um | u) ≤ 1 only, since in this approach, the normalisation with respect to u is not warranted. Yet, it is natural to assume that maxu π(u1 |u) = 1. It means there is at least one value x = u that makes the observation xm = um completely possible; in other words um is a completely relevant observation for the set of assumptions U. Indeed, if for instance maxu π(um | u) = 0, it would mean that it is impossible to observe xm = um for any value x = u ∈ U. It was called the postulate of observation relevance by Dubois and Prade (1992d). A counterpart of Bayes theorem holds for possibilistic conditioning, namely π(um | u) ∗ πx (u) = π x (u | um) ∗ π(um)
(6.15)
where π x (u) is the a priori knowledge about x, π(um | u) the possibility distribution describing the range of observed values when x = u, and π(um) = maxu π(um | u) ∗ πx (u). Operation ∗ denotes min or product according to whether the setting is ordinal or numerical. Assume no a priori knowledge about x, that is, π x (u) = 1, ∀u. Then (6.15) becomes π(um | u) = π x (u | um) ∗ maxu π(um | u). If ∗ = min, one may let π x (u | um) = π(um | u) in accordance with the use of likelihood functions for abductive reasoning; a qualitative normalization is possible which consists in moving to 1 the greatest possibility degrees and leaving the other degrees unchanged. If ∗ = product, one gets π x (u | um) =
π(um | u) maxu π(um | u)
which also corresponds to some existing practice in statistics whereby the likelihood function is renormalized via a proportional rescaling (Edwards, 1972). Note that the latter cannot be retrieved using Bayes theorem and an unknown prior when a probabilistic likelihood functions P(um | u) appears instead of π(um | u) in the above expression, since the supremum of P(u | um) when letting the prior probability free in Bayes theorem is always 1. It just points out that maximum likelihood reasoning is not in agreement with a sensitivity analysis on Bayesian
APPROXIMATE REASONING AND INFORMATION SYSTEMS
reasoning. Under the postulate of observation relevance, π x (u | um) = π(um | u) is again recovered. Ordinal possibility distributions. When the source is a logical database, possibility distributions may stem from the presence of exception-tainted rules. In accordance with ordinal possibilistic conditioning an uncertain rule A → B is modelled by the conditional necessity statement N(A | B) > 0, which translates into the constraint Π(A ∩ B) > Π(A ∩ Bc) on an unknown possibility distribution over interpretations of the language (see Benferhat et al., 1997c). Here possibility degrees take values on an arbitrary totally ordered scale L. A logical database is then viewed as a collection of such rules, hence a family of possibility distributions which possesses a unique greatest element, the mimimally specific one, that represents the semantics of the logical database. Each interpretation, which represents a state of the world, is thus ordinally weighted by a degree of plausibility, or normality. The above discussion suggests that the setting of possibility theory is common to the representation of imprecise information in many problems of fusion, ranging from expert opinion pooling in reliability engineering, to sensor fusion and database merging. This framework is consistent with maximum likelihood reasoning rather than Bayesian inference. 6.4
THE FUSION OF POSSIBILITY DISTRIBUTIONS
The pooling of possibility distributions is based on a set-theoretic, or in other words, logical, view of the combination of uncertain information as avocated in (Dubois and Prade, 1988a). The general ideas of the possibilistic approach to the fusion of information issued from several distinct sources are, first, that there is no unique combination mode, and, second,that the choice of the combination mode depends on an assumption about the reliability of sources, as formulated by some analyst. No a priori knowledge about the inquired parameter is needed, and the sources, viewed in parallel, are to be combined in a symmetric way if equally reliable. There are basically two extreme modes of symmetric combination: the conjunctive mode when all sources agree and are considered as reliable; and the disjunctive mode when sources disagree and at least one of them is wrong. These modes are implemented respectively in terms of fuzzy set intersections and fuzzy set unions. Indeed, in order to clarify the situation of the fusion problem, it is useful to look at it from such a set-theoretic (or logical) point of view, not from the idea of averaging. Especially, consider the case of heterogeneous parallel sources, so that possible conflicts between the pieces of information are not the result of random variations only. Set-theoretic operations such as unions and intersections (expressing disjunction and conjunction) are the basic natural consensus rules in set theory and logic. Consider the case of two sources of information giving evidence about the value of some variable x under the form of a set. The fusion problem is simply the following one:
MERGING FUZZY INFORMATION
Source 1 x ∈ E1 ⊆ U Source 2 x ∈ E2 ⊆ U What should be said about x? We claim that the choice of a type of pooling operator that solves this problem is a matter of context. There is no theory liable of prescribing a universal pooling method for these two pieces of information, that would apply to all situations. The chosen pooling scheme depends not only on formal properties the method should fulfil, but also on how much the sources agree, and what is known about their reliability. It means that a panoply of combination schemes together with underlying assumptions must be laid bare and used in an adaptive way. In the above case, if E1 ∩ E2 is not empty, the default assumption that both sources are reliable makes sense, and the combination rule can be the intersection, i.e., x ∈ E1 ∩ E2 . Otherwise, if E1 ∩ E2 is empty, at least one of the sources is necessarily wrong, and a more natural combination rule is the union, i.e., x ∈ E1 ∪ E2 , which assumes that the other source is not wrong. This operator does not require the identification of the right source. This view, introduced by Dubois and Prade (1987c) has been applied to multiple source interrogation systems (Sandri et al., 1989), to expert opinion pooling (Sandri et al., 1995), to data fusion in computer vision (Deveughèle and Dubuisson, 1993), and to database merging (Benferhat et al., 1997b). The pooling mode depends upon the available knowledge on source reliabilities and the extent to which the sources agree with one another. 6.4.1
Basic Symmetric Combination Modes in the Possibilistic Setting
Since a particular case of possibility distribution is the characteristic function of a set, the basic combination modes in the possibilistic setting are also conjunctions and disjunctions. There exist several possible choices among pointwise fuzzy settheoretic operations for defining conjunctive and disjunctive combinations of possibility distributions. Let π i be the possibility distribution supplied by source i, i = 1,n. Define ∀ u ∈ U, π ∧ (u) = ∗i=1,n π i (u) (fuzzy set intersection) ∀ u ∈ U, π ∨ (u) = ⊥ i=1,n π i (u) (fuzzy set union)
(6.16) (6.17)
where ∗ and ⊥ are [0,1]-valued operations defined on [0,1] × [0,1] which enjoy the duality relation α ⊥ β = 1 – (1 – α) ∗ (1 − β) in order to express De Morgan's law. Candidates for conjunctive and disjunctive fusion operations are so-called triangular norms and co-norms; i.e., the conjunctive operation ∗ is is associative, commutative, increasing (in the wide sense) in both places and such that α ∗ 1 = α and 0 ∗ 0 = 0. The main continuous solutions are α ∗ β = min(α, β), α ∗ β = α ⋅ β (product) and α ∗ β = max(0, α + β – 1) (linear product), which, by duality leads to disjunctive operations α ⊥ β = max(α, β), α ⊥ β = α + β − α ⋅ β (probabilistic sum), and α ⊥ β = min(1, α + β); see (Schweizer and Sklar, 1983; Fodor and Yager, 1998) for details.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Interpreting Basic Fusion Rules. As already said, the conjunctive aggregation makes sense if all the sources are considered as equally and fully reliable, while the disjunctive aggregation corresponds to a weaker reliability hypothesis, namely, when in the group of sources there is at least one reliable source for sure, but it is not known which one. In the conjunctive case, the min operation corresponds to a purely logical view of the combination process: the source which assigns the least possibility degree to a given value is considered as the best-informed with respect to this value. Note that with min, when all sources perfectly agree (∀ i ≠ j, π i = π j ), there is no reinforcement effect. The idempotence of min deals with the possible redundancy of sources due to common background, and duplicated information. On the contrary, if ∗ = product, if all the sources agree that a value x = u is not fully possible, then this value will receive a possibility degree strictly smaller than min i=1,n π i (u), i.e., the lack of complete possibility is reinforced; a necessary condition for choosing such an operation is the independence of the sources. This assumption may be more adapted to sensor fusion problems than to expert opinion pooling. It is possible to use linear product max(a + b – 1, 0) instead of product in (6.16); this is a very drastic combination rule that discards elements that both sources consider as little possible. It has been proposed by Boldrin and Sossai (1995) in a possibilistic logic setting (up to normalization), and by Mundici (1992) with a different terminology. The drastic reinforcement effect is explained in the latter reference by assuming that in the set of sources, a given number of them lie, but they are not identified. For instance, assume n independent sources, k of which lie. Assume source i says x ∈ Ei ⊆ U. Any value that is rejected by more than k < n sources is truthfully rejected. Values rejected by k sources or less cannot be discarded. The information supplied by each source is thus of the form π i (u) = 1 if x ∈ Ei and k/(k + 1) otherwise. Besides, the more the sources that agree on a value, the more this value is reliable. It can be checked that π ∧ (u) = max(0, ∑i=1,n π i (u) – n + 1) = 0 if and only if x = u is rejected by more than k sources, and π ∧ (u) ∈(0, 1] then reflects the plausibility of x = u namely the number of sources that consider x = u as possible, when this number is greater than n – k. Similar comments can be made for disjunctive fusion rules. Redundancy tolerance is achieved by the maximum operation. In the case of independent sources the probabilistic sum loses more information than the maximum, and the bounded sum even more so. More generally if the aim of the fusion is just to visualize the proportion of sources than claim that some value is possible one may choose a trade-off approach to fusion by preferring the arithmetic mean, or some other mean of possibility values π ΑΜ (u) = 1 (∑i = 1,n π i (u)). n
MERGING FUZZY INFORMATION
This operator is consistent with an approach to the pooling of imprecise information of the form x ∈ Ei ⊆ U for each source i, by building a random set where each set-valued outcome Ei ⊆ U has probability 1 . The one point coverage n function is π ΑΜ (u) = k if k sources claim that x = u is possible. This fusion mode n is also coherent with the so-called probabilistic sets (Hirota, 1981, Czogala and Hirota, 1986), that is, a fuzzy set with random membership grades. This is a way of putting together fuzzy sets supplied by several sources, viewed as a single random one. Then π ΑΜ supplies the expected membership function. This statistical approach to the aggregation of fuzzy sets has been extended to interval-valued membership functions by Kaufmann (1988). Using a geometric mean instead of the arithmetic mean preserves the conjunctive nature of the operator since a value considered impossible by a source is also ruled out by the fusion process. Comparison with Probabilistic Fusion. It is interesting to compare the above setting with the probabilistic modes of fusion. The use of the product operator exactly matches current practice in maximum likelihood reasoning, where independence is often assumed, given the similarity between possibility distributions and likelihood functions and corresponds to equation (6.2). On the contrary the minimum operator applied to likelihoods corresponds to a strong dependence between sources that expresses redundancy. In terms of likelihoods, viewing π i as modelling P(ui | ·), one must understand the equality π = min(π 1 , π 2 ) as expressing the inequality P(u1 , u 2 | u) ≤ min(P(u 1 | u), P(u2 | u)) because a possibility distribution represents the least specific representation of the available information, that is always liable of improvement (i.e., π may become π' < π upon the arrival of new information). Clearly the above inequality expresses ignorance about dependence rather than independence. Using linear product max(0, a + b – 1) instead of product in a probabilistic likelihood setting accounts for strong negative correlation. More generally, (6.2) can be weakened into a separability condition P(u1 ,u 2 | x) = F(P(u1 | x), P(u2 | x)) where max(0, a + b – 1) ≤ F(a,b) ≤ min(a,b). It was proved by Dubois(1986) that in the probabilistic setting, the only possible form of the function F is a Frank tnorm (see Fodor and Yager, 1999). Although the Bayesian setting may appear to be restrictive because the inequality P(u1 , u2 | u) ≤ min(P(u1 | u), P(u2 | u)) enforces a conjunctive behavior to fusion operators, nothing prevents one from introducing other fusion modes and define, for instance, a disjunctive probabilistic operator. In that case one does not consider
APPROXIMATE REASONING AND INFORMATION SYSTEMS
P(u1 , u2 | u) = P(({u1 } × U) ∩ (U × {u2 })| u) but its dual P(u1 or u2 | u) = P(({u1 } × U) ∪ (U × {u2 }) | u) = P(u1 | u) + P(u2 | u) – P(u1 , u2 | u) instead (Meizel and Piat, 1997). It justifies the use of the probabilistic sum in (6.17) for independent sources, as proposed by Hirota et al. (1991). Hence the main limitations of the Bayesian approach is the necessity of a prior probability and of normalization of resulting distributions, and not the impossibility of implementing various combination modes. However the combination mode must be expressible by a logical expression involving the sources, and the fusion modes that can be expressed are nevertheless limited by the set-inclusion monotonicity property of probabilities, which forbids to consider averaging modes since the conditional probability P(u1 , u2 | u) necessarily lies outside the interval [min(P(u1 | u), P(u2 | u)), max(P(u1 | u), P(u2 | u))]. The min and the product operations can also be justified in the setting of upper and lower probabilities as best possibilistic approximations of intersections of probability families (Dubois and Prade, 1992c; Dubois et al., 1997). In the random set setting, the three basic t-norms can be recovered from the intersection of random sets, of which the π i 's are the one-point coverage functions (Goodman, 1982). The product operator coincides with the unnormalized Dempster rule of combination for the plausibility of singletons (Smets, 1988). See Dubois and Prade, (1989b), Dubois and Yager (1992), Gebhardt and Kruse (1998) for more justifications in the random set setting. Other fusion modes such as disjunctive ones and averaging ones can be justified likewise. A disjunctive fusion corresponds to a union of probability families, or of random sets. A trade-off operation like arithmetic mean is very similar to the consensus method of convex mixing of probabilities or the convex mixing of random sets. Normalized Conjunctive Fusion. An important issue with conjunctive combination as defined by (6.16) is the fact that the result may be subnormalized, i.e., it may happen that ∀u, π ∧ (u) < 1. In that case it expresses a conflict between the sources. Clearly the conjunctive mode makes sense if all the π i 's significantly overlap, i.e., ∃u, ∀i, π i (u) = 1, expressing that there is at least a value of x that all sources consider as completely possible. If ∀u, π ∧ (u) is significantly smaller than 1, this mode of combination is debatable since, in that case, at least one of the sources or experts is likely to be wrong, and a disjunctive combination might be more advisable. When (6.16) provides subnormal results (sup π ∧ < 1), one may think of renormalizing π ∧ , thus leading to the new operator (here written for n = 2 for simplicity) π ∧ (u) ∀ u ∈ U, π(u) = (6.18) h(π 1 , π 2 )
MERGING FUZZY INFORMATION
where π ∧ (u) = π 1 (u) ∗ π 2 (u) and h(π 1 ,π 2 ) is the height of the intersection of π 1 and π 2 defined by h(π 1 ,π 2 ) = supu∈U π 1 (u) ∗ π 2 (u) = sup π ∧
(6.19)
h(π 1 ,π 2 ) is a natural measure of overlap between two possibility distributions. It is called a consistency index. Of course, (6.18) requires that h(π 1 ,π 2 ) be strictly positive, that is, the two sources must not be in absolute conflict. The normalized conjunction enforces the assumption of full reliability of the sources since all values that are considered impossible by one source, even if considered possible by the others, are rejected. It presupposes that sources are reliable even if very conflicting, so that the values that remain after pooling contain the true one, however little possible they may be. This normalized operator is very analogous to Dempster's rule of combination of Shafer (1976)'s belief functions, which, from a random set point of view is nothing but a normalized random set intersection under a stochastic independence assumption. Moreover for ∗ = product, (6.18) coincides (up to the normalization factor) with Dempster's rule applied to consonant belief functions (which are mathematically the same as necessity measures), for the computation of the plausibility of singletons. It has been also pointed out (Dubois and Prade, 1988a) that on a binary frame, i.e., a 2-element set U = {u,äu}, the normalized conjunctive operator (6.18), with ∗ = product, coincides with the rule of combination used in the famous MYCIN expert system (Buchanan and Shortliffe, 1984), interpreting the MYCIN certainty factor CF(u) as π(u) – π(äu). Moreover for ∗ = product, the normalized rule of combination (6.18) is still associative, otherwise it may not be so. For instance, the normalized minimum is not associative it is only quasi-associative (see Section 6.4.2). These remarks indicate that most if not all pooling operations commonly used in the literature are conjunctive. Yet another normalisation procedure consists in letting π(u) = π ∧ (u) + 1 – h(π 1 ,π 2 ) (Yager, 1985); it comes down to turning the level of conflict into a level of uncertainty on the set U. Ordinal Fusion. Note that the operations min and max, when applied to possibility distributions can be used even if the possibility degrees are not numerical, but simply belong to a totally ordered scale. They however assume that levels of possibility used by the various sources are commensurate. Combining several comparative possibility distributions is impossible as such due to Arrow's theorem on the impossibility of combining ordering relations. On the contrary, the use of product and its De Morgan dual assumes that possibility degrees are genuine numbers. In any case, the results of the pooling should be interpreted in an ordinal way, i.e., as a ranking of the values of the parameter x under study, in terms of their respective plausibility. It is sometimes difficult to interpret the resulting degrees of possibility with a frequentist approach. However ordinal information is often sufficient for practical purposes. Of course the normalized conjunctive operator (6.18) makes no sense in the ordinal setting. A qualitative normalization exists
APPROXIMATE REASONING AND INFORMATION SYSTEMS
which consists in moving to 1 the level of possibility of the values having the greatest possibility degree already and leaving the other degrees unchanged, thus generalizing the qualitative conditioning (6.8). When Fusion is not Appropriate. Assume that two pieces of information describing a situation are to be merged, one that stems from general considerations on similar situations, and the other is a piece of direct evidence on the situation. Then insofar as these pieces of evidence conflict, combination must be avoided and priority should be given to the most specific information. This idea has been around since the mid-eighties in the artificial intelligence literature (e.g., Poole, 1985) in reaction to problems encountered in uncertainty-handling rule-based systems such as MYCIN (Buchanan and Shortliffe, 1984) where fusion rules were sometimes blindly applied to combine conflicting inferences. For instance, if Tweety is a bird and most birds fly, then probably Tweety flies; but if we also hear that Tweety is a penguin, then, because penguins do not fly, we have another argument claiming that Tweety does not fly. Viewing this case as one of fusion from two sources is strange, because it will result in uncertainty about whether Tweety flies or not. However it is clear that Tweety does not fly because penguins are an exceptional small class of birds and the generic information about birds is not relevant when information about the more specific class of penguins is available. In most fusion problems it is hard to tell if a source is more reliable than another on the basis of specificity. However when such a specificity-driven reasoning can be carried out about sources then a blind combination can be avoided. This is typically the case when the pieces of information to be fused come as conclusions obtained from a knowledge base. See Dubois and Prade (1988c, d) for discussions about when combination is licit or not in the framework of fuzzy expert systems. This is also the case, for instance, when forecasting the life duration of a building. We may possess some generic information about the life duration of a class of buildings similar to the one under concern, with respect to a limited set of attributes. We may also have some specific extra information about the particular building under study, that contradicts the generic information. Again the latter seems to be more reliable, and should have priority. The combination of a default fuzzy value with a specific fuzzy value is handled via a form of prioritized conjunction described in Section 6.5.2. 6.4.2
Expected Formal Properties of Combination Rules
Several authors (e.g., Cheng and Kashyap, 1989; Hajek, 1985; Bardossy et al., 1993; Bloch, 1996; Yager, 1997), not necessarily in the framework of fuzzy set and possibility theory, have discussed fusion operations in terms of requested algebraic properties. Indeed, a combination law looks all the better if it possesses such nice algebraic properties. However this is sometimes at the expense of power of expressiveness. Let us discuss them briefly in view of the above basic operators. Closure. The closure property is one that is often used without being explicitly stated. It says that if some representation framework is used then the result of the
MERGING FUZZY INFORMATION
combination should also belong to that framework. For instance, any probability theory tenant would assume that pooling two probability measures should produce a probability measure. Similarly, in proposing fuzzy set-theoretic operations, Zadeh (1965) took the natural requirement that the intersection or the union of two fuzzy sets is still a fuzzy set. This kind of closure assumption is natural once we wish to stay within a given mathematical framework. Note that the closure property can be expressed at two levels in the possibilistic framework, namely pointwisely on the possibility distributions (equivalently, on the corresponding fuzzy sets) or eventwisely on the possibility (or equivalently on the necessity) measures. These closure properties are not equivalent as shown by the following result (Dubois and Prade, 1990). The only way of combining possibility measures Π 1 , …, Π n into a possibility measure Π, in an eventwise manner (i.e., ∀A, Π(A) is a function of Π 1 (A), …, Π n (A) only), is a max-combination of the form ∀A, Π(A) = max(f1 (Π 1 (A)), …, fn (Π n (A)))
(6.20)
where fi is a monotonically increasing function such that fi (0) = 0, ∀i, and ∃j, fj (1) = 1. Function fj modifies the shape of the possibility distribution π i underlying Π i (since π(u) = Π({u}). An example of eventwise possibility consensus function is the weighted maximum operation, i.e. ∀A, Π(A) = maxj=1,n min(λ j , Π j (A))
(6.21)
with maxj=1,n λ j = 1, where λ j represents the relative importance of the source yielding Π j ; see (Dubois and Prade, 1986) on weighted maximum and minimum operations. However, in (6.16), the minimum can be changed into a product, or into the linear product max(0, α + β – 1), and more generally into any operation ∗ increasing in both places and such that 1 ∗ 1 = 1, 0 ∗ 1 = 0 = 1 ∗ 0. Note that the eventwise closure property sanctions the pointwise disjunctive rule of combination between possibility distributions but may invalidate useful combination modes such as the conjunctive mode, in the above elementary example, since only the disjunctive operator is coherent with expression (6.20). Here only the pointwise closure property is required. Note that the eventwise closure assumption in probability theory, not only justifies, but enforces as well the consensus method based on linear convex sums (Cooke, 1991). Non-Decreasingness. It is clear that when degrees of confidence α 1 , …, α n , are combined in a fusion process by a function f, increasing one degree of confidence should not decrease the overall confidence degree f(α 1 , …, α n ). Hence the aggregation function should be non-decreasing. The basic fusion modes discussed above use non-decreasing functions.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Commutativity and Associativity. Commutativity is good when sources of information are exchangeable. Associativity is not absolutely required. For instance the averaging combination rule underlying the Kalman filtering technique is not right away associative since it accounts for an ordering of the sources. A weaker property such as quasi-associativity is often sufficient (Yager, 1987): a combination operation f is quasi-associative if and only if there is an associative operation ∗ and an invertible function φ such that f(α 1 , α 2 , …, α n ) = φ(α 1 ∗ α 2 ∗… ∗ α n ). Then the main advantage of associativity, i.e., modularity of the combination when integrating the information from a (n + 1)th source, remains. Namely if α∗ = f(α 1 , α 2 , …, α n ) then f(α 1 , …, α n , α n+1 ) = φ(φ –1(α ∗ ) ∗ α n+1 ). This is what happens in Kalman filtering-based fusion where pieces of information arrive in sequence, and are absorbed one after the other. By definition, the triangular norms and conorms used in the previous section are commutative and associative. Continuity. The continuity requirement is natural on continuous referentials. However conjunctive fusion operations are often discontinuous in the presence of conflicting pieces of information. It means that in the neighborhood of conflict, normalized conjunctive operations are numerically unstable, and should be avoided. On the contrary, disjunctive operations remain continuous. As pointed out in (Dubois and Prade, 1988a), the normalized operator may be very sensitive to rather small variations of possibility degrees around 0; indeed the operator is not continuous in the vicinity of the total conflict expressed by h(π 1 ,π 2 ) = 0. This sensitivity has been first noticed for Dempster rule of combination in the belief function setting by Zadeh (1984). Dubois and Prade (1992a) point out that this sensitivity problem also occurs in probability theory when merging likelihood functions in Bayes theorem with a condidional independence assumption. Example: Assume U = {u1 , u2 , u3 }, a uniform prior and consider the fusion of likelihoods P(x1 m = u1 | x = u) = L(u1 | u), and P(x2 m = u3 | x = u) = L(u3 | u), using (6.3) (where xi m is the measured value given by a source si ). Assume α » 1 – α (» means "much larger than") and L(u1 | u1 ) = α L(u1 | u2 ) = 1 – α L(u1 | u3 ) = 0 L(u3 | u1 ) = 0 L(u3 | u2 ) = 1 – α L(u3 | u3 ) = α In other words, when source s1 says x = u1 it is almost sure that it is right but may be the true value is u2 ; as for source s2 , when it says x = u3 it is almost sure it is right but may be the true value is u2 . Hence, P(u2 | u1 , u3 ) = 1 is obtained, from Bayes theorem under uniform priors and (6.2). If the pair (0, α) is changed into (ε, α– ε), in the above specifications while 1 – α remains the same and ε « 1 – α, we obtain for α = 0.9, ε = 0.01:
MERGING FUZZY INFORMATION
P(u1 | u 1 , u 3 ) = P(u3 | u 1 , u 3 ) Ç 0.32 and P(u 2 | u 1 , u 3 ) Ç 0.36. For α = 0.9, ε = 0.001, we get: P(u1 | u 1 , u 3 ) = P(u3 | u 1 , u 3 ) Ç 0.075 and P(u 2 | u 1 , u 3 ) Ç 0.85. This proves that in some cases Bayes rule is not robust to small variations of numerical inputs as well. Idempotence. That means that if two sources supply the same data the result of the combination should not alter these pieces of data. This property is not always to be accepted. Indeed if each source regards an alternative as surprizing for independent reasons, it seems natural to conclude that the alternative should be very surprizing since there are different reasons for considering it as such. When the information about independence is not available, an idempotent operator appears to be more cautious. Indeed an idempotent operation can even cope with redundant information. Concludingly, adopting idempotence is really a matter of context. It looks natural with expert opinion pooling where experts often have common background. It is also a natural property for estimation processes since the best representative of a set of identical values is precisely this value (Yager, 1997). Note that an idempotent non-decreasing combination lies between min and max. In the scope of fusion the two latter appear as the most natural idempotent operations, because means proper are debatable for this purpose. Kosko (1986a) also requests idempotency in his analysis of fuzzy knowledge combination operations. His view is to reject the arithmetic mean and the median because he claims that the combination operation should become cautious as the gap between the least and the greatest membership estimate increases. Hybrid Behavior. Cheng and Kashyap (1989) have defended the auto-duality property, namely operations t that satisfy (in set-theoretic notation): E1 t E2 = (E1 c t E2 c)c, that is π 1 (u) t π 2 (u) = 1 – (1 – π 1 (u)) t (1 – π 2 (u))
(6.22)
The meaning of (6.22) is a neutrality property with respect to the membership scale that contains membership grades π 1 (u) and π 2 (u). As a consequence, it implicitly rejects conjunctive and disjunctive modes of combination, since intersection and union exchange under De Morgan law and (6.22) expresses an invariance property under this law. Hence auto-duality cannot be used as a universal property. It looks natural if the numbers π 1 (u) and π 2 (u) reflect preference intensities, in the scope of social choice (Dubois and Koning, 1991). Such operations when commutative and increasing are called symmetric sums. A symmetric sum (Silvert, 1979) is always of the form
APPROXIMATE REASONING AND INFORMATION SYSTEMS
α t β=
f(α, β) f(α, β) + f(1 – α, 1 – β)
for some function f such that f(0,0) = 0. Bloch (1996) has advocated such kinds of fusion operations because their behavior (conjunctive, disjunctive, compromise) sometimes depends on the values that are combined. While f(α, β) = α + β corresponds to the arithmetic mean, f(α, β) = α · β corresponds to an associative α·β operation on (0, 1), that displays such an hybrid behavior: α · β + (1 – α) · ( 1 – β) α t β ≥ max(α, β) when α > 1/2, β > 1/2 (disjunctive consensus), α t β ∈ [α, β] when α ≥ 1/2 ≥ β (trade- off), α t β ≤ min(α, β) when α < 1/2, β < 1/2 (conjunctive consensus). Moreover it is discontinuous when α = 0, β = 1 (total conflict) as soon as f(0, 1) = 0. This is not surprizing since the denominator is a kind of normalization factor. Normalized conjunctive operations also display such hybrid behaviors and such discontinuities, but are not necessarily autodual. Kuncheva and Krisnapuram (1996) have modelled such types of hybrid behaviors by imbedding more standard aggregation operations into a more elaborated fusion scheme where the degree of consensus between sources intervenes and is compared to a threshold above which consensual opinions reinforce, and under which they attenuate each other. Transformation Invariance. This property requests that the result of the fusion should not depend on the membership scale. Namely, changing the possibility distributions π i using some modifier m, a continuous, monotonically increasing function, by turning π i into m(π i ), then the fusion operator ƒ should be such that m(ƒ(π 1 ,…,π n )) = ƒ(m(π 1 ),…, m(π n )). This property is very demanding and forces the merging operator to be ordinal in nature, namely that ƒ(π 1 (u)…,π n (u)) ∈ {π 1 (u)…,π n (u)} (Ovchinnikov, 1998). So only the minimum and the maximum remain, as well as order statistics (that selects the kth greatest figure in {π 1 (u)…,π n (u)}). Convergence. Some authors suggest that as the number of sources increases the result of the fusion should tend to deliver the "true" value of the variable of interest. Assuming perfectly reliable sources, Park and Lee (1993) combine fuzzy numbers in such a way that the fusion of the modal values of the fuzzy numbers converges to the true value of x (which presupposes that the fusion operator ƒ be convex), and that the imprecision of the result eventually vanishes. However these conditions are very drastic and seem to be more natural requirements for an estimation method than for a fusion process whose aim is to extract valid and faithful information from what sources deliver, rather than to supply a representative value.
MERGING FUZZY INFORMATION
Adaptiveness. The 2-source elementary example in the beginning of this section points out that the overlap between each piece of information affects the choice of the proper combination rule, since the absence of overlap prevents the use of conjunctive operators. Particularly, when E1 ∩ E2 ≠ Ø, the assumption that both sources are reliable becomes less and less plausible, as the overlap between E1 and E2 becomes smaller. Hence one might request that such an overlap be evaluated and be used as a source of information of its own. The advantage is to obtain an adaptive combination operator that gradually turns from a conjunctive combination into a disjunctive one as the disagreement between the sources increases. This practical requirement may be incompatible with formal ones such as associativity, for instance. It seems more suitable for sources with common background, whose almost disagreement suggests that one of them is wrong. When sources refer to different backgrounds, the reasons why source 1 supplies E1 and source 2 supplies E2 might be distinct; computing E1 ∩ E2 then legitimately produces a significant improvement of the precision of the information. Adaptive rules are considered in the following section. Another view of adaptiveness is developed by Kosko (1986a) using a "leniency" condition, and the suggestion that the result of the combination should depend on E1 ∩ E2 and E1 ∪ E2 . 6.5. REFINED FUSION MODES The basic symmetric fusion modes are generally not sufficient in practice, for several reasons. A first reason may be that it is known that some sources are more reliable than others. In order to take this information into account, weighted or prioritized fusion schemes must be envisaged. Even if the reliability of sources is unknown, one may suspect that not all of them are reliable and one may have some idea about the proportion of good sources. It leads to quantified fusion. Lastly, even if nothing is known about the sources, one may get an idea of how many of them are good by studying the conflicts among them. It leads to adaptive fusion modes. 6.5.1
Assessment of the Reliability of Sources
In this subsection, it is assumed that something is known about the reliability of the sources. The problem is how to model this information. For instance in the standard Bayesian setting this issue seems to be imbedded in the standard model. Namely since each source supplies a point value, the reliability of the source is described by the shape of the likelihood function. The less reliable the source, the larger the square deviation of the distribution. However one may attach to each source a weight describing how confident we are about it. In the probabilistic consensus method of Cooke (1991) such weights are attached to experts supplying the information, and these weights are used to build a weighted convex sum of distributions supplied by experts. A first question is how to get such a weighting of sources. Rating the Performance of Experts. In expert opinion pooling, experts are asked questions whose answers are known, and they are rated on the basis of
APPROXIMATE REASONING AND INFORMATION SYSTEMS
their replies. The questions pertain to the true values of a series x1 , x2 ,…, xn of "seed" variables; the values of these parameters are either known by the analyst and not known by the experts, or more often can be determined afterwards by means of physical experiments, or other means. In order to build a meaningful rating system, one must first identify the type of deficiencies experts may be prone to, and then define indices that enable the true answer and the expert answer to be compared so as to detect these deficiencies. This program has been carried out in the probabilistic setting by Cooke (1991). Note that the true value of a seed-variable may be illknown itself, sometimes because the state-of-the-art in the field does not allow for its precise evaluation, or because the available information consists of some histogram. Suppose experts supply possibility distributions in the form of confidence intervals. Experts can be deficient with regard to three aspects: • inaccuracy: values given by the experts are inconsistent with the actual information about the seed variables. For instance, they always underestimate the true value. Such experts are then said to be inaccurate; • overcautiousness: the experts are too cautious because the intervals they supply are too large to be informative, although the are not inaccurate. Such experts are said to be underconfident; • overconfidence: the value of the seed variable is not precisely known todate but the experts supply intervals that are too narrow (or point-values). Such experts are said to be overconfident. Sandri et al. (1995) propose an approach to build scoring indices that reflect these issues. Consider a seed variable x whose value u* is precisely known, and let E be the fuzzy set supplied by the expert, in order to describe his knowledge about x. Let E(·) be the membership function of E (so that E(·) = π x ). In this situation, overconfidence does not appear. It is easy to see that – the greater E(u*), the more accurate is the expert. Indeed if E(u*) = 0, E totally misses u* while if E(u*) = 1, u* is acknowledged as a usual value of x. Hence E(u*) is a natural measure of accurateness. – if E is a crisp interval [a,b], the wider E, the more imprecise (hence under-confident) the expert. The width of E is then |E| = b – a. When E is fuzzy the width of E is generalized by
∫
|E| = U E(u) du.
(6.23)
This is a generalized fuzzy cardinality (where cardinality is changed into the Lebesgue measure). Other extended cardinalities exist for evaluating imprecision (see Klir and Folger, 1988). In our situation, where E is a finite nested random set, the equality |E| = ∑i=1,m |Ai |pi can be established, where the Ai 's denote the level-cuts of E (as in Section 6.3.2). This evaluation must be rescaled so as to account for the
MERGING FUZZY INFORMATION
ever present residual uncertainty e = pm (see Figure 6.1) and so that it rates 1 when a = b (precise response) and 0 when a = u0 , b = um where U = [u0 , um] (empty response). A reasonable underconfidence index is then . f(|E|) = |U| – |E| (1 – e) · |U|
(6.24)
On the whole, the overall rating regarding a single seed-variable can be defined as Q(E,x) = E(u*) · f(|E|)
(6.25)
that requires the expert to be both accurate and informative in order to score high. One may admit that by convention, Q(E,x) = 0 if u* ∉ [a, b] instead of e · f(|E|), if we do not want to account for the residual uncertainty in the calibration index. When the seed variable is not precisely known, the index Q(E,u*) can be extended as shown in Sandri et al. (1995). In the probabilistic approach (Cooke, 1991), the expert data is modelled by means of a probability distribution function, and the rating of the expert combines a degree of calibration and a degree of informativeness. The latter is based on entropy instead of cardinality. Probabilistic calibration is quite different from accurateness because it is a conjoint evaluation pertaining to several seed variables simultaneously while accurateness deals with single seed variables. Calibratedness can go along with significant inaccurateness (see Sandri et al., 1995). Overall, an expert j is rated by the set {Q(Eji ,x i ) | i = 1, n} of evaluations. A ranking of experts can be obtained based on the average rating R(j) of each expert. The standard deviation is also useful to check the significance of the gaps between average ratings of experts. Based on these evaluations the set of experts can be divided into groups of equally reliable ones. However it is not clear that the coefficients R(j) have probabilistic significance such as the probability that the expert is right. Rather, these coefficients indicate a ranking of experts, suggesting who seem to be the most credible ones in the concerned domain. Sensor reliability. In the case of sensors, one may take into account the possibility that a sensor is out of order. When the sensor works, it delivers some value um with some impecision resulting in a possibility distribution (π(u) = P(um | u), for instance, or the transform of a probability distribution). When it is broken, the reading means nothing (π(u) = 1, ∀u). The weight can be viewed as the level of certainty that the sensor is working. Then the actual weighted possibility distribution is obtained via a form of discounting. In possibility theory, this issue is closely related to the problem of certainty qualification of fuzzy statements, namely how to represent qualified statements like "it is α-certain that x is E" where E may be a fuzzy set understood as acting as a fuzzy restriction on the possible values of x. See Chapter 1 of this volume by Bouchon et al. (1999).
APPROXIMATE REASONING AND INFORMATION SYSTEMS
If the degree of certainty that a given source is reliable is known, say α, then it is possible to account for this information by changing the possibility distribution π, provided by the source, into (Yager, 1981b; Prade, 1985; Kosko, 1986a; Dubois and Prade, 1988b): π' = max(π, 1 – α).
(6.26)
When α = 1 (fully reliable source), π' = π, and when α = 0 (absolutely unreliable source), then ∀s, π'(s) = 1 (total ignorance). Note that α = 0 does not mean that the source lies, but that it is impossible to know whether its advice is good or not. This view makes sense on any bounded ordered scale of plausibility values. There exist other proposals for certainty qualification that behave similarly in limiting cases. Yager (1984) suggested an expression of the form π' = α ∗π + 1 – α.
(6.27)
by analogy with the discounting of focal elements (which are ordinary subsets corresponding to probable imprecise localizations of the value of x in U) in Shafer (1976)'s approach to belief functions. Here the focal elements are π and U itself, weighted by α and 1 – α respectively, where π is possibly a fuzzy membership function. Operation ∗ is a conjunction and the following partticular cases can be pointed out: For ∗ = min, we get π'(u) = min(1, 1 – α + π(u)) which enlarges the core of π; for ∗ = product, we get π'(u) = α ⋅πx (u) + 1 – α ; for a ∗ b = max(0, a + b – 1), we recover (6.26). Expression α⋅πx (s) + 1 – α can be directly obtained by applying the discounting procedure available in Shafer (1976)'s belief function framework to the consonant focal elements {A1 , A2 ,…, Am} where Ai ⊂ Ai+1 , for i = 1,…m – 1 which, together with the set of weights p1 , p2 , …, pm summing to one, describe the possibility distribution π. Discounting attaches weight α⋅pi to Ai , ∀i, 1 – α being added to the one of to U. When the information provided by the source is given under the form of an ordinary subset of U, all above expressions coincide and yield a simple support belief function (Shafer, 1976) focusing on the subset. They are in accordance with Mundici (1992)'s representation of unreliable information when a fixed number of sources may lie, as explained above. Note that each discounting formula also corresponds to a combination of π and α by means of a many-valued implication (Dienes-Rescher for (6.26), the two others being Reichenbach and Lukasiewicz implications) Then π'(u) can be viewed as evaluating the truth of the following statement: "if the source is reliable (at degree α) then u is restricted by π", in some many-valued logic.
MERGING FUZZY INFORMATION
6.5.2
Fusion with Unequally Reliable Sources
When sources have unequal reliabilities, the symmetric fusion modes can be extended by allocating priorities. When these values are not available, a priority ranking of sources can be dealt with via a consistency analysis, whereby the information coming from a low priority source is allowed to modify the information coming from a higher priority source only if these pieces of information are consistent. Prioritized Fusion. The above discounting techniques lead to generalized conjunctive and disjunctive fusion rules which, in the idempotent case, are of the form: π ∧α = min j=1,n α j ∗π j + 1 – α j (6.28) π ∨α = maxj=1,n α j – α j ∗(1 − πj ) (6.29) where operation ∗ stands for minimum product or linear product max(0, a + b – 1). They remain dual to each other. These forms of "weighting" still apply when the minimum and maximum are changed into product and probabilistic sums respectively. For instance, Hirota et al. (1991) use a weighted probabilistic sum for sensor fusion, with ∗ = product. The prioritized max and min combinations are the counterpart in possibility theory of the linear convex combination in probability theory; when a ∗ b = max(0, a + b – 1) they take the more familiar forms: π ∧α = min j=1,n max(π j , 1 – α j ), π ∨α = maxj=1,n min(α j , π j ).
(6.30) (6.31)
These prioritized min- and max-combinations can be interpreted in the possibilistic framework as a kind of integral, just as the convex combination can be interpreted in terms of probabilistic expectation. They are of the form of a Sugeno integral (Sugeno, 1974, 1977) S π g (u) = max j = 1,mmin (π σ(j) (u), g({σ(1),…, σ(j)}) =min j = 1,mmax (π σ(j) (u), g({σ(1),…, σ(j – 1)})
(6.32)
where σ is a permutation such that π σ(1) (u) ≥ π σ(2) (u) ≥… ≥ π σ(m)(u) and g is a fuzzy measure on {1,…, m} such that if T is a subset of T' then g(T) ≤ g(T'). To see it, use the fact that the prioritized min and max are special cases of Sugeno integrals where g is respectively a necessity measure and a possibility measure according to any distribution ρ on {1,…, m} such that α j = max i = 1,…,j ρ(σ(i)) = g({σ(1),…, σ(j)} for π ∨α α j = 1 – max j = i,…,m ρ(σ(j)) = g({σ(1),…, σ(j – 1)} for π ∧α .
APPROXIMATE REASONING AND INFORMATION SYSTEMS
S π g (u) is actually the median of 2m – 1 terms {π 1 (u),…, π m(u), g({σ(1}),…, g({σ(1),…, σ(m – 1)})}(see for instance Dubois and Prade, 1980; Grabisch et al., 1992). Tahani and Keller (1990) have applied fuzzy integrals to computer vision for the purpose of information fusion. They used so-called λ-fuzzy measures of Sugeno (1997) which are decomposable measures such that g(A∪B) = g(A) + g(B) + λ·g(A). g(B), for some λ > – 1. Further on, Yan and Keller (1996) propose to use possibility measures instead, while generalizing the Sugeno integral (6.32) into a more general prioritized schemes, called normed and conormed possibility integrals, respectively : S ∨ (u) = max j = 1,m π σ(j) (u)∗Π({σ(1),…, σ(j)}), S ∧ (u) = min j = 1,m π σ(j) (u)⊥Π({σ(1),…, σ(j)}). These expressions can be simplified into a form similar to (6.30-6.31), provided that min is changed into the t-norm ∗ in (6.31) with α j = Π({j}), and max into the conorm ⊥ in (6.30) with 1 – α j = Π({j}). Bardossy et al. (1993) have also used several weighted pooling techniques such as a prioritized disjunctive fusion of the form (6.29) for ∗ = product, that is, π ∨α =maxj=1,n α j . π j of which they take the convex hull. See Arbuckle et al. (1995) for the use of Choquet integrals in a weighted information fusion problem. Consistency-Driven Prioritized Fusion. Sometimes the priorities of the sources are just an indication of which source is reliable and which source is not, as in the case of expert opinion pooling. Then one approach to prioritized fusion is to accept the conjunctive merging of information from a reliable source s1 and from a less reliable one s2 as soon as the data coming from the latter is consistent with the former. In case of inconsisency, the information given by the less reliable source is simply discarded. The information from source s2 is used to refine the information from s2 insofar as the two sources do not conflict. If π 1 is obtained from s1 and π 2 from s2 , the degree of consistency of π 1 and π 2 is defined as h(π 1 ,π 2 ) = supu min(π 1 (u), π 2 (u))
(6.33)
and the following combination rule has been proposed (Dubois and Prade, 1988c; Yager, 1991): π 1–2 = min(π 1 , max(π 2 , 1 – h(π 1 ,π 2 )). (6.34) Note that when h(π 1 ,π 2 ) = 0, s2 contradicts s1 and only the opinion of s1 is retained (π 1–2 = π 1 ), while if h(π 1 ,π 2 ) = 1 then π 1–2 = min(π 1 ,π 2 ). Here π 2 is viewed as a specific piece of information while π 1 is viewed as a fuzzy default value. The prioritized conjunctive fusion rule is easily interpreted as the conjunctive
MERGING FUZZY INFORMATION
combination of the information supplied by source s1 and the information supplied by source s2 , the latter being discounted by a certainty coefficient h(π 1 ,π 2 ), such that the degree of possibility that source s2 is wrong is taken as 1 – h(π 1 ,π 2 ). π 1–2(u) evaluates, in the sense of fuzzy logic, the expression "x = u is possible for source s1 and either x = u is possible for source s2 or source s2 is wrong (with certainty h(π 1 ,π 2 ) that source s2 is not wrong)". As suggested by Figure 6.2, π 1–2 is subnormalized when it differs from min(π 1 ,π 2 ). Hence a renormalization, as in (6.18) can be used. The disjunctive counterpart of this prioritized operator has been proposed by Dubois and Prade (1988d) and Yager (1991) π 1–2 = max(π 1 , min(π 2 , h(π 1 ,π 2 )).
(6.35)
The effect of this operator is to truncate the information supplied by the less prioritary source, while disjunctively combined with source s1 . Again if the two sources disagree (h(π 1 ,π 2 ) = 0) then π 1–2 = π 1 ; if h(π 1 ,π 2 ) = 1 then π 1–2 = max(π 1 , π 2 ).
1 h
π1
π2
1–h U
0 prioritized conjunction prioritized disjunction
Figure 6.2. Prioritized conjunctive and disjunctive fusion rules
Quantified Fusion. An intermediary mode of pooling the π i 's ranging between the conjunctive and the disjunctive modes consists in assuming that j sources out of m = |M| are reliable. The pooling method then consists in selecting a subset J ⊆ M of sources such that |J| = j, assume that they are reliable and combine their opinions conjunctively. Then, considering that it is not known which one of these subsets J contain reliable sources, combine the intermediary results disjunctively. The following formula is obtained (Dubois, Prade and Testemale, 1988). π (j) (u) = maxJ⊆M, |J|=j min i∈J π i (u).
(6.36)
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Clearly, π (m) = π ∧ (for ∗ = min) and π (1) = π ∨ (for ⊥ = max) i.e., this mode of aggregation subsumes the conjunctive and disjunctive ones. The above combination rule is equivalent to some quantified aggregation functions also proposed by Yager (1985), and can be easily calculated, as follows for each value of u: i) rank-order the π i (u) such that π σ(1) (u) ≥ π σ(2) (u) ≥…≥ π σ(m)(u) ii) then π (j)(u) =π σ(j) (u). What is obtained is an order-statistic. This scheme can be extended to fuzzy quantifiers, in order to model assumptions such as "most sources are reliable", "approximately j sources are reliable", etc. (Dubois et al., 1988; Yager, 1985). By "delocalizing" the priority levels α i in (6.30), a (fuzzily) quantified conjunction is obtained, corresponding to the assumption that 'most' sources in M are reliable (rather than 'all' of them). This can be done in the following way, using the same permutation as above. An absolute fuzzy quantifier is viewed as a fuzzy subset Q of the set of integers {0, 1, 2, …, m}. Fuzzy quantifiers such as most are modelled by Q(0) = 0, Q(i) ≤ Q(i + 1). For instance, the requirement that "at least k" sources are reliable will be modelled by Q(i) = 0 if 0 ≤ i < k, Q(i) = 1 for i ≥ k. More generally, the value 1 – Q(i -1) is going to be used as the priority associated with the i-th most satisfied constraint in a conjunctive aggregation. The quantified conjunctive aggregation operation is then defined by πQ ∧ (u) = min i=1,m max(Q(i – 1), π σ(i) (u)).
(6.37)
When Q(0) = 0, Q(i) = 1 for i ≥ 1, it means at least one and π Q ∧ = π σ(1) . When Q Q(i) = 0 for i < m and Q(m) = 1, it means all and π ∧ = π σ(m). When Q(i) = 0 if 0 ≤ i < k, Q(i) = 1 for i ≥ k, (6.37) reduces to π Q ∧ (u) = π σ(k) as expected. It has been proved that it comes down to computing the median of the set of numbers made by the π σ(i) (u)'s and the Q(i –1)'s for i = 1, m. Indeed it is once more a fuzzy integral of the form (6.32) letting g({σ(1),…, σ(i – 1)}) = Q(i – 1). What has been computed in (6.37) is an ordered weighted minimum operation (OWmin) (similar to the ordered weighted averages of Yager(1988)); strictly speaking, one should speak of ordered prioritized min. Similarly the disjunctive counterpart called the OWmax operation can be used: π OWmax(u) = maxi=1,…,mmin(α i , π σ(i) (u)). Using the two forms of the Sugeno integral, the disjunctive and the conjunctive ones in (6.32), any Owmax operation can be modelled by an OWmin using suitable
MERGING FUZZY INFORMATION
weights (if α i = Q(i), π OWmax(u) = π Q ∧ (u)). The expression (6.37) can be easily modified for accommodating relative quantifiers Q like 'most', by changing Q(i) into Q i for i = 0, m – 1 and Q(1) = 1 where Q is increasing (a required proportion of m at least k/m amounts to have k non-zero weights among m). The most general form of prioritized conjunctive fusion operations is obtained by attaching a priority level g(T), expressing reliability, to each subset of sources, where g is a fuzzy measure. The increasingness of g is understood as the fact that if T and T' are two sets of sources, and T' contains T then T' contains more reliable sources than T. In other words T' is more representative of the set of sources than T. Assuming the sources in T are reliable, the information is conjunctively aggregated in T. We thus obtain an extension of (6.36) of the form: π gmaxmin (u) = maxJ ™ Mmin(g(J), mini ∈ Jπ i(u)). This form of quantified fusion has been proposed by Yager and Kelman (1996), and also by Marichal (1998) in a mathematical context. Marichal has proved that π gmaxmin (u), that he calls "weighted max-min function", is equal to S g (u), the Sugeno integral (6.32). Moreover considering the dual proritized min-max form π gminmax (u) = minJ ™ Mmax(g(Jc), maxi ∈ Jπ i(u)) Marichal shows that π gmaxmin (u) = π gminmax (u). Since these fusion operations generalize the prioritized min, the prioritized max and the OWmax and OWmin, this property is directly applicable to make OWmin and OWmax equal as well. 6.5.3
Adaptive Fusion
The renormalization (6.18) applied to conjunctive operators erases the conflict between the sources pointed out by the subnormalization and creates numerical unstability problems in case of strong conflicts. Even if it is good that the result of the combination focuses on the values on which all the sources partially agree (in the sense that none of them gave to these values a possibility equal to 0), it would be better that the result also keeps track of the conflict in some way. A natural idea for keeping track of a partial conflict is to discount the result given by (6.18) by a priority level corresponding to the lack of normalization, i.e., 1 – h (π 1 ,π 2 ). Namely 1 – h (π 1 ,π 2 ) is viewed as the degree of possibility that both sources are wrong since when h(π 1 ,π 2 ) = 1, it is useful and reasonable to suppose that the two sources are right. An example of such a discounted conjunctive combination rule is: π'(u) = max(π(u), 1 – h(π 1 ,π 2 )) = max π 1 (s) ∗ π 2 (s) , 1 – h(π 1 ,π 2 ) . h(π 1 ,π 2 )
(6.38)
APPROXIMATE REASONING AND INFORMATION SYSTEMS
The amount of conflict 1 – h(π 1 ,π 2 ) induces a uniform level of possibility for all values outside the ones emerging in the subnormalized intersection of π 1 and π 2 , i.e., the result of the combination (6.18) is not fully certain. Clearly, lim h(π ,π )→0 π'(u) = 1, so that the discontinuity effect is coped with, resulting in 1 2 total ignorance. When h(π 1 ,π 2 ) is low, one could be more optimistic and assume that the discrepancy between the sources is due to one of them being wrong, not both. Then, instead of transferring the amount of conflict as a uniform degree of possibility on the whole referential U, one may restrict to the union of E1 and E2 (π 1 = E (·) π 2 1 = E (·)), considering that there is no reason to put a non-zero possibility degree on 2 values that both sources agree to consider as impossible. Hence the more elaborated adaptive operator (Dubois and Prade, 1992b). π AD(u) = max(
min (π 1 (u), π 2 (u)), h(π 1 , π 2 )
, min (max (π 1 (u), π 2 (u)), 1 – h(π 1 , π 2 ))) .
π AD(u) evaluates at x = u the fuzzy logic expression: "the two sources are reliable and claim that x = u, OR (they are not both reliable AND one of them is reliable and claims that x = u)", where h(π 1 ,π 2 ) evaluates to what extent both sources are reliable, AND and OR being expressed by min and max. The kind of result obtained by this adaptive operator is given on Figure 6.3. The pooling rule tries to keep the assumption that both sources are reliable as long as can be, i.e., the area where the two sources agree is always preferred. In order to avoid the complicated shape and the sudden discontinuity when h(π 1 ,π 2 ) goes to zero, one might use the convex hull of the result of the disjunctive fusion in the second term of the expression of π AD(u). This operator can be extended to priotitized fusion. Rather than choosing between conjunctive and disjunctive prioritized operators (6.34) and (6.35),it seems more promising to integrate these expressions in the adaptive operator π AD in place of symmetrical conjunctive and disjunctive combinations.
MERGING FUZZY INFORMATION
1
π1
πAD
π2
1–h
h
0 u Figure 6.3. Symmetric adaptive fusion of two sources
The extension of the adaptive operator to more than two sources is not obvious because of its lack of associativity and even of quasi-associativity. This is the price paid for adaptiveness. An obvious extension to k > 2 sources would mean changing min(π 1 ,π 2 ) and max(π 1 ,π 2 ) into min(π 1 ,…, π k ), and max(π 1 ,…, π k ) respectively in the expression of π AD. But this extension, if natural, will not be efficient because it only considers the two assumptions "all sources are right" and "one source is right", among which it builds a trade-off. Clearly the more sources, the more likely they will supply scattered information so that most of the time, the agreement index h(π 1 ,π 2 ,…, π k ) = supu min i π i (u) = 0, while maxi π i will yield a very uninformative result, i.e., the operator will behave disjunctively. Other intermediary assumptions about the sources can be envisaged as considered in the sequel. A first idea to cope with this situation is to assume a given number of reliable sources and use quantified fusion. The choice of the number j of supposedly reliable sources can be guided by the assessment step for expert opinion pooling, or by prior knowledge about sensors. However guessing the value of j is not easy. A more interesting idea is to extend the adaptive operator on the basis of a consistency analysis and to derive two values m and n representing a pessimistic and an optimistic evaluation of the number of reliable sources (Dubois and Prade, 1994). Let T ⊆ S be a subset of sources and h(T) = supu min i∈T π i (u) be the index of agreement among sources in T. Then define j– = sup{|T|, h(T) = 1} ; j + = sup{|T|, h(T) > 0}. Then it is plausible to assume that at least j– sources in S are reliable (since there is a fully agreeing subset of j– sources) and at most j+ are reliable (since there is a total conflict within all groups of more than j+ sources). It can be shown that
APPROXIMATE REASONING AND INFORMATION SYSTEMS
generally j– < j+ . The optimistic (resp. pessimistic) fusion is achieved by the quantified operator π (j–) (resp. π (j+)). So, a natural extension of the adaptive operator π AD to the case of m > 2 sources is +
– π (j )(u) π AD(u) = max( , min (π (j )(u) , 1 – h(j +))) + h(j )
(6.39)
where h(j+ ) = max{h(T), |T| = j+ }. It can be checked that when |M| = 2, (6.39) coincides with the 2-source expression of π AD. Indeed if h(π 1 ,π 2 ) = 1, then j+ = j– = 2 and π AD = min(π 1 ,π 2 ). If h(π 1 ,π 2 ) ∈ (0,1), j– = 1, j+ = 2, π (1) = max(π 1 ,π 2 ), π (2) = min(π 1 ,π 2 ). For instance, consider the three possibility distributions on Figure 6.4. It can easily be checked that j– = 2, j+ = 3, π (3) = min i=1,3 π i , π (2) = max(min(π 1 ,π 2 ), min(π 2 ,π 3 ), min(π 3 ,π 1 )). The resulting distribution may have a complex shape; however it highlights which values are most plausible for the parameter. Moreover, one may again simplify the shape of the resulting distribution by using convex hulls of π (3) and π (2). On Figure 6.4, one immediately sees that the most plausible values are the ones where all sources partially agree (on [u2 , u3 ]) and to a lesser extent, the ones where two sources strongly agree (on [u4 , u5 ]). It is also instructive to consider the fusion of crisp intervals using the adaptive operator. It is easy to check that j+ = j– is the maximal number of sources that do not conflict with one another. One computes the maximal subsets of nonconflicting sources in the sense of cardinality. For each such subset the intersection of the overlapping intervals is performed, and then the result is the union of such intervals. This remark points out that the adaptive fusion rule is very close to techniques that cope with inconsistency in logical settings where maximal consistent subbases of knowledge bases are calculated (see Rescher and Manor, 1970; Benferhat et al., 1997a). In the above fusion rule one might look for maximal consistent subset of sources in the sense of inclusion, and not in the sense of cardinality. Indeed the adaptive rule (6.40) is no longer idempotent because duplicating the sources affects the computation of j+ and j–. Using maximality for inclusion would restaure the idempotence of the adaptive fusion rule. Clearly, elaboration along these lines is a matter of further research.
MERGING FUZZY INFORMATION
π1
1
π2
π3
1 – h(3)
h(3)
0 (2) π
u1
u2
π(3)
u3 u4
u5 U the resulting distribution
Figure 6.4. The adaptive operator with 3 sources
The proposed adaptive fusion rule remains partially ad hoc, and is in need of formal justification, for instance on the basis of a logical conflict analysis on levelcuts of the possibility distributions. Variants of the operator (6.39) can also be thought of where minimum is changed into another t-norm, and the discounting method is different, although still of the form of equation (6.27). Some authors have noticed that the values of j+ and j– are not robust to little changes in the distributions. For instance when going from triangular to trapezoidal distributions, j– may be increased, and the resulting distribution may drastically change (See Pannerec et al., 1998). This defect can be mended by changing j+ and j– into fuzzy integers and using fuzzy quantified fusion to compute substitutes of π (j+) and π (j–). The adaptive rule has been tested in this form or using variants by Deveughele and Dubuisson (1993) in artificial vision, by Pannerec et al. (1998) in robotics. The concept of possibilistic conflict analysis imbedded in the combination rule (6.39) has been exploited by Nifle and Reynaud (1997) in a problem of multisource eventbased recognition of trajectories, in the setting of possibility theory. Another problem is to compute the adaptive fusion. Clearly such computation is not so easy, less easy than the updating step of the Kalman filtering technique. See Deveughele and Dubuisson (1995) for computational considerations and Delmotte and Borne (1998) for alternative proposals in the case of n sources, involving adaptive weighted averages of prioritized conjunctive and disjunctive fusion rules . 6.6
FUZZY ESTIMATION
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Of considerable interest in many domains is the estimation problem. Assume there are multiple readings u1 ,…, un for some attribute stemming from different sources (typically, sensors). The problem of estimation is that of combining these readings to obtain one representative value compatible with the observations. This problem can be solved directly, or indirectly via a fusion step. The latter approach is typical of the probabilistic approach where observed point-values are cast within a mathematical model of the uncertainty pervading the sources. Each value ui is interpreted as a likelihood function, and these likelihood functions are fused under some assumption. Then a representative value is selected. In the Bayesian approach an expected value is computed; alternatively, the maximum likelihood value is chosen. In the possibilistic setting the selection step is often called defuzzification. In this section, we consider the direct estimation method and its extensions via fuzzy set-based concepts. Fuzzy set theory offers its body of aggregation operations, especially the generalized mean family, to perform direct estimation techniques. Moreover these techniques can be directly applied to fuzzy observations. It differs from fusion in the sense that the aggregation is performed on the measurement scale (horizontal view) while in the fusion process the aggregation is performed on the scale of membership values, in practice degrees of plausibility (vertical view). A bridge between these two views of merging information is ensured using the concept of constrained fusion. 6.6.1
Fuzzy Extension of Estimation Techniques
In order to perform an estimation task, some function F, called the estimation function, is used. It takes multiple readings u1 ,…, un of a parameter x and provides a representative single value. Thus, F(u1 ,…, un ) = u, where u is called the estimated value. Typically the arithmetic mean is chosen, like in the least square methods. Interestingly the Bayesian approach is capable of justifying this estimation method on the basis of a fusion + selection approach and some additional assumptions (Brown et al.(1992), for instance). More generally, function F can be a median, a geometric mean, etc. While the choice of the function F is very situationdependent, intuitively one can require certain basic properties as being almost always necessary in F (Yager, 1997). One property required for the function F is that of idempotency: if all the ui are the same then our estimated value should be this value. A second property often associated with estimation operators is that of commutativity: the indexing of the arguments doesn't affect the final aggregated value. In situations in which the underlying space U is the real line, a monotonicity condition is generally required: if ui ≥ vi for all i then F(u1 , …, un ) ≥ F(v1 , …, vn ). An estimation operator F having these three properties is called a mean operator (see for instance Dubois and Prade, 1985; Fodor and Yager, 1998 for a full discussion of mean operators). The prototypical mean operator is the simple averaging operator. While prototypical, the simple average is one among many possible candidate mean operators. Indeed these three properties are sufficient to ensure that F(u1 , …, un ) lies between min(u1 , …,
MERGING FUZZY INFORMATION
un ) and max(u1 , …, un ). Other examples of mean operators are median, mode and the ordered weighted averaging operators (OWA, see Yager, 1988). Max, min, are limit cases of mean operators and are not usually considered so. Namely an operator F is a genuine mean if min < F < max. In many environments in which estimation is required, the information provided may be imprecise. For example, a witness may say the object is located at about 2 miles. Furthermore information provided by human sources often comes in the form of linguistic values, such as close, nearby and far. A natural framework for representing this kind of information is a fuzzy subset, and very often a fuzzy interval. One option is then to use a fusion approach to the merging, as in the previous sections, whereby the degrees of plausibility are combined. This is the vertical merging, while estimation as viewed here is an horizontal merging. The other option is to extend the estimation functions to fuzzy operands. Note that this generalized estimation procedure is not in the spirit of fusion as understood in Section 6.4. since fusion aims at guessing where the potentially true values are, while estimation searches for a representative value only. Namely the direct estimation method assumes that none of the values is utterly wrong, while fusion aims at discarding those values. In order to address fuzzy estimation, one must use the extension principle from fuzzy logic (Zadeh, 1965; Dubois and Prade, 1988b; Dubois, Kerre Mesiar and Prade, 1999). Let F be an ordinary estimation function defined as F: U 1 × U2 ×… × Un → V, that is, F takes points in the Cartesian space U1 × U2 ×… × Un and delivers a value in the space V. In the purely numeric environment all spaces Ui and V are the real line r. Assume A1 , …, An are fuzzy observations and Ai (ui ) denotes the membership degree of ui in Ai ; then we obtain the estimate of these observations, F(A1 , …, An ) = B, where: B(v) = sup{mini Ai (ui ): F(u1 , …, un ) = v}.
(6.40)
Consider the special case when our estimation function F is the simple average, F(u1 , …, un ) = u1 + u2 +… + un . To obtain the average of fuzzy n observations n we get F(A1 , …, An )[v] = sup{mini Ai (ui ): u1 +… + un = v} = B(v) and thus B = n A1 ⊕ A2 ⊕… ⊕ An using the fuzzy addition (Dubois and Prade, 1987a, Dubois n Kerre Mesiar and Prade, 1998). Other estimators can be fuzzified in a similar way. Quite another view of fuzzy estimation from fuzzy data is suggested by Yang and Liu (1998). They adopt a maximum likelihood approach in the possibilistic setting
APPROXIMATE REASONING AND INFORMATION SYSTEMS
where a joint likelihood pertaining to n fuzzy numbers is obtained via the minimum operator applied to parameterized possibility distributions of similar shapes (e.g., Gaussian). Note that it might be useful to complete the fuzzy estimation by a defuzzification step so as to extract a representative precise value from the process. Many defuzzification methods are available (see Yager and Filev, 1994). However it might be natural to request that the result be the same if the imprecise readings were defuzzified and then merged via F. Let sel(M) be the precise value representing the fuzzy number M. It comes down to requiring that: sel(F(A1 , …, An )) = F(sel(A1 ),…, sel((An )). For the usual arithmetic mean estimator, most existing defuzzification methods violate this property. An important one that respects it is the the middle point of the mean interval (Yager, 1981a) 1
sel(M) = 0
(inf Mα+ supMα) dα 2
where Mα is the level-cut {u, M(u) ≥ α}. This is a Choquet integral, the generalization of the expected value (see Dubois and Prade, 1987b; Fortemps and Roubens, 1996). 6.6.2
Constrained Merging
Yager (1997) considered the issue of including an "intelligent" component in the merging process to address conflicts in the data to be fused, and he used the idea of a combinability function, expressing compatibility, to encode knowledge that helps implement a type of intelligent fusion. First, consider the problem of merging ordinary numbers, F(u1 , …, un ) = u. One problem that arises in blindly using an estimation operator is that the merged value may not be really satisfactory because it does not reflect any of the components in the aggregation. For example, if there are two estimates of a person age, u1 = 60 and u2 = 20 the average is u = 40. This is a value that is not very compatible with either of the components being aggregated. The reason for this lack of compatibility is the attempt to aggregate disparate values. In constructing intelligent fusion systems one may desire to avoid the aggregation of items that are very dissimilar, or, at the very least, provide information that the aggregated value is based upon an aggregation of values that are essentially incompatible. Note that in the fusion processes described in the previous sections, the conjunctive mode is undefined when applied to non-overlapping possibility distributions. In the following we discuss the idea of a combinability relationship which can be used to convey information about the allowability of fusing elements; more details can be found in Yager (1997), Yager and Kelman
MERGING FUZZY INFORMATION
(1996). This approach leads to bridging the gap between fusion and estimation, and we call it constrained merging. Assume some aggregation function F has been selected, the average for example. A combinability relationship R is a mapping R: U × U → [0,1], such that for any two values u and u', R(u,u') indicates the degree to which it is acceptable to merge the values u and u' under F. The actual form of this relationship will be problemdependent and as such can be seen as synthesizing a knowledge base which contains meta-information about the problem involving the aggregation operation. For example, when merging ages, 20 and 60 would be very incompatible, and R(20, 60) = 0. While we shall not pursue the issue here, the possibility exists for using fuzzy modeling (Yager and Filev, 1994) to build these combinability functions. While the form of the combinability relation R is very problem-dependent, some problem independent properties can be associated with R. It is natural to require that R(u,u) = 1 for all u ∈ U. Furthermore, as the distance between u and v increases then R(u,v) should not get larger, that is if Distance(u, v) ≥ Distance(u, v') then R(u, v) ≤ R(u, v'). We also assume that R is symmetric, R(u, v) = R(v, u). It should be noted that this combinability relationship is in spirit close to the idea of a similarity relationship introduced by Zadeh (1971). On the real line, one may assume that R depends on Distance(u, v) = |u – v| only, and is a decreasing function of |u – v|. Hence there exists a fuzzy interval M, such that M(x) = M(–x) (symmetry of R) and M(0) = 1 (reflexivity of R) such that R(u, v) = M(u – v). M is thus a fuzzy number centered in 0. Such modelling of proximity relations by fuzzy intervals were proposed by Dubois and Prade(1988b). Especially, the image of a fuzzy interval via R is simply A ô R = A ⊕ M (where (R ô A)[v] = supu {min(A(u), R(u,v))}). One may alternatively use a relative measure of proximity that depends on min(u , v ) for positive numbers and get R ô A = A ⊗ M where ⊗ v u denotes the product of fuzzy numbers and M is a fuzzy number between 0 and 1, whose membership function is non decreasing and whose core contains 1 (Dubois and Prade, 1989a). While R has been defined as a binary relationship on U, it can naturally extended to act on n-tuples from U as follows R*(u1 , …, un ) = min ij R(ui , uj ).
(6.41)
Let us now turn to the problem of merging fuzzy subsets and see how one can use the combinability relation to generalize the estimation process; more details are given by Yager (1997), Yager and Kelman (1996). Assume A1 , A2 , …, An are a collection of fuzzy subsets which are the basis of the estimation. Referring to (6.40), for each v, B(v) indicates the possibility of v being the estimated value under F in the face of the observations. Introducing a combinability function R imposes the condition that the resulting merging process be one that only allows the aggregation of points that are combinable. This additional requirement leads to a
APPROXIMATE REASONING AND INFORMATION SYSTEMS
merged value that must satisfy (A1 , …, An ) and R*. Hence, in the situation in which there is a combinability function R, one must modify the estimation process and calculate the merged value F R as B(v) = FR(A1 , …, An )[v] = sup {min(min i Ai (ui ), R*(u 1 , …, un )): F(u1 ,…, un ) = v} where again B denotes the fuzzy subset of U corresponding to the merged value. In the above expression, the effect of the combinability relationship R is to essentially reduce the membership grades associated with tuples that are incompatible and only allow aggregation between compatible values. The scalar value β = height(B) = maxu B(u) will be an indication of the overall compatibility of this aggregation. When trying to merge fuzzy subsets which are far away from each other, it can be checked that β is low. Yager (1997) looked at the properties of this constrained approach to merging information for some notable combinability relationships; here let us summarize these results. In the case when that it is acceptable to merge any values, R = R + such that R + (u, v) = 1 for all u and v, we get F R(A1 , …, An ) = F(A1 , …, An ), the extension of estimation to fuzzy values described in the previous section. Thus the ordinary estimation function is a special case of the constrained merging when all aggregations are completely acceptable. We now consider the opposite extreme, where an element can only be merged with itself, i.e., R = R – such that R –(u, u) = 1 and R -(u, v) = 0 for u ≠ v. In this case it is shown that the aggregation reduces to the intersection of the arguments F R(A1 , …, An ) = A1 ∩ A2 … ∩ An . This can be easily seen by noticing that, in this case, F R computes the supremum of min(A1 (u1 ), …, An (un )) under the constraints F(u1 , …, un ) = z and u1 = u2 =… = un–1 = un and since F is a generalized mean, F(u, …, u) = u. These results show that constrained merging scans a range between conjunctive fusion and estimation. The more restrictive R the more precise the merged value. It is easy to see that for any combinability function R, R ⊂ R + and R – ⊂ R. From this it follows that if B is the merged value using a combinability relation R and an idempotent operation F then A1 ∩ A2 ∩ …∩ An ⊆ F R(A1 , …, An ) ⊆ F(A1 , …, An ).
(6.43)
Thus the unconstrained merging process always leads to the most imprecise fuzzy set while the totally constrained one leads to the most precise one. Techniques that can help in the calculations are required to implement these ideas. Some results along this line appear in Yager and Kelman (1996), and Dubois, Prade and Yager (1998). Say u, v are two ill-known quantities modelled by fuzzy numbers A and B, and R is a proximity relation (reflexive, symmetrical) on the real line,
MERGING FUZZY INFORMATION
acting as a combinability function. The problem is to compute an expression like C = A ⊕ R B where C(z) = supu,v {min(R(u,v), A(u), B(v)): z = u + v}. This problem is a typical one of addition of interactive fuzzy arithmetic (Dubois and Prade, 1981). In order to make the computation easy, a general result is first proved. As above, A ô R denoted the image of A via R based on the sup-min composition. The following properties are proved by Dubois et al. (1998): • A ⊕ R B ⊆ [A ∩ (B ô R)] ⊕ [B ∩ (A ô R)] where ⊕ is the standard fuzzy arithmetic operation. This result holds for operations other than ⊕, namely for any extended 2-place function. • If A = [a,a'] and B = [b,b'] are intervals and R is a crisp proximity relation {(u,v): |u – v| ≤ e} for some positive e, and if a > b' + e or b > a' + e, then A ⊕ R B is empty and the above inclusion holds with equality: A ⊕ R B = A ∩ (B ô R) ⊕ B ∩ (A ô R). • In the fuzzy case and for extended addition ⊕, if R(u,v) = M(u – v) for a symmetric fuzzy number with modal value 0, this equality also holds. Moreover, Height(A ⊕ R B) = Height(A ∩ (B ô R)) = Height(B ∩ (A ô R)). Since R ô A = A ⊕ M, when R(u,v) = M(u – v) for a symmetric fuzzy number with modal value 0, A ⊕ R B is easily computed as A ∩ (B ⊕ M) ⊕ B ∩ (A ⊕ M) via standard fuzzy arithmetic on possibly subnormal fuzzy numbers with the same height. Note that {A(u), R(u,v), B(v)} is a fuzzy constraint network (Dubois, Fargier and Prade, 1996), and the network {A ∩ (B ô R), R, B ∩ (A ô R)} is arcconsistent, albeit in the fuzzy sense (the tightest fuzzy constraints on u and v are attained). In particular, it does not hold that R ∩ (A × B) = A ∩ (B ô R) × B ∩ (A ô R), generally. The inclusion R ∩ (A × B) ⊂ A ∩ (B ô R) × B ∩ (A ô R) is strict. Example. Let us denote (a, b, c) a triangular fuzzy number with core b and support [a,c]. We use F(u,v) = (u + v) / 2 1. Fusion of two close estimates. A1 = (1, 6, 11); A2 = (4, 9, 14); see Figure 6.1. R is defined by M = (–5, 0, +5). So A1 ô R = (–4, 6, 16) and A2 ô R = (–1, 9, 19). Note that the intersection point of straight lines going respectively through points of coordinates (a,0), (b,1) and (c,0), (d,1) have coordinates (u*, µ) with c–a u* = bc – ad and µ = . b+c–a–d b+c–a–d
APPROXIMATE REASONING AND INFORMATION SYSTEMS
1 B
0.8
A1
A2
0.4
0
1
2.5
4 4.5
6
7.5
9
10.5 11
12.5
14
distance (m)
Figure 6.5. Constrained fusion of close fuzzy numbers
Hence breakpoint level cuts of A1 ∩ (A2 ô R ) are [1,11] (support); [3,9] (µ = .4); {7}(µ = .8) (core). Breakpoint level cuts of A2 ∩ (A1 ô R ) are [4,14] (support); [6,12] (µ = .4); {8}(µ = .8) (core). Breakpoint level cuts of the result are [2.5,12.5] (support); [4.5,10.5] (µ = .4); {7.5}(µ = .8)(core). See Figure 6.5.
1 A2
A3
B
0.16 0
4
9
14
17.5
21.5
25
30
35
distance (m)
Figure 6.6. Constrained fusion of remote fuzzy numbers
2. Fusion of two estimates far from each other. A2 = (4, 9, 35); see Figure 6.2. R' = (–15, 0, 15). Then A2 ô R' = (–11, 9, (10, 30, 50). Breakpoint level-cuts of A2 ∩ (A3 ô R') are {13.2} (µ = 0.16) (core). Breakpoint level-cuts of A3 ∩ (A2 (support; {25.8} (µ = 0.16) (core).
14); A3 = (25, 30, 29) and A3 ô R' = [10, 14] (support); ô R') are [25, 29]
The result is then obtained by computing [(A2 ∩ (A3 ô R')) ⊕ (A3 ∩ (A2 ô R'))] / 2, i.e., the triangular fuzzy set with support [17.5, 21.5] and core {19.5} for µ = 0.16. See Figure 6.6. Although the fuzzy numbers used in the example are symmetric, this is not at all required by the approach.
MERGING FUZZY INFORMATION
The n fuzzy number case is more tricky for n ≥ 3. Some results for the case n = 2 can be generalized to n fuzzy numbers by the following result which only provides an upper bound in the general case: ⊕ R{Ai : i = 1,n} ⊆ ∑i=1,n Ai ∩ (R* ô (˙j≠i Aj )) where R* is the n-ary relation: R*(u1 , …, un ) = min i≠j R(ui ,u j ) and ˙ denotes the fuzzy Cartesian product. In order to get a good upper approximation of ⊕ R{Ai , i=1,n} it is natural to make the constraint network {(Ai (ui ), Aj (uj ), R(ui , uj ) | i ≠ j = 1,n} arc-consistent. Computing A*i = Ai ∩ (Áj≠i Aj ⊕ M)) for all i is enough. Indeed, because R is reflexive only one propagation step is needed: Ai ⊂ Ai ô R = Aj ⊕ M ⊂ Ai ô R ô R = Aj ⊕ M ⊕ M, etc. Hence A*i ∩ (Áj≠i A*j ⊕ M)) = A*i . Then one should apply standard fuzzy arithmetic again. More details can be found in (Dubois et al. 1998). The above results remain valid for operations different from addition: for all continuous monotonic operations, hence all fusion operations as defined in previous sections, all proofs apply because the interval arithmetic property F([a,a'], [b,b']) = [F(a,b), F(a',b')] holds. 6.7
POSSIBILISTIC FUSION UNDER A PRIORI KNOWLEDGE
So far, we have not dealt with unsymmetric combination processes such as the revision of uncertain knowledge in the light of new information (Domotor, 1985, Dubois, Moral and Prade. 1998). This type of combination process always assumes that some a priori knowledge is available and that it is updated so as to minimize the change of belief while incorporating the new evidence. Bayes theorem leads to a revision rule of that kind. Dissymmetric and symmetric combination methods correspond to different problems and generally yield different results as briefly exemplified in the following in the possibilistic case. The Bayesian approach can be applied to update a prior information (obtained from a source not considered in the combination process), taking into account two new pieces of information and dealing with the two corresponding sources in a symmetric manner. A possibilistic counterpart to this combination scheme is also examined here. It generalizes possibilistic fusion to the case where some a priori information is available. 6.7.1
Revision with Fuzzy Inputs
The revison problem noticeably differs from the fusion problem. Revision is intrinsically unsymmetric in the sense that the information possessed by one of the two sources is considered as a reference and the information possessed by the other source is considered as an input. In the usual Bayesian setting, the former is the a priori probability P and the latter takes the form of a set-valued piece of information of the form x ∈ A. In this setting revision can be viewed as determining a new probability measure P' such that P'(A) = 1 and P' is as close as possible to the prior
APPROXIMATE REASONING AND INFORMATION SYSTEMS
P. Various justifications exist for choosing P'= P(· | A), the conditional probability (see Dubois Moral and Prade, 1998). When the input information is itself probabilistic,the input is supposed to be under the form of a partition {A1 , A2 , …, An } of U, and the probability attached to each Ai is α i . The result of the revision is then P'(B) = P(B | {(Ai ,α i )}i=1,n) = ∑i=1,n α i P(B | Ai ) (6.44) where ∑α i = 1 and P(Ai ) > 0 for each i. This is Jeffrey's rule, that considers the input information as a set of constraints of the form P'(Ai ) = α i (Jeffrey, 1965). In the possibilistic case an unsymmetric combination mode is obtained using conditional possibility when the input information is of the form x ∈ A. Let us denote by π' the resulting belief state after a change has occurred on π. Then π' = π(· | A) as given in Section 6.3.1. Note that the normalized conjunctive fusion mode (6.18) is a symmetric extension of the possibilitic conditioning rule based on product. To see it just assume that π 2 is the characteristic function of A in (6.18). Possibilistic conditioning can be extended to the case of uncertain inputs of the form of a proposition A with a certainty level α. The main question as in the probabilistic case is how to interpret such an uncertain input. Two interpretations of the unsymmetric fusion make sense according to whether the new information is viewed as a constraint or as some unreliable input (Dubois and Prade, 1997). Enforced Uncertainty. Assume (A, α) is taken as a constraint N'(A) = α that the new cognitive state must satisfy; it means that if π' is obtained by revising π with information (A, α), the resulting necessity measure N' must be such that N'(A) = α; this is in the spirit of Jeffrey's rule. Clearly absorbing the input (A, 1) should coincide with conditioning π' = π(·| A), while (A, 0) will enforce a loss of information about A. The uncertain input (A,α) is exprressed as a fuzzy set F defined by : F(u) = 1 if u ∈ A = 1 – α otherwise. Like for Jeffrey's rule, the input information (A,α) is described on the partition (A, Ac) of U, as the pair {(A, α), (Ac, 0)}, weights referring to necessity degrees. So, (A,α) is interpreted as forcing the resulting belief state to satisfy the constraint N'(A) = α, that is, when α > 0, equivalent to Π'(A) = 1 and Π'(Ac) = 1 – α, and the following belief change Jeffrey-like rule respects these constraints (Dubois and Prade 1992e) π(u | (A,α)) = π(u | A) if u ∈ A = (1 – α) ∗ π(u | Ac) if u ∈ Ac
MERGING FUZZY INFORMATION
where ∗ = min or product according to whether π(u | A) is the qualitative or Bayesian-like conditional possibility distribution. This rule is very similar to Jeffrey's rule for probabilities, changing the convex sum into a qualitative mixture (Dubois, Fodor et al., 1996) max(a ∗ λ, b ∗ µ), with max(λ, µ) = 1. Namely, the fusion rule writes: π(u | (A,α)) = max(π(u | A), (1 – α) ∗ π(u | A c)). Note that when α = 1, π(u | (A,α)) = π(u | A), but when α = 0, we obtain a possibility distribution less specific than π, such that N(A) = N(Ac) = 0. When α > 0, π(u | (A,α)) exactly coincides with what Williams (1994) calls an "adjustment": the most plausible worlds in A become fully plausible, the most plausible situations in Ac are forced to level 1 – α and all situations that were originally more plausible than 1-α, if any are forced to level 1 – α as well. For ∗ = product we obtain π(u | (A,α)) = π(u) if u ∈ A Π(A) = (1 − α) cπ(u) if u ∉ A. Π(A ) This operation minimizes changes to the possibility levels of situations so as to accommodate the constraint N'(A) = α. More generally, any belief change operation that accommodates the input in such a way as Π'(A) = 1 and Π'(Ac) = 1 – α, is called a transmutation by Williams (1994) who discusses such change operations in terms of so-called ordinal epistemic entrenchments which are necessity measures valued on ordinals. The above unsymmetric fusion under uncertain inputs is naturally extended to a set of input constraints Π(Ai ) = λ i , i = 1,n, where {Ai , i = 1,n} forms a partition of U, such that maxi=1,n λ i = 1 (normalisation). It gives the following Jeffrey-like rule: π(u | {(Ai ,λ i )}) = maxi:u∈A λ i ∗ π(u | Ai ) (6.45) i where ∗ = minimum or product according to whether π(u | Ai ) is qualitative or numerical. In the limit case when Ai = {ui }, ∀i, the input is equivalent to a fuzzy input F with F(ui ) = λ i . And the above unsymmetric fusion rule reduces to a simple substitution of π by F, just as for Jeffrey's rule in the probabilistic setting. This unsymmetric fusion rule has been proposed in another setting by Spohn (1988) who uses the integers as a possiblity scale rather than [0,1], where integers are viewed as levels of impossibility. More detailed comments on this unsymmetric fusion rule and its links with the literature on belief change appear in (Dubois, Moral and Prade, 1998).
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Unreliable Input. (A, α) is now interpreted as an extra piece of information that may be useful or not for enhancing the current information; in that case α is viewed as a degree of strength or priority of information A, and in some cases this input information can be discarded. In contrast with the previous fusion mode where information can be lost, the present approach can never lead to forget information, since if α is too low, the input information will be discarded, as not informative enough. The unreliable input (A,α) is again expressed as the fuzzy set F defined above. However instead of viewing F as a prioritized partition, it is represented in terms of its level-cuts. The fuzzy set F can be regarded as the weighted nested pair of subsets {(A,1), (U, 1 – α)} with α > 0, where the weight 1 – α denotes the degree of possibility that the information is vacuous (F = U). The weight α should be viewed as a degree of priority of input A, and reflects the willingness of the agent to accept it. Note that this view of the unreliable input differs from the one in the probabilistic setting, where the idea of a degree of priority does not seem easy to capture. More generally, letting F λ = {u | F(u) ≥ λ} for any λ ∈ V, each F λ is viewed as the (non-fuzzy) actual input information underlying F, with plausibility λ. Then F is equivalent to the nested sequence of sets F 1 , F 2 , …, F n corresponding to levels of possibility λ 1 , λ 2 , …, λ n . The revised information after absorbing input F, denoted π(· || F) is defined by Dubois and Prade (1991), again by formal analogy with Jeffrey's rule, as π(u || F) = maxi=1,n λi ∗ π(u | F i )
(6.46)
where the convex mixture is changed into the prioritized maximum and ∗ is min or product again. The term λ i ∗ π(u | F i ) achieves a truncation of the conditional possibility so as to prevent degrees of possibility from raising over λ i . The maximum operation is a disjunction that expresses the various possible understandings of the uncertain input. When F = {(A,1), (U, 1 – α)}, it gives, for α>0 π(u || F) = π(u | A) if u ∈ A = π(u) ∗ (1 – α) if u ∈ Ac. When α = 0, the formula does not apply since, in this case, F = U and π(u | F) = π(u). Notice the difference with the fusion operator (6.45): here, no conditioning arises if u ∈ Ac. Moreover π(u || F) ≤ F(u) = max(A(u), 1 – α) that is, N(A || F) ≥ α where N(· || F) is the necessity function based on π(· || F). However, contrary to (6.45), the equality N(A || F) = α is not warranted since N(A || F) = N(A) whenever N(A) > α. Lastly, if F = U, the input is completely vacuous, and it is not taken into account. This behavior is very different from the case when the uncertain input is taken as a constraint. To conclude, the belief change rules (6.45) and (6.46) are formally analogous to Jeffrey's rule. However, in the constraint case (6.45) the sets Ai form a partition,
MERGING FUZZY INFORMATION
while in the case of an ill-informed input (6.19) the sets F i are nested. Only the first approach is coherent with Jeffrey's rule. The behavior of (6.46) is more opportunistic and accepts only informative inputs. Possibilistic conditioning rules can be used, in the case of an evolving system when events are dated and pieces of information arrive in sequence. Namely, one may possess a prediction function f such that f(ut ) = ut+1 , where ut is the state at time t and f(ut ) is the resulting state at time t + 1. Knowing the possibibility distribution π t on the system state at time t, the prevision (forecast distribution) at t + 1 is given in u by Π t (f–1(u)). Supposing that the possibly fuzzy input information F refers to time t+1, the updated information at time t + 1 could be computed using possibilistic conditioning as π t+1 (u) = Π t (f–1(u) | A). This type of updating, decomposed into a prediction step followed by a revision step is at the basis of well-known updating techniques such as the Kalman filtering in systems engineering (Bar-Shalom and Fortmann, 1988). This kind of possibilistic fusion has been used by Delplanque et al. (1997) in a problem of underwater robotics. Nifle and Reynaud (1997) also proposed a possibilistic counterpart of Kalman filtering in a problem of recognition of fuzzily described temporal scenarios. Compatibility-Driven Weighted Average. Another unsymmetric fusion mode that distinguishes between a priori and new information is proposed by Lotan (1997), inspired by the prioritized conjunctive and disjunctive fusion modes of Section 6.5.2. Denoting again π the a priori possibility distribution on a numerical scale, and F the fuzzy input information, Lotan studies the following extended weighted average operator that updates π in the face of F: U(π, F) = comp(π, F)π ⊕ (1– comp(π, F))F where comp(π, F) is a degree of compatibility between the a priori information and the input information. For instance one may take comp(π, F) = h(π, F), the height of the fuzzy intersection, also equal to Π(F) the possibility of the fuzzy input F. The following properties can be observed: • If π and F are fully consistent (Π(F) = 1) then U(π, F) = π, that is the input is not used. • If π and F are fully inconsistent (Π(F) = 0) then U(π, F) = F, that is the prior knowledge is erased. • If Π(F) = 1/2, then U(π, F) is the fuzzy arithmetic mean of the a priori knowledge and the input information. Clearly this behavior is debatable since when Π(F) = 1, one might expect that the input information may refine the prior knowledge. A more natural condition might be that if F is less specific than π then U(π, F) = π. It leads to let comp(π, F) be a degree of inclusion of π in F, for instance N(F) = 1 – Π (Fc), where F c = 1 – F. Lotan(1997) envisages such a compatibility index along with some others.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
However U(π, F) = F as soon as N(F) = 0, which means that the prior information is erased quite often (as soon as a value considered fully plausible by the input is considered impossible a priori). It seems that choosing comp(π, F) as the area-based ∫ min(π(u), F(u))du inclusion index I(π ¡ F) = U (Dubois and Prade, 1980; Kosko, ∫ π(u)du U
1986b) leads to a fusion operation U(π, F) with a more sensible behavior, since U(π, F) = π if F does not bring anything new and U(π, F) = F if the input totally contradicts the prior information. A systematic study of this kind of unsymmetric fusion remains to be done so as to better understand what fusion modes it captures. 6.7.2
Multisource Fusion under a Priori Knowledge
Even in the case when precise prior probabilities on the value of the parameter of interest are not available, a priori information may be supplied in terms of upper probability functions that can be approximated by possibility distributions. Thus it may be interesting to apply a Bayesian-like approach in fusion problems with uncertainty representations other than probabilities, such as possibility measures. It is a combination of fusion and revision. Let the conditional possibility distribution π(u1 |u) be known. It represents the (causal) possibility of observing xm = u1 when the actual value is x = u. Conditional possibility measures obey a counterpart to Bayes theorem as described by equation (6.15). Thus the counterpart of Bayes theorem is: π(u1 , u2 |u) * π x (u) = π x (u |u1 , u 2 ) * π(u1, u2 ). The quantity π(u1 , u2 ) can be computed from the a priori knowledge about x represented by the possibility distribution π x on U: π(u1 , u2 ) = supu∈U π(u1 , u2 |u) * π x (u). If the observations u1 , u2 are non-interactive (i.e. no assumption about the lack or the presence of a relation linking them is made), the following decomposability property holds: π(u1 , u2 |u) = min(π(u1 |u), π(u2 |u)). Thus it holds π x (u |u1 , u 2 ) * π(u1 , u2 ) = π x (u) * min(π(u1 |u), π(u2 |u)) where * stands for the min operation or the product. Assume that operation * is the product. Then we get
MERGING FUZZY INFORMATION
π x (u) ∗ min(π(u1 | u),π(u2 | u)) π x (u |u1 , u2 ) = supu'∈ U π x (u) ∗ min(π(u1 | u),π(u2 | u))
(6.47)
This is the possibilistic counterpart to the Bayesian conjunctive fusion operator (6.3), under a condidional independence assumption. If there is no a priori knowledge, the a priori possibility distribution is vacuous, i.e., π x (u) = 1, ∀ u ∈ U. Under the postulate of observation relevance, supu π(ui |u) = 1 holds for i = 1, 2. However it does not prevent the case when π(u1 , u2 ) = supu min (π(u1 |u), π(u2 |u)) < 1, in the case of conflicting sources. Then (6.47) simplifies into: min(π(u1 | u), π(u2 | u)) π x (u |u1 , u2 ) = supu'∈ U min(π(u1 | u), π(u2 | u)) Thus, the normalized version of the possibilistic conjunctive combination operator can be viewed as the counterpart of the Bayesian pooling formula (6.3) in the case of vacuous a priori information. Assuming a decomposability property based on the product rather than on minimum (which corresponds to an assumption of independent measurements) would lead to use the product instead of min in (6.47). The fusion rule (6.47) with the product instead of min was first proposed by Smets (1982) in terms of likelihoods. More generally the decomposition π(u1 , u2 |u) = min(π(u1 |u), π(u2 |u)) can be changed into any possibilistic combination rule, including the disjunctive and the adaptive rules. A much more general setting for using fuzzy information in fusion processes involving a priori knowledge has been developed by Mahler (1995). This author exploits an extension of Dempster's rule of combination to the integration of a priori knowledge. He proposes a general form of representation of imperfect information under the form of random fuzzy sets. The random fuzzy sets stemming from several sources are fused with prior information, thus generalizing both Dempster rule of combination and Bayesian fusion to the case of random fuzzy observations. 6.8
SYNTACTIC COMBINATION OF LOGICAL DATABASES IN POSSIBILISTIC LOGIC
So far the combination techniques presented sound natural when combining expert opinions, sensor measurements, and pieces of information or the like. However we claimed at the beginning that fusion is also a relevant issue for databases. In the following, we show that deductive databases are amenable to the same approaches as above. In order to bridge the gap with the fusion of possibility distributions, one must explain to what extent a logical (deductive) database can be viewed as a
APPROXIMATE REASONING AND INFORMATION SYSTEMS
possibility distribution. In fact we shall consider prioritized databases, where priorities induce preferences among interpretations. Then one must explain how to implement the possibilistic fusion modes at the syntactic level. This section is based on recent works by Benferhat et al. (1997b). 6.8.1
From Prioritized Logical Databases to Possibility Distributions
A prioritized logical database is a set of logical formulas equipped with a complete preordering sructure. To each formula φ i , a priority level ai in a totally ordered set L is assigned. This level is viewed as a degree of certainty or priority, and is modelled as a lower bound on the degree of necessity N(φ i ). Such a logical database is thus made of a finite set of weighted formulas ∑ = {(φ i α i ), i = 1,n} which is called a possibilistic database. Inference and consequencehood in possibilistic logic are explained at length in (Dubois et al., 1994a,b); see also Chapter 2 of this volume, by Novak (1999). Here we focus on the possibilistic semantics of a prioritized logical database. In the following the referential set U is the set of interpretations of the language in which the prioritized logical database is implemented, and u . φ denotes the statement that φ is true in interpretation u . A possibilistic database made of one formula {(φ α)} is represented by the possibility distribution: ∀ u ∈ U, π {(φ α)}(u) = 1 if u . φ = 1 – α otherwise. where 1 – α is a notation for the order-reversing map on L which can be taken as the unit interval for simplicity. Clearly the higher the priority or the certainty of φ, the less possible are interpretations where φ is false. More formally the constraint N(φ) ≥ α is equivalent to 1 – max u{π(u) : u.¬φ} ≥ α . Of course, in general, there are several possibility distributions compatible with this constraint. The natural way to select one possibility distribution is to use the minimum specificity principle that allocates the greatest possibility degrees in agreement with the constraints N(φ) ≥ α, and thus does not restrict the levels of plausibility of the possible worlds more than it is necessary. This is precisely π {(φ α)}. The minimum specificity principle w.r.t. the constraints induced by ∑ generates a possibility distribution on interpretations, say π ∑, the least specific possibility distribution which satisfies the set of constraints N(φ i ) ≥ α i , i=1,n. This possibility distribution always exists and is defined by (e.g., Dubois et al., 1994a,b)
MERGING FUZZY INFORMATION
∀ u ∈ U, π ∑(u) = mini=1,n {1 – α i , u . ¬φ i }. Thus, π ∑ can be viewed as the result of the conjunctive fusion of the π {(φ i α ι )}’s using the min operator, that is, a fuzzy intersection. The possibility distribution π ∑ is not necessarily normal, and Inc(∑) = 1 – maxu∈U π ∑(u) is called the degree of inconsistency of the logical database ∑. Lastly, note that several syntactically different possibilistic databases may have the same possibility distribution as a semantic counterpart. In such a case, it can be shown that these logical databases are equivalent in the following sense: their αcuts, which are classical logical databases, are logically equivalent in the usual sense, where the α-cut of a possibilistic database ∑ is the set of classical formulas whose level of certainty is greater than or equal to α. 6.8.2
Syntactic Combination Modes
In the previous sections, several fusion modes have been described for possibility distributions. This section is concerned with the combination of n possibilistic databases ∑i provided by n sources. Each possibilistic database ∑i is associated with a possibility distribution π i which is its semantic counterpart. Here we describe the syntactic encoding of these fusion modes. More formally, given a semantic combination rule C, we look for a syntactic combination C such that: C(π ∑1, …,π ∑n) = π C(∑1,…∑n) where ∑C=C(∑1 ,…∑n ) is the result of merging ∑1 ,…∑n . It is assumed here that the results of the combination by C of ∑i 's which are semantically equivalent, are still semantically equivalent. Before introducing the syntactic counterpart of a fusion mode, let us first define two parameterized functions which operate on a logical database ∑: – Truncate(∑, α), which basically consists in removing formulas from the possibilistic database ∑ having certainty degrees strictly less than α, and – Discount(∑, α) which basically consists in decreasing to level α the certainty degree of the formulas of ∑ whose certainty is higher than α. More formally, there hold: Truncate (∑, α)={(φ β) | (φ β) ∈ ∑ and β ≥ α}∪{(⊥ α)} Discount (∑, α)={(φ α) | (φ β) ∈ ∑ and β ≥ α}∪{(φ β) | (φ β)∈∑ and β < α}.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
The following result (Benferhat et al.,1997b) lays bare the semantical counterparts of these two functions. Let ∑ be a possibilistic database, and π its associated possibility distribution. Then: – Truncate (∑, α) is associated with π'=min(π, 1 – α), – Discount (∑, α) is associated with π'=max(π, 1 – α). For the sake of simplicity, we only consider the case of two possibility distributions, in the following. The min-based conjunction mode leads simply to take the union of ∑1 and ∑2 at the syntactic level, namely, if C = min, ∑min = C(∑1 , ∑2 ) = ∑1 ∪ ∑2 . It corresponds to the adventurous attitude (i.e., to the fuzzy set intersection at the distribution level, and to a union for the possibilistic databases). Of course, C(∑1 , ∑2 ) may be inconsistent and the handling of inconsistency is simply achieved by using the possibilistic entailment; see (Dubois et al., 1994a,b). Namely only conclusions induced by formulas whose priority is over the inconsistency level are considered. Note that Inc(∑) = 1 – h(π ∑), in general, denoting h(π ∑)=maxu π ∑(u) the height (consistency level) of π ∑. The syntactic cautious fusion mode corresponds to find a possibilistic database ∑max associated with max(π ∑1, π ∑2). We first consider the case of two
possibilistic databases ∑1 ={(φ α)} and ∑2 ={(ψ β)} which exactly contain one formula. Then it is easy to check that ∑max is defined by ∑max = {(φ∨ψ min(α, β))}. Assume now that the ∑i 's are general sets of possibilistic formulas, namely ∑1 ={(φ i α i ) | i∈I} and ∑2 ={(ψj βj ) | j∈J}. Then it can be shown that: ∑max = {(φ i ∨ ψ j min(α i , βj )) | (φ i α i ) ∈ ∑1 and (ψj βj ) ∈ ∑2 }. Note that ∑max is always consistent (provided that one of ∑1 or ∑2 is consistent). The fuzzy set of possibilistic consequences of ∑max can also be viewed as the fuzzy intersection of the fuzzy set of possibilistic consequences of ∑1 and the fuzzy set of those of ∑2 . The result of this fuzzy intersection of deductively closed fuzzy sets of formulas (in the sense of Biacino and Gerla (1999), chapter 3 of this book) is deductively closed in the sense of possibilistic logic. This result extends the classical result that the intersection of deductively closed sets of formulas is deductively closed. It is also due to the closure property of the disjunctive fusion
MERGING FUZZY INFORMATION
rule (6.17) with respect to possibility measures (see Section 6.4.2). It expresses that what can be "logically" inferred from a set of distinct logical databases, is only the common part which can be inferred from each of them, taking into account the relative reliability of formulas in each base (Dubois et al., 1992). Indeed x ∈ E1 ∪ E2 is a valid conclusion which can be both derived from a source asserting x ∈ E1 and from a source asserting x ∈ E2 (E1 ∪ E2 is even the smallest set which contains both E1 and E2 ). In the framework of substructural logics, Boldrin (1995) (see also Boldrin and Sossai, 1995) has proposed an extension of possibilistic logic where a second "and" connective, based on the linear product (Lukasiewicz t-norm), is introduced for combining information from distinct independent sources. At the syntactic level this new conjunction applied to two one-formula logical databases ∑1 = {(φ α)} and ∑2 = {(ψ β)} results in three possibilistic formulas: ∑LP = {(¬φ ∨ ψ β), (φ ∨ ¬ψ α), (φ ∨ ψ min(1, α + β))} (semantically equivalent to ∑1 ∪ ∑2 ∪ {(φ ∨ ψ min(1, α + β))} which expresses the adventurous conjunctive fusion mode explained in terms of sources making a limited number of lies (section 6.4.1). This result can be generalized to the case of general possibilistic databases and to any t-norms and tconorms. Let ∑1 ={(φ i α i ) | i∈I} and ∑2 ={(ψj βj ) | j∈J}. Namely, the following result holds: Let π tn and π ct be the result of the fusion based on the t-norm * and the t-conorm ⊥, where * is the t-norm dual to the t-conorm ⊥ (α ⊥ β = 1 − (1 − α) ∗ (1 − β)). Then, π tn and π ct are respectively associated with the following logical databases: ∑tn = ∑1 ∪ ∑2 ∪ {(φ i ∨ψj αi ⊥βj ) : (φ i α i )∈∑1 and (ψj βj )∈∑2 }(conjunctive mode) ∑ct = {(φ i ∨ψj αi * βj ) : (φ i α i )∈∑1 and (ψj βj )∈∑2 } (disjunctive mode) The effect of the normalization can be captured at the syntactical level. Let π be a sub-normalized possibility distribution obtained by combining π 1 and π 2 using a conjunction operator C. Let ∑ be the possibilistic database associated with π and built via syntactic fusion. Let h(π )=maxuπ(u). Then : • π '= π is associated with h(π)
∑' = {(φ i 1 – 1 – α i ) : (φ i α i ) ∈ ∑ and α i > 1 – h(π)} h(π)
• π'(u) = 1 if π(u) = h(π ) = π(u) otherwise
APPROXIMATE REASONING AND INFORMATION SYSTEMS
is associated with: ∑'={(φ i αi ) | (φ i αi )∈∑ and α i > 1–h(π)}. • π' = π + 1–h(π) is associated with: ∑' = {(φ i αi – (1–h(π ))) | (φ i αi )∈∑ and α i > 1–h(π )}. All the normalization procedures maintain all the formulas of ∑ whose certainty degrees are higher than the inconsistency degree. The qualitative normalization just forgets the presence of conflicts and does not modify the certainty degrees of the pieces of information encoded by π. The two other normalization procedures modify the certainty degrees of the formulas retained in ∑. Let us observe that all the normalization modes diminish the certainty levels of formulas in the wide sense. This diminution is more important with the third method than with the first one. The corresponding syntactic counterparts of other (quantified, weighted, adaptive) fusion modes are immediate using the above results. 6.9
CONCLUSION
The possibilistic framework proposed in this paper for data fusion makes sense when it seems difficult to represent the information supplied by the sources by means of probability distributions, due to imprecision and/or lack of statistical evidence. Regarding the conjunctive and disjunctive pooling modes that possibility theory provides, they are particularly suitable when sources are heterogeneous, i.e., cannot be viewed as instances of a single random source and supply information under several formats, such as numerical and linguistic. All of them, if they pertain to numerical scale can fit in the fuzzy/possibilistic representation framework. The proposed fusion rules are optimistic in the sense that they always try to assume that as many sources as possible are reliable, as usual conjunctive operators do. However when the assumption that all sources tell the truth is not tenable, it is possible to express that a certain proportion of the sources are faithful, without being forced to point them out. The disjunctive operator, for instance, systematically assumes that at least one source is right. This optimistic prejudice in the fusion modes tries to extract the most useful part of the available information, by remaining as informative as possible, while avoiding the systematic use of a conjunctive operator. An adaptive operator has been proposed, and extended to multiple sources, in order to achieve a balance between the conjunctive and the disjunctive fusion modes, the trade-off being driven by the amount of consistency between the sources. The possibilistic setting looks more versatile than the probabilistic setting for the expression of various fusion modes. Possibilistic data fusion is also much related to logical approaches which cope with inconsistency in multiple source databases using notions of maximal consistent subbases (see Baral et al., 1992; Benferhat et al., 1997a; Dubois, Lang and Prade, 1992; Cholvy, 1993, 1998 among others). For instance, the adaptive operator of Section 6.5.2 involves maximal subsets of coherent sources. Moreover this chapter has not discussed the application
MERGING FUZZY INFORMATION
of approximate reasoning techniques in data fusion. Several authors did use fuzzy rule-based systems to specify the behavior of a fusion machinery, for instance, Fukuda et al. (1993) for machining robots, and Mauris et al. (1997) in color sensing. Bouchon-Meunier (1997) offers a collection of papers showing the use of fuzzy set theory in various aggregation problems including data fusion. The fuzzy and possibilistic approaches have been applied in several fields where merging information is needed: -) nuclear reliability engineering for expert opinion pooling (Sandri, 1991; Sandri et al.,1995); -) medical image processing (Bloch et al., 1997 Bezdek and Sutton, 1999), face recognition (Arbuckle et al., 1995), and remote sensing (Roux and Desachy, 1997); see also the companion volume of the Handbook of Fuzzy Sets by Bezdek et al.(1999), Section 4.9, for a survey of fuzzy datafusion techniques in pattern recognition and image processing. -) recognition of the temporal behavior of missiles (Nifle and Reynaud, 1997); -) in robotics, for the purpose of building the map of an environment via exploration (Poloni,1994 ; Poloni et al. 1995; Lopez-Sanchez et al. 1997). References Abidi M. A. and Gonzalez R. C., eds. (1992). Data Fusion in Robotics and Machine Intelligence Academic Press, New York. Arbuckle T. D., Lange E., Iwamoto T., Otsu N. and Kyuma K. (1995). Fuzzy information fusion in a face regognition system, Int. J. Uncertainty, Fuzziness and Knowledge-based Systems, 3, 217-246. Baral C., Kraus S., Minker J. and Subrahmanian (1992). Combining knowledge bases consisting in first order theories, Computational Intelligence, 8(1), 45-71. Bardossy A., Duckstein L. and Bogardi I. (1993). Combination of fuzzy numbers representing expert opinions, Fuzzy Sets and Systems, 57, 173-181. Bar-Shalom Y. and Fortmann T. E. (1988). Tracking and Data Association, Academic Press, New York. Benferhat S., Dubois D. and Prade H. (1997a). Some syntactic approaches to the handling of inconsistent knowledge bases — Part 1: The flat case, Studia Logica, 58, 1997, 17-45. Benferhat S., Dubois D. and Prade H. (1997b). From semantic to syntactic approaches to information combination in possibilistic logic, Aggregation and Fusion of Imperfect Information (Bouchon-Meunier B., Ed.), Physica-Verlag, Heidelberg, Germany, 141-161. Benferhat S., Dubois D. and Prade H. (1997c). Nonmonotonic reasoning, conditional objects and possibility theory, Artificial Intelligence, 92(1/2), 259276.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Bezdek J., Keller J., Krishnapuram R. and Pal N. R. (1999). Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, The Handbooks of Fuzzy Sets Series, Kluwer Academic Publishers, Dordrecht, The Netherlands. Bezdek J. and Sutton M. A. (1999). Image processing in medicine, Applications of Fuzzy Sets (Zimmermann H.-J., Ed.), The Handbooks of Fuzzy Sets Series, Kluwer Academic Publishers, Dordrecht, The Netherlands. Biacino L.Gerla G.(1999). Closure operators in fuzzy set theory. This volume, chapter 3. Bloch I. (1996). Information combination operation for data fusion: A comparative review with classification, IEEE Trans. on Systems, Man and Cybernetics, 26, 52-67. Bloch I. and Maitre H. (1997). Fusion of image information under imprecision, Aggregation and Fusion of Imperfect Information (Bouchon-Meunier B., Ed.) Physica-Verlag, Heidelberg, Germany, 141-161. Bloch I., Sureda F., Pellot C. and Herment A. (1997). Fuzzy modelling and fuzzy mathematical morphology applied to 3D reconstruction of blood vessels by multimodality data fusion, Fuzzy Information Engineering (Dubois D., Prade H. and Yager R.R., eds.), John Wiley, New York, 93-110. Boldrin L. (1995). A substructural connective for possibilistic logic, Symbolic and Quantitative Approaches to Reasoning and Uncertainty (Proc. of Europ. Conf. ECSQARU’95) (Froidevaux C. and Kohlas J., eds.), Springer Verlag, Fribourg, 60-68. Boldrin L. and Sossai C. (1995). An algebraic semantics for possibilistic logic, Proc of the 11th Conf. Uncertainty in Artifucial Intelligence (Besnard P. and Hank S., eds.) Morgan Kaufmann, San Francisco, CA, 27-35. Bouchon-Meunier B., Ed. (1997). Aggregation and Fusion of Imperfect Information, Physica-Verlag, Heidelberg, Germany. Bouchon-Meunier, B. Dubois, D. Godo L. and Prade H. (1999) Fuzzy sets and possibility theory in approximate and plausible reasoning. Chapter 1, this volume. Buchanan B. G. and Shortliffe E. H. (1984). Rule-Based Expert Systems — The MYCIN Experiments of the Stanford Heuristic Programming Project, AddisonWesley, Reading. Brown C., Durrant-Whyte H., Rao L. B. and Steer B. (1992). Distributed data fusion using Kalman filtering: a robotics application, Data Fusion in Robotics and Machine Intelligence (Abidi M.A. and Gonzalez R.C., eds.). Academic Press, New York, 267-310. Cheng Y. Z. and Kashyap R. L. (1989). A study of associative evidential reasoning, IEEE Trans. on Pattern Analysis and Machine Intelligence, 11(6), 623-631. Cholvy L. (1993). Proving theorems in a multi-source environment, Proc. of the 13th Inter. Joint Conf. on Artificial Intelligence (IJCAI'93), Chambéry, France, Aug. 28-Sept. 3, 66-71. Cholvy L. (1998). Reasoning about merged information, Belief Change (Dubois D. and Prade H., Eds), Vol. 3 in the Handbook of Defeasible Reasoning and Uncertainty Management Systems, Kluwer Academic Publishers, Dordrecht, The Netherlands, 233-265.
MERGING FUZZY INFORMATION
Cooke R. M. (1988). Uncertainty in risk assessment: A probabilist's manifesto, Reliability Engineering and Systems Safety, 23, 277-283. Cooke R. M. (1991). Experts in Uncertainty, Oxford University Press, Oxford, UK. Coolen F. P. A. (1994). Statistical Modeling of Expert Opinions Using Imprecise Probabilities, PhD Thesis, Eindhoven University of Technology, Eindhoven, The Netherlands. Czogala E. and Hirota K. (1986). Probabilistic Sets: Fuzzy and Stochastic Approach to Decision, Control and Recognition Processes, ISR 91, Verlag TÜV Rheinland, Köln, Germany. De Cooman G. (1997). Possibility theory — Part I: Measure- and integraltheoretics groundwork; Part II: Conditional possibility; Part III: Possibilistic independence, Int. J. of General Systems, 25(4), 291-371. De Kleer J. (1986). An assumption-based TMS, Artificial Intelligence, 28, 127162. Delmotte F. and Borne P. (1998). Modeling of reliability with possibility theory, IEEE Trans. On Systems Man & Cybern.A, 28, 78-88. Delplanque M., Desodt-Jolly A. M., Jolly D. and Jamin J. (1997). Fusion dissymétrique d'informations incomplètes pour la classification d'objets sousmarins, Traitement du Signal, 14, 511-522. Deveughèle S. and Dubuisson B. (1993). Using possibility theory in perception: An application in artificial vision, Proc. of the 2nd IEEE Inter. Conf. on Fuzzy Systems (FUZZ-IEEE'93), San Francisco, CA, March 28-April 1st, 821-826. Deveughèle S. and Dubuisson B. (1995). Adaptive aggregation: decomposing before combining, Proc. of the 4th IEEE Inter. Conf. on Fuzzy Systems (FUZZIEEE'95), Yokohama, Japan, 1589-1596. Domotor Z. (1985). Probability kinematics, conditionals and entropy principle, Synthèse , 63, 75-114. Dubois D. (1986). Generalized probabilistic independence, and its implications for utility, Operations Research Letters, 5, 255-260. Dubois D., Fargier H. and Prade H. (1996) Possibility theory in constraint satisfaction problems: Handling priority, preference and uncertainty, Applied Intelligence, 6, 287-309. Dubois D., Fodor J., Prade H. and Roubens M. (1996). Aggregation of decomposable measures with application to utility theory, Theory and Decision, 41, 59-95. Dubois D. and Kalfsbeek H. W. (1990). Elicitation, assessment and pooling of expert judgments using possibility theory, Proc. of the 8th Inter. Congress of Cybernetics and Systems (Manikopoulos C.N., Ed.), New York, June 11-15, 360-367. Published by New Jersey Institute of Technology Press. Dubois, D., Kerre E., Mesiar R. and Prade H. (1999). Fuzzy intervals, Fundamentals of Fuzzy Sets, The Handbook of Fuzzy Sets Series, Kluwer Academic Publ., Dordrecht, to appear. Dubois D., Nguyen H.T. and Prade H. (1999). Possibility theory, probability and fuzzy sets: Misunderstandings, bridges and gaps. Fundamentals of Fuzzy Sets, The Handbook of Fuzzy Sets Series, Kluwer Academic Publ., Dordrecht, to appear.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Dubois D. and Koning J.L. (1991). Social choice axioms for fuzzy set aggregation, Fuzzy Sets and Systems, 49(3), 257-274. Dubois D., Lang J. and Prade H. (1992). Dealing with multi-source information in possibilistic logic, Proc. of the 10th Europ. Conf. on Artificial Intelligence (ECAI'92) (B. Neumann, Ed.), Vienna, Austria, Aug. 3-7, 1992, 38-42. Wiley, New York. Dubois D., Lang J. and Prade H. (1994a). Automated reasoning using possibilistic logic: Semantics, belief revision and variable certainty weights, IEEE Trans. on Knowledge and Data Engineering, 6(1), 64-71. Dubois D., Lang J. and Prade H. (1994b). Possibilistic logic, Handbook of Logic in Artificial Intelligence and Logic Programming — Vol 3: Nonmonotonic Reasoning and Uncertain Reasoning (Gabbay D.M., Hogger C.J., Robinson J.A. and Nute D., eds.), Oxford Univ. Press, 439-513. Dubois D., Moral S. and Prade H. (1997). A semantics for possibility theory based on likelihoods, J. of Mathematical Analysis and Applications, 205, 359-380. Dubois D. Moral S. and Prade H. (1998). Belief change rules in ordinal and numerical uncertainty theories, Belief Change (Dubois D. and Prade H., Eds), Vol. 3 in the Handbook of Defeasible Reasoning and Uncertainty Management Systems, Kluwer Academic Publishers, Dordrecht, The Netherlands, 311-392. Dubois D. and Prade H. (1980). Fuzzy Sets and Systems — Theory and Applications, Academic Press, New York. Dubois D. and Prade H. (1981). Additions of interactive fuzzy numbers, IEEE Trans. on Automatic Control, 26, 926-936 Dubois D. and Prade H. (1985). A review of fuzzy set aggregation connectives, Information Sciences, 36, pp. 85-121. Dubois D. and Prade H. (1986). Weighted minimum and maximum operations in fuzzy set theory, Information Sciences, 39, 205-210. Dubois D. and Prade H. (1987a). Fuzzy Numbers: An Overview, Analysis of Fuzzy Information, Vol. I (Bezdek J., Ed.), CRC Press, Boca Raton, FL, 3-39. Dubois D. and Prade H. (1987b). The mean value of a fuzzy number, Fuzzy Sets and Systems, 24, 279-300. Dubois D. and Prade H. (1987c). Une approche ensembliste de la combination d'informations incertaines, Revue d'Intelligence Artificielle, 1(4), 23-42. Dubois D. and Prade H. (1988a). Representation and combination of uncertainty with belief functions and possibility measures, Computational Intelligence, 4, 244-264. Dubois D. and Prade H. (1988b). Possibility Theory – An Approach to the Computerized Processing of Uncertainty, Plenum Press, New York. Dubois D. and Prade H. (1988c). Default reasoning and possibility theory, Artificial Intelligence, 35, 243-257. Dubois D. and Prade H. (1988d). On the combination of uncertain or imprecise pieces of information in rule-based systems — A discussion in the framework of possibility theory, Int. J. of Approximate Reasoning, 2, 65-87. Dubois D. and Prade H. (1989a), Order-of-magnitude reasoning with fuzzy relations, Revue d'Intelligence Artificielle, 3, 69-94. Dubois D. and Prade H. (1989b). Fuzzy sets, probability and measurement, Europ. J. of Operations Research, 40, 135-154.
MERGING FUZZY INFORMATION
Dubois D. and Prade H. (1990). Aggregation of possibility measures, Multiperson Decision Making using Fuzzy Sets and Possibility Theory (Kacprzyk J. and Fedrizzi M., eds.), Kluwer, Dordrecht, 55-63. Dubois D. and Prade H. (1991). Updating with belief functions, ordinal conditional functions and possibility measures, Uncertainty in Artificial Intelligence 6 (Bonissone P.P., Henrion M., Kanal L.N. Lemmer J.F., eds.), North-Holland, Amsterdam, 311-329. Dubois D. and Prade H. (1992a). On the relevance of non-standard theories of uncertainty in modeling and pooling expert opinions, Reliability Engineering and Systems Safety, 36, 95-107. Dubois D. and Prade H. (1992b). Combination of fuzzy information in the framework of possibility theory, Data Fusion in Robotics and Machine Intelligence (Abidi M.A. and Gonzalez R.C., eds.), 481-505. Academic Press, New York. Dubois D. and Prade H. (1992c). When upper probabilities are possibility measures, Fuzzy Sets and Systems, 49, 65-74. Dubois D. and Prade H. (1992d). On the combination of evidence in various mathematical frameworks, Reliability Data Collection and analysis (Flamm J. and Luisi T., eds.), Kluwer Academic Publishers, Dordrecht, The Netherlands, 213-242. Dubois D. and Prade H. (1992e). Belief change and possibility theory, Belief Revision (Gärdenfors P., ed.), Cambridge University Press, Cambridge, UK, 142-182. Dubois D. and Prade H. (1994). Possibility theory and data fusion in poorly informed environments, Control Engineering Practice, 2(5), 811-823. Dubois D. and Prade H. (1997). A synthetic view of belief revision with uncertain inputs in the framework of possibility theory, Int. J. of Approximate Reasoning, 17(2/3), 295-324 Dubois D. and Prade H. (1998). Possibility theory: qualitative and quantitative aspects, Handbook on Defeasible Reasoning and Uncertainty Management Systems — Volume 1: Quantified Representation of Uncertainty and Imprecision. (Smets P., ed.) Kluwer Academic Publ., Dordrecht, The Netherlands, 1997. Dubois D., Prade H. and Nguyen H. T.(1999). Possibility theory, probability and fuzzy sets: Misunderstandings, bridges and gaps, Fundamentals of Fuzzy Sets (Dubois D. and Prade H., eds.), The Handbook of Fuzzy Sets Series, Kluwer Academic Publ., Dordrecht, to appear. Dubois D., Prade H. and Sandri S. (1993). On possibility/probability transformations, Fuzzy Logic: State of the Art (Lowen R. and Lowen M., eds.), Kluwer Academic Publ., 103-112. Dubois D., Prade H. and Testemale C. (1988). Weighted fuzzy pattern matching, Fuzzy Sets and Systems, 28, 313-331. Dubois D., Prade H. and Yager R. R. (1998). Computation of intelligent fusion operations based on constrained fuzzy arithmetics, Proc IEEE Int. Conf. on Fuzzy Systems, Anchorage, Alaska, 767-772. Dubois D. and Yager R. R. (1992). Fuzzy set connectives as combinations of belief structures, Information Sciences, 66, 245-275.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Edwards W. F. (1972). Likelihood, Cambridge University Press, Cambridge, U.K. Fodor J. and Yager R. (1999). Fuzzy set-theoretic operators and quantifiers, Fundamentals of Fuzzy Sets (Dubois D. and Prade H., eds.), The Handbook of Fuzzy Sets Series, Kluwer Academic Publ., Dordrecht, to appear. Fortemps P. and Roubens M. (1996). Ranking and defuzzification methods based on area compensation, Fuzzy Sets and Systems, 82, 319-330. French S. (1985). Group consensus probability distributions: a critical survey, Bayesian Statistics (Bernardo J. et al., Eds), Elsevier, The Netherlands, 183-201. Fukuda T., Shimojida K., Fumihito A. and Matsuura H. (1993). Multisensor integration systems based on fuzzy inference and neural network, Information Sciences, 71, 27-41 Gebhardt J. and Kruse R. (1998). Parallel combination of information sources, Belief Change (Dubois D. and Prade H., Eds), Vol. 3 in the Handbook of Defeasible Reasoning and Uncertainty Management Systems, Kluwer Academic Publishers, Dordrecht, The Netherlands, 393-440. Goodman I. R. (1982), Fuzzy sets as equivalence classes of random sets, Fuzzy Sets and Possibility Theory (Yager R., ed.), Pergamon Press, Oxford, 327-342. Grabisch M., Murofushi T. and Sugeno M. (1992). Fuzzy measure of fuzzy events defined by fuzzy integrals, Fuzzy Sets and Systems, 50, 293-313. Grabisch M., Orlovski S. and Yager R. R. (1998). Fuzzy aggregations of numerical preferences, Fuzzy Sets in Decision Analysis, Operations Research and Statistics, (Slowinski R., Ed.), The Handbooks of Fuzzy Sets Series, Kluwer Academic Publishers, Dordrecht, The Netherlands, 31-68. Hajek P. (1985). Combining functions for certainty degrees in consulting systems, Int. J. of Man-Machine Studies, 22, 59-76. Hirota K. (1981). Concepts of probabilistic sets, Fuzzy Sets and Systems, 5, 3146. Hirota K., Pedrycz W. and Yuda M. (1991). Fuzzy set-based models of sensor fusion, Proc. International Fuzzy Engineering Symposium, Yokohama, Japan, World Scientific, Singapore, 623-633. Hisdal E. (1978). Conditional possibilities — Independence and non-interactivity, Fuzzy Sets and Systems , 1, 283-297. Jeffrey R. (1965). The Logic of Decision, McGraw-Hill, New York. Kacprzyk J. and Nurmi H. (1998). Group decision-making under fuzziness, Fuzzy Sets in Decision Analysis, Operations Research and Statistics, (Slowinski R., Ed.), The Handbooks of Fuzzy Sets Series, Kluwer Academic Publishers, Dordrecht, The Netherlands, 103- 136. Kalfsbeek H. (1989), Elicitation, assessment and pooling of expert judgment using possibility theory, Working Paper PER1829/89 (revised version), Joint Research Center of the EEC, Ispra, Italy. Kaufmann A. (1988). Theory of expertons and fuzzy logic. Fuzzy Sets and Systems, 28, 295-304. Klir G. J. and Folger T. (1988). Fuzzy Sets, Uncertainty and Information, Prentice Hall, Englewood Cliffs, NJ. Kosko B. (1986a). Fuzzy knowkedge combination, Int. J. Intelligent Systems, 1, 293-330.
MERGING FUZZY INFORMATION
Kosko B. (1986b). Fuzzy entropy and conditioning, Information Sciences, 40, 165174. Kuncheva L. and Krishnapuram R. (1996). A fuzzy consensus operator, Fuzzy Sets and Systems, 79, 347-356. Lasserre V., Mauris G. and Foulloy L.(1998). A simple modelisation of measurement uncertainty: the truncated triangular possibility distribution, Proc 7th Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU'98), Paris, Editions Medicales et Scientifiques, 10-17. Lopez-Sanchez M., Lopez de Mantaras R. and Sierra C. Incremental map generation by low-cost robots based on possibility/necessity grids, Proc of the 13th Conf. Uncertainty in Artificial Intelligence (Geiger D. and Shenoy P., eds.), Morgan Kaufmann, San Francisco, CA, 351-357. Lotan T. (1997). Integration of fuzzy numbers corresponding to static knowledge and dynamic information, Fuzzy Sets and Systems, 86, 335-344 Mahler R. P. (1995). Comining ambiguous evidence with respect to ambiguous a priori knowledge — Part II: Fuzzy Logic, Fuzzy Sets and Systems, 75, 319354. Marichal J. L. (1998). On Sugeno integral as an aggregation function, Tech. Rep. 9710, GEMME, Fac. d'Ecomomie, Université de Liège, Belgium. Proceedings of EUFIT'98 (Vol. I), Aachen, Germany, September 7-10,. 540-544 Mauris G., Benoit E. and Foulloy L. Fuzzy linguistic methods for the aggregation of complementary sensor information, Aggregation and Fusion of Imperfect Information (Bouchon-Meunier B., Ed.) Physica-Verlag, Heidelberg, Germany, 214-230. Meizel D. and Piat E. (1997). Proposition d'un cadre probabiliste de fusion de croyances, Traitement du Signal, 14, 485-498. Mosleh A. and Apostolakis G. (1984). Models for the use of expert opinions, Low Probability/ High Consequence Risk Analysis (Waller R.A. and Covello V. T., Eds.). Plenum Press, New York. Mundici D. (1992). The logic of Ulam games with lies, Knowledge, Belief and Strategic Interaction (Bicchieri C. and Dalla Chiara M., eds.), Cambridge University Press, 275-284. Nifle A. and Reynaud R. (1997). Classification de comportements fondée sur l'occurrence d'événements en théorie des possibilités, Traitement du Signal, 14, 523-534. Novak V. (1999). Weighted inference systems. Chapter 2 in this volume. Ovchinnikov S. (1998). An analytic characterization of some aggregation operators, Int. J. Intell. Systems, 13, 3-10. Pannerec T., Oussalah M., Maaref H. and Barret. C. (1998). Absolute localization of a miniature mobile robot using heterogeneous sensors: comparison between Kalman filter and possibility theory method, Proc. IEEE Symposium on Robotics and Cybernetics, CESA'98, Tunis. Park S. and Lee C. S. G. (1993). Uncertainty fusion of sensory information using fuzzy numbers, Proc. 5th IFSA Congress, Seoul, Korea, 1001-1004. Parratt L. G. (1961). Probability and Experimental Errors in Science, Dover, New York.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Poloni M. (1994). Sensor data fusion using fuzzy decision-making techniques, Proc 2d Europ. Congress on Intelligent Techniques and Soft Computing. (EUFIT'94), Aachen, Germany, 35-41. Poloni M., Ulivi G. and Vendittelli M. (1995). Fuzzy logic and autonomous vehicles: Experiments in data fusion, Fuzzy Sets and Systems, 69, 15-27. Poole D. L. (1985). On the comparison of theories: preferring the most specific explanation, Proc. 9th Int. Joint Conf. on Artificial Intelligence, Los Angeles, 465-474. Prade H. (1985). Reasoning with fuzzy default values, Proc. of the 5th Inter. Symp. on Multiple-Valued Logic, Kingston, Ont., May 28-30, 191-197. Rescher N. and Manor R.(1970). On inference from inconsistent premises, Theory and Decision, 1, 179-219. Roux L. and Desachy J. (1997). Multisources information fusion application for satellite image classification, Fuzzy Information Engineering (Dubois D., Prade H. and Yager R.R., eds.), John Wiley, New York, 111-121. Sandri S. (1991). La combinaison de l'information incertaine et ses aspects algorithmiques, Thèse de Doctorat de l'Université Paul Sabatier, Toulouse, France. Sandri S., Besi A., Dubois D., Mancini G. Prade H. and Testemale C. (1989). Data fusion problems in an intelligent data base interface, Reliability Data Collection and Use in Risk and Availability Assessment (Colombari U., Ed.), 655-670. Springer Verlag, Berlin. Sandri S., Dubois D. and Kalfsbeek H. (1995). Elicitation, assessment and pooling of expert judgement using possibility theory, IEEE Trans. on Fuzzy Systems, 3, 313-335 Schweizer B. and Sklar A. (1983). Probabilistic Metric Spaces, North-Holland, New York. Shafer G. (1976). A Mathematical Theory of Evidence, Princeton Univ. Press, Princeton, NJ. Shafer G. (1978). Non-additive probabilities in the works of Bernoulli and Lambert, Archives for the History of Exact Sciences, 19, 309-370 Shafer G. (1986). The combination of evidence, Int. J. Intelligent Systems, 1, 155179. Silvert W. (1979). Symmetric summation: A class of operations on fuzzy sets, IEEE Trans. on Systems, Man and Cybernetics, 9(10), 657-659. Smets P. (1982). Possibilistic inference from statistical data, Proc. of the 2nd World Conf. on Math. at the Service of Man, Las Palmas, Spain, June 28-July 3, 611-613 Smets P. (1988). Belief functions, Non-Standard Logics for Automated Reasoning (Smets P., Mamdani A., Dubois D. and Prade H., eds.), Academic Press, London, 253-286. Spohn W. (1988). Ordinal conditional functions: A dynamic theory of epistemic states, Causation in Decision, Belief Change, and Statistics, Vol. 2 (Harper W.L. and Skyrms B., eds.), D. Reidel, Dordrecht, 105-134. Sugeno M. (1974). Theory of fuzzy integrals and its applications, Doctoral Thesis, Tokyo Inst. of Technology.
MERGING FUZZY INFORMATION
Sugeno. M. (1977). Fuzzy measures and fuzzy integrals — A survey, Fuzzy Automata and Decision Processes (Gupta M.M., Saridis G.N. and BGaines B.R., eds.), North-Holland, Amsterdam, 89-102. Tahani H. and Keller J.(1990). Information fusion in computer vision using fuzzy integrals, IEEE Trans. on Systems, Man and Cybernetics, 20, 733-741. Tang Y. C. and Lee C. S.G (1992). Optimal strategic recognition of objects based on candidate discriminating graphs with coordinated sensors, IEEE Trans. on Systems, Man and Cybernetics, 22, 647-661. Wagner C. and Lehrer K. (1981). Rational Consensus in Science and Society, D. Reidel, Dordrecht. Williams M. A. (1994). Transmutations of knowledge systems, Proc. of the 4th Inter. Conf. on Principles of Knowledge Representation and Reasoning (KR'94) (Doyle J., Sandewall E. and Torasso P., eds.), Bonn, Germany, May 24-27, 1994, Morgan & Kaufmann, San Mateo, CA, 619-629. Wu J. S., Apostolakis G. E. and Okrent D.(1990). Uncertainties in system analysis: probabilistic versus nonprobabilistic theories, Reliability Engineering and Systems Safety, 30, 163-181. Yager R. R. (1981a). A procedure for ordering fuzzy subsets of the unit interval, Information Sciences, 24, 143-161 Yager R. R. (1981b). A new methodology for ordinal multiobjective decisions based on fuzzy sets, Decision Sciences, 12, 589-600 Yager R. R. (1984). Approximate reasoning as a basis for rule-based expert systems, IEEE Trans. on Systems, Man and Cybernetics, 14, 636-643. Yager R. R. (1985). Aggregating evidence using quantified statements, Information Sciences, 36, 179-206. Yager R. R. (1987). Quasi-associative operations in the combination of evidence, Kybernetes, 16, 37-41. Yager R. R. (1988). On ordered weighted averaging aggregation operators in multicriteria decision making, IEEE Trans. on Systems, Man and Cybernetics, 18,183-190. Yager R. R. (1991). Non-monotonic set-theoretic operators, Fuzzy Sets and Systems, 42, 173-190. Yager R. R. (1992). On the specificity of a possibility distribution, Fuzzy Sets and Systems, 50, 279-292. Yager R. R. (1997). A general approach to the fusion of imprecise information, Int. J. of Intelligent Systems, 12, 1-29. Yager R. R. and Filev D.P. (1994). Essentials of Fuzzy Modeling and Control, John Wiley, New York. Yager R. R. and Kelman A. (1996). Fusion of fuzzy information with considerations for compatibility, partial aggregation and reinforcement, Int. J. of Approximate Reasoning, 15, 93-122 Yan B. and Keller J. (1996). A heuristic, simulated annealing algorithm of learning possibility measures for multisource decision-making, Fuzzy Sets and Systems, 77, 87-109 Yang M. S. and Liu M. C. (1998), On possibility analysis of fuzzy data, Fuzzy Sets and Systems, 94, 171-183. Zadeh L. A. (1965). Fuzzy sets, Information and Control, 8, 338-353.
APPROXIMATE REASONING AND INFORMATION SYSTEMS
Zadeh L. A. (1971). Similarity relations and fuzzy orderings, Information Sciences, 3, 177-200. Zadeh L. A. (1978). Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems, 1, 3-28. Zadeh L. A. (1984). Review of "A Mathematical Theory of Evidence"(G. Shafer), The AI Magazine, 5(3), 81-83