6th International Symposium on Imprecise Probability: Theories and Applications, Durham, United Kingdom, 2009
Consistent approximations of belief functions Fabio Cuzzolin Oxford Brookes University, Oxford, UK
[email protected]
Abstract Consistent belief functions represent collections of coherent or non-contradictory pieces of evidence. As most operators used to update or elicit evidence do not preserve consistency, the use of consistent transformations cs[·] in a reasoning process to guarantee coherence can be desirable. Such transformations are turn linked to the problem of approximating an arbitrary belief function with a consistent one. We study here the consistent approximation problem in the case in which distances are measured using classical Lp norms. We show that, for each choice of the element we want them to focus on, the partial approximations determined by the L1 and L2 norms coincide, and can be interpreted as classical focused consistent transformations. Global L1 and L2 solutions do not in general coincide, however, nor are they associated with the highest plausibility element. Keywords. Consistent belief function, simplicial complex, approximation, Lp norms.
1
The consistent approximation problem
Belief functions (b.f.s) [19] are complex objects, in which different and sometimes contradictory bodies of evidence may coexist, as they mathematically describe the fusion of possibly conflicting expert opinions and/or imprecise/ corrupted measurements, etcetera. Making decisions based on such objects can then be misleading. This is a well known problem in classical logics, where the application of inference rules to inconsistent sets of assumptions or “knowledge bases” may lead to incompatible conclusions, depending on the subset of assumptions we start our reasoning from. Consistent belief functions (cs.b.f.s), i.e. belief functions whose non-zero mass events or “focal elements” have non-empty intersection or “core”, are then par-
ticularly interesting as they represent collections of coherent or non-contradictory pieces of evidence. In some situations it may then be desirable to design a method which, given an arbitrary belief function b, generates a consistent or non-contradictory belief function cs[b]: we call this consistent transformation. Such a transformation is all the more valuable as several important operators used to update or elicit evidence represented as belief measures, like Dempster’s sum [8] and disjunctive combination [21], do not preserve consistency. To guarantee the consistency of a state of belief we may want to seek a scheme in which each time new evidence is combined to yield a new b.f., the consistent transformation cs[·] is applied to reduce it to a coherent knowledge state. Now, consistent transformations can be built by solving a minimization problem of the form cs[b] = arg mincs∈CS dist(b, cs), where dist is some distance measure between belief functions, and CS denotes the collection of all consistent b.f.s. We call this the consistent approximation problem. By plugging in different distance functions we get different consistent transformations. In this paper, in particular, we study what happens when using classical Lp norms. Indeed, consistent belief functions correspond to possibility distributions (Section 2), which are in turn inherently related to the L∞ norm. Besides, the region of all cs.b.f.s is geometrically the set of belief functions for which the L∞ norm of the plausibility distribution is equal to 1. We can then conjecture that Lp consistent approximations will be meaningful in terms of degrees of belief. This is indeed the case. From a technical point of view, consistent b.f.s do not live in a single linear space, but in a collection of higher-dimensional triangles or simplices, called “simplicial complex” [11]. A partial solution has then to be found separately for each maximal simplex CS x of the consistent complex CS, i.e., the set of cs.b.f.s whose core includes the element x. These partial solu-
tions are later to be compared to determine the global optimal solution. We will prove here that the partial approximations determined by both the L1 and the L2 norms are unique and coincide. We will also prove that the L1 /L2 consistent approximation onto each component CS x of CS generates indeed the consistent transformation focused on x [10, 1], i.e. a new belief function whose focal elements have the form A′ = A ∪ {x}, where A is a focal element of the original b.f. b. As we will see, though, the associated global L1 /L2 solutions do not lie in general on the same component of the consistent complex. 1.1
Paper outline
After recalling the notions of consistent and consonant belief functions, we will recall their semantics and stress why it can be desirable to transform a generic belief function into a consistent one (Section 2). As we pose the approximation problem in a geometric framework, we will briefly recall in Section 3 the geometry of consistent b.f.s. As the latter form a complex, we need to solve the approximation problem separately for each maximal simplicial component of such complex (Section 4). After gaining some insight from the analysis of the binary case (Section 5), we will proceed to solve the L1 and L2 consistent approximation problems in the general case in Section 6. We will finally comment and interpret our results.
2 2.1
Semantics of consistent belief functions Consistent belief functions
We first recall the basic notions of the theory of evidence, and the definition of consistent belief functions in particular, to later discuss their semantics [19]. Definition 1 A basic probability assignment (b.p.a.) on a finite set (frame of discernment [19]) Θ is a set . function mb : 2Θ → [0, 1] on 2Θ = {A ⊆ Θ} s.t. X mb (∅) = 0, mb (A) = 1, mb (A) ≥ 0 ∀A ⊆ Θ.
on Θ is defined as: b(A) =
X
mb (B).
B⊆A
A dual mathematical representation of the evidence encoded by a belief function b is the plausibility function (pl.f.) plb : 2Θ → [0, 1], A 7→ plb (A) where X . plb (A) = 1 − b(Ac ) = 1 − mb (B) B⊆Ac
expresses the amount of evidence not against A. In the theory of evidence a probability function is simply a special belief function assigning non-zero masses to singletons only (Bayesian b.f.): mb (A) = 0 |A| > 1. Consonant belief functions are b.f.s whose f.e.s A1 ⊂ · · · ⊂ Am are nested. Consonant b.f.s always have a non-empty core, namely their smallest f.e. A1 . However, not all b.f.s whose core is non-empty are consonant. Definition 3 A belief function is said to be consistent if its core is non-empty. 2.2
Semantics of consistent belief functions
Consistent belief functions (cs.b.f.s) form a significant class of b.f.s, for several reasons. On one side, they correspond to possibility distributions, and form therefore with consonant b.f.s the link between evidence and possibility theory. More importantly, though, they are the analogues of consistent, non-contradictory sets of propositions (“knowledge bases”) in logics. As maintaining coherence along an inference process is highly desirable, the utility of an operator which maps arbitrary belief functions to consistent ones emerges. This is all the more valuable as several evidence combination rules, like Dempster’s sum [8] and disjunctive combination [21] do not preserve consistency. To guarantee the consistency of the knowledge state a scheme like the following (where we use ⊕ to denote a valid combination rule) can be brought forward
A⊆Θ
b1 , b2 Subsets of Θ associated with non-zero values of mb are called focal elements (f.e.), and their intersection core: \ . A. Cb = A⊆Θ:mb (A)6=0
Definition 2 The belief function (b.f.) b : 2Θ → [0, 1] associated with a basic probability assignment mb
→
b1 ⊕ b2 ↓ cs[b1 ⊕ b2 ], b3
→
cs[b1 ⊕ b2 ] ⊕ b3 ↓ cs[cs[b1 ⊕ b2 ] ⊕ b3 ] (1) in which when new evidence is combined to yield a new belief state, the consistent transformation cs[·] is applied to ensure coherence.
2.3
Consistent b.f.s and possibility distributions
In possibility theory [9, 14], subjective probability is mathematically described by possibility measures, i.e. functions P os : 2Θ →S[0, 1] such that P os(∅) = 0, P os(Θ) = 1 and P os i Ai = supi P os(Ai ), for any family of subsets {Ai |Ai ∈ 2Θ , i ∈ I}, where I is an arbitrary set index. Each measure P os is uniquely characterized by a pos. sibility distribution π : Θ → [0, 1], π(x) = P os({x}), via the formula P os(A) = supx∈A π(x). A central role in the connection between possibility and evidence theory [20, 18, 14, 12, 23, 3] is played by consonant and consistent belief functions. On one side, Proposition 1 The plausibility function plb associated with a b.f. b is a possibility measure iff b is consonant. ¯ On the other, after calling plausibility assignment pl b the restriction of the plausibility function to single¯ (x) = plb ({x}) it can be proven that [13, 5] tons pl b ¯ assoProposition 2 The plausibility assignment pl b ciated with a belief function b is the admissible possibility distribution of a possibility measure iff the b.f. b is consistent. Consistent b.f.s are then the counterparts of possibility distributions in the theory of evidence. A different, powerful semantics comes in terms of consistent knowledge bases. 2.4
Consistent b.f.s as collections of coherent pieces of evidence
Belief functions are complex objects, in which sometimes contradictory bodies of evidence may coexist, as they may result from the fusion of possibly conflicting expert opinions and/or imprecise/corrupted measurements. In formal logics, the application of inference rules to inconsistent sets of assumptions or “knowledge bases” may lead to incompatible conclusions, depending on the subset of assumptions we start from. A variety of approaches to solve this problem have been proposed. These include fragmenting the knowledge base into maximally consistent subsets, limiting the power of the formalism, or adopting non-classical semantics [17, 2]. Paris, on his side, tackles the problem by not assuming each proposition in the knowledge base as a fact, but by attributing to it a certain degree of belief [16]. This leads to something similar to a belief function. A mechanism able to obtain a consistent knowledge base from an inconsistent one is therefore desirable.
In the theory of evidence such a mechanism can be described as an operator cs : B → CS, b 7→ cs[b] where B, CS denote respectively the set of all b.f.s, and that of all cs.b.f.s. 2.5
Consistent belief functions and combination rules
Such a transformation acquires even more importance when we notice that most operators used to update/elicit evidence in the theory of evidence do not preserve consistency. Definition 4 The orthogonal sum or Dempster’s sum of two belief functions b1 , b2 is a new belief function b1 ⊕ b2 with b.p.a. P mb (B) mb2 (C) , mb1 ⊕b2 (A) = PB∩C=A 1 B∩C6=∅ mb1 (B) mb2 (C) where mbi denotes the b.p.a. associated with bi . Their disjunctive combination is a new belief function b1 ∩ b2 with b.p.a. X mb1 ∩b2 (A) = mb1 (B)mb2 (C). B∩C=A
Their conjunctive combination is instead the b.f. b1 ∪ b2 with b.p.a. X mb1 ∪b2 (A) = mb1 (B)mb2 (C). B∪C=A
Now, it is not difficult to prove that: Proposition 3 If b1 , b2 are consistent then b1 ∪ b2 is also consistent. On the other hand, if b1 , b2 are consistent and their cores Cb1 , Cb2 have non-empty intersection, then both b1 ⊕ b2 and b1 ∩ b2 are consistent with core Cb1 ∩b2 = Cb1 ∩ Cb2 . Finally, if Cb1 ∩ Cb2 = ∅ then b1 ⊕ b2 , b1 ∩ b2 are not consistent. In other words, consistency is preserved by the conjunctive rule, the price to pay being increasing uncertainty as new evidence is combined, since the core of the belief state tends to Θ (complete ignorance). On the other side, both Dempster’s rule and disjunctive combination preserve consistency only when the collection of focal elements of b1 and b2 is already consistent (i.e. any intersection A∩B of a f.e. A of b1 and a f.e. B of b2 is non-empty). As long as the new evidence is consistent with the existing one uncertainty is reduced. The price to pay is the loss of consistency in most cases. The use of a consistent transformation in a reasoning process (1) would then guarantee consistency, while allowing the degree of uncertainty affecting our knowledge of the problem to decrease with time.
2.6
Making a belief function consistent
Consistent transformations can be built by solving a minimization problem of the form cs[b] = arg min dist(b, cs) cs∈CS
(2)
where dist is some distance measure between belief functions, and CS denotes again the collection of all consistent b.f.s. We call (2) the consistent approximation problem. Plugging in different distance functions in (2) we get different consistent transformations. In this paper we study what happens when using classical Lp norms in the approximation problem. As possibility measures are inherently related to the L∞ norm (see above) cs.b.f.s live in a space linked to such a norm (Section 3). This leads to suppose that Lp based approximations may indeed generate meaningful consistent transformations.
3
The simplicial complex of consistent belief functions
To solve the consistent approximation problem (2) we need to understand the structure of the space in which consistent belief functions live. We can then move forward and find the projection of b onto this space by minimizing the chosen distance.
The geometry of consistent belief functions can be described as a structure collection of simplices or simplicial complex [7]. More precisely, CS is the union [ CS = Cl(bA , A ∋ x) x∈Θ
of the maximal simplices Cl(bA , A ∋ x) formed by all the b.f.s with core containing a given element x of Θ. 3.2
Example: the binary case
As an example let us consider a frame of discernment formed by just two elements, Θ2 = {x, y}. In this very simple case each belief function b : 2Θ2 → [0, 1] is completely determined by its belief values b(x), b(y) as b(Θ) = 1, b(∅) = 0 ∀b ∈ B. We can then represent each b.f. b as the vector [b(x) = mb (x), b(y) = mb (y)]′ of RN −2 = R2 (since N = 22 = 4). Since mb (x) ≥ 0, mb (y) ≥ 0, mb (x) + mb (y) ≤ 1 the set B2 of all the possible belief functions on Θ2 is the triangle of Figure 1, whose vertices are the points bΘ = [0, 0]′ , bx = [1, 0]′ , by = [0, 1]′ which correspond respectively to the vacuous belief function bΘ (mbΘ (Θ) = 1), the Bayesian b.f. bx with mbx (x) = 1, and the Bayesian b.f. by with mby (y) = 1. The reby =[0,1]'
3.1
The consistent complex
A belief function is determined by its N − 2, N = 2|Θ| belief values {b(A) ∅ ( A ( Θ} (since b(∅) = 0, b(Θ) = 1 for all b.f.s). It can then be thought of as a vector of RN −2 . The collection B of points of RN −2 which are b.f.s is a “simplex” (in rough words a higher-dimensional triangle), which we call belief space. B is the convex closure1 B = Cl(bA , ∅ ( A ⊆ Θ) of the (“categorical”) belief functions bA assigning all the mass to a single event A: mb (A) = 1, mb (B) = 0 ∀B 6= A. In the belief space the vector b ∈ B which represents a belief function is the convex combination X mb (A)bA (3) b= ∅(A⊆Θ
of the vectors bA representing all the categorical belief functions. 1 Here
Cl denotes the convex closure operator: P Cl(b1 , ..., bk ) = {b ∈ B : b = α1 b1 + · · · + αk bk , i αi = 1, αi ≥ 0 ∀i}.
P2 B2 CS y
b
m b(y) bΘ=[0,0]'
CS x
m b(x)
bx =[1,0]'
Figure 1: The belief space B for a binary frame is a triangle of R2 whose vertices are the categorical b.f.s focused on {x}, {y} and Θ. The probability region is the segment Cl(bx , by ), while all consistent b.f.s live in the union of the two segments CS x = Cl(bΘ , bx ) and CS y = Cl(bΘ , by ). gion P2 of all the Bayesian b.f.s on Θ2 is the segment Cl(bx , by ). In the binary case consistent belief functions can have as list of focal elements either {{x}, Θ2}
or {{y}, Θ2}. Therefore the space of cs.b.f.s CS 2 is the union of two one-dimensional simplices (line segments): CS 2 = CS x ∪ CS y = Cl(bΘ , bx ) ∪ Cl(bΘ , by ).
4 4.1
The Lp consistent approximation problem Using norms of the Lp family
The geometry of the binary case hints to a strict relation between consistent belief functions and Lp norms. As the plausibility of all the elements of their core is X mb (A) = 1 ∀x ∈ Cb , plb (x) = A⊇{x}
the region of consistent b.f.s o n o n ¯ kL = 1 CS = b : max plb (x) = 1 = b : kpl ∞ b x∈Θ
is the set of b.f.s for which the L∞ norm of the plausibility distribution is equal to 1. This reinforces the observation that cs.b.f.s correspond to possibility distributions (Section 2), which are in turn inherently related to L∞ . It makes then sense to conjecture that the consistent transformation we obtain by picking as distance function in the approximation problem (2) one of the classical Lp norms kb − b′ kL1
=
kb − b′ kL2
=
kb − b′ kL∞
=
Approximation in the binary case
To get some insight on how to proceed in the general case, we will first consider the case study of a binary frame (Figure 3), and discuss how to approximate a belief function b ∈ B2 with a Bayesian or a consistent b.f. using an Lp norm. We will denote by
the probability which minimizes the Lp distance from b. Analogously, we will use the notation
. P[b] = {p ∈ P : p(A) ≥ b(A) ∀A ⊆ Θ}. Approximation on a complex
As the consistent complex CS is a collection of linear spaces (better, simplices which generate a linear space) solving the problem (2) involves finding a number of partial solutions cs∈CS x
5
p∈P
maxA⊆Θ {|b(A) − b′ (A)|}
csxLp [b] = arg min kb − cskLp
(see Figure 2). Then, the distance of b from all such partial solutions has to be assessed in order to select a global optimal approximation. In the rest of the paper we will apply this scheme to both the approximation problems associated with L1 and L2 , respectively.
. pLp [b] = arg min kb − pkLp
P ′ A⊆Θ |b(A) − b (A)|, qP ′ 2 A⊆Θ (b(A) − b (A)) ,
will be meaningful. When looking for a probabilistic approximation p[b] = arg minp∈P dist(b, p) the use of Lp norms leads indeed to quite interesting results. The L2 approximation produces the so-called “orthogonal projection” of b onto P [6], while, at least in the binary case, the set of L1 /L∞ probabilistic approximations of b coincide with the set of probabilities dominating b:
4.2
Figure 2: To minimize the distance of a point from a simplicial complex, we need to find all partial solutions (4) for all maximal simplices in the complex (empty circles), and later compare these partial solutions to select the global optimum (black circle).
(4)
. csLp [b] = arg min kb − cskLp cs∈CS
for Lp consistent approximations. In the Bayesian case we get h mb (Θ) i′ mb (Θ) ; , mb (y) + pL2 [b] = mb (x) + 2 2 this probability is called orthogonal projection π[b] of b onto P [6], and coincides with the pignistic function BetP [b] [22, 4] in the binary case. The L1 solution pL1 [b], instead, is the whole set of probabilities “dominating” b [15], i.e., . pL1 [b] = P[b] = p ∈ P : p(A) ≥ b(A) ∀A ⊆ Θ . (5)
Figure 3 illustrates the geometry of all Lp Bayesian and consistent approximations of a belief function b in the binary frame. We can notice that:
b − cs is orthogonal to all the generators bB − bΘ of the simplex CS x = Cl(bB , B ⊇ {x}):
by =[0,1]'
hb − cs, bB − bΘ i = hb − cs, bB i = 0 ∀B ⊇ {x} P CS y
P[b]=pL [b] pL [b]=pL [b]=BetP[b] 1
inf
b
m b(y)
2
(as bΘ = 0 is the origin of RN −2 , see binary example). . We denote by α(A) = mcs (A) the b.p.a. of cs so that we can write each consistent belief function whose core contains {x} as X α(A)bA cs = A⊇{x}
CS[b]=csL [b] inf
bΘ=[0,0]'
bx =[1,0]'
cs L [b]=cs L [b]
CS x
1
2
m b(x)− m b(y) m b(x)
m b(x)+m b(y)
Figure 3: The dual behavior of Bayesian pLi [b] and consistent cLi [b] approximations of a b.f. b associated with the norms L1 , L2 , L∞ is shown in the binary case.
(by Equation (3)). After introducing the notation . β(A) = mb (A) − α(A) P we can write b − cs = A(Θ β(A)bA and the orthogonality condition reads as DX E β(A)bA , bB = 0 ∀B ⊇ {x} A(Θ
1. the solution of the L∞ approximation problem determines an entire set CS[b] of consistent b.f.s; 2. on the other hand, L1 /L2 approximations on the same component CS x of CS are point-wise and coincide; 3. the corresponding consistent transformation csxL2 [b] maps the original belief function b to a new b.f. with a focal element A ∪ {x} whenever A is a f.e. of b. The resulting b.p.a. is X mb (A) = mb (x), mcsxL [b] (x) = 2
A:A∪{x}={x}
mcsxL
2
[b] (Θ)
=
X
mb (A)
A:A∪{x}=Θ
=
mb (y) + mb (Θ).
4. finally, the global L1 /L2 consistent transformations also coincide, as they belong to the same component of the consistent complex (CS x in the figure). These facts (except the last point, which turns out to be an artifact of binary frames) are valid in the general case. Here we are going to focus on L1 /L2 approximations.
6 6.1
Consistent L1 /L2 approximations Reducing the approximation problem to a linear system
In the case of an arbitrary frame a cs.b.f. cs ∈ CS x is a solution of the L2 partial approximation problem if
i.e. (still for ∀B ⊇ {x}), X X mb (A)hbA , bB i = 0. β(A)hbA , bB i + A6⊃{x}
A⊇{x}
(6) The L1 minimization problem reads instead as X X X α(B) = mb (B) − arg min α ~
A⊇{x} X arg min ~ β
B⊆A
X
B⊆A,B⊇{x}
β(B) +
A⊇{x} B⊆A,B⊇{x}
X
B⊆A,B6⊃{x}
mb (B)
which is clearly solved by setting all addenda to zero, obtaining the linear system: X X mb (B) = 0 ∀A ⊇ {x}. β(B) + B⊆A,B⊇{x}
B⊆A,B6⊃{x}
(7) 6.2
Linear transformation
We are going to show here that the two minimization problems associated with the linear systems (6) and (7) coincide. The solution is indeed conserved due to the fact that the second linear system is obtained from the first one through a linear transformation. P |B\A| = 1 if C ⊆ A, 0 Lemma 1 B⊇A hbB , bC i(−1) otherwise. Corollary 1 The linear system (6) can be reduced to the system (7) through a linear transformation of rows: X rowA 7→ rowB (−1)|B\A| . (8) B⊇A
Proof. If we apply the linear transformation (8) to the system (6) we get i Xh X X mb (C)hbB , bC i · β(C)hbB , bC i + B⊇A
C⊇{x}
X
·(−1)|B\A| =
β(C)
X
mb (C)
X
hbB , bC i(−1)|B\A| ∀A ⊇ {x}.
B⊇A
C6⊃{x}
∀A ⊇ {x}
C6⊃{x},C⊆A
i.e. the system of equations (7). 6.3
Form of the solution
To obtain both the L2 and the L1 consistent approximations of b it then suffices to solve the system (7) associated with the L1 norm.
2
∀x ∈ Θ, and for all A s.t. {x} ⊆ A ⊆ Θ. 6.4
Therefore by Lemma 1 we get X X mb (C) = 0 βC + C⊇{x},C⊆A
mcsxL (A) = mcsxL (A) = mb (A) + mb (A \ {x}) 1
B⊇A
C⊇{x}
+
XC6⊃{x} hbB , bC i(−1)|B\A| +
Corollary 2 The partial L1 and L2 consistent approximations of a belief function b with b.p.a. mb onto the component CS x of the consistent complex coincide. They have b.p.a.
Partial solutions as focused consistent transformations
The basic probability assignment of the L1 /L2 consistent approximations of b has an elegant expression. It also has a straightforward interpretation: to get a consistent b.f. focused on a singleton x, the mass contribution of all the events B such that B ∪ {x} = A coincide is assigned indeed to A. But there are just two such events: A itself, and A \ {x}. As an example, the partial consistent approximation of a belief function on a frame Θ = {x, y, z, w} with core {x} is illustrated in Figure 4. The b.f. with focal
Theorem 1 The unique solution of the linear system (7) is given by β(A) = −mb (A \ {x}). Proof. We can prove it by substitution. System (7) becomes X X mb (B) = mb (B \ {x}) + − B⊆A,B⊇{x}
=−
X
mb (C) +
X
mb (C) +
elements {y}, {y, z}, and {x, z, w} is transformed by the map
B⊆A,B6⊃{x}
C⊆A\{x}
=−
XB⊆A,B6⊃{x} mb (B) =
X
mb (C) = 0.
C⊆A\{x}
C⊆A\{x}
Therefore, according to what discussed in Section 4, the partial L1 /L2 consistent approximations of b on the maximal component CS x of the consistent complex have b.p.a. mcsxL (A) 1
= =
mcsxL (A) = α(A) = mb (A) − β(A) 2 mb (A) + mb (A \ {x})
for all events A such that {x} ⊆ A ( Θ. The value of α(Θ) can be obtained by normalization: X α(A) α(Θ) = 1 − {x}⊆A(Θ
=1−
X
mb (A) + mb (A \ {x})
X
mb (A) −
X
mb (A)
{x}⊆A(Θ
=1−
{x}⊆A(Θ
=1−
X
mb (A \ {x})
{x}⊆A(Θ = mb ({x}c )
+ mb (Θ)
A6=Θ,{x}c
as B 6⊃ {x} iff B = A \ {x} for A = B ∪ {x}.
Figure 4: A belief function (left) and its L1 /L2 consistent approximation with core {x} (right).
{y} 7 → {x} ∪ {y} = {x, y}, {y, z} 7→ {x} ∪ {y, z} = {x, y, z}, {x, z, w} → 7 {x} ∪ {x, z, w} = {x, z, w} into the consistent b.f. with focal elements {x, y}, {x, y, z}, and {x, z, w} and the same b.p.a. Partial solutions to the L1 /L2 consistent approximation problem turn out to be related to classical inner consonant approximations of a belief function b, i.e. the set of consonant b.f.s such that c(A) ≥ b(A) ∀A ⊆ Θ (or equivalently plc (A) ≤ plb (A) ∀A). Dubois and Prade [10] proved indeed that such an approximation exists iff b is consistent. However, when b is not consistent a “focused consistent transformation” can be applied to get a new belief function b′ such that m′ (A ∪ xi ) = m(A)
∀A ⊆ Θ
and xi is the element of Θ with highest plausibility. Theorem 1 and Corollary 2 state that the L1 /L2 consistent approximation onto each component CS x of CS generates the consistent transformation focused on x.
6.5
Global optimal solution for L1
To find the global consistent approximation of b we need to work out which of the partial approximations csxL1/2 [b] has minimal distance from b. To do so we need to find arg min kb − csxL1/2 [b]k. x
The L1 distance of b from CS x can be computed as X kb − csxL1 [b]kL1 = |b(A) − csxL1 [b](A)| =
A⊆ΘX
X
|b(A) − 0| +
X
b(A) +
A6⊃{x}
=
A⊇{x}
b(A) −
X
B⊆A,B⊇{x}
X X mb (B)+
α(B)
A⊇{x} B⊆A
A6⊃{x}
mb (B) + mb (B \ {x}) B⊆A,B⊇{x} X X X mb (B)+ b(A) + = X
−
A⊇{x} B⊆A,B6 ⊃{x}
A6⊃{x}
X
− +
B⊆A,B⊇{x}
mb (B \ {x}) =
X
X
mb (C) −
X
b(A) =
X
X
b(A).
X
mb (B) · 2|{x}
c
|−|B|
(10) Now, consider a belief function on a frame Θ = {x1 , ..., xn } of cardinality n, with just two focal elements: mb (x1 ) = mb ({x1 }c ) =
mx , mb ({x2 , ..., xn }) = 1 − mx .
If mx < 1/2 all y 6= x1 have maximal plausibility, as plb (x1 ) = 1 − b({x1 }c ) = mx , while plb (y) = 1 − mx for all y 6= x. However, according to (10), X b(A) kb − csxL11 [b]kL1 = A⊆{x1 }c
(1 − mx )2n−1−(n−1) = 1 − mx ,
= where n = |Θ|, while kb − csyL1 [b]kL1
b(A)+
mb (C)
X
= =
b(A)
A⊆{y}c mx 2n−1−1
= mx 2n−2
∀y 6= x. But when mx 2n−2 ≥ 1 − mx ≡ n ≥ 2 + log2
(9) Immediately,
1 − m x
mx
we have that
Theorem 2 The global optimal L1 consistent approximation of any belief function b is given by . ˆ [b] csL1 [b] = arg min kb − csxL1 [b]k = csxL 1 cs∈CS
i.e. the partial approximation associated with the element x ˆ which minimizes (9): X x ˆ = arg min b(A), x ∈ Θ . x
A⊆{x}c
kb − csxL11 [b]kL1 ≤ kb − csyL1 [b]kL1 ∀y 6= x1 , and therefore the global L1 consistent approximation can fall on a component not associated with the maximal plausibility element. Global optimal solution for L2
6.7
In the L2 case we get 2 X kb − csxL2 [b]k2 = b(A) − csxL2 [b](A) = A⊆Θ
6.6
=
A counterexample
XhX
mb (B) −
A⊆Θ
In the binary case (Figure 3) the condition of Theorem 2 reduces to X b(A) = arg min mb ({x}c ) xˆ = arg min x
=
.
B⊆{x}c
A⊆{x}c
A6⊃{x}
B⊆{x}c
A⊆{x}c B⊆A
A⊆{x}c
·|{A ⊆ {x}c : A ⊇ B}| =
A6⊃{x}
C⊆A\{x}
A⊇{x} C⊆A\{x}
=
X
simple counterexample. Let us first write X X X X mb (B)· mb (B) = b(A) =
A⊆{x}c
x
arg maxx plb (x)
and the global approximation falls on the component of the consistent complex associated with the element of maximal plausibility. Unfortunately, this is not generally the case for arbitrary frames of discernment Θ. Let us see this in a
=
B⊆A XhX
A⊆Θ
−
X
A6⊃{x}
−
X
+
A⊇{x}
mb (B)+
mb (B \ {x}) 2 b(A) +
X
C⊆A\{x}
=
X h
A⊇{x}
X
mb (B)+
B⊆A,B6⊃{x} i2 X
mb (B \ {x})
B⊆A,B⊇{x}
X h
X
B⊆A,B⊇{x} i2
B⊆A,B⊇{x}
=
i2 α(B) =
B⊆A,B⊇{x}
mb (B) −
B⊆A
X
X
=
A6⊃{x}
mb (C) −
X
C⊆A\{x}
2 b(A) +
i2 mb (C)
so that, in analogy with the L1 case, X kb − csxL2 [b]k2 = (b(A))2 .
possibility measures in the theory of evidence. That will complete our understanding of the relation between geometric norms and evidence consistency.
Theorem 3 The global optimal L2 consistent approximation of any belief function b is given by
Proof of Lemma 1
A⊆{x}c
. ˆ [b] csL2 [b] = arg min kb − csxL2 [b]k = csxL 2 cs∈CS
i.e. the partial approximation associated with the element X x ˆ = arg min (b(A))2 , x ∈ Θ . x
We first note that, by definition of dogmatic belief function bA (Section 3), X X c 1 = 2|(B∪C) | −1. hbB , bC i = 1= Hence
= = =
Comments and conclusions
Belief functions represent coherent knowledge bases in the theory of evidence. As consistency is not preserved by most operators used to update or elicit evidence, the use of a consistent transformation in conjunction with those combinations rules can be desirable. Consistent transformations are strictly related to the problem of approximating a generic belief function with a consistent one. In this paper we solved the instance of the consistent approximation problem we obtain when measuring distances between uncertainty measures by means of the classical Lp norms. This makes sense as cs.b.f.s live in a simplicial complex defined in terms of the L∞ norms, and correspond to possibility distributions. A partial approximation for each component of the complex has to be found. The conclusions of this study are the following: 1. partial L1 /L2 approximations coincide on each component of the consistent complex; 2. such partial approximation turns out to be the consistent transformation focused on the given element of the frame; 3. the corresponding global solutions have not in general as core the maximal plausibility element, and may lie in general on different components of CS. The interpretation of the polytope of all L∞ solutions is worth to be fully investigated in the near future, in the light of the intuition provided by the binary case. In particular its clear analogy with the polytope of consistent probabilities will be interesting matter to study. A natural continuation of this line of research is obviously the solution of the Lp approximation problem for consonant belief functions, as counterparts of
X
hbB , bC i(−1)|B\A| =
B⊆A
A⊆{x}c
Other simple counterexamples show that the global L2 consistent approximation can fall on a component not associated with the maximal plausibility element.
7
E((B∪C)c
D⊇B,C;D6=Θ
X
B⊆A X
B⊆A X
c
(2|(B∪C) | − 1)(−1)|B\A| c
2|(B∪C) | (−1)|B\A| −
X
(−1)|B\A|
B⊆A c
2|(B∪C) | (−1)|B\A| ,
B⊆A
as
|B\A|
X
(−1)|B\A| =
B⊆A
X
1|A
c
|−k
(−1)k = 0
k=0
for Newton’s binomial: n X
pk q n−k = (p + q)n .
(11)
k=0
Now, as both B ⊇ A and C ⊇ A the set B can be
Figure 5: Decomposition of B into A + B ′ + B ′′ in the proof of Lemma 1. decomposed into the disjoint sum B = A + B ′ + B ′′ where ∅ ⊆ B ′ ⊆ C \ A,
∅ ⊆ B ′′ ⊆ (C ∪ A)c
(see Figure 5), so that the above quantity can be written as X X c ′′ ′ ′′ 2|(A∪C)| −|B | (−1)|B |+|B | = ∅⊆B ′ ⊆C\A ∅⊆B ′′ ⊆(C∪A)c X X |B ′ |
(−1)
∅⊆B ′ ⊆C\A
∅⊆B ′′ ⊆(C∪A)c
′′
(−1)|B | 2|(A∪C)|
c
−|B ′′ |
where X
′′
(−1)|B | 2|(A∪C)|
c
′′
−|B |
= [2+(−1)]|(A∪C)|
c
[12] S. Heilpern, Representation and application of fuzzy numbers, Fuzzy Sets and Systems 91 (1997), 259–268.
∅⊆B ′′ ⊆(C∪A)c c
= 1|(A∪C)| = 1, again for Newton’s binomial (11). The desired quantity becomes X ′ (−1)|B | ∅⊆B ′ ⊆C\A
which is nil for C \ A 6= ∅, equal to 1 when C \ A = ∅, i.e. C ⊆ A.
References [1] P. Baroni, Extending consonant approximations to capacities, IPMU, 2004, pp. 1127–1134. [2] D. Batens, C. Mortensen, and G. Priest, Frontiers of paraconsistent logic, Studies in logic and computation (J.P. Van Bendegem, ed.), vol. 8, Research Studies Press, 2000. [3] L. Caro and A. Babak Nadjar, Generalization of the Dempster-Shafer theory: a fuzzy-valued measure, IEEE Transactions on Fuzzy Systems 7 (1999), 255–270. [4] B.R. Cobb and P.P. Shenoy, A comparison of Bayesian and belief function reasoning, Information Systems Frontiers 5 (2003), no. 4, 345–358. [5] F. Cuzzolin, An interpretation of consistent belief functions in terms of simplicial complexes, submitted to Information Sciences (2007). [6] F. Cuzzolin, Two new Bayesian approximations of belief functions based on convex geometry, IEEE Trans. on Systems, Man, and Cybernetics - Part B 37 (2007), no. 4, 993–1008. [7] F. Cuzzolin, An interpretation of consistent belief functions in terms of simplicial complexes, Proc. of ISAIM’08, 2008. [8] A.P. Dempster, A generalization of Bayesian inference, Journal of the Royal Stat. Soc., Series B 30 (1968), 205–247. [9] D. Dubois and H. Prade, Possibility theory, Plenum Press, New York, 1988. [10] D. Dubois and H. Prade, Consonant approximations of belief functions, International Journal of Approximate Reasoning 4 (1990), 419–449. [11] B.A. Dubrovin, S.P. Novikov, and A.T. Fomenko, Sovremennaja geometrija. metody i prilozenija, Nauka, Moscow, 1986.
[13] C. Joslyn, Towards an empirical semantics of possibility through maximum uncertainty, Proc. IFSA 1991 (R. Lowen and M. Roubens, eds.), vol. A, 1991, pp. 86–89. [14] G. J. Klir, W. Zhenyuan, and D. Harmanec, Constructing fuzzy measures in expert systems, Fuzzy Sets and Systems 92 (1997), 251–264. [15] H. Kyburg, Bayesian and non-Bayesian evidential updating, Artificial Intelligence 31 (1987), no. 3, 271–294. [16] J.B. Paris, D. Picado-Muino, and M. Rosefield, Information from inconsistent knowledge: A probability logic approach, Interval / Probabilistic Uncertainty and Non-classical Logics, Advances in Soft Computing, vol. 46, SpringerVerlag, Berlin - Heidelberg, 2008. [17] G. Priest, R. Routley, and J. Norman, Paraconsistent logic: Essays on the inconsistent, Philosophia Verlag, 1989. [18] C. Roemer and A. Kandel, Applicability analysis of fuzzy inference by means of generalized Dempster-Shafer theory, IEEE Transactions on Fuzzy Systems 3 (1995), no. 4, 448–453. [19] G. Shafer, A mathematical theory of evidence, Princeton University Press, 1976. [20] Ph. Smets, The transferable belief model and possibility theory, NAFIPS-90 (Kodratoff Y., ed.), 1990, pp. 215–218. [21] Ph. Smets, Belief functions : the disjunctive rule of combination and the generalized Bayesian theorem, International Journal of Approximate Reasoning 9 (1993), 1–35. [22] Ph. Smets and R. Kennes, The transferable belief model, Artificial Intelligence 66 (1994), 191–234. [23] R. R. Yager, Class of fuzzy measures generated from a Dempster-Shafer belief structure, International Journal of Intelligent Systems 14 (1999), 1239–1247.