Modelling Information Evaluation In Fusion

1 downloads 0 Views 178KB Size Report
reported item of information does not conflict with the previously reported behaviour pattern of the target, the item may be classified as “possibly true” and given a ...
Modelling Information Evaluation In Fusion L. Cholvy ONERA Centre de Toulouse 2 avenue Ed. Belin, 31055 Toulouse, France [email protected] Abstract This paper studies the notion of information evaluation in fusion1 . It starts from some recommendations defined by NATO relatively to information evaluation in intelligence. It shows that these recommendations, even if imprecise and ambiguous, point out three notions which should be taken into account in order to define information evaluation. It proposes a model of information evaluation, which takes into account these notions and an associated fusion process as well. Keywords: Information evaluation, information fusion, logic.

1

Introduction

An important step in the intelligence gathering process is the fusion of information provided by several sources. The objective of this process is to build an up-to-date and correct view of the current situation with the overall available information in order to make adequate decisions. In a context of interoperability, NATO has defined a Standardization Agreements in order to define a common information evaluation system. The aim of such an evaluation system is to indicate the degree of confidence that may be placed in any item of information which has been obtained for intelligence. This is achieved by adopting an alphanumeric system of rating which combines a measurement of the reliability of the source of information with a measurement of the credibility of that information when examined in the light of existing knowledge. This standard is given in the annex. In previous works [3], [4], we have analysed these informal recommendations and we have shown that the the three main notions which underline an evaluation system are : the number of independent sources that support an information, their reliability and the fact that the information tends to conflict or conflicts with some available information. Furthermore, in [4], we have defined a model of evaluation and an associated fusion method which takes into account the number of the independent sources and their reliability but which does not make a distinction between “tending to conflict” and “conflicting”. In that model, the unique notion of conflict is represented by the property of inconsistency. 1 This

work was supported by ONERA.

The present paper aims at extending this work to take into account a notion of degree of conflict. More precisely, it will be shown that, in the case when a hierarchy of concepts is available, then we can define an obvious notion of degree of conflict between information our previous work implicitly takes into account. This paper is organized as follows. Section 2 presents the recommendations of NATO about information evaluation. Section 3 discusses this informal definition. It also discusses the moment when this evaluation must be taken into account. Section 4 recalls the formal approach we have previously taken to evaluation. Section 5 addresses the notion of degrees of conflicts in the case when information is described by using a hierarchy of concepts. Finally, section 6 is devoted to a discussion.

2

Recommendations of NATO

This standard [1], [2], explicitly mentions that the aim of information evaluation is to indicate the degree of confidence that may be placed in any item of information which has been obtained for intelligence. (...) This is achieved by adopting an alphanumeric system of rating which combines a measurement of the reliability of the source of information with a measurement of the credibility of that information when examined in the light of existing knowledge. Reliability of the source is designated by a letter between A and F signifying various degrees of confidence as indicated below. • a source is evaluated A if it is completely reliable. It refers to a tried and trusted source which can be depended upon with confidence • a source is evaluated B if it is usually reliable. It refers to a source which has been successfully used in the past but for which there is still some element of doubt in particular cases. • a source is evaluated C if it is fairly reliable. It refers to a source which has occasionally been used in the past and upon which some degree of confidence can be based. • a source is evaluated D if it is not usually reliable. It refers to a source which has been used in the past but has proved more often than not unreliable.

• a source is evaluated E if it is unreliable. It refers to a source which has been used in the past and has proved unworthy of any confidence. • a source is evaluated F if its reliability cannot be judged. It refers to a source which has not been used in the past. Credibility of information is designated by a number between 1 and 6 signifying varying degrees of confidence as indicated below. • If it can be stated with certainty that the reported information originates from another source than the already existing information on the same subject, then it is classified as “confirmed by other sources” and rated 1. • If the independence of the source of any item of information cannot be guaranteed, but if, from the quantity and quality of previous reports, its likelihood is nevertheless regarded as sufficiently established, then the information should be classified as “probably true” and given a rating of 2. • If, despite there being insufficient confirmation to establish any higher degree of likelihood, a freshly reported item of information does not conflict with the previously reported behaviour pattern of the target, the item may be classified as “possibly true” and given a rating of 3. • An item of information which tends to conflict with the previously reported or established behaviour pattern of an intelligence target should be classified as “doubtful” and given a rating of 4 • An item of information which positively contradicts previously reported information or conflicts with the established behaviour pattern of an intelligence target in a marked degree should be classified as “improbable” and given a rating of 5 • An item of information is given a rating of 6 if its truth cannot be judged.

3 3.1

Discussions about these recommendations How to define evaluation ?

The previous definition can be discussed because, since they are given in natural language, they are quite imprecise and ambiguous. Indeed, according to the previous recommendations, the reliability of a source is defined in reference to its use in the past. It can be measured for example, as the ratio of the number of times it gave a true information to the number of times it gave information. However, this definition does not take into account the actual environment of use of the information source. For instance, even if it is known to be reliable, an infra-red sensor loose reliability when it rains.

Concerning the credibility of information, it seems that the rating defined according to the recommendations does not qualify an unique property. For instance, how should we note an item of information supported by several sources of information which is also conflictual with some already registered information ? According to these definitions, this item should be given a rating of 1 and should also be given a rating of 5. Furthermore, according to the recommendations, a rating of 6 should be given to an item whose truth cannot be judged. This supposes that the other ratings (1...5) concern the evaluation of the truth of information. If so, rating of 1 should be given to a true information. But, as it is defined, rating of 1 should be given to an item supported by at least two sources. This is questionable since, the different sources (even if they agree) may emit false information. However, even if the previous recommendations can be discussed, one can see that three main notions underline the evaluation system : • the number of independent sources that support an information • their reliability • the fact that the information tends to conflict or conflicts with some available information These three characteristics are independent. For instance, two sources, one being fully reliable and the other not usually reliable may have emitted the same piece of information. Furthermore, this piece of information may be contradictory with another piece of information. Our claim is that a model for information evaluation should take into account these three notions for any piece of information.

3.2

When should evaluation be taken into account ?

Before proposing a formal model for modelling information evaluation, let us discuss the moment when information evaluation must be taken into account. Due to the fact that it depends on the sources that support an information and also depends on other information, we can see that information evaluation changes over time. There are two main different approaches to take information evaluation changing into account. • In the first approach we call “reasoning when updating” each information stored in the database is associated with a valid evaluation, i.e some attributes which, at any time, represent the number of the sources that support the piece of information, their reliability and the degree of conflicts between the piece of information and the other information. Following this approach, each time a new piece information arrives (database updates time), the

evaluation of some other information must be updated. In particular the evaluation of the information with which it is or it tends to be in conflict must be updated. However, when an information is needed (database querying time), no more computing is required and answering the required information and its evaluation is obvious. • In the second approach we call “reasoning when querying”, each information is stored in the database with the minimal meta-information required, i.e the name of the source that supports it. Following this approach, each time a new piece information arrives (database updates time), the only thing is to check that the same information with the same meta-information has not already been stored (that would mean that the source is repeating). No other update is made. But, when an information is needed (database querying time), some computing is required, in particular in order to compute its evaluation in respect with the evaluation of the other pieces of information. The main disadvantage of the first approach is that the evaluation of a piece of information may change many times before the information is needed. So we suggest to adopt the second approach.

4

Formal approach to evaluation

4.1

Information representation

Let us first consider a propositional language L, used to express information. Information will then be formulas of L. M od(F ) will denote the set of models of the formula F . We consider several sources of information which deliver information about an observed situation. Each source of information S is associated with its degree of reliability r(S), which is a letter belonging to {A, ..., E} depending on the fact that this source is judged more or less reliable2 . Definition. Each information emitted by a source S is stored in a database DB with a single attribute which is S. We call the pair (F, S) a quoted information. This way of storing information allows ones to express that “the source S supports information F ”. the database is a set of quoted information. Definition The database DB, is a set of quoted information DB = {(F1 , S1 ), ...(Fn , Sn )} which satisfies: ∀i∀j if (i 6= j) then Fi 6= Fj 3 or Si 6= Sj The previous condition is introduced in order not to take into account a source which would repeat a given information. Thus, adding a new information in the base is made as follows: 2 Notice

that this definition could be refined by defining the degree of reliability of a source according to the probability that this source gives true information in the current context of use. Refining this definition is not necessary for the following. 3 In order to be more rigorous, we should write “F is not i equivalent to Fj ”

Let us consider an information F emitted by a source S. • If (F, S) is not yet stored in DB then DB become DB ∪ {(F, S)} • else DB is not updated. Besides DB there may exist some constraints, IC, stating what is true for certain in the real world. For instance knowledge like a plane is not an helicopter, an Airbus plane is not a Boeing plane, etc. are stored in IC.

4.2

Reasoning when querying DB

The problem is now to answer a query addressed to DB. Let Q be a formula expressing this query. We denote ||Q||DB,IC the answer to Q addressed to DB under the constraints IC. In the following, we show that given IC, DB can be associated with an unique consistent set of formulas of L, denoted merge(DB, IC) and we will define ||Q||DB,IC according to the satisfaction of Q in this set. merge(DB, IC) is here defined by extending a majority merging operator by taking into account weighted sum instead of sum. The weights here will be numbers (5, 4, 3, 2, 1) which intend to represent, on a numerical scale, the different degrees of reliability of the sources (A, B, C, D, E). We recall here some definitions introduced by Konieczny and Pino-P´erez [5] [6] and adapt them for our purpose. Hamming distance between two interpretations Let w and w0 be two interpretations. The Hamming distance between w and w0 , denoted here dh(w, w0 ) is the number of propositional letters whose valuation differs from w to w0 . Hamming distance between two sets of interpretations Let S and S 0 be two sets of interpretations. The Hamming distance between S and S 0 , denoted here dh(S, S 0 ) is defined by : dh(S, S 0 ) =

min

w∈S,w0 ∈S 0

dh(w, w0 )

Definition of ∆W Σ Let db1 ...dbn be n sets of formulas of L and IC a set of integrity constraints. Konieczny and Pino-P´erez defined a majority merging operator, denoted ∆Σ,IC , such that the set of models of the information source which is obtained from merging db1 ... dbn with this operator, is semantically characterized by: M od(∆Σ,IC ([db1 , ..., dbn ])) =

min

≤Σ [db

(Mod(IC))

1 ...dbn ]

≤Σ [db1 ...dbn ] is a total pre-order on M od(IC) defined by: 0 0 w ≤Σ [db1 ...dbn ] w iff dΣ (w, [db1 ...dbn ]) ≤ dΣ (w , [db1 ...dbn ])

with

dΣ (w, [db1 ...dbn ]) =

n X

dh(w0 , mod(dbi ))

i=1

In other words, when merging db1 ...dbn with the operator ∆Σ,IC , the result is semantically characterized by the models of IC which are minimal according to the pre-order ≤Σ [db1 ,...,dbn ] . As Konieczny and Pino-P´erez explicitly mentionned it in conclusion of [5], this merging operator ∆Σ,IC can be extended by taking into account a weighted sum instead a simple sum However, the properties of this operator remain to be studied. In the following, we suggest such an extension, which comes to the weighted model-fitting approach of Revesz [8] to the revision of a (possibly inconsistent) weighted knowledge-base by a formula. We assume now that any dbi is associated with an integer denoted r(dbi ). We define ∆W Σ,IC by :

We consider four information sources named One, T wo, T hree and F our, whose reliability are respectively A, B, C and D. We consider the following flow of emissions: One emits b T wo emits a T hree emits a F our emits c One emits d. At this moment, we have DB = {< b, One >, < a, T wo >, < a, T hree >, < c, F our >, < d, One >} The questions that raise are: what is the observed object i.e is a true ? is b true ? is c true ? is d true ?. By the previous definitions, we can define: db1 = {b} and r(db1 ) = 5 db2 = {a} and r(db2 ) = 4 db3 = {a} and r(db3 ) = 3 db4 = {c} and r(db4 ) = 2 db5 = {d} and r(db5 ) = 5

M od(∆W Σ,IC ([db1 , ..., dbn ])) = min≤W Σ (M od(IC))

IC has got three models which are: w1 = {a, ¬b, ¬c, ¬d}, w2 = {¬a, b, c, ¬d}, w3 = {¬a, b, ¬c, d}. Then we can compute the following distances:

Σ ≤W [db1 ...dbn ] is a total pre-order on M od(IC) defined by:

dW Σ (w1 , [db1 ...db5 ]) = 12 dW Σ (w2 , [db1 ...db5 ]) = 12 dW Σ (w3 , [db1 ...db5 ]) = 9

[db1 ...dbn ]

Σ 0 w ≤W [db1 ...dbn ] w iff dW Σ (w, [db1 ...dbn ]) ≤ dW Σ (w0 , [db1 ...dbn ])

with dW Σ (w, [db1 ...dbn ]) =

Pn

i=1

r(dbi ).dh(w, M od(dbi )

Definition of merge(DB, IC). Consider DB = {< F1 , S1 > .... < Fn , Sn >}. We associate DB with db1 ...dbn defined as follows : • ∀i = 1...n, dbi = {Fi } • ∀j = 1...n, r(dbi ) = 5 (resp, 4, 3, 2, 1) if f reliability of Si is A (resp, B, C, D, E) merge(DB, IC) is the consistent set of formulas of L so that: M od(merge(DB, IC)) = ∆W Σ,IC ([db11 ...dbpnn ]) Definition of the answer to Q The answer to query Q is then defined by: ||Q||DB,IC = Y ES if f |= merge(DB, IC) → Q ||Q||DB,IC = N O if f |= merge(DB, IC) → ¬Q ||Q||DB,IC =? else

4.3

An example

We consider a propositional language with letters a, b, c and d. a means the observed object is an helicopter, b means the observed object is a plane, c means the observed object is an Airbus plane and d means the observed object is a Boeing plane. Consider the following set of constraints IC = {a ∨ b, ¬a ∨ ¬b, b ↔ (c ∨ d), ¬c ∨ ¬d, }.

Thus, M erge(DB, IC) = w2 = {¬a, b, ¬c, d}. Then finally, ||a||DB,IC = N O, ||b||DB,IC = Y ES, ||c||DB,IC = N O and ||d||DB,IC = Y ES. I.e, given what has been registered in DB until now by the different sources and given their reliability, the most plausible answer is that the observed object is a Boeing plane (thus it is not an helicopter nor a Airbus plane).

5

Degrees of conflicts

The formal model and the associated fusion process previously presented obviously take into account the first two notions which underline the recommendations which are the number of independent sources that support a piece of information and their reliability degree as well. In this section, we aim at showing that this model implicitly takes into account the third important notion, which is the fact that a piece of information is in conflict with or tends to be in conflict with another one. The question is how to distinguish “tending to conflict” and “conflicting” ? In this work, we address this question in the case when the information sources to be merged agree on a common hierarchy of concepts, i.e, information to be collected are described by using a hierarchy of concepts as, for instance, the J3.2, used for Air Track identification [7]. We claim that a hierarchy of concepts implicitly induces a distance between concepts which can be used to define a degree of conflicts between pieces of information.

5.1

Illustrative example

Let us consider the following hierarchy of concepts: • The two main concepts are A (Helicopters) and B (Planes)4 . • Concept A has no subconcept. • Concept B has two subconcepts which are : C (Airbus planes) and D (Boeing planes). This means that the objects we are able to talk about are planes and helicopters. More specifically, we can distinguish Airbus planes and Boeing planes. We claim that such a hierarchy of concepts implicitly defines a notion of importance of conflicts. For instance we think that the conflict between the sentences “the object to be identified is an Airbus plane” and “the object to be identified is a Boeing plane” is less important than the conflict between “the object to be identified is an Airbus plane” and “the object to be identified is an helicopter”. The reason why the first conflict is less important, given the hierarchy, is that in this hierarchy, the nodes D and C are “closer” than the nodes D and A. The following sections formally define this notion of distance between nodes and then formally define the notion of degree of conflicts.

5.2 5.2.1

Formalization Definitions

The following definitions aim at defining the first common ancestor of two nodes in a hierarchy of concepts. Intuitively, the first common ancestor of nodes n1 and n2 , denoted f ca(n1 , n2 ) is a node which is an ancestor of n1 and also an ancestor of n2 , such that any ancestor of n1 and n2 is also an ancestor of f ca(n1 , n2 ). Definition. Let H be a hierarchy of concepts. It induces a relation order between nodes by: n1 ≤H n2 if and only if n1 is an ancestor of n2 in H. Definition. Let H be a hierarchy of concepts. Let n1 and n2 be two nodes in H. We define Anc(n1 , n2 ) = {n ∈ H, n ≤H n1 and n ≤H n2 } Definition. Let H be a hierarchy of concepts. Let n1 and n2 be two nodes in H. The first common ancestor of n1 and n2 denoted F ca(n1 , n2 ) is defined by F ca(n1 , n2 ) = M in(Anc(n1 , n2 ), ≤H ). Thus, F ca(n1 , n2 ) is the minimal element for the order relation, ≤H of Anc(n1 , n2 ). The following definitions aim at defining the degree of conflict between two nodes in a hierarchy of concepts. Such a degree is defined by means of a distance. Definition (case of two leaves). Let n1 and n2 be two leaves in a hierarchy of concepts H. We define the distance between n1 and n2 , dH (n1 , n2 ) by the number of arcs between n1 and F ca(n1 , n2 ) plus the number of arcs between n2 and F ca(n1 , n2 ). Definition (general case). Let n1 and n2 be two nodes in a hierarchy of concepts H. We define the 4 The

root concept is not needed

distance between n1 and n2 , denoted dH (n1 , n2 ), by the smallest distance dH (n01 , n02 ) where n01 (resp, n02 ) are all the leaves nodes whose n1 (resp, n2 ) is ancestor. Definition (degree of conflicts between two nodes). Let n1 and n2 be two nodes of a hierarchy of concepts H. The degree of conflicts between n1 and n2 is defined by: Conf lictH (n1 , n2 ) = dH (n1 , n2 ) Proposition. It can be proved that Conf lictH is a distance i.e., (1) Conf lictH (n, n) = 0, (2) Conf lictH (n1 , n2 ) = Conf lictH (n2 , n1 ) (3) Conf lictH (n1 , n2 ) ≤ Conf lictH (n1 , n3 ) + Conf lictH (n3 , n2 ) Example (continued). We can show that: Conf lictH (C, D) = 1 + 1 = 2, Conf lictH (C, A) = 2 + 1 = 3 5.2.2

Reformulation in logic

These previous definitions can be reformulated in logic as follows. Modelling a hierarchy of concepts in logic First of all, it must be stated that a hierarchy of concepts can be modeled by a set of formulas of a propositional language. For doing so, we define a propositional language whose propositional atom symbols are the hierarchy node symbols. Definition. Let H be a hierarchy of concepts. From H, we can define a propositional language whose atoms symbols are isomorphic to nodes symbols (for convenience, we will use the same symbol to denote a node and its isomorphic atom). Then H is modeled by the set of formulas, denoted LH made of formulas: P ↔ C1 ∨ ... ∨ Cn and ¬(Ci ∧ Cj ), i 6= j, for any non-leave node P whose children nodes are C1 ...Cn . It is easy to check that LH has as many models as H has leaves. Furthermore, each node n in H corresponds to a formula, denoted LH (n). We call models of node n, the models of LH (n). Notice that if n is a leave node, then LH (n) has one and only one model. If n is a non-leave node, then LH (n) has several models. Example (continued). The hierarchy is modeled by the set of formulas: {A ∨ B, ¬(A ∧ B), B ↔ C ∨ D, ¬(C ∧ D)} This set has exactly 3 models which are : mA = {A, ¬B, ¬C, ¬D}, mC = {¬A, B, C, ¬D}, mD = {¬A, B, ¬C, D}. 5.2.3

Degrees of conflict in logic

Proposition. Let n1 and n2 be two nodes in a hierarchy of concepts H and let LH (n1 ) and LH (n2 ) be the two formulas they correspond to. Then, Conf lictH (n1 , n2 ) = dh(mod(LH (n1 )), mod(LH (n2 ))) This result means that the degree of conflict between two nodes n1 and n2 of hierarchy H, corresponds to the Hamming distance between the models of their associated formulas LH (n1 ) and LH (n2 ).

5.2.4

Consequence on our formal model

Because the Hamming distance corresponds to some notion of degree of conflicts, we can conclude that the fusion process presented in section 3, and which is based on the Hamming distance, takes into account a kind of degree of conflicts between information. Indeed, the core definition of our model is the one of dW Σ . And we can see that dW Σ (w, [db1 ...dbn ]) can be reformulated as : P n dW Σ (w, [db1 ...dbn ]) = i=1 r(dbi ).Conf lictH (nw , dbi ) where nw is the leave in H such that w is the unique model of nw i.e such that mod(LH (nw )) = {w} This shows that our model implicitly takes into account the notion of degree of conflicts. Thus, our model takes into account the three main notions that underline the recommendations about information evaluation presented in the beginning. Not only it takes into account the number of independent sources that support an information, their reliability but it also takes into account a kind of degree of conflict (which can be justified if a hierarchy of concepts is expressed).

6

Discussion

This works intended to formalize some informal recommendations about information evaluation in information fusion. The main notions underlying the recommendations have been given a formal interpretation. In particular we have shown that our model takes into account a notion of degree of conflict, which, it must be pointed out, is dependent from the definition of a hierarchy of concepts. Changing such a hierarchy implies a modification of theses degrees. Thus has an incidence on the result of the fusion process. However, some of the notions underlying the recommendations have not yet been taken into account here. For instance, a point which has been left aside in this present work, is the total ignorance about the reliability of a given source, which was a case foreseen in the recommendations. In the present formalization, it seems difficult to represent that. Indeed, which number can be attached to a source whose reliability is not known ? That is an open question. However, this formalization shows that the second attribute in the evaluation of an information is not necessary. Thus, the evaluation of an information could be simplified and memorizing only the names of the sources that emitted it and their reliability is enough. As for the method of fusion, we have chosen a weighted majority because it takes into account the number of the sources and their reliability. If the number of the sources had not been so important, we could have chosen a method taking into account the reliability of the sources only (see [9], [10], [11] for instance). As for arbitration merging operator as characterized in [5] or [6], it does not consider as essential the number of sources which support an information. As for implementation issues, in order to automatically compute the answers to queries, we suggest to

use one the query-evaluator defined in [12]. Finally, we must point out the fact that the choice of the numerical scale 1,2,...5 for encoding degrees of reliability is arbitrary and another scale could be used, such as, for instance an exponential one. The study of the admissible scales and their impact on the result contitute an extension to this present work. Acknowledgment: We would like to thank the anonymous referees for their helpful comments.

References [1] Annex to STANAG 2022 (Edition 8) North Atlantic Treaty Organization (NATO) Information Handling Services, DODSID, Issue DW9705 [2] AAP-6-2006, North Atlantic Treaty Organization NATO Standardization Agency (NSA), 2006, (www.nato.int/docu/stanag/aap006/AAP6-2006). [3] V Nimier and L. Cholvy. Information Evaluation: discussion about STANAG 2022 recommendations. IST Symposium “Military data and information fusion”. Prague, October 2003. [4] L. Cholvy. Information Evaluation: Formalization of informal recommendations. Proc. of IPMU’04, Perugia, 2004. [5] S. Konieczny and R. Pino-P´erez. On the logic of merging. Proc of KR’98, Trento, 1998. [6] S. Konieczny and R. Pino-P´erez. Merging with Integrity Constraints Proc. of ESCQARU’99, London, 1999. [7] Annex to STANAG 5516 (Edition 3) North Atlantic Treaty Organization (NATO) [8] P. Z. Revesz. On the semantics of arbitration. Journal of Algebra and Computation, 7(2), 1997. [9] C. Baral and S. Kraus and J. Minker and V.S. Subrahmanian. Combining multiple knowledge bases. IEEE TKDE, 3(2), 1991 [10] L. Cholvy. Proving theorems in a multi-sources environment. Proc. of IJCAI, Chambry, 1993 [11] C. Liau. A conservative Approach to Distributed Belief Fusion Proc. of FUSION, Paris, 2000 [12] L. Cholvy Querying Contradictory Databases by taking into account their reliability and their number Flexible Databases Supporting Imprecision and Uncertainty G Bordogna and G Psaila editors, Springer Verlag, 2006