existing decision support solutions are based on using predefined template or rule based models, which are inadequate for complex military operations. In DSO.
Novel Methods for Fusing Bayesian Network Knowledge Fragments in D’Brain Gee Wah Ng, Khin Hua Ng, Kheng Hwee Tan, Chong Hock K. Goh DSO National Laboratories 20 Science Park Drive Singapore 118230 Email: {ngeewah, nkhinhua, tkhenghw, gchongho}@dso.org.sg Abstract - In this paper, we present two novel methods to handle the fusion of multiple Bayesian Network knowledge fragments which we termed N-Combinator and N-Clone. In DSO National Laboratories, we have developed a cognition based dynamic reasoning machine called D’Brain capable of performing high level data fusion. Knowledge is encapsulated in D’Brain as Bayesian Networks knowledge fragments. D’Brain is dynamic in its reasoning mechanism that resembles human reasoning, where the knowledge structure is ever evolving with the different sources of observable inputs. N-Combinator and N-Clone are the methods used in the dynamic reasoning mechanism. Experiments have shown the good performance of these two methods.
knowledge fragments are stored in a way like the human’s long term memory. We adopted Endsley’s [3, 4] concept where based on the observed evidence, the mental models are instantiated as situation models represented in a short term memory. This is the space where the reasoning process is carried out. The reasoning is dynamic because the instantiated situation network is not precompiled and it evolves as new evidence is received. Potential applications for D’Brain include intent inference, plan recognition, hypothesis management and any knowledge or data based inference system.
Keywords: Data Fusion, Bayesian Network, Dynamic Reasoning.
1
Introduction
Today’s military commanders operate in a highly complex and dynamic battlefield due to deployment of advanced sensor and weapon systems, and newer concepts of operations like network-centric warfare. Hence, the commanders must leverage on all possible information and knowledge in order to meet the challenges of modern military operations and it is critical that they are supported with intelligent and robust decision aids. However, most existing decision support solutions are based on using predefined template or rule based models, which are inadequate for complex military operations. In DSO National Laboratories, we have developed a cognition based dynamic reasoning machine called D’Brain [1] (Dynamic Bayesian Reasoning & Advanced Intelligent Network). D’Brain is a high level data fusion system designed to provide enhanced situation awareness, overcome limitations of human processing power and to provide consistent and good quality decision support. In our research, we adopted a cognitive perspective to designing the architecture of D’Brain so that it has some level of resemblance to the human cognitive process [2], which we strongly believe is a key element to building the next generation of truly intelligent cognitive aids for commanders. D’Brain’s dynamic reasoning mechanism works on a library of knowledge fragments which are encapsulated mental models of the commanders. These
Fig. 1: Architecture of D’Brain In D’Brain, we use Bayesian Network (BN) for the modelling of the knowledge fragments. An advantage of our architecture (Fig. 1) is that the implementation of mental models, and hence situation and intent models, is not restricted to a particular knowledge representation. Using BN is just one viable approach. We can also use other approaches such as Neural Networks, Fuzzy Rules, templates, etc. The architecture allows flexibility in
implementation and can be updated through modifications of its modules. Since the focus of this paper is not on the architecture of D’Brain, we refer interested readers to [1] for more details. A BN is fully specified by two parts: the qualitative part which is a directed acyclic graph that represents the conditional relationships among nodes and the quantitative part that defines the conditional probability tables (CPTs) for the node in the BN [5]. Most BN based decision support solutions build a complete and static BN for the inference process [6, 7]. The networks are usually preconstructed prior to a mission and are expected to accurately reflect the operational scenario, which is unrealistic in the real world. The disadvantage of such static modelling is that as the events and activities in the battlefield deviate from what have been expected, the network will not be able to fully represent the new situation or adapt sufficiently well to it. To adapt to dynamic situations, we model situations by fusing BN knowledge fragments together according to the sensor inputs received. BN knowledge fragments form our knowledge base of a certain domain. Each fragment models a BN with a network structure that generally holds true, and each fragment behaves like a standard BN. As evidence is received, knowledge fragments are merged together to form a Situation Specific Reasoning Network (Fig. 2). In other words, we maintain a library of subnetworks fragments that encode a smaller set of basic relationships. These fragments are then fused at run-time based on the presence of evidences to form a full network and is constantly updated to reflect the current situation. Situation Specific Reasoning Network
Fragment 1
B
Fragment 2
Combined Fragment
C
B
C
Combine
A
A
A
Fig. 3 : Graphical Representation of Problem Statement One approach is to compute the higher dimension CPT of P(A|B,C), from lower dimension CPTs P(A|B) and P(A|C). There exists a widely known method of CPT combination called Noisy-OR if all the nodes in the model are binary [5]. Noisy-OR is usually used to describe the interaction between n causes x1, x2,…,xn (parent nodes) and their common effect y (child node). The causes xi are each assumed to be sufficient to cause y in absence of other causes and their ability to cause y is assumed independent of the presence of other causes. Here, we do not consider leak probabilities where the model does not capture all possible causes of effect node y. The leak probability models the situation where the effect node y will be active even when all its causes are inactive. The mathematical formulation for the probability that the effect node y is active, given a subset of its active parents x p is: (1) P ( y | x p ) = 1 − ∏ (1 − pi )
Knowledge Fragments
C
B
combined fragment where node A now has multiple parents namely, B and C. This fusion process is needed in D’Brain as it allows the situation specific reasoning network to be created from the library of knowledge fragments.
LTM
STM
i:xi∈xp
A
B
C A
Sensor Reports > Observation A > Observation B > Observation C
Fig. 2 : Dynamic Reasoning In this paper, we will introduce two novel methods for fusing BN knowledge fragments as implemented in D’Brain. The rest of the paper is organised as follows. Section 2 presents the problem statement and related work. Sections 3 and 4 present the N-Combinator and NClone methods respectively with experiments to highlight their performance. We then conclude in Section 5.
2
BN Knowledge Fragments Fusion
The problem of BN knowledge fragments fusion is depicted in the example given in Fig. 3. In this example we wish to fuse fragments 1 and 2 encoded with the parameters P(A|B) and P(A|C) respectively to form the
where pi is defined as: pi = P ( y | x1 , x 2 ,..., x i −1 , x i , xi +1 ,..., x n −1 , x n ) = P( y | xi )
[abbreviated notation]
(2)
which is the probability that the effect node y is active given a single parent xi is active. Note that x i denotes that xi is not active. While Noisy-OR is a widely accepted CPT combination method, it is limited to only binary nodes modelling. Diez [8] extended the Noisy-OR to a Noisy-Max model which allows computation of full CPTs for multiple state nodes but on the condition that the nodes (effect node and all its parents) have to be graded. This means that if a node has n states, its nth state has to be greater in its effect than its (n-1)th state which is greater in its effect than its (n-2)th state and so on. For example, if the node is Illness, its graded states can be None, Mild, Serious, and Critical. On the other hand, if the node is Color and its states are Red, Green, Blue, then the node Color is not graded since we cannot say that the effect of Red > Green > Blue.
Noisy-OR is not a suitable method for fusing BN knowledge fragments due to the restriction that only binary nodes are used in the knowledge fragments. Modelling complex military operations using only binary nodes is inadequate. Although Noisy-Max allows us to model nodes with more than two states, it requires nodes to be graded. However, it is not always possible to model graded nodes for military operations. Hence, Noisy-Max is also not a suitable method. The absence of an appropriate method for fusing BN knowledge fragments for generic nodes (multi-state nodes with no restriction on their states) provided the motivation for us to propose two novel methods: N-Combinator and N-Clone.
where α =
1
is the normalizing
3
∑ P(a i | b1 , b2 , b3 , c1 ,c 2 , c 3 ) i =1
factor. Binary nodes
Nodes with multi-states
b2
b1
B
b1
b3
b3
b2
Modelled by
……. a1
a1
A
a2
a1
a3
a3
Fragment 1 c2
c1
c1
c3
c3
c2
C Modelled by
……. a1
a1
a2
a1
a3
a3
A
3
N-Combinator
Fragment 2
The N-Combinator computes the joint CPT entries for a child node with multiple parent nodes without the constraint that they have to be graded. There are three steps to the N-Combinator method. Step 1: Model each multi-state child node and its multi-state parent nodes as multiple binary nodes. Step 2: Combine the binary nodes using Noisy-OR. Step 3: Re-model the binary nodes back to their original multi-state nodes by normalization.
Binary nodes
Nodes with multi-states C
B
b1
A
a1
X
x2 …
c2
c3
Noisy-OR Combination
a3
a2
Binary nodes b2
b1
B
x1
c1
Fig. 5: Example of N-Combinator method
n binary nodes Modelled by
b3
Combined Fragment
Nodes with multi-states
Node X with n states
b2
Modelled by
b1
b3
Modelled by
……. a1
a1
A
Xn
b3
b2
a3
a1
a3
a3
Fragment 1 c2
c1
Consider the node X in Fig. 4 with n states {x1, …, xn}. In Step 1 we model it into n binary nodes where each binary node has an active and non-active state represented by x i and x i respectively. Then we have
(
( X i = x i ) ≡ x1 ,..., x i −1 , x i , x i +1 , … x n
= α [1 − (1 − P ( a 3 | b1 ) ) ⋅ (1 − P (a 3 | c2 ) )] (Using Eqn 1 & 2)
a1
a2
a3
a3
Fragment 2 Binary nodes
Nodes with multi-states C
b1
b2
b3
c1
c2
c3
Modelled by a1
a2
Noisy-OR Combination
a3 P ( a3 | b1, b2 , b3 , c1, c 2 , c3 )
Combined Fragment
Fig. 6: Illustration of obtaining P(A=a3|B=b1,C=c2) The general equation for computing the joint CPT of a child node Y that has n parent nodes X1, X2,… Xn (Fig. 7) is P(Y = y j | X 1 = x1 , X 2 = x 2 ,..., X n = x n ) = α 1 −
where α = (4)
a1
a1
A P(A=a3|B=b 1,C=c 2)
P ( A = a 3 | B = b1 , C = c 2 ) ≡ α P( a 3 | b1 , b2 , b3 , c1 , c2 , c3 )
…….
A
(3)
To illustrate how Step 2 and Step 3 are carried, we use the example of computing the joint CPT entries for a child node with 2 parent nodes shown in Fig. 5. In this example, we model the multi-state nodes A, B, and C (each multistate node has 3 states) into the binary nodes a1, a2, a3, b1, b2, b3, c1, c2, c3 (Step 1). Next, these multiple binary nodes are combined via Noisy-OR (Step 2) using P(A|B) (stored in fragment 1) and P(A|C) (stored in fragment 2), and the Noisy-OR conditional probabilities are normalized (Step 3) to yield the full CPT of P(A|B,C). For example, (refer to Fig. 6 for illustration),
c3
c2
Modelled by
B
)
c1
c3
C
Fig. 4 : Modelling a multi-state node with multiple binary nodes
n
∏ (1 − P(y
j
i =1
| xi
))
1 m
∑ P (Y = y j | X 1 = x1 , X 2 = x 2 ,..., X n = x n ) j =1
and m is the number of states of Y.
(5) ,
X1
… …
X2
Intent
Intent
Xn Indicator
Fragment 2
Y
Intent Act AB
Fig. 7 : Child node Y with n parent nodes X1, X2,… Xn
Act AC
Fragment 1
Act AB Indicator C
Act AC
3.1
Indicator B
Experiments on N-Combinator
The goal of the experiments in this section is to investigate the performance of N-Combinator in inferring the intention of a target of interest based on inputs on its observable activity. We use the ground truth model as shown in Fig. 8, which serves as the model for generating the activities of the target of interest. The mental model of BN fragments created in D’Brain for the intent inference process is shown in Fig. 9. We will compare the inference accuracy of the situation specific reasoning network generated by NCombinator and Noisy-Max. The performance measure is defined as the Euclidean error on the posterior probability of the node ‘Intent’. For experiments 1 and 2, we use the ground truth model as shown in Fig. 8 with node ‘Intent’ having states A, B and C; all observable activity nodes (Act AB, Act AC, Act BC) having 3 states; and the intermediate nodes Indicator A, B and C having 4, 3 and 3 states respectively. Each experiment consists of 20 experimental runs. For each experimental run, we generate the CPTs for each node with a peaky distribution in the following manner: for each combination of parents’ states of each node, we randomly choose a state of the node to be ProbDominant = [0.5 1] and randomly generate and normalize the rest of the state probabilities such that their sum is 1-ProbDominant. The CPTs for the BN fragments are then obtained by marginalizing from this ground truth model. An exhaustive set of 63 test data Dtest per experimental run on the observable activity nodes where Dtest = {D1, D2,… D63} and D1 = {Act AB = state 1}, D2 = {Act AB = state 2}, … D63 = {Act AB = state 3, Act AC = state 3, Act BC = state 3} was used to serve as observation inputs to D’Brain. For each experiment, we plot the mean of the Euclidean error for each experimental run. Intent
Indicator
Indicator B
Indicator C
Act AB
Act AC
Act BC
Fig. 8 : Ground Truth Model for Experiment 1 & 2
Act BC
Fragment 3
Act BC
Fig. 9 : Mental Model library of BN Fragments
3.1.1 Experiment 1
Fig. 10 : Result of Experiment 1 (N-Combinator) From the result shown in Fig. 10, the N-Combinator clearly outperformed the Noisy-Max for most of the experimental runs. Also note that N-Combinator yielded relatively low errors which indicated that it closely matched the posterior probabilities of the node ‘Intent’ in the ground truth model. The poorer performance of NoisyMax was expected since it required the nodes to be graded, which was not the case in this experiment.
3.1.2 Experiment 2 For fair comparison between N-Combinator and Noisy Max, we repeated experiment 1 with the underlying ground truth model’s CPT distributions now being graded.
Act ABC1
Intent
Intent
Indicator
Indicator B
Act ABC2
Act ABC3
Fragment 1
Act ABC1
Intent
Act ABC2
Act ABC3
Fragment 3
Indicator C
Act ABC1
Act ABC2
Act ABC3
Fragment 2
Fig. 13 : Mental Model Library of BN Fragments Fig. 11: Result of Experiment 2 (N-Combinator) We see from Fig. 11, that in this case the Noisy-Max method did achieve smaller errors, which came as no surprise since the distributions were indeed graded. However, it is interesting to note that the N-Combinator also has a comparable performance to that of Noisy-Max. Out of the 20 experimental runs, the N-Combinator has the smaller margin of error in 7 runs compared to 13 runs for the Noisy-Max method. It is obvious that when the underlying CPT distribution is graded, the preferred method is Noisy-Max. However, the problem is, in reality, the ground truth distributions are often unknown, especially in the military domain. Hence, the N-Combinator is still a better choice over Noisy-Max. Fig. 14: Result of Experiment 3 (N-Combinator)
3.1.3 Experiment 3 As the number of parents to be combined for a child node increases, the number of parameters to be computed to obtain the full combined CPT increases exponentially. Therefore, we would like to investigate if the increase in the number of parents affects the comparison results. In this experiment, we use the ground truth model as shown in Fig. 12 and the corresponding fragments as shown in Fig. 13. The observable nodes Act ABC1, ABC2 and ABC3 are each set to 3 states with the rest of the nodes having the same states as before. CPTs are generated for all nodes as in Experiment 1.
From Fig. 13, we see that the N-Combinator again achieved better results than Noisy-Max. Out of the 20 experimental runs, the N-Combinator has the smaller mean Euclidean error in 15 runs compared to only 5 in NoisyMax. However, we also observe that as we increase the number of parents to be fused, the resulting CPT converges to a uniform distribution. This effect can be explained by looking at Equation 5. Note that as n → ∞ , ∏ (1 − P (y j | x i )) → 0 , n
i =1
Intent
since each term in the product is • 1. Therefore, Indicator
Act ABC1
Indicator B
Act ABC2
Indicator C
Act ABC3
Fig. 12 : Ground Truth Model for Experiment 3
P (Y = y j X 1 = x1 , X 2 = x 2 ,..., X n = x n ) →
as n → ∞
1 , m
Nonetheless, we also note that Noisy-OR tends to a skewed distribution as we increase the number of parents to be fused. From Equation 1, P( y | x p ) = 1 − ∏ (1 − pi ) → 1 as x p increases i: xi ∈x p
i.e. the more active causes there are, the more likely the effect node will be active. Noisy-Max, the generalisation of Noisy-OR, suffers from the same problem. This can be seen from the equations in [9]. Hence, when the number of parents to be fused is substantial, the performance of N-Combinator will suffer. This prompted us to propose an alternate approach to BN fragment fusion called N-Clone.
4
N-Clone
Recall that a BN is fully specified by its network structure and the corresponding network parameters (CPTs). In the N-Combinator approach to BN fragment fusion, we effectively fused the network structures of all fragments into one, which requires the higher dimension joint CPT to be computed. On the other hand, if we can some how maintain the network structure of the individual fragments in the fused network, then there is no need to compute new CPT parameters. This is the approach adopted in N-Clone. In N-Clone the child node is cloned. In this way, we do not need to generate the joint CPT entries. In doing so, we avoid the problem of poor inference results from poorly generated joint CPT entries. Suppose we have these 2 fragments: A
node. This is as though the two nodes were merged into a single node as shown by the dotted ellipse. Again, note that we do not need to generate joint CPTs with this method: P(D1|B) is obtained from P(D|B) in fragment 1; P(D2|C) is obtained from P(D|C) in fragment 2. Fragment 1 gives us some information about node D (now cloned as D1). Fragment 2 also gives us some information about node D (now cloned as D2). The posterior of node D is computed by combining the posteriors of node D1 and D2. For example, if A is in state 1, we have a posterior probability for node D1, and a posterior probability for node D2. Suppose their posterior probabilities are as follows: Prob(D1 = State 1) Prob(D1 = State 2)
0.3 0.7
Prob(D2 = State 1) Prob(D2 = State 2)
0.4 0.6
We propose to combine these two posterior probabilities by averaging: Prob(D = State 1) Prob(D = State 2)
(0.3+0.4)/2 = 0.35 (0.7+0.6)/2 = 0.65
This is the posterior probability that the user will view if he queries about node D (i.e., given evidence that node A is in state 1, the probability that node D will be in state 1 is 0.35).
A
4.1 C
B
D
D
Instead of combining node D’s parents in N-Combinator to form A
C
B
D
we now clone the node D A
C
B
D1
D2
where D1 and D2 are the clone nodes. They are kept as separate nodes in D’Brain. However in the graphical user interface, the user will not view the clone nodes. To the user, all the clone nodes will be displayed as one single
Experiments on N-Clone
We now compare the performance of N-Clone to that of N-Combinator. We repeated Experiments 1 and 2 with nodes Act AB, Act AC, Act BC being the clone nodes instead of nodes fused via N-Combinator. The results are shown in Fig. 15 and Fig. 16 respectively. N-Clone matched the performance of N-Combinator in Experiment 1 and clearly out-performed N-Combinator in Experiment 2. In fact, N-Clone also out-performed Noisy-Max in Experiment 2, which was unexpected since the underlying truth distributions were graded. Nevertheless, this showcased the good performance of N-Clone and is worth further investigation in future research. Lastly, we repeated Experiment 3 with nodes Act ABC1, Act ABC2, Act ABC3 being cloned. The result is shown in Fig. 17, and it again shows the on-par performances by the two fusion methods.
Fig. 15 : Result of Experiment 1 (N-Clone)
Situation-specific Intent Model Construction module in our D’Brain architecture (Fig. 1). This module requires algorithms to fuse various Bayesian Network knowledge fragments together. In the N-Combinator approach to BN fragment fusion, we effectively fused the network structures of all fragments into one followed by computing a higher dimension joint CPT, whereas the N-Clone method avoids the need of computing joint CPT entries by maintaining the network structure of the individual fragments in the fused network. We have shown the good performance of these two methods through a few experiments on intent inference problems. The NCombinator suffers from loss of discerning distributions in the fused network when the number of parent nodes are substantial, where its computed CPTs converge to uniform distributions. The alternative is to use the N-Clone method. However, N-Clone has a higher implementation complexity with increased number of fragments to be fused, since it has to explicitly handle the cloned network structure to maintain a single fused network for reasoning. Hence there is a trade-off between accuracy and complexity when choosing between N-Combinator and NClone.
Acknowledgements This research was funded by the Directorate of Research and Development (DRD), Defense Science and Technology Agency (DSTA), Singapore.
References Fig. 16 : Result of Experiment 2 (N-Clone)
[1] G. W. Ng, K. H. Ng, K. H. Tan, and C. H. K. Goh, The Ultimate Challenge of Commander’s Decision Aids: The Cognition Based Dynamic Reasoning Machine, In Proc. of 25th Army Science Conference, Orlando, Florida, Nov 2006. [2] G. W. Ng, Intelligent Systems - fusion, tracking and control. Research Studies Press and Institute of Physics Publishing, 240 pp. [3] M.R. Endsley, Theoretical Underpinnings of Situation Awareness: A Critical Review. Situation Awareness Analysis and Measurement. LEA Inc, pp 3-32, 2000.
Fig. 17: Result of Experiment 3 (N-Clone)
5
Conclusion
In this paper, we have presented the N-Combinator and NClone methods for fusing BN knowledge fragments used in the dynamic reasoning machine that we have developed called D’Brain. These two methods are essential to the
[4] M.R. Endsley, Situation Models: An Avenue to the Modelling of Mental Models. In Proc. of 14th Triennial Congress of the International Ergonomics Association and the 44th Annual Meeting of the Human Factors and Ergonomics Society, Santa Monica, CA, pp 61-64. [5] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 552 pp, 1988.
[6] E. Wright, S. Mahoney, K. Laskey and M. Takikawa, Multi-Entity Bayesian Networks for Situation Assessment. In Proc. of the 5th International Conference on Information Fusion, 2002, vol. 2, pp 804-811. [7] S. Das, R. Grey and P. Gonsalves, Situation Assessment via Bayesian Networks. In Proc. of the 5th International Conference on Information Fusion, 2002, vol. 1, pp 664-671. [8] F. J. Diez, Parameter Adjustment in Bayes Networks. The Generalized Noisy OR gate. In Proc. of the 9th Annual Conference of Uncertainty Artificial Intelligence, San Mateo CA, 1993, pp 99-105. [9] F. J. Diez, S. F. Galan, Efficient Computation for the Noisy-Max. International Journal of Intelligent Systems, 2003, pp 165-177.