A New Research Tool for Hybrid Bayesian Networks using ... - CiteSeerX

1 downloads 0 Views 504KB Size Report
In addition, some free programs existing in the literature, commonly known as BNT, GeNie/SMILE, etc, have their own advantages and disadvantages ...
A New Research Tool for Hybrid Bayesian Networks using Script Language Wei Sun, Cheol Young Park, and Rommel Carvalho The Sensor Fusion Lab Department of Systems Engineering and Operations Research George Mason University Fairfax, VA 22030 ABSTRACT While continuous variables become more and more inevitable in Bayesian networks for modeling real-life applications in complex systems, there are not much software tools to support it. Popular commercial Bayesian network tools such as Hugin, and Netica etc., are either expensive or have to discretize continuous variables. In addition, some free programs existing in the literature, commonly known as BNT, GeNie/SMILE, etc, have their own advantages and disadvantages respectively. In this paper, we introduce a newly developed Java tool for model construction and inference for hybrid Bayesian networks. Via the representation power of the script language, this tool can build the hybrid model automatically based on a well defined string that follows the specific grammars. Furthermore, it implements several inference algorithms capable to accommodate hybrid Bayesian networks, including Junction Tree algorithm (JT) for conditional linear Gaussian model (CLG), and Direct Message Passing (DMP) for general hybrid Bayesian networks with CLG structure. We believe this tool will be useful for researchers in the field. Keywords: Java software tool, hybrid Bayesian networks, script language

1. INTRODUCTION 1234

Bayesian network (BN) is a directed acyclic graph (DAG) consisting of nodes and arrows, in which node represents random variables, and arrow represents dependence relationship between connected nodes in the sense of the probabilistic, deterministic, or functional. Each node in BN has a specified conditional probability distribution (CPD), which together all CPDs parameterize the model. BN has been a powerful probabilistic knowledge base for decision support under uncertainty over a few decades, with numerous applications such as classification, medical diagnosis, bioinformatics, speech recognition, etc. One of the most important features BN has is the factorization of the joint probability space, so that conditional independence can be exploited to save modeling and computations. However, BN model is only useful when combined with efficient algorithms for inference. And certainly, all of model construction and probabilistic inference algorithms have to be implemented in software package with some forms of user interface. Currently in the literature, there are some popular commercial software packages available with different features and prices, such as the famous Hugin,5 and Netica.6 Hugin has developed full line of Bayesian network support products capable to build the BN models and providing mostly used inference algorithms including Junction Tree. More importantly, Hugin provides the capacity of full density estimation for CLG hybrid BN models. On the opposite, Netica has to discretize the continuous variables in model construction and is then only able to return discretized distribution estimates. Both softwares provide good graphical user interfaces and appropriate documentations. Regarding the price, Hugin is expensive for both commercial and academic versions. In addition to the commercially available BN tools described above, there exists some other free packages among academic groups, including GeNie/SMILE from University of Pittsburgh,7 and Matlab BN Toolbox (BNT) developed by Kevin Murphy, et al.,8 .9 The latter is open source Matlab code, therefore researchers can Send correspondence to Wei Sun: [email protected], Telephone: 1 703 993 5538 Signal Processing, Sensor Fusion, and Target Recognition XX, edited by Ivan Kadar, Proc. of SPIE Vol. 8050, 80501Q · © 2011 SPIE · CCC code: 0277-786X/11/$18 · doi: 10.1117/12.884056

Proc. of SPIE Vol. 8050 80501Q-1 Downloaded from SPIE Digital Library on 21 Mar 2012 to 129.174.97.34. Terms of Use: http://spiedl.org/terms

easily extend it and explore new ideas. GeNie/SMILE provides graphical interface and is fast in inference due to the C++ implementation. They also provide APIs for users to call the built-in functions. For the purpose of academic research, we like to have the flexibility to modify/extend the software, and have the software capable to tackle complicated cases such as big networks with continuous variables. In the mean time, we prefer to have it fast in run time. Further, to be user friendly, we like to have basic graphical user interface. These are primary reasons why we develop this new research tool in Java, in which we use script language to define CPDs so that funcational relationship is easily specified for continuous variables. Actually, scripts have been commonly used in BN software, including Hugin and Netica mentioned above. They only differ in grammars and/or forms of representations. For network with CLG topology, we know that the CPD for a discrete random variable can always be represented in a table/matrix, that is why it is being called Conditional Probability Table (CPT) most of time. On the other hand, for continuous variables with CPD involving complicated functional relationships and/or arbitrary densities, scripts may be needed in defining the relationships and specifying the parameters for most of cases, due to its flexibility and expressive power. For example, let us take a look at how to specify a hybrid CPD for one Gaussian variable X with one discrete parent D (binary with states d1, d2), and one continuous parent U in Hugin, Netica, BNT, respectively. Let us assume that the CPD of X is the following,  P (X|D = d1, U ) = N (x; 0.5U + 1, 1) P (X|D, U ) = , P (X|D = d1, U ) = N (x; U − 2, 2) where N (x; u, σ 2 ) represents a Gaussian density with mean u, and variance σ 2 . In Hugin, this CPD can be defined by filling numbers in a form via its GUI, shown in Figure 1 (a) (from the free version of Hugin Lite 7.4). While script equation has been used to specify the same CPD in Netica, as shown in Figure 1 (b) (from Netica v4.16). In BNT, this CPD is specified by the code below: bnet.CP D{X} = gaussian CP D(bnet, X, 0 mean0 , [1 3], 0 cov0 , [0.5 2], 0 weights0 , [0.5 1]); where gaussian CP D is a BNT function to parameterize Gaussian CPD. Script with well defined grammars is the main language in the Java tool we developed. And therefore, we name this tool Script Bayesian Network (SBN) in this paper. As an example, SBN specifies the same CPD mentioned above using the following scripts: defineNode(X, DescriptionOfX); { defineState(Continuous); P (X|D, U ) = if (D == d1){0.5U + N ormalDist(1, 1)} else if (D == d2){U + N ormalDist(−2, 2)} } In general, for dependence relationships that have no fixed forms, script language may be the best and the most flexible means to define a model, describe evidence, and then choose a inference engine to call. However, the script needs to follow well defined grammars and a script parser is then needed to serve as the interpreter between the input scripts and the core program. The remainder of this paper is organized as follows. Section 2 first describes the SBN architecture. We then focus on introducing some detailed implementations of the inference algorithm in SBN in Section 3. Section 4 presents several BN models we used for testing the robustness of SBN. Finally, Section 4.1 summarizes our work.

2. SBN ARCHITECTURE As shown in Figure 2, SBN consists of several main modules, namely, script parser, model database, inference engine pool. These modules interact with each other to complete a task together. First when a well-defined script is given as the input to the program, the script parser reads it in and interprets it into Java code that is understandable to other programs in SBN. Based on the task read in, the script parser can build the BN model

Proc. of SPIE Vol. 8050 80501Q-2 Downloaded from SPIE Digital Library on 21 Mar 2012 to 129.174.97.34. Terms of Use: http://spiedl.org/terms

(a) Hugin specification: the intercept means the constant in the linear function, and numbers corresponding to the variable (in this case it is U ) represent its coefficients in the linear function.

(b) Netica script: using conditational equations to define the hybrid CPD.

Figure 1. Defining the hybrid CPD in Hugin, and Netica

and save it in the model database. Also, the script parser can call the particular inference engine specified in the original input script, and update the inference results back to the model database. Finally, from the model database, the posterior distributions of the hidden variables of our interest can be read out as the output of SBN. We use ANTLR10 (ANother Tool for Language Recognition), a multi-purpose language compiling framework that provides API for users to develop customized text compiler/parser. ANTLR provides excellent support in grammar development, including data structure, token, and parser to interpret, etc. With the help of ANTLR, we have developed the set of SBN grammars and our customized script parser. An Abstract Syntax Tree (AST) is first generated from the script, then various nodes in the tree are plugged in the right places of the program based on the tree translation protocol. For online reference of ANTLR, please visit http://www.antlr.org/. To illustrate SBN grammars, here we show an example script to define a continuous node W with one discrete parent A and 2 continuous parents X, Y , excerpted from our testing models. defineNode(A, DescriptionOfA); { defineState(Discrete, a1, a2); P(A)={a1:0.8; a2:0.2} SBN Program Structure }

Scripts "defineNode(A, ( , DescriptionA);"+ p ); "{ defineState(Discrete, a1, a2);" + "f( A ) = { a1:0.8; a2:0.2; } }" + "defineNode(X, DescriptionX);"+ "{{ defineState(Continuous);" ( ) + "p( X ) = NormalDist( -3, 1 ); }" + "defineNode(Y, DescriptionY);"+ "{ defineState(Continuous);" + "p( p( Y ) = NormalDist(( 3, 0.5 ); ) }“ } +

Parser

"defineInferenceEngine( DMP );“ + "defineEvidence( W, 2.1 );“ …..

Model database

Inference engines DMP, LW, etc.

Belief update posterior distributions

Figure 2. Primary modules in SBN.

11

Proc. of SPIE Vol. 8050 80501Q-3 Downloaded from SPIE Digital Library on 21 Mar 2012 to 129.174.97.34. Terms of Use: http://spiedl.org/terms

defineNode(X, DescriptionOfX); { defineState(Continuous); P(X)=NormalDist(-3, 1); } defineNode(Y, DescriptionOfY); { defineState(Continuous); P(Y)=NormalDist(3, 0.5); } defineNode(W, DescriptionOfW); { defineState(Continuous); P(W | A,X,Y)=if(A==a1) { X - 0.5*Y + NormalDist(2, 1); } else if(A==a2) { X + Y + NormalDist(-1, 0.5) ; } } Basically, the model database stores all of BN models constructed. And accordingly, it updates any evidence, as well as the beliefs of variables in the model in real time. Belief changes are sent from the inference engine module, that has a hierarchical structure to save all implemented inference algorithms. Up to date, we have DMP and LW as the candidate inference engines for the users to call. However, it is easy to have other inference algorithms plugged in SBN. The Inference Engine Java class serves as the super class for any particular inference algorithm, that is already encapsulated to have internal interface within SBN architecture. Therefore, any other algorithms can inherit Inference Engine super class, and have its own implementation separately. In the same manner, any inference engine can be called using run(‘name of inference algorithm’), and can access model data in the same way.

3. IMPLEMENTATION OF INFERENCE ALGORITHM In the process of implementing inference algorithms, there are several important functions worth describing in more details, such as the functional inverse and unscented transformation. These are required computations in DMP algorithm, but also, very useful in general. For example, functional inverse can be used anywhere to solve an equation; and unscented transformation can be easily extended as unscented filter for state estimations in dynamic systems. Next, let us review how we implement them in SBN.

3.1 Functional inverse using equation tree When computing λ messages sent from one continuous node to its continuous parents, DMP algorithm needs to know the inverse function of the original function defined in the CPD, which is assumed to be inversible and has unique inverse function. We use the equation tree method11 to find the inverse of any inversible function with the target independent variable showing up only in one place in the original function. Pseudo code of the equation tree method is summarized in Table 1. First of all, any math function is decomposed into a tree structure. Without loss of generality, let us assume that the math operators we are dealing with are all binary with two operands. In the equation tree, the math operator is always the root node. And the two branch subtrees will be the two separate parts in the original function connected by this particular math operator. Let us look at an example equation tree for the right side of an equation, as shown in Figure 3. By following the equation tree method, we obtain the corresponding inverse function and associated tree as shown in Figure 4.

Proc. of SPIE Vol. 8050 80501Q-4 Downloaded from SPIE Digital Library on 21 Mar 2012 to 129.174.97.34. Terms of Use: http://spiedl.org/terms

Table 1. Equation Tree Method to Find the Functional Inverse Input: any inversible math function (linear or nonlinear) Y = f (X, T), where Y is the source variable; X is the target independent variable of interest for inverse; and T represents the other variables in the function. Assumption: X shows up at only one place in the original function. Output: the inverse function of the original function. 1. Structure the original function into a tree, otree = {V, R}, where V is the set of nodes that can be math operators, variables, or constants, and R is the set of connecting edges between nodes.; 2. Initialize a new tree itree with the source variable Y as the only node. 3. ctree = otree 4. while ctree! = N U LL ttree = ctree rNode = root node of ttree stree lef t= left subtree of rNode; stree right= right subtree of rNode; if X ⊆ stree lef t replace stree lef t with itree in ttree; ctree = stree lef t else if X ⊆ stree right replace stree right with itree in ttree; ctree = stree right endif Change rNode to the opposite operator Equation Tree to Find the Functional Inverse (Note: if changing from ‘+’ to ‘−’, or from ‘∗’ to ‘/’, reverse the order of operands as well.) itree = ttree „ Every math equation can be structured as a tree.  endwhile 5. Output itree as the inverse function for X … The root node: math operator such as “+”, “‐”, “log”, “cos”, etc.  …

The leaf node: variable or constant.

^

R

Inverse for X

+

L Log

e

/

e

C

*

X

Y

Figure 3. An Example Equation Tree for math function Y = R × loge (X) − C

R

C 12

3.2 Unscented transformation Unscented Transformation (UT)12 is a deterministic sampling method to provide accurate estimates up to the first two moments of a continuous random variable undergone any functional transformation from an initial Gaussian density. We implemented UT in SBN as a Java method for DMP inference engine, and made it ready for potential extension in the future if unscented filter is needed for estimating state space models. Let us demonstrate the unscented transformation by a simple two-dimension Gaussian example. Let x = [x1 x2 ] with mean and covariance matrix given as,     3 1 −1 ¯= x Px = . 1 −1 2 In order to show the robustness of unscented transformation, we choose a set of functions with severe nonlinearity

Proc. of SPIE Vol. 8050 80501Q-5 Downloaded from SPIE Digital Library on 21 Mar 2012 to 129.174.97.34. Terms of Use: http://spiedl.org/terms

Equation Tree to Find the Functional Inverse „

Every math equation can be structured as a tree.  … …

The root node: math operator such as “+”, “‐”, “log”, “cos”, etc.  The leaf node: variable or constant.

^

R

/

e

C

*

Inverse for X

+

L Log

e

R

Y

X

C 12

Figure 4. The inverse function of Y = R × loge (X) − C

shown as below: y1 = log (x21 ) cos (x2 ) ,

y2 =

p

exp (x2 ) sin (x1 x2 )

The true posterior statistics are approximated very closely by brute force Monte Carlo simulation using 100, 000 sample points drawn from the prior distribution and then propagated through the nonlinear mapping. We compare them with the estimates calculated by unscented transformation using only 5 sigma points, which are the deterministic sample points. From Figure 5, it shows that the mean calculated by transformed sigma points is very close to the true mean and the posterior covariance seems consistent and efficient because the sigma-point covariance ellipse is bigger but still tight around the true posterior covariance ellipse.

4. TESTING MODELS We have used various BN models for testing the robustness of SBN in different settings, including: • one continuous node with two discrete parents and two continuous parents, as shown in Figure 6(a); • one continuous node with three discrete parents, as shown in Figure 6(b); • CLG poly tree model, as shown in Figure 7; • nonlinear hybrid BN model, as shown in Figure 8. For all the testing models, the inference results returned by SBN have been compared with the golden standard provided by Hugin, except the one with nonlinear functional relationships. As well know, there is no exact Prior Distribution

After Transfom,ation sample points

0

o

hun mean

- nua COVa anne D

o

sIgma pelVIs

0

op mean

ID -4

(a) prior to the functional transformation

-3

-2

-I

U

I

2

(b) post estimates

Figure 5. Demostration of Unscented Transformation

Proc. of SPIE Vol. 8050 80501Q-6 Downloaded from SPIE Digital Library on 21 Mar 2012 to 129.174.97.34. Terms of Use: http://spiedl.org/terms

3

4

Test LinearPolytree Polytree Hybrid Models Test ––Linear Hybrid BNBN Models

Hybrid model with with 2d2c 2 discrete (a) One continuous node and Hybrid 2 continuous parents model with 2d2c

Hybrid model with 3d (b) One continuous node with Hybrid model with 3d 3 discrete parents.

Figure 6. Poly tree hybrid testing BN models, where the yellow double eclipse represents a continuous variable.

Test Hybrid Models solution in the nonlinear case.– Linear However, Polytree we implemented DMPBN in Matlab BNT as well, and the results from BNT and SBN are consistent with each other. Furthermore, by conducting brute force16Likelihood Weighting method with huge number of random samples, we also verified the accuracy of inference results by SBN in the nonlinear cases. 16

Test – Nonlinear Polytree BN Model Figure 7. A polytree CLG BN model, where the Hybrid yellow double eclipse represents a continuous variables. 17

defineNode(MCancer, DescriptionOfMCancer); { defineState(Continuous); P(Mcancer) = NormalDist(10, 1); } defineNode(SC, DescriptionOfSC}; { defineState(Continuous); P(SC|MCancer) = log(MCancer) + NormalDist(5, 1); } defineNode(BT, DescriptionOfBT}; { defineState(Continuous); P(SC|MCancer) = exp(sqrt(MCancer)) + NormalDist(0, 1); } defineNode(Coma, DescriptionOfComa}; { defineState(Continuous); P(SC|SC, ( | , BT)) = SC + sqrt(BT) q ( ) + NormalDist(0, ( , 1)) ; } defineNode(Headaches, DescriptionOfHeadaches}; { defineState(Continuous); ( ); P(SC|MCancer) = 0.5*BT + NormalDist(0, 0.5); }

19

Figure 8. A BN model consisting of continuous nodes with nonlinear function relationships.

Proc. of SPIE Vol. 8050 80501Q-7 Downloaded from SPIE Digital Library on 21 Mar 2012 to 129.174.97.34. Terms of Use: http://spiedl.org/terms

4.1 Summary In order to have the capability of modeling continuous variables in Bayesian networks, and implementing hybrid inference algorithms such as DMP, we have developed a Java tool called Script Bayesian Network (SBN) using script language. SBN has the following features, • SBN can model both discrete and continuous variables; • SBN can specify any functional relationships between variables, including nonlinear functions; • SBN provides discretization as an option when building a hybrid model; • SBN has multiple inference engines implemented, including DMP, LW at this point; • SBN is an open-source software package, easy to use and extend. Currently, SBN has basic user interface and it is easy to use. However, more documentations and online download/support will be helpful for users to obtain and learn the tool. In addition, improving the graphical user interface will also make the tool much more friendly.

REFERENCES [1] Charniak, E., “Bayesian networks without tears,” AI MAGAZINE 12(4), 50—63 (1991). [2] Pearl, J., [Probabilistic reasoning in intelligent systems: networks of plausible inference], Morgan Kaufmann (1988). [3] Jensen, F. V., [Introduction to Bayesian Networks ], Springer, 1 ed. (Aug. 1996). [4] Neapolitan, R. E., [Probabilistic Reasoning in Expert Systems: Theory and Algorithms], Wiley-Interscience (Apr. 1990). [5] [http://www.hugin.com/ ], Hugin Expert, Aalborg, Denmark (website). [6] [http://www.norsys.com/], Norsys Software Corp., Vancouver, Canada (website). [7] [http://genie.sis.pitt.edu/], Decision Systems Laboratory, University of Pittsburgh (website). [8] Murphy, K. P., “The bayes net toolbox for matlab,” Computing Science and Statistics 33, 2001 (2001). [9] [http://code.google.com/p/bnt/ ], Kevin Murphy, et al., Open source software (website). [10] Parr, T., [The Definitive ANTLR Reference: Building Domain-Specific Languages], Pragmatic Bookshelf, 1 ed. (May 2007). [11] Aho, A. V., Sethi, R., and Ullman, J. D., [Compilers: Principles, Techniques, and Tools], Addison Wesley, US ed ed. (Jan. 1986). [12] Julier, S. J. and Uhlmann, J. K., “The scaled unscented transformation,” in [Proceedings of the American Control Conference ], 6, 4555–4559 (2002).

Proc. of SPIE Vol. 8050 80501Q-8 Downloaded from SPIE Digital Library on 21 Mar 2012 to 129.174.97.34. Terms of Use: http://spiedl.org/terms