Software Comparison Dealing with Bayesian Networks Mohamed Ali Mahjoub1 and Karim Kalti2 1
Preparatory Institute of Engineer of Monastir (IPEIM) 5019 Monastir, Tunisia
[email protected] 2 University of Monastir (FSM) Monastir, Tunisia
[email protected]
Abstract. This paper presents a comparative study of tools dealing with Bayesian networks. Indeed, Bayesian networks are mathematical models now increasingly used in the field of decision support and artificial intelligence. Our study focuses on methods for inference and learning. It presents a state of the art in the field. Keywords: Bayesian network, software, comparison.
1 Introduction Bayesian networks are part of the family of graphical models [1],[3]. They band together in the same formalism of graph theory and the probability to provide effective intuitive tools to represent a probability distribution attached to a set of random variables. For handling and scheduling algorithms dealing with Bayesian networks, several libraries have been initiated. The purpose of this paper is firstly, to study these libraries, see what provided each of them, the implemented algorithms, data types supported and the development interface. Thus, we present a synthesis of Bayesian networks. Initially, we present the formalism of Bayesian networks covering fundamentals notions and associated areas of applications of Bayesian networks. In this article, we review some of the more popular and/or recent software packages for dealing with graphical models.
2 Formalism of Bayesian Networks One of the key issues in the field of research in Artificial Intelligence is being able to design and develop dynamic and evolving systems. Therefore, they must be equipped with intelligent behaviors that can learn and reason. But in most cases, the knowledge gained is not always adequate to allow the system to take the most appropriate decision. To answer such questions, several methodologies have been proposed, but only probabilistic approaches are better suited not only to reason with the knowledge and belief uncertain, but also the structure of knowledge representation. These probabilistic approaches are called Bayesian networks. They are a compact representation of a D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 168–177, 2011. © Springer-Verlag Berlin Heidelberg 2011
Software Comparison Dealing with Bayesian Networks
169
joint probability of variables on the basis of the concept of conditional independence. In addition, they aim to facilitate the description of a collection of belief by making explicit the relationship of causality and conditional independence among these beliefs and provide a more efficient way to update the strength of beliefs when new evidence is observed. 2.1 Basic Concepts In a first part, we recall the Bayes theorem: Given two events A and B, Bayes theorem can be stated as follows: P(A|B) = P(B|A)*P(A)/P(B)
(1)
Where P(A) is the prior probability of A, P(B|A) is the likelihood function of A, and P(B) is the prior probability of B. Thus, P(A|B) is a posterior probability of A given B. A Bayesian network is an acyclic graph. This graph is directed and without circuit possible. Each node of a Bayesian network is a label that is an attribute of the problem. These attributes are binary, which can take (with some probability) the value TRUE or FALSE, which means that a random variable is associated with each attribute. A Bayesian network is defined by: A directed graph without circuit G = (V, E) where V is the set of nodes G and E is the set of arcs of G. A finite probability space (Ω,Z,p). A set of random variables associated with nodes of the graph defined on (Ω,Z,p),as : (2) where P(Vi) is the set of causes (parents) of Vi in the graph G. A Bayesian network is then composed of two components; a causal directed acyclic graph (qualitative representation of knowledge) and a set of local distributions of probability1. After defining the different values that can take a characteristic, the expert must indicate the different relationships between these characteristics. Finally, the definition of the different probabilities is necessary to complete the network, each value (state) of a given node must have the probability of occurrence.
3 Problems Associated with Bayesian Networks There are several problems in the use of BNs, we will cite the mains [1]:The first one; The correspondence between the graphical structure and associated probabilistic structure will allow to reduce all the problems of inference problems in graph theory. However, these problems are relatively complex and give rise to much research. The second difficulty of Bayesian networks lies precisely in the operation for transposition of the causal graph to a probabilistic representation. Even if the only probability 1
Quantitative representation of knowledge.
170
M.A. Mahjoub and K. Kalti
tables needed to finish the entire probability distribution are those of a node conditioning compared to his parents, he is the definition of these tables is not always easy for an expert. Another problem of Bayesian networks, the problem of automatic learning of the structure that remains is a rather complex problem.
4 Study on Applications of Learning Bayesian Networks We present a study on the tools manipulating Bayesian networks, see what provides each of them specifies the algorithms for inference and learning and a study on exchange formats for interaction between these programs. Add to this a comparison of these tools in tabular form to facilitate understanding of them. For handling and scheduling algorithms dealing with Bayesian networks, several bookstores and software have been developed for this purpose. We cite these tools on which more research has been conducted. BAYES NET TOOLBOX (BNT) BNT is an open-source library used Matlab2 and is now supported by many researchers view the integration of several algorithms for inference and learning Bayesian networks. However, this library is still difficult to use by non-specialists because it requires some knowledge of Matlab and BNT. Indeed, the manipulation is done through the use of such knowledge and the introduction of the code with a text editor Matlab. BNT offers several inference algorithms for discrete Bayesian networks, Gaussian, or mixed (conditional Gaussian) as the elimination of variables, junction Tree, Quick Score, Pearl exact algorithm (for polyarbres) or approached and sampled (Likelihood Weighting and Gibbs Sampling). For learning parameters, BNT uses two types of learning settings first by maximum likelihood or maximum a posteriori for complete data and the second using the EM algorithm for incomplete data. On learning of structure, BNT uses several scoring functions as the criterion BIC. The only saving format is supported by BNT format Matlab extension ". m". BAYESIALAB BayesiaLab is a product of Bayesia (www.bayesia.com), a French company dedicated to the use of methods of decision support and learning from artificial intelligence and their operational applications (industry, services , finance, etc. BayesiaLab looks like a laboratory full of manipulation and study of Bayesian networks. It is developed in Java and is currently available in French, English and Japanese. BayesiaLab can treat the entire chain modeling study of a system by Bayesian network. Learning in BayesiaLab is using a text file or an ODBC link describing all cases. This application uses exact and approximate inference, but the algorithms are not mentioned. The backup formats supported by BayesiaLab formats are "XBL", "BIF", "DNE" and "NET". NETICA Netica is a Bayesian network software with the greatest circulation in the world3. It is used for diagnosis, prevention or simulation in the fields of finance, environment, 2 3
Following the work of Kevin Murphy. It is developed in 1992 by the Society Norsys which has a free version of software that is limited to the use of 15 variables and samples of 1000 cases for both learning from data.
Software Comparison Dealing with Bayesian Networks
171
medicine, industry and other fields. NETICA software is very responded in the use of Bayesian networks, but uses only one inference algorithm: junction tree. On the other hand, Netica offers a graphical interface for easy operation, and can easily transform a Bayesian network and explore relationships between variables in a model built by learning from data by inverting links or absorbing nodes, while keeping unchanged the probability of overall Bayesian network. Learning in Netica is using a text file and CVS files delimited by tabs or an ODBC link describing all cases. The backup formats supported by Netica are "DNE" and "NETA". This application also allows the import of a Bayesian network in the format "DSC", "DXP" and "NET". HUGIN Hugin is one of the greatest tools used in Bayesian networks. It is a commercial product with similar functionality to the BNT. It was one of the first packet to the model DAG. This tool provides a graphical environment and a development environment that help define and build the foundations of knowledge based on Bayesian networks. Hugin supports one inference algorithm which is the junction tree. The Junction tree algorithm can be seen and it is possible to change the method of triangulation. Learning structure and parameters is done from a Java, C or Visual Basic. The learning parameters in Hugin is with the use of the EM algorithm while learning structures is ensured with the use of two algorithms PC and NPC. The backup formats supported by HUGIN formats are "OOBN" and "HKB" and "NET". JAVABAYES JavaBayes is a set of Java tools that create and manipulate Bayesian networks4. This system consists of a graphical editor, a core inference and a set of parsers. The graphical editor allows the user to create and modify Bayesian networks. Methods in JavaBayes are not commented and many variables were not significant names which makes understanding the code difficult. In addition, this JavaBayes still many bugs and shortcomings in its interface. It is difficult to handle, as safeguarding the network or import into another program. The inference algorithms implemented in this system are the elimination of variables and the Junction tree. JavaBayes is able to use models with sets of distributions to calculate the intervals of posterior distributions or intervals expectations, but it does not propose algorithms for learning parameters and structure. The backup formats supported by JavaBayes formats are "BIF" and "XML". GENIE Released in 1998 by the group decision systems Druzdzel, GENIE (Graphical Network Interface) was a development environment for the decision and the construction of Bayesian networks, characterized by its inference engine SMILE (Structural Modeling Reasoning, and Learning Engine) . Engineering offers several inference algorithms as it has several backup formats for the exchange network between different applications. Genie uses essentially the algorithm of junction tree and Polytree algorithm for inference, and several other approximate algorithms that can be used if the networks become too large for the clustering (logic sampling, likelihood weighting,
4
It was developed by Fabio Gagliardi Cozman in 1998 and has been licensed under the GNU (General Public License).
172
M.A. Mahjoub and K. Kalti
self importance and heuristic importance sampling, sampling backwood). Learning settings and learning structures are supported. The backup formats supported are engineered formats "xDSL", "DSL", "NET", "DNE", "DXP", "ERG" and "DSC". BNJ BNJ (Bayesian network tools in Java) is a set of Java tools research and development of Bayesian networks. This project was developed within the KDD laboratory at the University of Kensai. This is an Open Source project licensed under the GNU5. Its latest version has been published in April 2006. It provides a set of inference algorithms for Bayesian networks. In BNJ it is possible to define two types of probability distribution for the nodes: discrete tabular layout and continuous distribution. Bayesian networks created are stored in XML files. BNJ provides two categories of inference algorithms: exact inference ("Tree Junction", "Elimination of variables with optimization") and approximate inference which is based on exact inference algorithms. Indeed, some methods use the concept of sampling as "Adaptive Importance Sampling (AIS)", "Logic Sampling" and "Forward Sampling" and also, there are other methods of applying the algorithms on an exact inference Selection of arcs of the graph to be treated as "KruskalPolytree", "BCS" and "PTReduction. On the other hand, traveling to all source files of this toolkit we find no implementation of learning algorithms or parameters or structure. The backup formats supported by BNJ formats are "NET", "Hugin" and "XML". MSBNX MSBNX is a component-based Windows applications to create and evaluate Bayesian networks. MSBNX algorithm uses the junction tree algorithm for the inference. There is no learning algorithm for parameters or structures. The backup formats supported by MSBNX formats are "XBN", "DSC" and "XML" (XML for this application has a specific design MSBNX different from other XML formats of other applications that provide a standard format). SAMIAM Samiam has two main components: a graphical interface and an inference engine. The graphical interface allows users to develop models of Bayesian networks and save them in a variety of formats. The inference engine includes many works: the classical inference, parameter estimation, space-tim; sensitivity analysis and explanation, based on the MAP and MPE. Samiam is a free software including a variety of algorithms for inference but are based on one inference algorithm junction Tree. It supports three implementations of the junction Tree algorithm: Hugin architecture, architecture Shenoy-Shafer, and a new architecture that combines best architectures previous PD. Samiam uses the EM algorithm (Expectation Maximization) for estimating parameters of the network based on the data. It adopts the "File" Hugin format for specifying data as a set of cases6. The backup formats supported by Samiam are "NET" and "HUGIN.
5 6
GNU : General Public License. It also includes utilities to generate data randomly from a given network and for storing data in files.
Software Comparison Dealing with Bayesian Networks
173
UNBBAYES UnBBayes is an open source modeling, learning and probabilistic reasoning networks. It uses a variety of algorithms for inference and learning but there are often bugs in handling of a mistake. The software is in an infinite loop that crashes the system and this is due to the lack of exceptions generation of software errors. UnBBayes uses algorithms of junction tree, Likelihood Weighting and Correct Classification Review to make the inference. Also, it uses algorithms K2, B, V and Incremental Learning to learning. The backup formats supported by UnBBayes are "NET", "UBF" and "XML". This application also allows the import of a Bayesian network in the format "OWL". PROBT ProBT is a C++ library. The commercial and industrial exploitation of this software has been granted an exclusive basis to the Company Probayes. ProBT is a very powerful software that offers a free version for the purpose of research, but it does not provide the source code itself, it offers only an explanation of the different classes in which it is built. For exact inference, ProBT uses the "Successive Restrictions Algorithm (SRA). For approximate inference, several arrangements of approximation are used by ProBT like Monte Carlo, the simultaneous evaluation and maximization. ProBT also provides algorithms for learning parameters. In the case of complete data, it proposes an algorithm based on maximum likelihood and an algorithm based on the principle of EM in the case of incomplete data. On the other hand, ProBT contained no implementation for learning Bayesian network structure7. The save format supported by ProBT format is "XML" which is specific to this application. This application offers the possibility to import an Excel file after classified data in tables with graphs. ANALYTICA Analytica is a shareware software for creating, analyzing and modeling of graphical models like Bayesian networks. There may be a limited trial version of this software, simply fill out a form indicating the telephone number on their website and you phone for your opinion on their software. This software offers a single inference algorithm specific to that application can not be compared by the inference algorithms known in the literature of Bayesian networks. Analytica uses its own inference engine ADE (The Analytica Decision Engine). There is no learning algorithm or parameters or structure. The only saving format is supported by Analytica format "ANA". BNET BUILDER BNet.Builder is software for the Bayesian networks developed with Java and its use is very easy. This software uses an inference engine that does not mention the inference algorithm used. Bnet Builder provides an onboard engine to the inference. Browsing through all the documentation and help documents and using the software, there is no evidence indicating that this software makes learning structure or parameter. The backup formats supported by BNetBuilder formats are "XBN", "DNE" and "NET". 7
But recent projects are under development as the algorithm MWST and K2 for learning structure from complete data.
174
M.A. Mahjoub and K. Kalti
PNL Open Source Probabilistic Networks Library, a tool for working with graphical models, supporting directed and undirected models, discrete and continuous variables, various inference and learning algorithms. BAYES BUILDER BayesBuilder is free software for the construction of Bayesian networks. It is only available on Windows, because the inference engine is written in C + + and has been compiled for Windows yet8. This software uses an inference engine without mentioning the inference algorithm, which classifies it as a black box in algorithmic perspective. There is an inference algorithm not mentioned. He supports neither the training nor the structure of the learning parameters. The backup formats supported by BNetBuilder formats are "bbnet. This application also allows the import of a Bayesian network in the format "NET", "DNET", "DSC" and "BIF". XBAIES This software bayesian networks as dialogs for inference and learning Bayesian networks. This software has no graphical interface for entering the network but can be entered based on rules, not making it difficult to use as new generations of applications of expert systems is now based on graphical models and more on systems based on rules. Also it uses only one inference algorithm which is the junction tree. XBAIES uses the junction tree algorithm. Learning parameters and structure are supported XBAIES does not provide a format for saving a Bayesian network, but entering a network is literally using the buttons. OPENBUGS BUGS is software for Bayesian networks, which do not show the networks built, he has limited command lines similar to those of Matlab without graphical display of Bayesian network. This software is for the older generation of expert systems because it is based on rules and not on graphical models. The inference algorithm used in Bugs is Gibbs sampling. Bugs is the learning parameter but not the structure. The save format is supported by OpenBugs format "BUGS". BAYESIAN KNOWLEDGE DISCOVERER / BAYESWARE DISCOVERER Bayesian Knowledge Discoverer is a free software that has been replaced by a commercial version Bayesware Discoverer which itself offers a trial version of 30 days. This software provides a graphical interface with powerful visualization options and is available for Windows, Unix and Macintosh. This software uses only one inference algorithm which is not mentioned. The software algorithm uses the junction tree for inference. Bayesware been learning structure and parameters from complete data and incomplete. The algorithm used is approached 'bound and collapse'. The save format supported by BKD / BD is the format "DBN". This application also allows the import of a Bayesian network in the format "DSC".
8
The GUI program is written in Java and it is easy to use.
Software Comparison Dealing with Bayesian Networks
175
VIBES VIBES (Variational Inference for Bayesian Networks) is a java program for Bayesian networks. This software uses a single inference algorithm not known in the literature of Bayesian networks. VIBES provides an inference engine to clean it VIBES (Variational Inference for Bayesian Networks). The learning parameters is supported while learning structure is not supported. The save format supported by VIBES format is "XML" which has a specific design for this application.
5 Comparison of Tools Manipulating Bayesian Networks To better understand the tools manipulating Bayesian networks, we propose in this section a comparative analysis, summarized in two tables. First, we propose a comparison of applications of Bayesian networks in the form of a table based on the type Table 1. Table of comparison tools manipulating Bayesian networks
Tools name BNT BayesiaLab NETICA HUGIN JavaBayes
Type Source Library Matlab/ c Library No software No software No software Java
GeNIe BNJ MSBNX
software software software
No Java No
Free Free Free
SamIam
software
Java
Free
UnBBayes
software
Java
Free
ProBT Analytica BNet Builder Bayes builder OpenBugs
Library software software software software
C++ No No No No
Free PLEV PLEV PLEV Free
BKD/BD PNL
software Librairy
No Yes
PLEV Free
VIBES
software
Java
Free
PLEV : Pay-limited evaluation version
License Free PLEV PLEV PLEV Free
Web site http://www.cs.ubc.ca/~murph yk/Software/BNT/bnt.html http://www.bayesia.com http://www.norsys.com http://www.hugin.com http://www.cs.cmu.edu/~java /bayes/Home http://genie.sis.pitt.edu http://bnj.sourceforge.net http://research.microsoft.com/ enus/um/redmond/groups/adapt/ /msbnx http://reasoning.cs.ucla.edu/sa miam http://unbbayes.sourceforge.n et http://www.probayes.com http://www.lumina.com http://www.cra.com http://www.snn.ru.nl http://www.mrc/bsu.cam.ac.uk/bugs http://bayesware.com http://sourceforge.net/projects //openpnl http://www.johnwinn.org/ et http://vibes.sourceforge.net/
176
M.A. Mahjoub and K. Kalti
of tool, the availability of source code and what language used for development, the software license and the site Official web application. Second we propose a second comparison chart applications based on the types of inference algorithms used in learning, the formats supported and types of variables used. On the other hand all tools use discrete variables. A more extensive comparison can be found at this [2]. Table 2. Technical Comparison of tools manipulating Bayesian networks
Tools name BNT BayesiaLab NETICA
Inference E/A E/A Exact
Learning P/S P/S P
HUGIN JavaBayes GeNIe
Exact Exact E/A
P/S -
BNJ MSBNX SamIam
E/A Exact Exact
P
UnBBayes ProBT Analytica AgenaRisk BNet Builder Bayes builder
Exact E/A Exact E/A Exact E/A
S P/S P -
XBAIES OpenBugs BKD/BD VIBES
Exact E/A E/A E/A
P/S P P/S P
Supported formats m BIF, xbl, dne et net. dne, neta, dsc, dxp, net. Oobn, hkb, net. BIF et XML xdsl, dsl, net, dne, dxp, erg et dsc XML, net et hugin xbn, dsc et XML. net, hugin, dsl, xdsl, dsc, dne, dnet, erg. XML, net et ubf. xml ana cmp xbn, dne et net. bbnet, net, dnet, dsc, BIF bugs dbn et dsc xml
Variables
C/D C/D C/D C/D Discrete Discrete Discrete Discrete C/D Discrete C/D C/D C/D C/D D C/D C/D C/D C/D
P/S : parameter/structure C/D: continous and discrete E/A : Exact/Approximate
6 Conclusion Graphical models are a way to represent conditional independence assumptions by using graphs. The most popular ones are Bayesian networks. To conclude, we can say that BNT library is most useful tool for Bayesian networks. Also we note that all softwares are paying while all the libraries are free.
Software Comparison Dealing with Bayesian Networks
177
References 1. Naïm, P., Wuillemin, P.-H., Leray, P., Pourret, O., Becker, A.: Ré-seaux Bayésiens, 3rd edn. Eyrolles, 2. Murphy, K.: Software for Graphical models: a review. ISBA Bulletin (December 2007) 3. Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge (2000)