A Novel Bayesian Belief Network Structure Learning Algorithm Based ...

2 downloads 0 Views 804KB Size Report
Jaypee Institute of Information Technology. Noida, UP, India. [email protected], 2krishna.gopal@jiit.ac.in. S.L. Maskara. G-2W, Soura Niloy Housing ...
A Novel Bayesian Belief Network Structure Learning Algorithm Based on Bio-Inspired Monkey Search Meta Heuristic Sangeeta Mittal1 , Krishna Gopal2

S.L. Maskara

1

Deptt. of CS&IT, 2Dean, A&R Jaypee Institute of Information Technology Noida, UP, India 1 [email protected], [email protected] Abstract—Bayesian Belief Networks (BBN) combine available statistics and expert knowledge to provide a succinct representation of domain knowledge under uncertainty. Learning BBN structure from data is an NP hard problem due to enormity of search space. In recent past, heuristics based methods have simplified the search space to find optimal BBN structure (based on certain scores) in reasonable time. However, slow convergence and suboptimal solutions are common problems with these methods. In this paper, a novel searching algorithm based on bio-inspired monkey search meta-heuristic has been proposed. The jump, watch-jump and somersault sub processes are designed to give a global optimal solution with fast convergence. The proposed method, Monkey Search Structure Leaner (MS2L), is evaluated against five popular BBN structure learning approaches on model construction time and classification accuracy. The results obtained prove the superiority of our proposed algorithm on all metrics. Keywords-Bayesian Belief Networks, Structure Learning, Monkey Search, Probabilistic Classifiers

I.

INTRODUCTION

Bayesian Belief Networks (BBNs) are stochastic models that describe and quantify probabilistically the relationship between one or more feature variables and a class variable [1]. Numerous applications of Bayesian network classifiers exist in data mining applications, outlier detection, early warning systems, autonomous monitoring and tracking systems [2]. In these applications, Bayesian Belief networks can be used to represent the whole archival knowledge in a compact and visual representation. A BBN data structure is a directed acyclic graph (DAG) and table of conditional probabilities such that it best represents the relationships exhibited by past and future data of various variables of underlying domain [3]. The structure and conditional probabilities can be determined either by domain experts or can be algorithmically learnt in a supervised manner by examples from existing data. The first solution is infeasible many times due to problems like unavailability of experts and uncertainty due to differences in opinions. For ease of explanations, complete BBN learning tasks is seen as two subtasks of learning the graphical structure of dependencies among variables and corresponding conditional probabilities. Given network topology and example datasets, it is trivial to find numerical probability

978-1-4799-5173-4/14/$31.00 ©2014 IEEE

G-2W, Soura Niloy Housing Complex, 1 - Kailash Ghosh Road, Kolkata – 700008, India [email protected] parameters using methods like maximum likelihood estimation. A sizeable amount of BBN research focuses on approaches for BBN structure learning. Collectively the proposed BBN structure generating methods employ two categories of approaches as constraint based and score and search [4]. The first set of methods learns conditional independence relations from dataset and obtains a structure conforming to large number of these relations. The other set utilizes a structure scoring metric and searches the structure space for score-maximizing DAG. While there have been few studies on defining an appropriate score, most of the methods are about on the ways of exploring the huge DAG space so that the algorithm converges quickly with maximally optimal results. In this paper, we present a novel search paradigm for locating optimal DAG using recently designed bio-inspired Monkey Search (MS) meta-heuristic [5]. Relevant algorithms have been designed to adopt this heuristic for concerned DAG finding problem. Our MS based method has been compared with five representative score and search approaches on diverse medical datasets. Comparable results were obtained using standard DAG evaluating metrics. Given the simplicity of our method as compared to other heuristic approaches, it is can be used as search method. The paper is organized as follows. In next section, problem of BBN structure learning and related existing methods has been discussed. In third section our proposed MS heuristic algorithm is detailed. The results for performance evaluation using scores and classification accuracy are presented and discussed in fourth section. The paper is concluded in the last section. II.

PROBLEM OF BAYESIAN BELIEF NETWORKS STRCUTURE LEARNING AND ITS RELATED METHODS

Bayesian Belief Networks are graphical models that represent quantitatively the dependence relationships between variables that describe a problem domain. Let U = {x1…… xk,} ∀ k: k >1 be a set of feature variables and one class variable that describe a problem domain. A Bayesian Belief Network, B over these set of variables U is a network structure BBNS, which is a Directed Acyclic Graph (DAG) over U and a set of probability tables BBN p , where

BBN p = p ( xi | pa ( xi )) for all i= 1 to k

(1)

where pa(xi) is the set of parents of xi in BBNS[4]. A Bayesian Belief Network also represents joint probability distribution on whole set of variables, U as k

P(U ) = ∏ p ( xi | pa ( xi ))

probability of structure given data, that is, maximize the ‘K2 Score’ of DAGs [11]. n

qi

i =1

j =1

max[ P ( BBN s , D)] = ∏ [ P (π i → xi )∏ BBN s

ri (ri − 1)! N ij ∏ N ijk ! ( N ij + ri − 1)! k =1

(2)

(4)

i =1

For a given problem, BBNs can be obtained by experts of that domain. Alternatively, in absence of an expert algorithms making use of statistics, graph algorithms and information theory are used to autonomously find most probable BBN given experiential dataset, D of that domain. Many algorithms combining principles from all three fields, for learning Bayesian Belief Networks automatically from data, have been proposed in literature. These can be subdivided into at least two categories: methods based on conditional independence tests, and methods based on a scoring function and a search procedure [4]. An exhaustive search procedure for finding the best BBN DAG is to evaluate all possible graphs to find maximum scoring one. This is a gigantic task as the number of nonisomorphic structures for a BBN of ‘n’ variables i.e. No_of_DAGs(n) will be as defined by recursive equation 3[6]. ∑

1)

2

)

_

_

)

(3)

From (3), if n=10, No_of_DAGs(10) ≈ 4.2 x 1018 It is due to exhaustive searching in an exponentially increasing DAG space that algorithms in first category face the problem of ‘Memory Crash’ with graphs of order of more than ten nodes. . This happens due to large number of calculations for independence tests among variables. Specifically, popular PC algorithm [7] and Bayesian Network Power Constructor (BNPC) [8] were evaluated by us on varying dataset sizes.The algorithms mentioned below are chosen as representative samples from search and score category, each based on its own heuristic. Naïve Bayesian Structure (A1) [9]: A naïve Bayesian Structure is a static structure representing problem domain. It is assumed that class variable is root and all other variables are only dependent on it and independent of each other. Maximum Weighted Spanning Tree Structure (A2) [10]: This method associates a weight to each edge in an initial DAG. Mutual information between variable pairs and sometimes Bayesian information scores are used as weights. Classical MWST finding algorithm by Kruskal or Prim can be used to find trees exhibiting maximum scores. K2 Algorithm: Another metric for reducing search space has been devised by Cooper et al in [11]. The method requires an initial node order to be provided. Here three types of orders have been tried, as follows: using the MWST DAG (A3), reversed MWST (A4) and a randomly ordered DAG (A5). With respect to each of these orders, the search space is explored to maximize

This algorithm requires feature variables to be discrete valued and complete. Meanings of the variables used in equation (4) are same as defined in next section. Network Structures were determined using algorithms above and then the conditional probability tables for each of the variables were estimated using maximum likelihood estimation (using frequency counts) in all cases. PROPOSED ALGORITHM

III.

The monkey search meta-heuristic is recent one added to increasing list of bio-inspired algorithms. It is inspired by the tree climbing process of monkeys in a forest while searching for good quality food [5]. There are mainly three sub processes namely climbing, watch-jump and somersault. The algorithm iterates over these three steps until a maximum number of iterations or a k- persistent global maximum is achieved. With respect to our BBN structure learning the sub processes of basic algorithm have been redesigned and is named as Monkey Search Structure Learner (MS2L). Representation: For “n” variables in underlying application, its BBN DAG is represented by a connectivity matrix, C, of size “n x n”. A tree is a weighted graph T, defined by ordered set (V, E, w), V is the set of vertices of tree, V = {v1, v2,…vt} where vi ∈ C. E, the set of edges=∅ initially, it is populated as the algorithm proceeds . “w” is the set of weights assigned to each edge according to the gradient of objective function (defined below). Perturbation: Single arc-insertion and arc-deletion are taken as two possible perturbations with equal probabilities. It will be referred to as perturb(Node, dist), where Node ∈ V and dist is number of single perturbations to be applied. Edges of trees are created as a result of perturb(). From an initial tree with one seed DAG, edges are added, if perturb() function chooses arcinsertion and deletes otherwise. Objective Function: The Bayesian Information Criteria (BIC) is used as objective function, which is to be maximized for optimal results. BIC score is a local and decomposable score formulated as: Max BIC (B) = ∑





log

- log N * |B|

(5) where B is the chosen BBN DAG for which score is being is number of computed on underlying dataset D, instances in D where the variable Xi takes value xik and the variables in Parents (Xi) take value wij , is the number of

possible configurations of parent of (Xi), and ri number of possible values of variable Xi [3]. N is total instances of training set. 1) Also, network complexity, |B| = ∑ The objective is to maximize BIC. Algorithm works in following manner: Initialization: Population size of Monkey has taken to be one initially to work out the algorithm. In actual implantation, if population size is chosen as s, then same algorithm is initialized “s” times. A DAG returned by MWST using random root is chosen for a monkey with connectivity matrix C flattened to a vector {C11, C12 ,…. C1n, …… Cn1, Cn2,……Cnn}. A. Climb Process The climb process of one monkey is given as algorithm 1. The monkey starts creating a tree by building two branches in each step. Branches are result of applying perturb() to root node. The monkey chooses to climb up the branch leading to better solution in terms of objective function. The best solution obtained so far is maintained in monkey memory. Monkey continues to climb up until it reaches a maximum allowed height or tree top. After that the monkey starts climbing down and explores random branches for fixed maxplore times. Local maximum of that tree is then considered to be achieved. Each climb is decided after taking following two steps: a.

ΔCjk =

C_new = Perturb(C,dist) where C is connectivity matrix of node which is to be perturbed and dist is the desired distance between C_new and C. Generate a difference matrix Δ(C) = {ΔC11, ΔC12 ,…. ΔC1n …… ΔCn1, ΔCin2, ……ΔCnn} 1 1

= 0.5 = 0.5

(6)

Algorithm 1: Climb_MS(maxexplore, maxheight) # maxplore = n, maxheight = n # Repeat for each Monkey Initialize V0 = {C}, curr = C, currbest= curr; curr_lvl = 0; curr_xplor=0; finished = false; While (finished == false) do # first tree creation and explore curr_xplor = curr_xplor + 1; While (crr_lvl BIC(currbest)) currbest=curr’; else if (BIC(curr’’) > BIC(currbest)) currbest = curr’’; V = V ∪ {curr’, curr’’}; E = E ∪ {(curr, curr’), (curr, curr’’)}; Wcurr-x’ = BIC(curr) − BIC(curr’); Wcurr-x’’ = BIC(curr) − BIC(curr’’); y = choose_arc(max(Wcurr-curr’, Wcurr-curr’’ ); curr_lvl = curr_lvl + 1; end while if (curr_xplore < maxplore) # climb down the tree Wbest = Wcurr_lvl-(curr_lvl−1 ); While (curr_lvl >=0) curr_lvl = curr_lvl − 1; Wnow = Wcurr_lvl-(curr_lvl−1 ); if (Wnow > Wbest) Wbest = Wnow; else W curr_lvl-(curr_lvl−1 ) = Wbest; end while else finished = true end if end while

Repeat till no_perturbations < p With respect to BBNs in perturbation function a single arc is either deleted or added from the current DAG each with equal probability. b.

Calculate BIC(C’) =BIC(C + Δ (C) ) and BIC(C’’) =BIC(C - Δ (C) )

Following rules are then applied to choose the next DAG to start further exploration by each monkey. If BIC(C’’) > BIC(C’’) and BIC(C’’) > BIC(C) then C = C’ If BIC(C’’) < BIC(C’’’) and BIC(C) < BIC(C’’’) then C =C’’ Updating the chosen connectivity matrix in this manner, the monkey gradually moves towards best scoring DAG of current path. Once the monkey reaches the top it climbs down to search for alternative paths and find optimal solution on that path.

Fig. 1. Two positions of Monkey during Climb Process

The climb process is executed as Algorithm 1. An example of climb up and climb down is shown in figure 1. After successful execution of Algorithm 1 currbest will be returned as DAG with locally most optimal BIC score among explored ones. The algorithm is similarly run for each monkey ‘i' in the population. B. Watch-Jump Process After completion of climb process, each monkey has obtained a sub-optimal solution within a tree. As next step, again analogy from monkey behavior is taken. Monkeys assume that good food or better solutions may be in vicinity to each other. Monkey thus explores nearby trees. If trees with better solutions are visible, it will jump there and start exploring its branches. Monkeys are assumed to be able to see only up to eyesight defined as positive integer “b”. Eyesight is chosen here as one fifth of the number of total variables. Algorithm 2: Watch_Jump (maxtry) maxtry = n Initialize b=0.20*n ; tried=0; for each monkey i while (tried BIC(Bi) and Acyclic(B’) Ci= B’; else Goto TRY_AGAIN; Climb_MS(B’); end while return B; end for The watch jump process further improves the optimal solution obtained in initial climb. But still only a sub space of the entire possible solutions has been explored. To truly come out of the local maxima and try an entirely new range of DAGs the somersault process is undertaken. C. Somersault Process After the jump and watch-jump process, each monkey reaches its maximally optimal point among nearby options. To achieve more optimized objective function values, the monkey somersaults to a previously unexplored domain. The somersault interval [c;d] is decided according to the search space size. It governs the maximum distance a monkey can somersault. It is the strongest point of monkey search heuristic which makes it faster in finding global optimal solution as compared to other heuristic algorithms. Specifically, each monkey i will somersault to the next point with its current best connectivity matrix Ccurr = CWJ-best where CWJ-best is the best scoring connectivity matrix of a monkey obtained from watch jump process. Assuming the population size of monkeys,

M=3, and choosing random number R=3, the pivot is calculated as per the algorithm for example shown in figure 2. Algorithm 3: Somersault (maxtry) maxtry=n no_of_monkeys = M Initialize [c;d] = [-n/2, n/2]. While (tried < maxtry) do for each monkey i Randomly generate a number, R ∈ [c; d] newi = xi + R * (pivotij - xij) (mod 2) 2), j=1,2,…n where, pivot = ∑ if Acyclic(newj) Ccurri = newi else goto step 2 Climb_MS(newi) end for tried++; end while

Fig. 2. Somersault of a Monkey guided by pivot (Assume R=3)

It can be seen that somersault process takes monkey in a newer domain as compared to where it earlier was. The objective function is then calculated over newly discovered DAG and if it is higher than current best of monkey, then it is used as initialization DAG to restart the Climb process. D. Termination Algorithm is terminated if at least one of the following conditions is achieved: (a) Solutions found by all monkeys have not improved since last k iterations (b) The algorithm has run k, the maximum permissible times. K and max iterations have been chosen to be ‘n’ by us. IV.

EVALUATION

The proposed and existing structure learning algorithms have been evaluated on two parameters. First parameter is the time taken to find the optimal structure. Secondly, performance of

the structure when used for classification of domain data has also been used for comparing the proposed method against others. The classification task consists of classifying a variable, C called class variable given a set of variables A = a1.....an, called feature variables. The classifier Z: A Æ C is a function that maps an instance of A to a value of C. Given dataset D, BBNs has been constructed in previous section. To use a BBN as a classifier, argmaxyP(C|A) using the distribution P(U) represented by the BBN is calculated. i.e.

P (C | A) =

P (U ) P ( A)

(7)

Maximizing LHS, by maximizing

TABLE1. DATASETS OF DIFFERENT DISEASES AND THEIR DESCRIPTIONS Disease No.

D1 D2

n

P (U ) = ∏ p (a i | pa (a i )

(not shown due to shortage of space) obtained by MS2L is represented in various parts of Figure 3. Before presenting the datasets to learning algorithms, they were preprocessed to impute missing values by mode of the variable in the class of current tuple and discretize all the continuous features.

(8)

i =1

If all variables in A are known, (8) can be used to find posterior probability distribution of class values in C. This is known as probabilistic inference and is used to estimate probability of a set of query nodes, given values for some evidence. This is also called belief updating in BBNs. There are exact inference algorithms performing inference by enumeration. These algorithms though work only for BBNs having polytree structure. Up to an order of 35 nodes, exact inference work well, but becomes computationally intractable for larger networks hence several approximate algorithms making use of randomized sampling, also called Monte Carlo algorithms are to be used. Refer [12] for details of various inference algorithms. In disease related BBNs, attributes are generally multiply connected. Hence a well-known algorithm for such trees namely Junction Tree (JTree) Algorithm (Clustering + Variable Elimination) is used here [12]. JTree is an exact clustering algorithm that performs inference in two stages: 1. Transform the multiply connected network into a polytree 2. Perform belief updating on the obtained polytree using message passing mechanism. A. Datasets Used for Evaluation The networks here are learnt for various medical datasets taken from UCI machine learning laboratory [13]. The reason for taking these types of datasets was due to our larger context of sensor based study being done in healthcare domain. Thus taking medical related datasets is part of concept proving experiment in related field. The characteristics of variables in all data sets are described in Table I. The datasets differ in the number of labelled instances available for learning and testing the BBN based classifiers. There is also difference in the number of feature variables of every disease. The original number of continuous variables (C) and discrete valued variables (D) is also shown. Information on missing values varies from none-many across the datasets. Use of these heterogeneous datasets will enable us to capture their effects on performance of classifiers. The selected algorithms work on different principles and metrics, hence the resulting models may also differ. Out of thirty DAGs generated in all, DAG of each disease except D4,

Disease Names

Urinary System Diseases Pima-Indian Diabetes

Description No. of Features D C

Cardinality of Class Variables

No. of Instances

Missing Values

6

1

2

120

None

0

8

2

768

Few

D3

Hepatitis

13

6

2

155

Many

D4

Dermatology

33

1

6

366

Few

D5

Thyroid

19

7

20

9712

Many

In Table II, the BIC score of DAGs obtained by various algorithms is shown. MS2L performs better than other algorithms in almost all datasets. TABLE II.

D1 D2 D3 D4 D5

SCORES OF DIFFERENT BAYESIAN BELIEF NETWORK STRUCTURES USING BAYESIAN INFORMATION CRITERIA

A1

A2

A3

A4

A5

MS2L

-611 -2334 -9504 -13637 -1.6964

-510 -2346 -9251 -19225 -1.5267

-472 -2174 -9272 -13478 -2.1572

-514 -2191 -24546 -13435 -1.8108

-627 -2183 -24485 -13442 -2.4355

-468 -2143 -9256 -12267 -1.4732

(a) Bayesian Belief Network for D1

b) Bayesian Belief Network for D2

Times Cross-Validation foor classification evaluation. Classifiers obtained are used for labeling test instances with class values. The values asssigned by the classifiers are compared against the actual vaalues. Accuracy is defined as the percentage of instances that were w labeled with correct class values. The percentage of miscclassified instances is termed as Error rate. Given evidence atttributes, some instances, were not classified by the classifierr and counted as percentage of ‘Rejection’. Test input is rejjected if underlying BBN has maximum a posterior probabiliity for more than one class. Any test instance contributes to exxactly one of these percentages. Any classifier having higher accuracy a is better. In Table III, summary of accuracy obtaiined for classifying different diseases which have multtivalued diagnosis is given. Algorithms are also to be compared on Error Rate vs Rejection Rate. An algorithm that rejects instances instead of misclassifying them is prefeerable particularly in medical classification. The Error and Rejection rates are compared in Figure 4.

(c) Bayesian Belief Network for D3

TABLE III ACCURACY (IN %) OF PREEDICTION OF MULTIVALUED DISEASES WITH DIFFERNT BBN CLASSIFIER RS IN 95% CONFIDENCE INTERVAL Algorithms

D1

D2

D3

D4

D5

A1

97.6 ± 0 10 98.6 ± 0.11

79.8 ± 0.25 74 ± 0.26

83.6 ± 0.19 82.9 ± 0.20

73.5 ± 0.09 57.6 ± 0.10

86.6 ± 0.03 85.7 ± 0.03

A3

98.6 ± 0.11

79.5 ± 0.23

88.1 ± 0.20

74.58 ± 0.09

84.6 ± 0.03

A4

98.5 ± 10

85.5 ± 0.24

88.4 ± 0.20

74.4 ± 0.09

83.1 ± 0.04

84.8 ± 0.25 86.7 ± 0.22

87.4 ± 0.20 90.4+ 0.20

69.5 ± 0.09 76.6 + 0.04

81.6 ± 0.03 88.1+ 0.03

A2

(d) Bayesian Belief Network for D5 D Fig. 2. Bayesian Belief Network Structures of various diiseases with MS2L

A5

20

MS2L

10

A A5 A3

0 D1

D2

D1 MS2L

A4

D4 K2+Random

30

D1

K2-MWST

A3

0.5 0.4 0.3 0.2 0.1 0 K2+MWST

A2

40

MS2L

100 ± 0.12 100+ 0.10

MWST

A1

A5

Naïve

Computational time (CT) taken for classificcation is important for any real time classification. It is observedd that variations in it are due to increased number of features. CT C is considerably higher for D5 due to larger no. of continuouss instances. MS2L takes least time as compared to all other tested algorithms. This owes to guided somersault process.

D2 D3 D4 D5

A1 D3

D4

D5

Fig. 3. Computational Time Taken by different algorithhms on a 2.4Ghz Clock Speed machine

Next the evaluation was done on discriminative powers of the learnt structures. The datasets (D1-D5) undderwent 5-Fold 10

Fig. 4. Error and Rejection ratte of various learning algorithms

Naïve Bayes Classifier gives loow error rates, but high rejection rates. MS2L gives lowest erroor as well as rejection rates. In [14], TAN-EM classifier was used u to classify hepatitis disease and maximum accuracy has been b 75.4% and around 80% in [15] whereas here 86.7% accuuracy is achieved by proposed

algorithm. The Diabetes data set was contributed to UCI by authors in [16], they used ADAP algorithm for implementing classifier and accuracy obtained was maximum 76% as against our maximum of 78%. Moreover given the computational complexity of neural networks based ADAP, the approach used here is acceptable. In [16], same dataset is tested again with hybrid Bayesian approach and accuracy reported is 6473%. Effect of Number and Type of Attributes used: The performance of classifiers interestingly varies not much on number of attributes, but with range of possible values of each feature. As the equal frequency technique used for quantization is creating a large range of values for continuous attributes. For example, for both thyroid and diabetes diseases having maximum number of continuous attributes, the computation time are very high as compared to others. In Diabetes disease, where 100% attributes had to be quantized, it is affecting accuracy also. Still, the result accuracy obtained is maximum 75%, as compared to 73% obtained with hybrid Bayes classifier proposed in [16]. V.

CONCLUSION

Bayesian Belief Networks are useful for representing in compact manner knowledge about an inherent uncertain domain. Structure learning of BBN is an NP hard problem. In this paper a novel meta heuristic based Monkey Search Structure Learner (MS2L) has been proposed. The algorithm took lesser time to compute optimal DAG as compared to other popular existing algorithms. Accuracy of classification with MS2L based structure is better than others. Evaluation also asserts that range of values of features variables effects classification. MS2L can further be experimented with different combinations of parameters like population size of monkeys, eyesight and somersault interval. REFERENCES [1]

J. Pearl, Bayesian networks: a model of self-activated memory for evidential reasoning, in: 7th Conference of the Cognitive Science Society, 1985, pp. 329–334.

[2]

[3] [4]

[5]

[6]

[7] [8] [9]

[10]

[11] [12]

[13] [14]

[15]

[16]

D. Heckerman, A. Mamdani, and M. P. Wellman, “Real-world applications of Bayesian networks”, Commun. ACM 38, 3 (March 1995), 24-2. 1995. K Korb, Bayesian Artificial Intelligence , CRC Press , 2004. Mittal, S.; Maskara, S. L., "A review of some Bayesian Belief Network structure learning algorithms," Information, Communications and Signal Processing (ICICS) 2011 8th International Conference on , vol., no., pp.1,5, 13-16 Dec. 2011 Zhao, Ruiqing, and Wansheng Tang. "Monkey algorithm for global numerical optimization." Journal of Uncertain Systems 2, no. 3 (2008): 165-176. Robinson, Robert W. "Counting unlabeled acyclic digraphs." In Combinatorial mathematics V, pp. 28-43. Springer Berlin Heidelberg, 1977. Spirtes P, Glymour C, Scheines R : Causation, prediction, and search, 2nd edn. MIT press, Cambridge (2000) J Cheng et al, “Learning Bayesian networks from data: an information theory based approach”, Artif Intell 137:43–90 ,2002 J. Cheng and R. Greiner, Learning Bayesian Belief Network Classifiers: Algorithms and System, Lecture Notes in Computer Science, (2056) pages 141.151, Springer Verlag, 2001. Lam, Wai, and Fahiem Bacchus. "Learning Bayesian belief networks: An approach based on the MDL principle." Computational intelligence 10, no. 3 (1994): 269-293. GF Cooper, E Herskovits, “A Bayesian method for the induction of probabilistic networks from data”, Mach Learning 9(4):309–347(1992) Friedman, Geiger and Goldszmidt (1997)] Friedman, N., Geiger, D., Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, pp. 131 – 163. UCI repository of machine learning databases Available at: http://archive.ics.uci.edu/ml/ O.C.H. François and P. Leray. Learning the tree augmented naive bayes classifier from incomplete datasets. In Proceedings of the Third European Workshop on Probabilistic Graphical Models (PGM’06), pages 91–98, Prague, Czech Republic, Sep 2006. J. Smith, J. Everhart,W. Dickson, W. Knowler., & R. Johannes. “Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care 261-265. 1988 M.Raymer, T. Doom, L. Kuhn, and W. Punch, "Knowledge discovery in medical and biological datasets using a hybrid bayes classifier/evolutionary algorithm raymer", IEEE Trans Syst., Man, Cybern. B, Cybern., vol. 33, no. 5, pp.802 813 , 2003

Suggest Documents