Inferring a Bayesian network for Content-based ... - Semantic Scholar

2 downloads 11024 Views 193KB Size Report
Inferring a Bayesian network for Content-based Image. Classification. Shahriar Shariat1, Hamid R. Rabiee2, Mohammad Khansari3. 1 Sharif University of Tech., ...
Inferring a Bayesian network for Content-based Image Classification Shahriar Shariat1, Hamid R. Rabiee2, Mohammad Khansari3 1

Sharif University of Tech., Computer Engineering Department, Digital Media Research Lab [email protected] 2 Sharif University of Tech., Computer Engineering Department, Digital Media Research Lab [email protected] 3 Sharif University of Tech., Computer Engineering Department, Digital Media Research Lab [email protected]

Abstract Bayesian networks are popular in the classification literature. The simplest kind of Bayesian network, i.e. naïve Bayesian network, has gained the interest of many researchers because of quick learning and inferring. However, when there are lots of classes to be inferred from a similar set of evidences, one may prefer to have a united network. In this paper we present a new method for merging naïve networks in order achieve a complete network and study the effect of this merging. The proposed method reduces the burden of learning a complete network. A simple measure is also introduced to assess the stability of the results after the combination of classifiers. The merging method is applied to the image classification problem. The results indicate that in addition to the reduced computation burden for learning a complete network, the total precision is increased and the precision alteration for each individual class is estimable using the measure.

Keywords Bayesian network, naïve networks, learning, inference, image classification images using a prior knowledge obtained from previous categories and estimate a PDF (Probability Density 1. Introduction Function) for each parameter. Object categories are represented by probabilistic models and prior knowledge Classification is a traditional problem in pattern is represented as a PDF on the parameters of these recognition for over last fifty years. Various classification models. A Bayesian framework is constructed by Vailaya algorithms have been reported for specific applications or et al. to semantically classify the outdoor images [5]. for general case in the literature [1, 2]. On the other hand, They estimate the class conditional density using a code emergence of new technologies produced a large amount book which is extracted by a vector quantizer. They have of multimedia data. Within the multimedia formats, also used MDL principle to find the optimal size of the image comprises the simplest yet very important one. code book. Naphade and Huang in [6] combined HMM, Image classification has gained many researches during EM and Bayesian network to first estimate the probability last two decades. Classifying and clustering alleviates density for each category (Multiject) and then connect both browsing and retrieving burdens from image them together to enhance the classification in addition to datasets. incorporate the inter-conceptual relations between the One of the most popular classifiers is Bayesian classes. However, they have designed their framework for network. A Bayesian network is a graphical model for video but the idea could be employed for image too. probabilistic relationships among a set of variables. In [7], Luo et al. have utilized a Bayesian network Bayesian networks have some features, which make them again to understand the image subject in addition to suitable for classification [3]. First, Bayesian networks classify it in the proper class. Most of the work is can handle incomplete data. Second, they can combine dependent on the domain expert knowledge. Even the domain knowledge and data in conjunction with Bayesian conditional dependencies of the variables are extracted statistics and third, they do not overfit the training data. from the semantic relations by the expert and if the These properties have encouraged lots of researchers in relation is intricate, then Pearson's correlation multiplier diverse areas to employ Bayesian networks for is calculated for them. classification. In many real-world cases, however, the experts prefer Bayesian approach for image classification has been to use naïve Bayesian networks (NBN) and especially reported in numerous papers in literature. In [4], the Gaussian naïve Bayesian networks (GNBN). These authors have employed a Bayesian approach to categorize

networks, as shown in fig. 1, are formed from a parent node which represent the classes (or clusters) and children nodes stand for evidences. Learning and inferring from this simple type of network is easy, straightforward and fast. In [8] naïve Bayesian networks are the basis of structural learning of a Bayesian network to classify genetic abnormalities. The authors have compared K2 and expert knowledge to learn the structure of each naïve classifier and they experimentally proved that K2 algorithm performs faster but delivers a less accurate network. A variation of naïve Bayesian networks called semi-naïve Bayesian network is studied and another approach is introduced in [9] based on finite mixture models to combine the attributes. The authors show that this type of combination and bounding model improves the accuracy of classification. In [10] maximal covariance criterion is utilized to distinguish the necessary hidden nodes and insert them between the features of some naïve Bayesian network to classify textures of aerial images. The usual case happens when there are lots of classes that cannot be combined in a single node and the inference is based on a similar set of evidence nodes. In this case, it would desirable to have a network consisting of all classes. Inferring from that network classifies all samples in a single inference. Thus, merging these naïve networks reduces the computational burden for the inference. Nevertheless, the learning process of a complete network needs a noticeable computation time. For Gaussian Bayesian networks (GBN), which through this paper we are dealing with, EM algorithm can approximate the best MAP or ML estimation for each node and is widely used to learn the parameters [3]. Even though, approximating the parameters for a complete network seems to be unneeded, since the naïve network parameters are already learned and another learning process is not favorable. Therefore, a solution to attain the accurate parameters without performing an additional learning process will be very beneficial. In this paper we have proposed a new framework for combining naïve Bayesian classifiers in order to achieve a complete network with direct parameter approximation from individual classifiers without accomplishing any extra learning for the combined network. In other words, our algorithm adjusts the parameters of each network and removes the need for additional parameter learning. We have also studied the effect of combining naïve networks and proved that the overall precision of the classifier can be improved especially when a sample belongs to one and only one class. We have also introduced a measure to estimate the stability of the classification precision after the merging. The classifier merging algorithm has been applied to the image classification problem to experiment the algorithm in practice. The rest of paper is organized as follows. Section two presents the network architecture. The third section gives details on learning naïve network and general Bayesian networks. In the forth section the inference of the networks and the combination effect are investigated. Section five and six utilize the algorithm for image

x

yn

......

y2

y1

Fig 1- Naïve Bayesian network

classification and shows the experimental results respectively. Finally, section six concludes the paper.

2. Network design Each node in a Bayesian network can represent either discrete or continuous random variables. Thus a Bayesian network could consists of a mixture or either of them. There are various methods to learn the parameters of a discrete or continuous network. However, there is not any common and proved method for a network of mixture variables. In this case, researchers or designers try to convert one type to the other so that the network would be homogenous and the learning process become easier. Here in the problem of image classification we have continuous evidence nodes (i.e. y1, y2, …, yn in fig. 1) and multinomial class node (i.e. X in fig. 1). Although we could convert evidence nodes to discrete but to eliminate the quantization noise and diverse interpretations of this conversion we prefer to convert the multinomial node to continuous. Moreover, converting X to a continuous variable makes the network able to give a confidence probability for each class. That is, the probability of each class given a certain sample, states the strength of belonging the datum to that class which reminds us the philosophy of fuzzy logic. Therefore, this approach gives us a fuzzy network! To simplify the learning and inference we take X as binary rather than multinomial and thus if X represents m classes, it is broken up to m-1 twostate-class nodes. In this stage two strategies could be tried. First, having a network with m-1 class nodes and n evidence nodes connected to the class nodes while for the second method one can choose to have m-1 naïve Bayesian networks. In the rest of the paper we have compared these strategies and studied the advantages and draw backs of each approach.

3. Learning The purpose of parameter learning in a continuous Bayesian network is to estimate the network by a random distribution. The most popular distribution is Gaussian mixture in which each node is estimated by a separate Gaussian PDF. These separate PDFs are connected to each other as depicted by the network structure. Actually the conditional density function of each node in a Bayesian network is given by

ρ (x | pax ) = N (x ;

∑b

z ∈ pax

xz

z , σw2 x )

(1)

where pax denotes the parents of X, b xz the connection value between X and Z and

σW

x

the condition variance of

X conditioned on its parents. Bayesian networks are able to deal with incomplete data and hidden variables [11] but in this paper and especially for learning procedure we assume complete data is available. Throughout the learning process the covariance matrix will be computed by the following algorithm from [12]. Let

ti =

1

(2)

σ i2

and

⎛ bi 1 ⎞ ⎜ ⎟ ⎜ . ⎟ bi = ⎜ . ⎟ ⎜ ⎟ ⎜ . ⎟ ⎜b ⎟ ⎝ i ,i −1 ⎠

(3)

The following algorithm creates the precision matrix (inverse of covariance matrix)

b y 1x σ x b y 2x σ x .... by n x σ x ⎞ ⎛ σx ⎜ ⎟ 2 b y n x b y 1x σ x ⎟ ⎜ b y 1x σ x σ y 1 + by 1x σ x by 2x b y 1x σ x .... ⎟ ψ = ⎜ b y 2x σ x 2 ⎜ ⎟ by 2x b y 1x σ x σ y 2 + b y 2x σ x .... : ⎜: ⎟ ⎜ ⎟ 2 ⎜by x σ x .... σ y n + by n x σ x ⎟⎠ ⎝ n where b y i x is the value of the link which connects node X and yi. The mean values are obviously, the average of each node samples. The learning process of a naïve Bayesian network is not that hard. Looking at the covariance matrix, one can derive a very simple approach to learn a naïve network. However, merging naïve networks result in a much more complex precision matrix which is burdensome to inverse and complicated to solve. The following theorem provides a much simpler solution which, by the way, keeps the accuracy of learned parameters. Since we are going to combine some naïve networks to form a more general Bayesian network, we must adjust some parameters. The adjustment is just on variances because the size of network is changed. Theorem 1- Under horizontal conjunction of two or more naïve Bayesian networks provided that there is no connection between class nodes and class nodes are not descendent of any other node, each class node's variance is multiplied by

Table 1: Covariance matrix creation algorithm

T1 = (t 1 ); ⎛T i −1 + t i bi biT Ti = ⎜ T ⎝ −t i bi T =Tn ;

−t i bi ⎞ ⎟ ti ⎠

where

−1

For fig. 1, b matrices which show the links between nodes will be

b y 1 = [1]

(5)

σ xt = m σ xn

(6)

and thus

for (i = 2; i

Suggest Documents