Uncertain Graph Classification Based on Extreme Learning Machine ...

4 downloads 2005 Views 1MB Size Report
Abstract. The problem of graph classification has attracted much attention in recent years. The existing work on graph classification has only dealt with precise ...
Cogn Comput (2015) 7:346–358 DOI 10.1007/s12559-014-9295-7

Uncertain Graph Classification Based on Extreme Learning Machine Donghong Han • Yachao Hu • Shuangshuang Ai Guoren Wang



Received: 22 August 2013 / Accepted: 7 July 2014 / Published online: 14 August 2014  Springer Science+Business Media New York 2014

Abstract The problem of graph classification has attracted much attention in recent years. The existing work on graph classification has only dealt with precise and deterministic graph objects. However, the linkages between nodes in many real-world applications are inherently uncertain. In this paper, we focus on classification of graph objects with uncertainty. The method we propose can be divided into three steps: Firstly, we put forward a framework for classifying uncertain graph objects. Secondly, we extend the traditional algorithm used in the process of extracting frequent subgraphs to handle uncertain graph data. Thirdly, based on Extreme Learning Machine (ELM) with fast learning speed, a classifier is constructed. Extensive experiments on uncertain graph objects show that our method can produce better efficiency and effectiveness compared with other methods. Keywords Uncertain graph  Classification  Extreme Learning Machine

Introduction Recently, cognitive science and information processing have attracted great attention in many research areas, D. Han  Y. Hu  S. Ai (&)  G. Wang Key Laboratory of Medical Image Computing (NEU), MOE, Shenyang, China e-mail: [email protected] D. Han e-mail: [email protected] D. Han  Y. Hu  S. Ai  G. Wang College of Information Science and Engineering, Northeastern University, Shenyang, China

123

including neurobiology, cognitive psychology, machine learning, and pattern recognition [1–5]. As the basis of cognitive science, cognitive computation is used to analyze and compute datasets in a real-time manner so that cognitive systems are capable of self-learning and cognition, similar to the human brain. Classification is an important task in machine learning and is also a key part of cognitive computation [6]. In the era of big data, there is an inevitable trend for analyzing massive data so that cognitive systems learn the mechanism of the human cognitive process. Nowadays, a large amount of mass data is generated in many applications such as bioinformatics, chemoinformatics, social network analysis, web information management, etc. As one of the most important data structures, the graph is suitable for describing and modeling questions in these fields. The problem of graph classification, which is an important branch of the area of mining graph data, has become a hot research domain. Graph classification aims to construct a classification prediction model by learning graph data with category labels, followed by the realization of automatic classification of graph data without category labels. For example, in the field of bioinformatics, protein function can be predicted using protein secondary and tertiary structure. The structure can be formed through folding of the protein amino acid sequence, where the secondary structure can be represented through plane labeled graphs and the tertiary structure can be represented through special geometric graphs with geometry information. Constructing a classification predication model on protein secondary and tertiary structure can help biologists analyze and identify basic protein functions. Existing methods of graph classification can be divided into two groups: kernel-based approaches [7–9] and methods based on frequent subgraphs [10–13]. A graph

Cogn Comput (2015) 7:346–358

kernel is a similarity measure between two graphs, the two most popular being the cyclic-based graph kernel [8] and the walk-based graph kernel [9]. In contrast to approaches with kernel functions based on frequent patterns, the authors propose a kernel function based on a natural set of cyclic and tree patterns, independent of their frequency [8]. The alternative graph kernels were proposed previously [9]. The feature space of these kernels is conceptually based on the label sequences of all possible walks in the kernels. The classification method based on frequent subgraphs is mainly applied to use the frequent subgraphs as classification features. This method has good classification performance and scalability for any topology or geometric structure. The main steps of this method are: (1) mining frequent subgraphs, (2) extracting features, and (3) constructing the classifier. Pattern growth is one of the best strategies for mining frequent subgraphs. In this approach, one first mines a frequent subgraph p and generates the children of p extensively. Then, for each child of p, we compute the support of the children of p (the supergraph pattern of p) and extend it in a depth-first way until all frequent subgraphs are found. Among existing algorithms for frequent subgraph mining [14–19], the algorithm called gSpan (graph-based substructure pattern mining) [15], one of the most efficient algorithms, is outstanding in terms of its mining efficiency, memory utilization, scalability, etc. Other studies replace frequent subgraphs with discriminative subgraph patterns. The GAIA algorithm [10] can discover discriminative subgraph patterns in big graph databases. This method exploits a novel subgraph encoding approach to support an arbitrary subgraph pattern exploration order. On the other hand, the literature [11] proposes a method which optimizes a submodular quality criterion to select features among frequent subgraphs. This criterion can be integrated into gSpan to prune the search space for discriminative frequent subgraphs. The above-mentioned methods can solve a specific classification problem for graph patterns, but all of them are used to classify certain graphs. However, graph objects are inherently uncertain in many real-world applications, such as sensor network monitoring, moving object detection, etc. As we know, vertices and edges in biological or social networks are usually based on the existence of a probability. The difference between uncertain and certain graphs is that an uncertain graph represents the probability distribution of all certain implicated graphs (introduced in ‘‘Problem Definition’’ section). Classifying uncertain graph objects faces important challenges. For example, in traditional algorithms for mining frequent subgraphs, the proportion of a subgraph appearing in a certain graph database is its support. However, such a definition is unavailable for uncertain graphs, because it cannot be known whether a

347

subgraph can match an uncertain graph or not. As a result, existing methods of classifying certain graphs are unable to resolve the classification problem for uncertain graphs effectively. In this paper, we propose a classification algorithm based on ELM [20] to solve the classification problem in uncertain graphs. ELM [20] (introduced in ‘‘ELM’’ section) is an emergent computational intelligence technique that can be used for prediction, regression, and classification problems. ELM was chosen to train our classifiers due to its good generalization performance and fast learning speed [21– 23]. To the best of our knowledge, there is no work so far on classifying uncertain graph objects. In this paper, we study the problem of uncertain graph classification. The main contributions of our work can be summarized as follows: • •





A classification framework is put forward for uncertain graph objects; We extend the gSpan algorithm; in other words, we propose an algorithm to find all specific embedded graphs in uncertain graph sets, enabling treatment of uncertain graph data; We provide a novel method using ELM based on equality constraints to train classifiers to solve the multiclassification problem on uncertain graphs; We conduct extensive experiments to evaluate the performance of our proposed classification framework. The statistics verify the efficiency and effectiveness of our proposed algorithms.

In the remainder of this paper, we define the problem in ‘‘Problem Definition and Preliminaries’’ section. Then, ‘‘Uncertain Graph Classification Algorithm Based on ELM’’ section discusses the algorithms proposed in this paper, while experimental results are given in ‘‘Experiments and Performance Evaluation’’ section; Conclusions are presented in ‘‘Conclusions’’ section.

Problem Definition and Preliminaries Problem Definition Definition 2.1 (Uncertain Graph): An uncertain graph P G ¼ ðV; E; ; L; PÞ, where V is a set of vertices, E  P V  V is a set of edges, is a label set for vertices and edges, L is a function assigning labels to each of the vertices and edges, and P : E ! ð0; 1 is a function assigning conditional existence probability values to the edges. In an uncertain graph, the probability of an edge e 2 E is the probability of e existing in practice. If the probability of an edge e is 1, e must exist in the graph. A special uncertain

123

348

Cogn Comput (2015) 7:346–358

graph with existence probability values of 1 on all edges is called a P certain graph, which can be indicated by g ¼ ðV; E; ; LÞ. V(G) and E(G) represent the set of vertices and edges of the graph G, respectively. For the sake of simplicity, it is assumed that the existence probabilities of edges of an uncertain graph are mutually independent. A certain graph g is implicated to uncertain graph G, if g and G have the same set of vertices and the set of edges of g is a subset of G’s. So, an uncertain graph G with j E j edges implicates 2jEj certain graphs g, where umusðGÞ is a set of certain graphs implicated by G. The probability of an P uncertain graph G ¼ ðV; E; ; L; PÞ implicated to a certain P graph g ¼ ðV 0 ; E0 ; 0 ; L0 Þ is given by Y Y PðeÞ ð1  PðeÞÞ: pðG ! gÞ ¼ ð1Þ 0 0 e2E

e2ðEE Þ

An uncertain graph dataset D is composed of multiple uncertain graphs, i.e., D ¼ ðG1 ; G2 ; . . .; Gi . . .Þ. As shown in Fig. 1, the dataset of uncertain graph D is made up of the uncertain graphs G1 ; G2 ; G3 . The uncertain graph G1 in the dataset of D above has four edges, and it implicates 24 ¼ 16 certain graphs as shown in Fig. 2. Definition 2.2 (Graph Isomorphy): Graph g ¼ P fV; E; ; Lg is said to be graph isomorphic to graph P g0 ¼ fV 0 ; E0 ; 0 ; L0 g, if there exists a bijective function f : V $ V 0 such that: (1) (2) (3)

8u 2 V, LðuÞ ¼ L0 ðf ðuÞÞ; 8u; v 2 V, ððu; vÞ 2 EÞ , ððf ðuÞ; f ðvÞÞ 2 E0 Þ; 8ðu; vÞ 2 E, Lðu; vÞ ¼ L0 ðf ðuÞ; f ðvÞÞ.

Definition 2.3 (Subgraph Isomorphy): Graph g ¼ P fV; E; ; Lg is said to be subgraph isomorphic to graph P g0 ¼ fV 0 ; E0 ; 0 ; L0 g, denoted by g  g0 , if there exists an injective function f : V ! V 0 such that: (1) (2) (3)

8u 2 V, LðuÞ ¼ L0 ðf ðuÞÞ; 8u; v 2 V, ððu; vÞ 2 EÞ ) ððf ðuÞ; f ðvÞÞ 2 E0 Þ; 8ðu; vÞ 2 E, Lðu; vÞ ¼ L0 ðf ðuÞ; f ðvÞÞ.

Fig. 1 A dataset of uncertain graph D

123

A certain graph s is a frequent subgraph of an uncertain graph G, only if s is isomorphic to at least one certain graph implicated by G; the probability of G containing subgraph s is defined as follows: X PðG ! gÞ: Pðs; GÞ ¼ ð2Þ g2umusðGÞ^sg

Definition 2.4 (Embedded Graph): Given a certain graph P s, a subgraph of an uncertain graph G ¼ ðV; E; ; L; PÞ, then si ðVðsi Þ  VðGÞ; Eðsi Þ  EðGÞÞ is the embedded graph of s in G, only if si is isomorphic to s. In traditional frequent subgraph mining, the absolute support of a certain graph s in a precise graph database d is supðs; dÞ ¼ jfg j s  g; g 2 dgj, and the relative support is given by sup0 ðs; dÞ ¼ jfgjsg;g2dgj . Unfortunately, these jdj definitions are unavailable when mining uncertain graphs, because it cannot be known whether a subgraph can match an uncertain graph or not. Therefore, we need to redefine the concept of support. Definition 2.5 (Support): The support of s in D is X supðs; DÞ ¼ Pðs; GÞ; G2D

ð3Þ

where D is the uncertain graph database, s is a certain graph, and s is a frequent subgraph to D only if the support of s is no less than a given threshold, minsup. Preliminaries ELM ELM was originally proposed for single-hidden-layer feedforward neural networks and then extended to generalized single-hidden-layer feedforward networks (SLFNs) where the hidden layer need not be neuron-like [24–26]. In ELM, the input weight and the bias of the hidden node are randomly generated, and the output weight

Cogn Comput (2015) 7:346–358

349

Fig. 2 The certain graphs implicated in the uncertain graph G1 and their probability

can be calculated without iterative tuning. The output function of ELM for generalized SLFNs is L L X X fL ðxÞ ¼ bi gi ðxÞ ¼ bi Gðai ; bi ; xÞ; x 2 Rd ; bi 2 Rm ; i¼1

i¼1

ð4Þ where bi ¼ ½bi1 ;    ; biL T denotes the vector of the output weights between the hidden layer of L nodes and the output nodes, and gi denotes the output function Gðai ; bi ; xÞ of the ith hidden node. For N distinct samples ðxi ; ti Þ 2 Rd  Rm , the SLFNs with L hidden-layer nodes are modeled as L L X X bi gi ðxj Þ ¼ bi Gðai ; bi ; xj Þ ¼ oj ; j ¼ 1;    ; N: i¼1

i¼1

ð5Þ That SLFNs can approximate these N samples with zero P error means that Lj¼1 koj  tj k ¼ 0; i.e., there exist ðai ; bi Þ and bi such that

L X

bi Gðai ; bi ; xj Þ ¼ tj ;

j ¼ 1; . . .; N:

ð6Þ

i¼1

The above three equations can be written compactly as Hb ¼ T; where 0

1 hðx1 Þ B . C C H¼B @ .. A hðxN Þ 0 1 Gða1 ; b1 ; x1 Þ    GðaL ; bL ; x1 Þ B C .. .. C ¼B .  . @ A Gða1 ; b1 ; xN Þ    GðaL ; bL ; xN Þ NL 0 T1 0 T1 t1 b1 B . C B . C C C b¼B and T ¼ B : @ .. A @ .. A tTN Nm bTL Lm

ð7Þ

123

350

Cogn Comput (2015) 7:346–358

H is called the hidden layer output matrix of the network [27, 28]; the ith column of H is the ith hidden node output with respect to inputs x1 ; x2 ;    ; xN . hðxÞ ¼ ½Gða1 ; b1 ; xÞ;    ; GðaN ; bN ; xÞ is called the hidden layer feature mapping. The ith row of H is the hidden layer feature mapping with respect to the ith input xi : hðxi Þ. Since ELM can approximate any target continuous functions and the output of the ELM classifier hðxÞb can be as close to the class labels in the corresponding regions as possible, the classification problem for the proposed constrained-optimization-based ELM with a single-output node can be formulated as N 1 1X Minimize : LPELM ¼ kbk2 þ C n2 2 2 i¼1 i

Subject to : hðxi Þb ¼ ti  ni ;

ð8Þ

i ¼ 1; . . .; N

where C is a user-specified parameter. Based on the Karush-Kuhn-Tucker (KKT) theorem [29], training ELM is equivalent to solving the following dual optimization problem: N N X 1 1X LDELM ¼ kbk2 þ C n2i  ai ðhðxi Þb  ti þ ni Þ; 2 2 i¼1 i¼1

ð9Þ where each Lagrange multiplier ai corresponds to the ith training sample. An alternative approach for multiclass applications is to let ELM have multioutput nodes instead of a single-output node. m-Class classifiers have m output nodes. If the original class label is p, the expected output vector of m output nodes is ti ¼ ½0; . . .; 0; p1 ; 0; . . .; 0T . In this case, only the pth element of ti ¼ ½ti;1 ; . . .; ti;m T is 1, while the rest of the elements are set to zero. The classification problem for ELM with multioutput nodes can be formulated as N 1 1X Minimize : LPELM ¼ kbk2 þ C kn k2 2 2 i¼1 i

Subject to : hðxi Þb ¼ tTi  nTi ;

ð10Þ

i ¼ 1;    ; N

where ni ¼ ½ni;1 ; . . .; ni;m T is the training error vector of m output nodes with respect to the training sample xi . As for the problem of a single-output node, training ELM is equivalent to solving the following dual optimization problem: LDELM ¼

N 1 1X kbk2 þ C kn k2 2 2 i¼1 i



N X m X i¼1 j¼1

123

ai;j ðhðxi Þbj  ti;j þ ni;j Þ;

ð11Þ

where bj is the vector of the weights linking the hidden layer to the jth output node and b ¼ ½b1 ; . . .; bm . From the above equation, we can see that the singleoutput node case can be considered as a specific case of multioutput nodes when the number of output nodes is set to 1. For the aforementioned optimization problem, two kinds of solutions can be obtained for different sized training datasets. The first case is that the number of training samples is not huge. The output function of the ELM classifier for this solution is  1 T I T ð12Þ f ðxÞ ¼ hðxÞb ¼ hðxÞH þ HH T: C The second solution is that the number of training samples is huge. In this case, the output function of the ELM classifier is  1 I T ð13Þ þ HH HT T: f ðxÞ ¼ hðxÞb ¼ hðxÞ C gSpan This subsection provides a brief overview of gSpan. We first introduce several techniques developed in gSpan, including mapping each graph to a depth-first search (DFS) code (a sequence), building a novel lexicographic ordering among those codes, and constructing a search tree based on that lexicographic order. When performing a depth-first search [30] in a graph, we construct a DFS tree. One graph can have several different DFS trees; For example, the graphs in Fig. 3b–d are isomorphic to that in Fig. 3a. The thickened edges in Fig. 3b–d represent three different DFS trees for the graph in Fig. 3a. The depth-first discovery of the vertices forms a linear order. We use subscripts to label this order according to their discovery time [30], where i\j means vi is discovered before vj . We call v0 the root and vn the rightmost vertex. The straight path from v0 to vn is named the rightmost path.We denote such subscripted G as GT . Forward Edge and Backward Edge. Given GT , (i,j) is an ordered pair to represent an edge. If i\j, it is a forward edge; otherwise, it is a backward edge. A linear order, T is built among all the edges in G by the following rules: [assume e1 ¼ ði1 ; j1 Þ, e2 ¼ ði2 ; j2 Þ]: (1) if i1 ¼ i2 and j1 \j2 , e1 T e2 ; (2) if i1 \j1 and j1 ¼ i2 , e1 T e2 ; and (3) if i1 ¼ i2 and e1 T e2 and e2 T e3 , e1 T e3 . Definition 2.6 (DFS Code): Given a DFS tree T for a graph G, an edge sequence ðei Þ can be constructed based on T , such that ei T eiþ1 , where i ¼ 0; . . .; j E j 1. ðei Þ is called a DFS code, denoted as code(G,T). For simplicity, an edge can be represented by a 5-tuple, ði; j; li ; li;j ; lj Þ, where li and lj are the labels of vi and vj ,

Cogn Comput (2015) 7:346–358

351

(a)

(b)

(c)

(d)

Fig. 3 DFS tree Table 1 DFS codes for Fig. 1b–d Edge

(Fig. 1b)a

(Fig.1c)b

(Fig.1d)c

0

(0,1,X,a,Y)

(0,1,Y,a,X)

(0,1,X,a,X)

1

(1,2,Y,b,X)

(1,2,X,a,X)

(1,2,X,a,Y)

2

(2,0,X,a,X)

(2,0,X,b,Y)

(2,0,Y,b,X)

3

(2,3,X,c,Z)

(2,3,X,c,Z)

(2,3,Y,b,Z)

4

(3,1,Z,b,Y)

(3,0,Z,b,Y)

(3,0,Z,c,X)

5

(1,4,Y,d,Z)

(0,4,Y,d,Z)

(2,4,Y,d,Z)

respectively, and li;j is the label of the edge between them; For example, ðv0 ; v1 Þ in Fig. 1b is represented by (0,1,X,a,Y). Table 1 presents the corresponding DFS codes for Fig. 3b–d. Definition 2.7 (DFS Lexicographic Order): DFS lexicographic order is a linear order defined as follows: If a ¼ codeðGa ; Ta Þ ¼ ða0 ; a1 ; . . .; am Þ and b ¼ codeðGb ; Tb Þ ¼ ðb0 ; b1 ; . . .; bn Þ, a; b 2 Z, then a  b if either of the following is true: (1) (2)

9t; 0  t  minðm; nÞ, ak ¼ bk for k\t, at e bt ak ¼ bk for 0  k  m, and n m:

For the graph in Fig. 3a, there exist different DFS codes. Three of them, which are based on the DFS trees in Fig. 3b–d, are listed in Table 1. According to DFS lexicographic order, c  a  b. Definition 2.8 (Minimum DFS Code): Given a graph G, Z(G) = fcode(G,T)j T is a DFS tree of G g, based on DFS lexicographic order, the minimum one, min(Z(G)), is called the minimum DFS code of G. It is also a canonical label of G.

can be solved by existing sequential pattern mining algorithms. In fact, to construct a valid DFS code, b must be an edge which only grows from the vertices on the rightmost path. In Fig. 4, the graph shown in Fig. 4a has several potential children with one edge growth, which are shown in Fig. 4b–f (assuming that the darkened vertices constitute the rightmost path). Among them, Fig. 4b–d grow from the rightmost vertex, while Fig. 4e, f grow from other vertices on the rightmost path. Fig. 4b.0–b.3 are children of Fig. 4b, and e.0–e.2 are children of Fig. 4e. Backward edges can only grow from the rightmost vertex, while forward edges can grow from vertices on the rightmost path. The enumeration order of these children is enhanced by the DFS lexicographic order; i.e., it should be in the order of Fig. 4b–f. Algorithm 1 of gSpan is shown below:

Algorithm 1 gSpan Input: s: a DFS code; D: a graph set; minsup: the minimum support threshold; Output: S: a set of frequent graphs; Call gSpan(s,D,minsup,S) Procedure gSpan(s,D,minsup,S) 1: if s d f s s then 2: return 3: end if 4: add s to S 5: C 6: scan D for once, find out all edges e that make s be extended to rightmost s insert s r e to C and compute it’s support 7: sort C in DFS lexicographic order 8: for each frequent s r e in C do 9: gSpan(s r e,D,minsup,S) 10: end for 11: return

r e,

Theorem 2.1 Given two graphs G and G0 , G is isomorphic to G0 , if and only if minðGÞ ¼ minðG0 Þ. (proof omitted)

Uncertain Graph Classification Algorithm Based on ELM

Thus, the problem of mining frequent connected subgraphs is equivalent to mining their corresponding minimum DFS codes. This problem turns out to be a sequential pattern mining problem with a slight difference, which conceptually

In this section, we introduce the classification algorithm proposed in this paper in detail. We propose a framework to address the problem of classifying uncertain graph objects as

123

352

Cogn Comput (2015) 7:346–358

Fig. 4 DFS code/graph growth

(a)

(b)

shown in Fig. 5. The framework mainly contains three separate components: (1) mining frequent subgraphs, (2) extraction of features, and (3) construction of the classifier. The first step is to mine all frequent subgraphs in each class of the uncertain graphs with the same label from the training set, which are used as a candidate set of classification features. The second step is to extract a part of the subgraphs from the above set as classification features through the feature extraction algorithm; the selected frequent subgraphs have very strong distinguishing ability. Next, each uncertain graph is mapped into a feature space; that is, we use a one-dimensional vector of length n (where n is the number of classification features) to represent an uncertain graph. Finally, a classifier is constructed with the ELM algorithm. In this framework, mining frequent subgraphs and then extracting features are the basic work for constructing the uncertain graph classifier. Based on discriminative features selected in the previous steps, we train the classifier using the ELM technique due to its high learning speed. These three components play the same important role in the framework of uncertain graph classification. We explain the three steps in detail below. Mining Frequent Subgraphs In this work, we improved gSpan in two aspects: (1) discover all embedded graphs of s in an uncertain graph G, rather than judge whether s exists in G or not; and (2) calculate the support of the graph s in an uncertain graph G. Apart from these two improvements, there is no essential difference between the improved gSpan and the traditional one; the detailed process can be found in Subsection ‘‘gSpan’’. Next, the improvements are discussed in detail.

123

(c)

(d)

(e)

(f)

Mining All Embedded Graphs (MAEG) Given two graphs G and G0 , we need to discover all embedded graphs of G0 in G. Assume a ¼j EðGÞ j, b ¼j EðG0 Þ j. In the algorithm, we use a one-dimensional vector F ¼ ðF1 ; F2 ; . . .; Fi ; . . .; Fa Þ of length a to represent the usage of each edge in dfs(G): the ith edge has been used for Fi ¼ 1, while Fi ¼ 0 indicates the opposite. Similarly, let a one-dimensional vector H ¼ ðH1 ; H2 ; . . .; Hi ; . . .; Hb Þ of length b represent the mapping from dfs(G0 ) to dfs(G): Hi ¼ j represents that the ith edge in dfs(G) has been mapped to the jth edge in dfs(G), while Hi ¼ 0 represents that the ith edge in dfs(G0 ) has not been mapped to any edge of dfs(G). Algorithm 2 MAEG(G,G ,F,H ,d) Input: G: The minimum DFS code in an uncertain graph G, that is dfs(G); G : dfs(G ); F: one-dimensional vector F with the length of a, every initial value is zero; H: one-dimensional vector H with the length of b, every initial value is zero; d: The depth of search, the initial value is zero; Output: R: The set of embedded graphs found in Algorithm 1, and the initial value is null; 1: if d b then 2: add the embedded graph corresponding to F to R 3: return 4: end if 5: for 1 k a do 6: if Fk 0 AND the vertex label and the edge label of the kth edge in dfs(G) is not equal to the vertex label and the edge label of the dth edge in dfs(G ) then 7: goto 5 8: end if 9: end for 10: Hd = k 11: Fk = 1 12: MAEG(G,G’,F,H,d+1) 13: Fk = 0

Cogn Comput (2015) 7:346–358

353

Fig. 5 Classification framework for uncertain graph data

Lines 1–4 in Algorithm 2 are to test whether an embedded graph of G0 in G is discovered or not. If it is found, the DFS code of the embedded graph is added to R. The next step is to enumerate all edges that have not been used but meet the conditions (lines 5–6) and then set vector H and F (lines 10–11). Line 12 is to call this algorithm recursively, which can get into the next round of the test. Line 13 is to go back to the first round of the test with the purpose of finding all embedded graphs. We find all embedded graphs of G0 in G and store them in the set R through Algorithm 2.

Algorithm 3 CalPro(s,G) Input: s: A subgraph need to be queryed; G: An uncertain graph; Output: The probability of s in G; 1: Find out embedded graph set R s1 s2 rithm 2 2: p 0 3: for 1 k R do 4: p p 1 k 1 1 i1 ik k e E 5: end for 6: return p

si

Si1

of s in G through Algo-

E Sik

P e

The algorithm 4 of judging whether a certain graph s is frequent subgraph to an uncertain graph set D is as follows:

Calculating the Support The calculation of the support of a certain graph s in an uncertain graph G by Eqs. (1) and (2) requires the list of all certain graphs implicated in G and the probability of their existence. Next, we judge whether s is isomorphic to each of these or not. If so, the difficulty of the algorithm will be increased. Here, we simplify Eq. (2) through mathematical operations: Theorem 3.1 In this work, we assume that there are n embedded graphs ðs1 ; s2 ; . . .; sn Þ of s in the uncertain graph G, then X Y X Y ð PðeÞÞ  ð PðeÞÞ Pðs; GÞ ¼ 1  i  n e2EðSi Þ

þ    þ ð1Þt1

1  i  j  n e2ðEðSi Þ[EðSj ÞÞ

X

ð

Y

PðeÞÞ

1  i1    it  n e2ðEðSi1 Þ[[EðSit ÞÞ

þ    þ ð1Þn1

X

ð

Y

PðeÞÞ:

1  i1    in  n e2ðEðSi1 Þ[[EðSin ÞÞ

ð14Þ

According to Eq. (14), we present Algorithm 3 for calculating the probability of a certain graph s in an uncertain graph G as follows:

Algorithm 4 JudgeFreGra(s,D,mi nsup) Input: s: A subgraph need to be queryed; D: An uncertain graph set D; minsup: A minimum threshold; Output: Boolean: Whether s is frequent subgraph for D; 1: Calculate P(s,G i ) for each uncertain graph G i in D through Algorithm 3 2: sup(s,D) = 1 i D P s G i 3: return sup s D mi n sup)

Extraction of Features For a given training database DS ¼ fðGi ; Ci Þ j i ¼ 1; 2; . . .; ng, where Gi is a graph and Ci is the label of Gi . We use the improved gSpan to deal with uncertain graphs with the same label Ci of Di , and we can discover all frequent subgraphs in the set Di . However, this algorithm produces too many subgraphs. This large amount of subgraphs is unacceptable for our classification algorithm, leading to three main problems: (1) increasing the training time of the classification model, (2) reducing the accuracy of the classifier, and (3) increasing the processing time for the classification. To solve the problems mentioned above, we adopt a scoring function to select a subset from all found features on the condition of guaranteeing the accuracy of the

123

354

classification model. The principle of the scoring function is: the higher the frequency in a specific graph set Di and the lower the frequency in other graph set Dj ðj 6¼ iÞ, the higher the score will be. The scoring function of a subgraph s in Di is defined as follows: ! supðs; Di Þ : ð15Þ Scoreðs; Di Þ ¼ ln P j6¼i supðs; Dj Þ As can be seen, Eq. (15) leads to singularities when P j6¼i supðs; Dj Þ ¼ 0. In this case, we assume Score(s,Di ) is positive infinity. Given another threshold minsco, for every class, if a graph s meets the condition Scoreðs; Di Þ minsco, it is added to the final feature set C. Here, we need to keep in mind that, when choosing minsco, we should ensure that there is at least one frequent subgraph selected and added to the final feature set C for every uncertain graph set Di with the same class label of uncertain graph set. Construction of the Classifier After the two steps above, we have found a feature set C. We map uncertain graphs into the feature set C; that is, we use a one-dimensional vector F of length j C j to represent an uncertain graph G, where the value Fi of the ith component in F represents the probability of existence for the corresponding feature si of it; that is, Fi ¼ Pðsi ; GÞ. Then, the vector F is normalized. We convert all uncertain graphs in the training set into one-dimensional feature vectors of length j C j and combine these vectors, then we can get a feature matrix. Due to the characteristics of the graph data and the existing problems of the classification processing itself, not all classification algorithms are suitable to construct the classifier. Based on the following reasons, this paper chooses an equality constrained optimization strategy based on ELM to construct the classifier: (1) The feature matrix is a sparse matrix, because most uncertain graphs are rarely part of the feature set C; (2) The problem we deal with is multiclassification; (3) The learning efficiency of the ELM algorithm is very high, while not affecting the classification efficiency. In this algorithm, the number of features is much less than the number of training samples, so we adopt Eq. (13) to solve the problem.

Experiments and Performance Evaluation Experimental Environment We conducted a large number of experiments to evaluate the performance of the proposed algorithms. The algorithm

123

Cogn Comput (2015) 7:346–358

for feature extraction was implemented using C??, whereas the classification algorithm was implemented through MATLAB simulations. All experiments were performed on a 2-GHz Intel Core2 Duo PC with 3 GB of memory, running Windows 7. Data Selection In our experiments, we used a protein database and a compound structure database. The protein structures of the protein database were obtained from the Protein Data Bank [31] and classified (listed in Table 2) using Structural Classification of Proteins (SCOP) [32]. We randomly chose four superfamilies from the SCOP class, which are, in order, All alpha proteins, All beta proteins, Alpha and beta proteins (a/b), and Alpha and beta proteins (a ? b). Then, we chose six families from the All alpha proteins, which are, in order, Bacillus cereus metalloprotein-like, C-terminal domain of alpha and beta subunits of F1 ATP synthase, Cellulases catalytic domain, Cytochrome c3-like, Calmodulin-like, and S100 proteins; Five families were chosen from the All beta proteins, which are, in order, NF-kappa-B/REL/ DORSAL transcription factors, C-terminal domain, E-set domains of sugar-utilizing enzymes, V set domains (antibody variable domain-like), C1 set domains (antibody constant domain-like), and I set domains; We chose four families from the Alpha and beta proteins (a/b), which are, in order, Adenosine/AMP deaminase, Class I aldolase, Class I DAHP synthetase, and Xylose isomerase. Three families were chosen from the Alpha and beta proteins (a ? b), which are, in order, Ubiquitin-related, First domain of FERM, and Double-stranded RNA-binding domain (dsRBD). The four selected superfamilies were named as BASE1  BASE4. We carried out a classification test on the data of BASE1  BASE4, and they are multiclassification problems of class 3  6, respectively. As stated in the literature [10], we obtained the protein graphs according to the following rules: To generate a protein graph, each graph node denotes an amino acid, whose location is represented by the location of its alpha carbon. There is an edge between two nodes if the distance between the two alpha carbons is \11.5 angstroms. Nodes are labeled based on their amino acid type, and edges are labeled with the distances between the alpha carbons. On average, each protein graph has 210 nodes and 2,200 edges. The database of compound structures obtained from PubChem [33] was classified by compound activity (listed in Table 3). In this paper, we only chose compounds as our classification data if the proteins are biologically active. We randomly chose 11 groups of NCI cancer biology identification compounds, which are assigned into four databases (BASE5  BASE8). BASE5 contains six groups

Cogn Comput (2015) 7:346–358

355

of compounds, which are, in order, Lung cancer (NCI11), Melanoma (NCI23), Prostate cancer (NCI41), Nervous sys. Tumor (NCI45), Small Cell Lung cell (NCI61), and Colon cancer (NCI81); BASE6 contains five groups of compounds, which are, in order, Lung cancer (NCI11), Nervous sys. Tumor (NCI45), Breast cancer (NCI83), Ovarian tumor (NCI109), and Leukemia (NCI123); BASE7 contains four groups of compounds, which are, in order, Melanoma (NCI23), Breast cancer (NCI83), Renal cancer (NCI145), and Yeast Anticancer Drug (NCI161); BASE8 contains three groups of compounds, which are, in order, Colon cancer (NCI81), Leukemia (NCI123), and Yeast Anticancer Drug (NCI161). We carried out a classification test on the data of BASE5  BASE8, showing that they are multiclassification problems of class 3  6, respectively. We obtained the compound graphs according to the following rules: Each atom is represented by a node labeled with the atom type, and each chemical bond is represented by an edge labeled with the bond type. On average, each compound graph has 48 nodes and 54 edges. The data obtained as described above are certain graph data, and we added a probability p (0\p  1) to each edge of all the graphs randomly to obtain uncertain graphs. Parameter Selection In this algorithm framework, there are two key parameters in the processes of frequent subgraph mining and feature extraction, named min_sup and minsco, respectively. In the stage of frequent subgraph mining, the smaller the value of min_sup chosen, the more frequent subgraphs will be generated, which increases the processing time. In the stage of feature extraction, the smaller the value of minsco chosen, the more features will be extracted, which will be helpful for classification but will increase the computational complexity of the classifiers. Therefore, we need to choose appropriate values for these parameters to balance the conflict between processing time and classification efficiency. In our experiments, we adopted the following strategies to determine the values of min_sup and minsco. For minsco, we set it to 0, 1, 2, 4, 8, etc. in turn and chose the largest one on the condition of ensuring that there are frequent subgraphs selected as the final features in each

Table 3 List of selected compound data Database

Number of selected compounds

Classes

BASE5

12,778

6

BASE6

12,818

5

BASE7

979

4

BASE8

901

3

class, which is called the best value of minsco. For min_sup, we set it to 0.1, 0.2, 0.4, 0.8, etc. in turn and chosen the smallest one such that the final number of selected features will not affect the efficiency of the classifier. This is called the best min_sup. Tables 4 and 5 present the number of frequent subgraphs and features generated from the SCOP database and compound database, with the condition of the best min_sup and minsco. ELM Classification Efficiency For the classification strategy, we used fivefold cross-validation to evaluate the efficiency of the ELM classification. Tables 6 and 7 present the classification efficiency for the SCOP database and compound database, respectively, with different excitation functions (Gaus, Sig, and radial basis function [RBF]). We evaluate the classification capacity using the following measure: accuracy rate ¼

the number of data classified correctly the total number of data in the database

Table 6 presents the training time and testing accuracy rate of the SCOP database ðBASE1  BASE4Þ using the Table 4 Number of frequent subgraphs and features generated from the SCOP database Database

Number of frequent subgraphs

Number of features

BASE1

260

159

BASE2

223

101

BASE3

165

87

BASE4

109

64

Table 5 Number of frequent subgraphs and features generated from the compound database

Table 2 List of selected SCOP data Database

Number of selected proteins

Classes

Database

Number of frequent subgraphs

Number of features

BASE1

338

6

BASE5

848

227

BASE2

309

5

BASE6

694

132

BASE3

242

4

BASE7

473

136

BASE4

148

3

BASE8

255

73

123

356 Table 6 Training time and accuracy rate of SCOP database

Cogn Comput (2015) 7:346–358

Database

ELM Gaus

Table 7 Training time and accuracy rate of compound database

Table 8 Comparison of training and accuracy rate of BP, SVM, and ELM on SCOP database

Table 9 Comparison of training and accuracy rate of BP, SVM, and ELM on compound database

Sig

Time (s)

Accuracy rate (%)

Time (s)

Accuracy rate (%)

Time (s)

Accuracy rate (%)

BASE1

0.0078

58.03

0.0209

60.38

0.0264

58.96

BASE2

0.0101

64.59

0.0312

63.79

0.0279

65.17

BASE3

0.0024

65.88

0.0028

66.74

0.0048

63.45

BASE4

0.0000

76.51

0.0000

79.57

0.0035

77.89

Database

ELM Gaus

Sig

RBF

Time (s)

Accuracy rate (%)

Time (s)

Accuracy rate (%)

Time (s)

Accuracy rate (%)

BASE5

0.0749

54.60

0.1721

55.23

0.1664

54.35

BASE6

0.0442

62.18

0.1660

60.59

0.1957

60.86

BASE7

0.0707

68.61

0.1348

65.19

0.1191

64.27

BASE8

0.0769

73.10

0.1320

72.74

0.0973

73.57

Database

ELM

BP

SVM

Time (s)

Accuracy rate (%)

Time (s)

Accuracy rate (%)

Time (s)

Accuracy rate (%)

BASE1

0.1092

63.4

0.8736

61.8

0.7956

66.4

BASE2

0.0780

64.3

0.9048

62.1

0.8236

57.2

BASE3

0.0780

66.7

0.8736

70.9

0.9516

72.8

BASE4

0.0936

75.4

0.8580

77.6

0.9672

79.2

Database

ELM

BP

SVM

Time (s)

Accuracy rate (%)

Time (s)

Accuracy rate (%)

Time (s)

Accuracy rate (%)

BASE5

0.2964

76.4

52.71

73.4

44.45

62.4

BASE6

0.2340

61.2

42.99

77.6

23.42

74.2

BASE7

0.2184

67.6

38.52

77.6

27.63

76.5

BASE8

0.1716

74.6

33.48

71.3

17.57

71.1

algorithm based on ELM, with different excitation functions (Gaus, Sig, and RBF). Under the excitation function Sig, BASE2 has the longest training time of 0.0312 s, which is acceptable. Under the random distribution, the accuracy rates of BASE1  BASE4 are, in order, 16.7, 20, 25, and 33.3 %. Using the algorithm proposed in this paper, BASE1 has the best classification effect. The accuracy rate of BASE1 is 60.36 %, under the excitation Sig, which is 3.6 times as accurate as the accuracy rate of the random distribution. Even for the worst classification effect, the accuracy rate of BASE4 is 76.51 %, under the excitation function Gaus, which is 2.3 times as accurate as the accuracy rate of the random distribution.

123

RBF

Table 7 presents the training time and testing accuracy rate of the compound database ðBASE5  BASE8Þ using the algorithm based on ELM, with different excitation functions (Gaus, Sig, and RBF). Under the excitation function Sig, BASE6 has the longest training time of 0.1957 s, which is acceptable. Under the random distribution, the accuracy rates of BASE5  BASE8 are, in order, 16.7, 20, 25, and 33.3 %. Using the algorithm proposed in this paper, BASE5 has the best classification effect. The accuracy rate of BASE5 is 55.23 %, under the excitation Sig, which is 3.3 times as accurate as the accuracy rate of the random distribution. Even for the worst classification effect, the accuracy rate of BASE4 is 72.74 %, under the

Cogn Comput (2015) 7:346–358

excitation function Sig, which is 2.2 times as accurate as the accuracy rate of the random distribution.

357

objects. A large number of experiments were carried out to verify the efficiency and effectiveness of our proposed algorithms.

Training Time and Accuracy Rate Comparison To verify the effectiveness of the proposed algorithm, we adopted ELM, back-propagation (BP), and support vector machine (SVM) to construct the classifier, according to the proposed classification framework and feature extraction algorithm. BP and SVM were realized through the packages MATLAB and LIBSVM, respectively. LIBSVM is designed by Lin (Lin Chih-Jen) et al., an associate professor at Taiwan University. As an effective package for SVM pattern recognition and regression, its advantage is its ease of use for the programmer. SVM chooses a sigmoidal kernel function. For the ELM algorithm, the number of hidden-layer nodes that we selected was 60; for BP, the number was 10; for the SVM algorithm, we set the parameter C (the penalty parameter) as 100. Table 8 presents the training time and accuracy rate comparison for constructing the classifier for the SCOP database ðBASE1  BASE4Þ. The number of training data is 200; the difference of the classification accuracy between three algorithms is little, while the average training time of ELM is more than 10 times faster than for BP or SVM. Table 9 presents the training time and accuracy rate comparison for constructing the classifier for the compound database ðBASE5  BASE8Þ. In this experiment, we add training data to around 10,000. We can see that the superiority of the ELM algorithm is obvious; the difference of the classification accuracy between three algorithms is little, while the training time of ELM is more than 100 times faster than for BP or SVM.

Conclusions With the use of advanced hardware and software technologies, uncertain big data are widely produced in many applications. As a basic data structure, the graph is suitable for modeling some kinds of uncertain big data. The task of classifying uncertain graph data, which is one of the most important components in cognitive computation, now faces huge challenges. In this paper, we investigate the problem of efficiently classifying uncertain graphs and propose a framework to classify uncertain graph objects. In this framework, we first extend the gSpan algorithm to mine all frequent subgraphs from an uncertain graph database. These frequent subgraphs form a candidate set of features. Then, a part of the discriminative features are extracted from the candidate set as final features. Finally, based on the ELM technique, we train the multiclass classifier over uncertain graph data

Acknowledgments This research was partially supported by the National Natural Science Foundation of China under Grant Nos. 61173029 and 61272182; New Century Excellent Talents in University (NCET-11-0085).

References 1. Taylor JG. Cognitive computation. Cogn Comput. 2009;1(1): 4–16. 2. Wollmer M, Eyben F, Graves A, Schuller B, Rigoll G. Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework. Cogn Comput. 2010;2(3):180–90. 3. Mital P, Smith T, Hill R, Henderson J. Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn Comput. 2011;3(1):5–24. 4. Cambria E, Hussain A. ‘‘Sentic computing: techniques, tools, and applications’’, SpringerBriefs in cognitive computation. Dordrecht: Springer; 2012. 5. Wang Q, Cambria E, Liu C, Hussain A. Common sense knowledge for handwritten Chinese recognition. Cogn Comput. 2013;5(2):234–42. 6. Xu Y, Guo R, Wang L. A twin multi-class classification support vector machine. Cogn Comput. 2013;5(4):580–8. 7. Tsivtsivadze E, Urban J, Geuvers H, Heskes T. Semantic graph kernels for automated reasoning. In: SDM, 2011. pp. 795–803. 8. Tamas H, Thomas G, Stefan W. Cyclic pattern kernels for predictive graph mining. In: KDD, 2004. pp. 158–167. 9. Thomas G, Peter F, Stefan W. On graph kernels: hardness results and efficient alternatives. In: COLT, 2003. pp. 129–143. 10. Jin N, Young C, Wang W. GAIA: graph classification using evolutionary computation. In: SIGMOD, 2010. pp. 879–890. 11. Thoma M, Cheng H, Gretton A, Han J, Kriegel H-P, Smola AJ, Song L, Yu PS, Yan X, K.M. Borgwardt: near-optimal supervised feature selection among frequent subgraphs. In: SDM, 2009. pp. 1075–1086. 12. Jin N, Young C, Wang W. Graph classification based on pattern co-occurrence. In: CIKM, 2009. pp. 573–582. 13. Kong X, Yu PS. Semi-supervised feature selection for graph classification. In: SIGKDD, 2010. pp. 793–802. 14. Bifet A, Holmes G, Pfahringer B, Gavald R. Mining frequent closed graphs on evolving data streams. In: SIGKDD, 2011. pp. 591–599. 15. Yan X, Han J. gSpan: graph-based substructure pattern mining. ICDM, 2002. pp. 721–724. 16. Parthasarathy S, Tatikonda S, Duygu U. A survey of graph mining techniques for biological datasets. In: Managing and mining graph data, 2010. pp. 547–580. 17. Jiang C, Coenen F, Zito M. A survey of frequent subgraph mining algorithms. In: Knowledge Engineering Review, 2013. pp. 75–105. 18. Thoma M, Cheng H, Gretton A, Han J, Kriegel HP, Smola A, Orgwardt KM. Discriminative frequent subgraph mining with optimality guarantees. Stat Anal Data Min. 2010;3(5):302–18. 19. Shelokar P, Quirin A, Cordn O. MOSubdue: a Pareto dominancebased multiobjective subdue algorithm for frequent subgraph mining. Knowl Inf Syst. 2013;34(1):75–108. 20. Huang G-B, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1):489–501.

123

358 21. Zhao Z, Chen Z, Chen Y, Wang S, Wang H. A class incremental extreme learning machine for activity recognition. Cogn Comput. 2014. doi:10.1007/s12559-014-9259-y. 22. Wang X, Shao Q, Miao Q, Zhai J. Architecture selection for networks trained with extreme learning machine using localized generalization error model. Neurocomputing. 2013;102:3–9. 23. Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst. 2012;42(2):513–29. 24. Miche Y, Sorjamaa A, Bas P, Simula Q, Jutten C, Lendasse A. OP-ELM: optimally pruned extreme learning machine. Neural Netw. 2010;21(1):158–62. 25. Mishra A, Goel A, Singh R, Chetty G, Singh L. A novel image watermarking scheme using extreme learning machine. In: Neural Networks (IJCNN), 2012. pp. 1–6. 26. Huang G-B, Wang DH, Lan Y. Extreme learning machines: a survey. Int J Mach Learn Cybern. 2011;2(2):107–22.

123

Cogn Comput (2015) 7:346–358 27. Zong W, Huang G-B. Learning to rank with extreme learning machine. Neural Process Lett. 2013;39(2):1–12. 28. Zong W, Huang G-B, Chen Y. Weighted extreme learning machine for imbalance learning. Neurocomputing. 2013;101:229–42. 29. Fletcher R. Practical methods of optimization. John Wiley & Sons, 2013. p. 2. 30. Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms. In: Constrained optimization, 2001. 31. The protein structure. Retrieved May 6, 2013 from http://www. rcsb.org/pdb/. 32. Structural classification of proteins. Retrieved May 10, 2013 from http://scop.mrc-lmb.cam.ac.uk/scop/. 33. The database of compound structures. Retrieved May 8, 2013 from http://pubchem.ncbi.nlm.nih.gov.