Learning Hidden Structures with Relational Models by ... - arXiv

arXiv:1310.1545v1 [cs.LG] 6 Oct 2013

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054

Learning Hidden Structures with Relational Models by Adequately Involving Rich Information in A Network

Abstract Effectively modelling hidden structures in a network is very practical but theoretically challenging. Existing relational models only involve very limited information, namely the binary directional link data, embedded in a network to learn hidden networking structures. There is other rich and meaningful information (e.g., various attributes of entities and more granular information than binary elements such as “like” or “dislike”) missed, which play a critical role in forming and understanding relations in a network. In this work, we propose an informative relational model (InfRM) framework to adequately involve rich information and its granularity in a network, including metadata information about each entity and various forms of link data. Firstly, an effective metadata information incorporation method is employed on the prior information from relational models MMSB and LFRM. This is to encourage the entities with similar metadata information to have similar hidden structures. Secondly, we propose various solutions to cater for alternative forms of link data. Substantial efforts have been made towards modelling appropriateness and efficiency, for example, using conjugate priors. We evaluate our framework and its inference algorithms in different datasets, which shows the generality and effectiveness of our models in capturing implicit structures in networks.

1. Introduction Learning hidden structures within a network is an emergent topic in various areas including social-media recommendation (Tang & Liu, 2010), customer partitioning, social network analysis, and partitioning protein interaction networks (Girvan & Newman, 2002; Preliminary work. Under review by the International Conference on Machine Learning (ICML). Do not distribute.

Fortunato, 2010). Many models have been proposed in recent years to address this problem by using linkage information such as a person’s view towards others. Examples include stochastic blockmodel (Nowicki & Snijders, 2001) and its infinite communities case infinite relational model (IRM) (Kemp et al., 2006), both aiming at partitioning a network of entities into different groups based on their pairwise, directional binary observations. The “inter-nodes” link data used in existing approaches contributes to the explicit insight on social structures. On the other hand, the “intra-nodes” metadata information could complement the disclosure of hidden implicit relations structures. Let us take the Lazega lawfirm network (detailed in Section 6) as an example, which contains both link and metadata information: The metadata information include attributes such as offices (Boston, Hartford or Providence), law schools (harvard, yale, ucon or other) and age associated with each of the entities (attorneys). Naturally, people with similar attributes tend to have relationship with each other. The directional link data including elements such as basic advice frequency, co-work time and friendship. These elements however, may take different forms. Here the first two elements can be represented by integer values (i.e. count link data), while friendship may better be represented by a real value on the unit interval (i.e. unit link data). If they have to be “binarized” as in the case of the existing models, then the lost of granularity could lead to poorer utilization of information. While some of the most recent efforts are directed to involve more information, they all face with some shortcomings. For instance, in LFRM (Miller et al., 2009), to generate the link data, a combination of metadata information with the feature matrix introduces the ambiguity into the role of latent feature matrix. In terms of efficient inference, as shown in NMDR (Kim et al., 2012), the logistic-normal transform was employed to integrate the metadata information into each entity’s membership distribution. However, this integration complicates the original structures and leads to a non-conjugacy. In terms of linkdata modelling, in order to use count link data as observations, (Yang et al., 2011) has used geometric

055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109

Title Suppressed Due to Excessive Size

110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164

distribution as its likelihood model. This implies the probabilities of counts decreases monotonically, which may not be applicable in a general setting.

2. Relational Models & Notations

We propose a new informative relational model (InfRM) framework, which incorporates both the rich metadata information and granular link data in a sensible and efficient way. To integrate the metadata information, a new transform is proposed, and the corresponding result is placed as the prior for the communities of each entity. This enables the similarity of metadata information between entities to be reflected in their corresponding hidden structures. Inspired by the existing benchmark relational models of MMSB (Airoldi et al., 2008) and LFRM (Miller et al., 2009), we individually model the hidden structure as mixed memberships and latent features respectively. This lead to informative mixed membership model (InfMM) and informative latent feature model (InfLF) which is the centerpiece of this paper. The stick breaking process (Sethuraman, 1994)(Teh et al., 2007) alike methods were proposed to model the unknown number of communities. In particular, our InfMM model successfully gains the conjugate property. As discussed in Section 3.1, through these efforts, the existing models can be seen as the special cases of our proposed models.

The stochastic blockmodel (Nowicki & Snijders, 2001) assumes that each entity has a latent variable that directly represents its community membership. Each of the fixed number of communities associates with a weight, and the whole weight vector can be seen as a draw from a K-dimensional Dirichlet distribution. Naturally, the community memberships are realized from the multinomial distribution parameterized by the weight vector. The binary link data between two entities is determined by their belongingness communities. This model has been extended to an infinite K community, i.e., infinite relational model (IRM) (Kemp et al., 2006) where the Dirichlet distribution has been replaced by a Dirichlet process.

In addition, we designed a set of solutions to model the various forms of link data, including the count and unit link data. We have chosen the likelihood and prior model carefully for both of their practical appropriateness and computation efficiencies. Their effects have been demonstrated in our experiments. As a result, our models capture much richer information embedded in a network, thus leading to better performance as illustrated in Section 6 in modelling hidden structures. The rest of the paper is organized as follows. Section 2 introduces the relational models and necessary notations for our work. In Section 3, we describe the InfRM framework for integrating metadata information into each entity’s hidden structure, including InfMM and InfLF. The generation distributions proposed for different forms of link data are provided in Section 4. Section 5 discusses the sampling methods and computational complexity analysis. Experiments in Section 6 compare our methods with the previous work and validate our model performance. Conclusions and future works are in Section 7.

2.1. Relational Models

Various recent work has been proposed to capture the complex interactions amongst entities based on stochastic blockmodel, which can be categorized into two notable branches, both are a generalisation of stochastic blockmodel. The first branch features the latent feature relational model (LFRM) (Miller et al., 2009): instead of associating an entity with only a single feature, i.e., its membership indicator, it allows a variable number of binary features to be associated with each entity. The second branch follows the mixedmembership stochastic blockmodel (MMSB) model, in which each entity has its own community distribution, hence having a “mixed” class of interactions with other entities. The LFRM-like work was originated from (Hoff et al., 2002; Hoff, 2005), while it assumes a latent realvalued feature vector for each entity. The LFRM in (Miller et al., 2009) uses a binary vector to represent latent features of each entity, and the number of features of all entities can potentially be infinite using an Indian Buffet Process prior (Griffiths & Ghahramani, 2006; 2011). The work in (Palla et al., 2012) further uncovers the substructure within each feature and uses the “co-active” features from two entities during generating their link data. On the MMSB-typed work, a few variants have been subsequently proposed, including (Koutsourelakis & Eliassi-Rad, 2008) which extends the MMSB into the infinite community case and (Ho et al., 2012) which uses the nested Chinese Restaurant Process (Blei et al., 2010) to build the hierarchical structure of communities. 2.2. Notations All notations in this paper are given in Table 1.

165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219


220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274

Table 1. Notations for our InfRM

n K F φ η eij sij , rij zi πi B

number of entities number of discovered communities number of attributes in metadata an n × F binary matrix, φif = 1 denotes the ith data occupies the f th attribute an F × K positive matrix, ηf k indicates the importance of f th attribute to k th roles. directional, binary interactions membership indicators of eij in MMSB latent feature vector of entity i in LFRM membership distribution for entity i, πik is the significance of community k for entity i asymmetric, role-compatibility matrix, Bkl indicates compatibility of communities k, l

η

φi

φj

C1

ψj

ψi C2

πj

πi

Metadata

C3

Link data

sij

r ij C4

eij

B

3. Involving Metadata Information Figure 1 depicts the generative models of all the variables used in our work. In this paper, metadata information is incorporated into both branches of the stochastic blockmodel described earlier, i.e. MMSB and LFRM. Further, it can be applied to their base mode IRM (Kemp et al., 2006) with the similar approach, which we will not elaborate here.

Figure 1. The generative model for several implementations. Observations are denoted in grey. The part above the dashed line corresponds to involving metadata information in Section 3; the below part corresponds to modeling link information discussed in Section 4. C1 to C4 represent four conditional distributions in two different forms as shown in Sections 3.1 and 3.2, respectively.

3.1. Informative Mixed Membership Model The generative process for informative mixed membership (InfMM) model is defined as follows (W.l.o.g. ∀i, j = 1, . . . , n, k ∈ N + ): C1, ψik ∼ Beta(1, C2, πik = ψik

Q

Qk−1 l=1

f

φ

ηf kif );

(1 − ψil );

C3, sij ∼ Multi(πi ), rij ∼ Multi(πj ); C4, eij ∼ g(Bsij rij ). Here C1 and C2 are the stick breaking representation for our mixed membership distribution πi . C3 and C4 correspond to the membership indicator and link data’s generation, respectively. Detailed elaboration is in (Airoldi et al., 2008; Koutsourelakis & Eliassi-Rad, 2008). We leave equation C4 in its general form, i.e., g(Bsij rij ), which may take on a variety of forms, such as those described in Section 4. We use the attribute age in Lazega lawfirm to further explain the importance indicator ηf k used in C1. W.l.o.g., we let f0th column of φ matrix denote the age attribute of all the entities, φif0 = 1 implies that entity i has age > 40 (in our experimental setting), and

0 otherwise. From equation C1, one can easily see that when ηf0 k ≪ 1, age would largely increase the impact of the k th community. Likewise, ηf0 k ≫ 1 reduces the significance of the age attribute on the k th community. ηf0 k = 1 means that age does not have influence on the k th community at all. When φif0 = 0, it makes age of the entity i neutral towards all other communities. Instead of C1, the NMDR model (Kim et al., 2012; Kim & Sudderth, 2011) uses the logistic normal distribution (with the mean value being the linear sum (i.e., P f φif ηf k )) to construct a stick-breaking weight ψik . While the method can successfully integrate the metadata information into the entity’s membership distribution, it suffers from the lack of conjugacy, which makes inference inefficient. In our approach InfMM, we replaced the logistic normal distribution with a Q φ beta distribution, parameterised by f ηf kif , where the positive, importance indicator ηf k is given a vague gamma prior ηf k ∼ Gamma(αη , βη ). This operation leads to a conjugate property we can enjoy, on both of the importance indicator ηf k and stick-breaking weight ψik . More specifically, the dis-

275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329


330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384

tributions of ηf k , ψik are:

equally, which implies that the model becomes the classical iMMM.

α −1

p(ηf k |αη , βη ) ∝ ηf kη e−βη ηf k ;   Q φif Y φ (1) if p(ψik |η·k , φi ) ∝  ηf k  · (1 − ψik ) f ηf k −1 .

3.2. Informative Latent Feature Model The generative process for informative latent feature (InfLF) model is defined as follows:

f

Thus, the posterior distribution of ηf k becomes: ηf k ∼ Gamma(αη +

X

φif , βη −

X

Y

φif ln(1−ψik )

i

i

ηFφiF k

Q φ C1, ψik ∼ Beta( f ηf kif , 1);

)

F 6=f

C2, πik =

(2)

The joint probability of p({sij }nj=1 , {rji }nj=1 |ψi· )

{sij , rji }nj=1 ∝

K h Y

k=1

PK

l=k+1

Nil

(3) here Nik = #{j : sij = k} + #{j : rji = k}. The posterior distribution of ψik becomes: ψik ∼ Beta(Nik + 1,

K X

l=k+1

Nil +

Y

φ

ηf kif )

l=1

ψil ;

C3, zik ∼ Bernoulli(πik );

becomes:

Nik ψik (1 − ψik )

Qk

(4)

i

C4, eij ∼ g(zi BzjT ). Here C1 and C2 refer to the detailed construction of our specialized stick breaking representation πi . Similar to the traditional stick-breaking process (Sethuraman, 1994)(Teh et al., 2007), they are used to generate the latent feature matrix z in C3. C4 corresponds to the link data’s generation in our model. Our work can be seen as an extension to the traditional LFRM, which can be seen at (Miller et al., 2009).

f

The posterior distribution in Eq. (4) is consistent with the result in (Ishwaran & James, 2001; Kalli et al., 2011), where their result is conditioned on single conQ φ centration parameter α instead of f ηf kif . Another interesting comparison is the placing of prior information for communities within different models. In iMMM, although the author claimed to use different αi to model individual πi , however, each stickbreaking weight ψik within one πi is generated identically, i.e., from beta(1, αi ). This is still insufficient for many practical applications. Accordingly NMDR has incorporated metadata information using logistic normal function, as stated above. In a way, this approach has further generalised the model, such that each ψik differs in their distributions. Despite the model relaxation, empirical results show that NMDR has a slow convergence. It is therefore imperative for us to search for a more efficient way to incorporate the metadata information. Compared to iMMM, our InfMM model replaces unified {αi } with Q φif f ηf k for the generation of ψik . Its conjugate property makes our model appealing in terms of mixing efficiency, which is confirmed in the results shown in Section 6.3. What is more is that our model can be seen as a natural extension of the popular iMMM model. By letting ηf k = α1/F and φif = 1 , we obtain the classical iMMM. This makes sense, as without the presence of metadata, each feature is assumed to be counted

However, our InfLF’s hidden structure differs from the one of LFRM. More specifically, the original LFRM uses one specialized beta process as the underlying representation for all the n entities’ latent feature z. This process can be easily marginalized out in convenient of the Beta-Bernoulli conjugacy (Thibaux & Jordan, 2007). In our InfLF, each ith entity’s latent feature is motivated by their own stick breaking representation πi , i.e., there are n representations in total. Thus, the individual metadata information is contained in each corresponding representation, which will be reflected in the latent feature. Q φ We use the new transform, i.e., f ηf kif , as the mass parameter (Thibaux & Jordan, 2007) in the construction of the stick breaking representation, as stated in C1. The importance indicator η here plays an opposite role when comparing to the InfMM model, i.e., larger value of ηf k would make the present of attribute f promote the k th community. An interesting notation is that the stick breaking representations in both of our InfMM and InfLF are no longer Dirichlet Process and Beta Process individually, as the single valued α parameter is replaced by a set Q φ of individually-different valued { f ηf kif }.

4. Modelling Link Data As stated in the introduction, many real-world applications use directional count and unit link data instead

385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439


440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494

of binary link data. Thus, we need more appropriate generation distributions to model them. We here mainly discuss the MMSB case, the LFRM case and its detailed derivations is included in the Supplementary Material. 4.1. Count Link Data In iMMM, we propose to model the count link data with the following likelihood and prior distributions: eij ∼ Poisson(Bsij rij ) Bkl ∼ Gamma(αB , βB )

Table 2. Computational Complexity for Different Models

Models IRM LFRM MMSB NMDR InfMM InfLF

Computational complexity O(K 2 n) (Palla et al., 2012) O(K 2 n2 ) (Palla et al., 2012) O(Kn2 ) (Kim et al., 2012) O(Kn2 + Kn + F Kn) O(Kn2 + Kn + F Kn) O(K 2 n2 + Kn + F Kn)

5. Inference (5)

5.1. Without Collapsing πi

The parameter Bkl in the Poisson distribution reflects the compatibility between the two communities k and l. The lager Bkl encourages larger eij . Further, we put a vague gamma prior on it as Bkl ∈ R+ . The resultant predictive distribution is the Negative Binomial distribution (Hilbe, 2011) when marginalizing out Bkl

Due to the space limit, the detailed sampling procedures of both InfMM and InfLF models are summarised in the Supplementary Material. In here, we explicitly sample {πi }ni=1 . In fact, most of the conditional distributions used in Gibbs have been stated in the preceding sections.

For the discovered communities k, l in iMMM, we sample Bkl as: X Bkl ∼ Gamma( ei′ j ′ + αB , mkl + βB ) (6)

5.2. πi -Collapsed Sampling for InfMM

i′ j ′

Here i′ j ′ = {ij|sij = k, rij = l}, mkl = #{i′ j ′ }. On calculating an undiscovered community K + 1 in iMMM, we can marginalize the corresponding B value out: Qeij αB βB q=0 (αB + q) · Pr(eij |αB , βB ) = eij ! · (βB + 1)αB +eij αB + eij (7) 4.2. Unit Link Data In modelling the unit Link Data, we use the Beta Distribution instead as the corresponding generation distribution. For the iMMM case, we have: eij ∼ Beta(Bsij rij , 1) Bkl ∼ Gamma(αB , βB )

(8)

Thus, the posterior distribution of Bkl between the discovered communities k, l is: X ln eij ) (9) Bkl ∼ Gamma(mkl + αB , βB − i1 j1

While involving an undiscovered community K + 1, we have p(eij |αB , βB ) =

αB βB αB · eij (βB − ln eij )αB +1

(10)

We have further improved the sampling strategy by collapsing the membership distribution πi for the finite communities case of InfMM, in which we have analysed its computational complexity. Under the condition of finite communities number, inferencing the InfMM by collapsing the mixed-membership distributions {πi }ni is a promising solution. W.l.o.g., the membership indicators’ joint probability for entity i is: Q φif Q f ηf k )) k Γ(Γ(Nik + n n Pr({sij }j=1 , {rji }j=1 |φ, η) ∝ P Q φif Γ(2n + k f ηf k ) (11) Thus, the conditional probability of the membership indicator sij (or rij ) is: Y φ \s Pr(sij = k|z\sij , φ, η) ∝ Nik ij + ηf kif (12) f

Comparing to its counterpart in MMSB (Airoldi et al., 2008): α \s (13) Pr(sij = k|z\sij , φ, η) ∝ Nik ij + K α our collapsed InfMM (cInfMM) replace the term K Q φif in Eq. 13 with f ηf k . In fact, while the MMSB generates the membership distribution πi through the α α ,··· , K ), Dirichlet distribution with parameters ( K our cInfMM’s parameter is vector containing unequal Q φif Q φ ). elements ( f ηf 1if , · · · , f ηf K Due to the unknown information on the undiscovered communities, we are limiting our cInfMM model into this finite communities number case. The extension on the infinite communities case remains a future task.

495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549


550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604

5.2.1. Computational Complexity

6.1. Lazega-lawfirm Dataset

We estimate the computational complexities for each model and present the results in Table 2. Our InfMM and InfLF are O(Kn2 + Kn + F Kn) and O(K 2 n2 + Kn + F Kn) respectively, with O(Kn) for the sampling of {πi }ni=1 and O(F Kn) for the metadata information’s incorporation.

The lazega-lawfirm dataset is about a social networking corporate located in the northeastern part of the U.S. in 1988 - 1991. The dataset contains three different types of relations: co-work network, basic advice network, and friendship network for 71 attorneys, in which each link data is labelled as 1 (exist) or 0 (absent). Apart from these three 71 binary asymmetric matrices, the datasets also provide some metadata information on each of the attorneys, including the status (partner or associate), gender, office (Boston, Hartford or Providence), years with the firm, age, practice (litigation or corporate), law school (harvard, yale, ucon or other). After binarizing these attributes, a 71 × 11 binary metadata information matrix is obtained.

6. Experiments We analyse the performance of our models (InfMM and InfLF) on three real-world datasets: lazegalawfirm dataset (Lazega, 2001), MIT Reality Mining dataset (Eagle & Sandy, 2006) and NIPS Coauthoring dataset (Teh & G¨ or¨ ur, 2009). The Previous works for comparison including IRM (Kemp et al., 2006), LFRM (Miller et al., 2009), iMMM (Koutsourelakis & Eliassi-Rad, 2008) (an infinite community case of MMSB (Airoldi et al., 2008)), and NMDR (Kim et al., 2012) are brought here to validate our framework’s behaviour. We have independently implemented the above benchmark algorithms to the best of our understanding. There have been a slight variation to NMDR, in which we have employed Gibbs sampling to sample the unknown cluster number, instead of Retrospective MCMC (Papaspiliopoulos & Roberts, 2008) used in the original paper. This is because we have set the conjugate priors to their corresponding generation distributions. To validate the model’s prediction accuracy, we use a ten-fold cross-validation, where we randomly select one out of ten for each entity’s link data as testing data and the rest as training data. The criteria for evaluating the prediction capability are the training error (0−1 loss), the testing error (0−1 loss), the testing log likelihood and the AUC (Area Under the roc Curve) score, where these detailed derivations can be found in the Supplementary Material. Also, an extra study is performed on learning the importance indicator of metadata information in the lazega-lawfirm dataset as we have successfully inferred the corresponding η values. At the beginning of the learning process, we set the vague Gamma prior Γ(1, 1) for all the hyperparameters, including αη , βη , αB , βB . The initial states are of random guesses on the hidden labels (membership indicators in MMSB and latent feature in LFRM). For all the experiments, we run chains of 10, 000 MCMC samples for 30 times, assuming 5000 samples are used for burn-in. The average of the remaining 5000 samples are reported.

We conduct the link prediction on the co-work network and show the result in Table 3. Notably, the performance of our implementation of NMDR model is inferior compared to its original (Kim et al., 2012). The reason may be as a result of a sub-optimal metadata binarization process. However, we have shown that with the same attributes, our InfMM performs better than the NMDR in terms of training error, test capability and convergence behaviour (including the burn in samples needed and mixing rate). Another interesting topic here is the learning of the importance indicator η for the attributes in the metadata information. We take a geometric mean value of participating the communities for each attribute and show the detail result in Table 4. As stated in Section 3.1, smaller value indicates larger influence. Table 4. The recovered Importance indicator η in Lazegalawfirm dataset. As we can see, the value of attribute office is the smallest amongst all these values, which indicates it has a largest influence. This phenomenon is consistent with common sense as people in the same place would be more likely to have a relation. Also, the attribute practice is more important than the others in forming this co-work link data.

office years with the firm

status gender

0.5596 0.4996 0.5258 0.8207 0.8371 0.7307

age law school practice

1.0802 0.8114 0.8268 1.1061 0.5585

6.2. MIT Reality Mining Based on the MIT Reality Mining dataset, we obtain a proximity matrix describing each entity’s proximity

605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659


660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714

Datasets

Lazega

Reality

NIPS Coauthor 1 2

Table 3. Performance on Real world Datasets (Mean ∓ Standard Deviation) Models Training error Testing error Testing log likelihood

IRM 0.0987 ∓ 0.0003 LFRM 0.0566 ∓ 0.0024 iMMM 0.0487 ∓ 0.0068 NMDR 0.0640 ∓ 0.0055 InfMM 0.0334 ∓ 0.0056 InfLF 0.0389 ∓ 0.0126 cInfMM1 0.0466 ∓ 0.0092 IRM 0.0627 ∓ 0.0002 LFRM 0.0397 ∓ 0.0017 iMMM 0.0297 ∓ 0.0055 NMDR 0.0386 ∓ 0.0040 InfMM 0.0269 ∓ 0.0047 InfLF 0.0379 ∓ 0.0046 cInfMM1 0.0553 ∓ 0.0023 pInfMM2 0.0212 ∓ 0.0012 piMMM2 0.0223 ∓ 0.0011 IRM 0.0317 ∓ 0.0004 LFRM 0.0482 ∓ 0.0794 iMMM 0.0061 ∓ 0.0019 piMMM2 0.0097 ∓ 0.0048

0.1046 ∓ 0.0012 0.1051 ∓ 0.0064 0.1096 ∓ 0.0026 0.1133 ∓ 0.0018 0.1067 ∓ 0.0021 0.1012 ∓ 0.0034 0.1119 ∓ 0.0020 0.0665 ∓ 0.0004 0.0629 ∓ 0.0037 0.0625 ∓ 0.0015 0.0668 ∓ 0.0013 0.0621 ∓ 0.0015 0.0732 ∓ 0.0049 0.0641 ∓ 0.0011 0.0601 ∓ 0.0012 0.0632 ∓ 0.0013 0.0423 ∓ 0.0014 0.0239 ∓ 0.0735 0.0253 ∓ 0.0035 0.0213 ∓ 0.0022

−201.7912 ∓ 3.3500 −222.5924 ∓ 16.1985 −202.7148 ∓ 5.3076 −207.7188 ∓ 3.4754 −196.0503 ∓ 4.3962 −213.5246 ∓ 12.3249 −205.0673 ∓ 4.5321 −133.8037 ∓ 1.1269 −143.6067 ∓ 10.0592 −126.7876 ∓ 3.4774 −139.5227 ∓ 2.9371 −127.7377 ∓ 3.1313 −131.0326 ∓ 9.4521 −126.9091 ∓ 2.6459 −121.8172 ∓ 4.8434 −131.8161 ∓ 5.6687 −135.0467 ∓ 7.3816 −105.2166 ∓ 179.5505 −83.4264 ∓ 9.4293 −81.49 ∓ 7.9465

AUC 0.7056 ∓ 0.0167 0.8170 ∓ 0.0197 0.8074 ∓ 0.0141 0.8285 ∓ 0.0114 0.8369 ∓ 0.0122 0.8123 ∓ 0.0135 0.8314 ∓ 0.0119 0.8261 ∓ 0.0047 0.8529 ∓ 0.0179 0.8617 ∓ 0.0124 0.8569 ∓ 0.0138 0.8507 ∓ 0.0134 0.8645 ∓ 0.0139 0.8597 ∓ 0.0099 0.8736 ∓ 0.0107 0.8631 ∓ 0.0098 0.8901 ∓ 0.0162 0.9348 ∓ 0.0167 0.9574 ∓ 0.0155 0.9643 ∓ 0.0177

cInfMM is used to denote the πi -collapsed InfMM ; piMMM and pInfMM are used to denote the models of using Poission distribution as the generation distribution.

towards the others, i.e., ei,j represents the proximity from i to j based on participant i’s opinion. The detailed link values indicate the average proximity from one subject to another, which is categorised into 5 values correspondingly. While using previous models (Koutsourelakis & Eliassi-Rad, 2008), we manually set the proximity value larger than 10 minutes per day as 1, and 0 otherwise. We hence obtain a 73 × 73 asymmetric matrix. According to the generation distribution used, we can choose either the integer matrix or its binary version. Alongside this directional link data, we also have a survey data on the entities involving metadata, including the traffic choice to work, personal habitat, social activity, the communication method, and satisfaction of university life. As we can see in Table 3, we find our InfMM’s performance is similar to the ones in iMMM. The reason may be that the metadata information does not correlate with the link data. Our pInfMM and piMMM’s performance is the best among all these models. This validates the necessity of using Poisson distribution while encountering the count link data.

6.3. Convergence Behaviour Trace plot for AUC A trace plot for the AUC value versus iteration time could help us choose an appropriate burn-in length. An earlier reach to the stable status of MCMC is desirable as it indicates fast convergence. Figure 2 shows the detailed results.

Mixing rate for a stable MCMC. Besides the MCMC trace plot, another interesting observation is the mixing rate of the stable MCMC chains. We use the number of active communities K as a function of the updated variable to monitor the mixing rate of the MCMC samples, whereas the efficiency of the algorithms can be measured by estimating the integrated autocorrelation time τ and Effective Sample Size (ESS) for K. τ is a good performance indicator as it measures the statistical error of Monte Carlo approximation on a target function f . The smaller τ , the more efficient of the algorithm. Also, the ESS of the stable MCMC chains informs the quality of the Markov chains, i.e., a larger ESS value indicates more independent useful samples, which is our desired property. On estimating the integrated autocorrelation time, different approaches are proposed in (Geyer, 1992). Here we use an estimator τb (Kalli et al., 2011) and the ESS

715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769

Title Suppressed Due to Excessive Size Table 5. Mixing rate (Mean ∓ Standard Deviation) for different models, with the bold type denoting the best ones within each row. As we can see, our model InfMM performs the best among all the models.

Datasets Lazega Reality

Criteria τˆ ESS τˆ ESS

iMMM 166.2 ∓ 90.37 77.6 ∓ 38.71 184.9 ∓ 78.88 62.5 ∓ 22.70

LFRM 310.6 ∓ 141.95 40.7 ∓ 26.26 113.4 ∓ 77.35 125.5 ∓ 71.93

NMDR 179.8 ∓ 156.96 134.3 ∓ 133.12 142.8 ∓ 129.99 185.0 ∓ 206.12

0.9

0.9

0.85

0.85

0.8

0.8

0.75

0.75 Test AUC

Test AUC

770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824

0.7 0.65

InfLF 149.2 ∓ 126.12 61.2 ∓ 59.93 134.2 ∓ 163.23 71.24 ∓ 48.74

0.7 0.65

0.6

iMMM NMDR InfMM cInfMM pInfMM piMMM

0.6 iMMM NMDR InfMM cInfMM

0.55 0.5

InfMM 39.1 ∓ 40.58 341.8 ∓ 132.00 27.8 ∓ 22.49 449.7 ∓ 181.37

0

2000

4000

6000

8000

0.55 0.5

10000

Iterations

0

2000

4000

6000

8000

10000

Iterations

Figure 2. Left: Lazega dataset; Right: MIT Realtiy dataset. Trace plot of the AUC value versus iteration time in different MMSB type models. As we can see, except for NMDR, all the other models reach the stable status quite fast. On the lazega dataset, our InfMM and cInfMM outperform all the others. On the MIT Reality dataset, our InfMM and cInfMM do not perform better than iMMM. However, our pInfMM and piMMM perform the best compared to other models.

value is calculated based on τb as: τb =

1 + 2

C−1 X l=1

ρbl ; ESS =

2M . 1 + τb

entities are used. As the detail result shown in Table 3, our piMMM performs better than the classical iMMM. (14)

Here ρbl is the estimated autocorrelation at lag l and C is a cut-off √ point which is defined as C := min{l : |b ρl | < 2/ M }, and M equals to half of the original sample size, as the first half is treated as a burn in phase. The detailed results are shown in Table 5. 6.4. NIPS Coauthor Dataset We use the co-authorship as a relation gained from the proceedings of the Neural Information Processing Systems (NIPS) conference for years 2000-2012. Due to the sparsity of the co-authorship, we observe the author activities in all 13 years (i.e. regardless of the time factor) and set the link data being 1 if two corresponding authors have co-authored on no less than 2 papers, which is to remove the co-authoring randomness. Further, the authors with less than 4 relationships with others are manually eliminated. Thus, a 92 × 92 symmetric, binary matrix is obtained. We focus on the count link data’s modelling in this dataset, where the actual link data among these 92

7. Conclusions & Future work Increasing applications with natural and social networking behaviors request the effective modelling of hidden relations and structures. This is beyond the currently available models, which only involves limited link information in binary settings. In this paper, we have proposed a unified approach to incorporate various kinds of information into the relational models, including the metadata information and different formats of link data. The proposed informative mixed membership (InfMM) model and informative latent feature (InfLF) model have been demonstrated effective in learning the structure and show advanced performance on learning implicit relations and structures. Also, our adaptive link data modelling method further boosts the capability of utilizing rich link data in the real- work scenario. We are extending our work to: 1), how to integrate the multi-relational networks and unify them into the InfMM framework to deeply understand network structure; 2), when the link data is categorical, what

825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879

Title Suppressed Due to Excessive Size 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934

will be a proper generation distribution to describe the linkage; 3), as there are more advanced constructions for the beta process (Paisley et al., 2010; 2012), what are more flexible ways to incorporate the metadata information into LFRM; and 4), when the metadata information goes beyond the binary scope and becomes the continuous form, we need an effectively way to utilize such information.

References Airoldi, Edoardo M, Blei, David M, Fienberg, Stephen E, and Xing, Eric P. Mixed membership stochastic blockmodels. The Journal of Machine Learning Research, 9: 1981–2014, 2008. Blei, David M., Griffiths, Thomas L., and Jordan, Michael I. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of ACM, 57(2):7:1–7:30, February 2010. ISSN 0004-5411. Eagle, Nathan and Sandy, Alex. Reality mining: sensing complex social systems. Personal Ubiquitous Comput., 10(4):255–268, 2006. Fortunato, S. Community detection in graphs. Physics Reports, 486(3):75–174, 2010. Geyer, Charles J. Practical markov chain monte carlo. Statistical Science, 7(4):473–483, 1992. Girvan, M. and Newman, M.E.J. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002. Griffiths, Thomas L and Ghahramani, Zoubin. The indian buffet process: An introduction and review. Journal of Machine Learning Research, 12:1185–1224, 2011. Griffiths, Tom L. and Ghahramani, Zoubin. Infinite latent feature models and the indian buffet process. In Advances in Neural Information Processing Systems 18, 2006. Hilbe, Joseph M. Negative binomial regression. Cambridge University Press, 2011. Ho, Qirong, Parikh, Ankur P., and Xing, Eric P. A multiscale community blockmodel for network exploration. Journal of the American Statistical Association, 107 (499):916–934, 2012. Hoff, Peter D. Bilinear mixed-effects models for dyadic data. Journal of the american Statistical association, 100(469):286–295, 2005. Hoff, Peter D, Raftery, Adrian E, and Handcock, Mark S. Latent space approaches to social network analysis. Journal of the american Statistical association, 97(460): 1090–1098, 2002. Ishwaran, Hemant and James, Lancelot F. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453), 2001.

Kalli, Maria, Griffin, Jim E, and Walker, Stephen G. Slice sampling mixture models. Statistics and computing, 21 (1):93–105, 2011. Kemp, Charles, Tenenbaum, Joshua B, Griffiths, Thomas L, Yamada, Takeshi, and Ueda, Naonori. Learning systems of concepts with an infinite relational model. In Proceedings of the national conference on artificial intelligence, volume 21, pp. 381. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2006. Kim, Dae Il and Sudderth, Erik B. The doubly correlated nonparametric topic model. In Neural Information Processing Systems, volume 24, 2011. Kim, Dae Il, Hughes, Michael C., and Sudderth, Erik B. The nonparametric metadata dependent relational model. In ICML. icml.cc / Omnipress, 2012. Koutsourelakis, P.S. and Eliassi-Rad, T. Finding mixedmemberships in social networks. In Proceedings of the 2008 AAAI spring symposium on social information processing, 2008. Lazega, E. The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. 2001. Miller, Kurt, Griffiths, Thomas, and Jordan, Michael. Nonparametric latent feature models for link prediction. Advances in neural information processing systems, 22: 1276–1284, 2009. Nowicki, Krzysztof and Snijders, Tom A. B. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077– 1087, 2001. Paisley, John, Zaas, Aimee, Woods, Christopher W, Ginsburg, Geoffrey S, and Carin, Lawrence. A stick-breaking construction of the beta process. In International Conference on Machine Learning, 2010. Paisley, John, Blei, David M, and Jordan, Michael I. Stickbreaking beta processes and the poisson process. In Proceedings of the International Conference on Artificial Intelligence and Statistics, 2012. Palla, Konstantina, Knowles, David A., and Ghahramani, Zoubin. An infinite latent attribute model for network data. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012. Edinburgh, Scotland, GB, July 2012. Papaspiliopoulos, Omiros and Roberts, Gareth O. Retrospective markov chain monte carlo methods for dirichlet process hierarchical models. Biometrika, 95(1):169–186, 2008. Sethuraman, J. A constructive definition of dirichlet priors. Statistica Sinica, 4:639–650, 1994. Tang, Lei and Liu, Huan. Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery, 2(1):1–137, 2010.

935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989


990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044

Teh, Y. W. and G¨ or¨ ur, D. Indian buffet processes with power-law behavior. In Advances in Neural Information Processing Systems, 2009. Teh, Y. W., G¨ or¨ ur, D., and Ghahramani, Z. Stick-breaking construction for the Indian buffet process. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 11, 2007. Thibaux, Romain and Jordan, Michael I. Hierarchical beta processes and the indian buffet process. In International Conference on Artificial Intelligence and Statistics, pp. 564–571, 2007. Yang, Tianbao, Chi, Yun, Zhu, Shenghuo, Gong, Yihong, and Jin, Rong. Detecting communities and their evolutions in dynamic social networks - a bayesian approach. Machine Learning, 82(2):157–189, 2011.

1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099

Learning Hidden Structures with Relational Models by ... - arXiv

Learning Hidden Structures with Relational Models by ... - arXiv

Suggest Documents

Infinite Hidden Relational Models - DBS

Learning directed relational models with recursive dependencies

Learning Probabilistic Relational Models with Structural ... - CiteSeerX

Learning Probabilistic Relational Models with Structural ... - CiteSeerX

Learning Tree Distributions by Hidden Markov Models

Order effects in learning relational structures

Learning with hidden variables

Nonparametric Relational Topic Models through Dependent ... - arXiv

Learning Credible Models - arXiv

Learning Tractable Statistical Relational Models - Sum-Product ...

Variational Bayesian Inference for Hidden Markov Models With ... - arXiv

Overfitting hidden Markov models with an unknown number of ... - arXiv

Learning Tractable Statistical Relational Models - Sum-Product ...

Learning Tractable Statistical Relational Models - Sum-Product ...

On Learning Causal Models from Relational Data

Relational Learning Problems and Simple Models - CiteSeerX

Relational Learning by Observation - CiteSeerX

Online Learning with Hidden Markov Models - MIT Press Journals

Learning Hidden Markov Models from Pairwise Co-occurrences with ...

Fast Inference in Infinite Hidden Relational Models - Semantic Scholar

Learning Hidden Markov Sparse Models - ITA @ UCSD

SOME RELATIONAL STRUCTURES WITH POLYNOMIAL GROWTH ...

Binary-Relational Storage Structures

Cultivating Relational Learning to Address the Hidden Curriculum