Clustering nodes of a network based on their topology and features. A classic research .... A lot real-life networks have been shown to be of relatively constant ...
Community-Based Network Alignment for Large Attributed Network Z H E N G C H E N 1 , X I N L I Y U 2 , B O S O N G 1 , J I A N L I A N G G A O 1 , X I A O H UA H U 1 , W E I - S H I H YA N G 2 1C O L L E G E
O F C O M P U T I N G & I N F O R M AT I C S , D R E X E L U N I V E RS I T Y, U S A
2 D E PA R T M E N T
O F M AT H E M AT I C S , T E M P L E U N I V E RS I T Y, U S A
Acknowledgement: this presentation is sponsored by ACM and SIG Travel Award.
Background Community discovery. Clustering nodes of a network based on their topology and features. A classic research with long history for networks. Network Alignment. Matching nodes of two or more networks based on their topology and non-topological features (attributes). Applications in fusing multisource data. Common ground: topological consistency and attribute consistency; nodes with similar topology and attributes are clustered, nodes with similar topology and features in two networks are matched.
Motivation: Large Network Alignment 1) Previous works on network alignment have 𝑂 𝑛2 quadratic complexity, which is prohibitive for large networks. 2) Previous works could become inaccurate for large networks, because the increased number of potential “candidates”. 3) Large networks have their underlying structures. Global search is a waste of computational resources by not taking advantage of underlying structures.
Motivation: Large Network Alignment 2) Previous works could become inaccurate for large networks, because the increased number of potential “candidates”. 3) Large networks have their underlying structures. Global search is a waste of computational resources by not taking advantage of underlying structures.
Strategy: Divide-and-Conquer
Approach: Probabilistic Generative Model 1) Basis: existing model for dividing the network: Stochastic Block Model. 2) Feasibility: generative model is relatively easier to extend. 3) Digestion: capable of digesting high dimensional data, and we can go beyond topology and include node attributes. 4) Novelty: generative model is rarely used in the problem of network alignment, to our best knowledge.
Approach: Probabilistic Generative Model
Contribution: a novel design of information flow between community discovery and network alignment. A balance between intuition and mathematical convenience.
Approach: Probabilistic Generative Model Main idea: Meta community: latent communities, represented by Dirichlet Priors. Every inferred “observable” community is a mixture 𝚯 of one or more latent communities. This design is responsible for generating attributes.
Approach: Probabilistic Generative Model Main idea: Meta community: latent communities, represented by Dirichlet Priors. Every inferred “observable” community is a mixture 𝚯 of one or more latent communities. This design is responsible for generating attributes. Community alignment: Inferred communities are “roughly” aliened before alignment. The node alignment is computed within the aligned communities to save time.
Approach: Probabilistic Generative Model Main idea:
any algorithm
Meta community: latent communities, represented by Dirichlet Priors. Every inferred “observable” community is a mixture 𝚯 of one or more latent communities. This design is responsible for generating attributes. Community alignment: Inferred communities are “roughly” aliened before alignment. The node alignment is computed within the aligned communities to save time.
community discovery community alignment network node alignment
Alignment feedback: Any algorithm can be used for network alignment. The result feeds back to the mixture 𝚯 , because intuitively alignment change should affect what a community is.
Reduced Time Cost: Dilemma Dilemma in the choice of 𝑘, the number of communities. Unreasonable to assume 𝑘 is constant, it should grow with 𝑉 . 1) Standard solution of SBM is 𝑂 𝑘 2 𝑉 depending quadratically on community number 𝑘. If we let 𝑘 = 𝑉 , then it is 𝑂 𝑉 2 , which fails all our efforts. want less communities 2) Put it in a simple way, if we only allow bijective community alignment (i.e. node alignment happen between one-one corresponding communities), and given a quadratic-complexity alignment algorithm,
then the alignment complexity for the whole network is 𝑂 want more communities
𝑉2 𝑘
.
Reduced Time Cost: Dilemma Dilemma in the choice of 𝑘, the number of communities. Unreasonable to assume 𝑘 is constant, it should grow with 𝑉 . desired to depend as less on 𝑘 as possible
1) Standard solution of SBM is 𝑂 𝑘 2 𝑉 depending quadratically on community number 𝑘. If we let 𝑘 = 𝑉 , then it is 𝑂 𝑉 2 , which fails all our efforts. want less communities 2) Put it in a simple way, if we only allow bijective community alignment (i.e. node alignment happen between one-one corresponding communities), and given a quadratic-complexity alignment algorithm,
then the alignment complexity for the whole network is 𝑂 want more communities
𝑉2 𝑘
.
Reduced Time Cost: Analysis When updating the community label for a node 𝑣, the update formula is 𝐳
new
(𝑣) = arg
max
𝐳 𝑣 ∈{1,…,𝑘}
ℒ1 + ℒ 2
Reduced Time Cost: Analysis When updating the community label for a node 𝑣, the update formula is 𝐳
new
(𝑣) = arg
max
𝐳 𝑣 ∈{1,…,𝑘}
ℒ1 + ℒ 2
constant time for one update, and linear time for the whole network, ignored
Reduced Time Cost: Analysis When updating the community label for a node 𝑣, the update formula is 𝐳
new
(𝑣) = arg
max
𝐳 𝑣 ∈{1,…,𝑘}
ℒ1 + ℒ 2
where ℒ1 𝐳
new
, 𝐏|𝐸 =
𝑒𝑖,𝑗 ln 𝑒𝑖,𝑗 − 𝑛𝑖,𝑗 ln 𝑛𝑖,𝑗 + 𝑛𝑖,𝑗 − 𝑒𝑖,𝑗 ln 𝑛𝑖,𝑗 − 𝑒𝑖,𝑗
𝑖,𝑗∈ 1,…,𝑘
with 𝑒𝑖,𝑗 : the number of edges between community 𝑖 and 𝑗 𝑛𝑖,𝑗 : the number of maximum number of edges between community 𝑖 and 𝑗
Reduced Time Cost: Analysis ℒ1 =
𝑒𝑖,𝑗 ln 𝑒𝑖,𝑗 − 𝑛𝑖,𝑗 ln 𝑛𝑖,𝑗 + 𝑛𝑖,𝑗 − 𝑒𝑖,𝑗 ln 𝑛𝑖,𝑗 − 𝑒𝑖,𝑗
𝑖,𝑗∈ 1,…,𝑘
𝑒𝑖,𝑗 : the number of edges between community 𝑖 and 𝑗 𝑛𝑖,𝑗 : the number of maximum number of edges between community 𝑖 and 𝑗
if one node 𝑣 switch membership, then only those 𝑒𝑖,𝑗 with either 𝑖 or 𝑗 is 𝑣’s adjacent communities. Thus update of 𝑒𝑖,𝑗 costs 𝑂 𝑘 deg 𝑣 of ℒ1, and 𝑂 𝑘 𝐸 for one sweep of updating 𝑧.
Reduced Time Cost: Analysis ℒ1 =
𝑒𝑖,𝑗 ln 𝑒𝑖,𝑗 − 𝑛𝑖,𝑗 ln 𝑛𝑖,𝑗 + 𝑛𝑖,𝑗 − 𝑒𝑖,𝑗 ln 𝑛𝑖,𝑗 − 𝑒𝑖,𝑗
𝑖,𝑗∈ 1,…,𝑘
𝑒𝑖,𝑗 : the number of edges between community 𝑖 and 𝑗 𝑛𝑖,𝑗 : the number of maximum number of edges between community 𝑖 and 𝑗
Unfortunately, say membership switch of 𝑣 is from 𝑟 ′ to 𝑟, all 𝑛𝑟,𝑗 , 𝑛𝑖,𝑟 , 𝑛𝑟 ′ ,𝑗 , 𝑛𝑖,𝑟 ′ need update, which cost 𝑂 𝑘 2 time for ℒ1, and 𝑂 𝑘 2 𝑉 for one sweep of updating 𝐳.
Reduced Time Cost: Math Technique ℒ1 =
𝑒𝑖,𝑗 ln 𝑒𝑖,𝑗 − 𝑛𝑖,𝑗 ln 𝑛𝑖,𝑗 + 𝑛𝑖,𝑗 − 𝑒𝑖,𝑗 ln 𝑛𝑖,𝑗 − 𝑒𝑖,𝑗
𝑖,𝑗∈ 1,…,𝑘
First try Taylor expansion, inspired by “Kronecker Graphs: An Approach to Modeling Networks*” ln(1 − 𝑥) = −𝑥 ln 𝑥 + 𝑥 + 𝑂 𝑥 2 , ∀ 𝑥 < 1 Then 𝑒𝑖,𝑗 𝑒𝑖,𝑗 𝑒𝑖,𝑗 𝑒𝑖,𝑗 ln 𝑛𝑖,𝑗 − 𝑒𝑖,𝑗 = ln 1 − =− ln + + ln 𝑛𝑖,𝑗 + 𝑂 𝑛𝑖,𝑗 𝑛𝑖,𝑗 𝑛𝑖,𝑗 𝑛𝑖,𝑗 𝑒𝑖,𝑗 𝑒𝑖,𝑗 ⇒ ℒ1 = 𝑒𝑖,𝑗 ln − 𝐸𝒾 + 𝑂 𝑛𝑖,𝑗 𝑛𝑖,𝑗
𝑒𝑖,𝑗 𝑛𝑖,𝑗
2
𝑖,𝑗∈ 1,…,𝑘
*Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., & Ghahramani, Z. (2010). Kronecker graphs: An approach to modeling networks. The Journal of Machine Learning Research, 11, 985-1042.
Reduced Time Cost: Math Technique ℒ1 =
𝑖,𝑗∈ 1,…,𝑘
𝑒𝑖,𝑗 𝑒𝑖,𝑗 𝑒𝑖,𝑗 ln − 𝐸𝒾 + 𝑂 𝑛𝑖,𝑗 𝑛𝑖,𝑗
𝑛𝑖,𝑗 = ൝
𝑉𝑖 𝑉𝑗 𝑉𝑖 2 − 𝑉𝑖
𝑖≠𝑗 𝑖=𝑗
𝐸
The ratio is called densification power. A lot real-life networks have been shown to be of 𝑉 relatively constant densification power*.
Charts are from https://view.officeapps.live.com/op/view.aspx?src=http://cs.stanford.edu/~jure/pubs/powergrowth-kdd05.ppt
*Leskovec, J., Kleinberg, J., & Faloutsos, C. (2005). Graphs over time: densification laws, shrinking diameters and possible explanations. Paper presented at the Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining.
Reduced Time Cost: Math Technique ℒ1 =
𝑒𝑖,𝑗 ln
𝑖,𝑗∈ 1,…,𝑘
The ratio
𝐸 𝑉
𝑒𝑖,𝑗 𝑒𝑖,𝑗 − 𝐸𝒾 + 𝑂 𝑛𝑖,𝑗 𝑛𝑖,𝑗
𝑛𝑖,𝑗 = ൝
𝑉𝑖 𝑉𝑗 𝑉𝑖 2 − 𝑉𝑖
𝑖≠𝑗 𝑖=𝑗
is called densification power. A lot real-life networks have been shown to be of 𝐸
relatively constant densification power*. A network is asymptotically sparse if 2 → 0 as 𝑉 𝑉 grows. A network with constant densification power is trivially asymptomatically sparse. Thus, theoretically ℒ1 ≈ σ𝑖,𝑗∈ 1,…,𝑘 𝑒𝑖,𝑗 ln 𝑒𝑖,𝑗
𝑒𝑖,𝑗 𝑛𝑖,𝑗
if the network is locally asymptotically sparse, i.e.
→ 0 for every 𝑖, 𝑗. This is not a strong assumption. Each community is a sub network that might come with its own constant densification power, and inter-community links are usually even more sparse. 𝑛𝑖,𝑗
*Leskovec, J., Kleinberg, J., & Faloutsos, C. (2005). Graphs over time: densification laws, shrinking diameters and possible explanations. Paper presented at the Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining.
Reduced Time Cost: Math Technique ℒ1 ≈
𝑖,𝑗∈ 1,…,𝑘
(𝒾)
𝒾
Now use 𝑛𝑖,𝑗 = 𝐶𝑖
(𝒾)
𝐶𝑗
𝑒𝑖,𝑗 𝑒𝑖,𝑗 ln − 𝐸𝒾 𝑛𝑖,𝑗 bad guy still here
, then
(𝒾)
(𝒾)
𝑒𝑖,𝑗 ln
𝑖,𝑗∈ 1,…,𝑘
=
(𝒾)
𝑒𝑖,𝑗
(𝒾)
𝑛𝑖,𝑗
=
𝑖,𝑗∈ 1,…,𝑘
(𝒾)
𝑒𝑖,𝑗 ln 𝑒𝑖,𝑗 −
𝑖,𝑗∈ 1,…,𝑘 𝑘 (𝒾)
𝑒𝑖,𝑗 ln 𝑒𝑖,𝑗 − ln 𝐶𝑖 (𝒾)
(𝒾)
𝒾
𝒾
𝑒𝑖,𝑗 (𝒾)
𝑒𝑖,𝑗 ln 𝑒𝑖,𝑗 − 𝑒𝑖,∗ ln 𝐶𝑖 𝑖=1
(𝒾)
(𝒾)
𝑒𝑖,𝑗 ln 𝐶𝑖
𝑖,𝑗∈ 1,…,𝑘 𝑘
𝑗=1 (𝒾)
𝑘
𝑖=1 𝑘
𝑖,𝑗∈ 1,…,𝑘
=
(𝒾)
𝑘
𝑘
− ln 𝐶𝑗 𝑗=1 (𝒾)
−
𝒾
𝑖,𝑗∈ 1,…,𝑘 𝒾
𝑒𝑖,𝑗 𝑖=1
(𝒾)
− 𝑒∗,𝑗 ln 𝐶𝑗
no 𝑛𝑖,𝑗
𝑗=1
Finally the calculation of ℒ1 runs in 𝑂 𝑘 𝐸 time, depending linearly on 𝑘.
(𝒾)
(𝒾)
𝑒𝑖,𝑗 ln 𝐶𝑗
Reduced Time Cost: Heuristic Initialization Community label initialization: a node of high degree is usually the central of a community. Start with high-degree nodes, estimate community boundaries when the degree drops below a threshold.
Reduced Time Cost: Heuristic Faster Iteration With the proposed initialization, it quickly becomes less likely for a node to switch to a nonadjacent communities. Iteration
1
2
3
4
Amazon 77222(1) PlosOne 53374 LinkedIn 34221 AMiner 30045 Fliker 21945 Lastfm 26426
23677(2)/16100(3) (20.8%(4)) 22096/19425 (36.4%) 14020/8845 (25.8%) 15903/5642 (18.8%) 16422/10324 (47.0%) 15650/11563 (43.8%)
5389/2458 (3.2%) 9974/3642 (6.8%) 3316/1963 (5.7%) 3081/1274 (4.2%) 7798/2666 (12.1%) 7985/2994 (11.3%)
1866/348 (0.45%) 3161/1163 (2.2%) 928/228 (0.7%) 866/210 (0.7%) 3596/503 (2.3%) 3297/620 (2.4%)
710/122 (0.15%) 1129/373 (0.7%) 297/115 (0.3%) 282/108 (0.3%) 1322/369 (1.7%) 1617/294 (1.1%)
5 /5) 512/143 (0.27%) / / 754/287 (1.3%) 833/261 (1.0%)
After initialization and the first round, at most only half of the nodes can jump to a non-adjacent community.
Community Alignment Main Idea: Communities form a weighted bipartite graph. Cluster on this bipartite graph, and each cluster is treated as one “rough” community alignment. Satisfies 𝐂𝒾,𝒿 𝑖, 𝑗 = 𝐂𝒿,𝒾 𝑗, 𝑖 The weighting is undirected.
1 𝑘
weights: 𝐂𝒾,𝒿 = 𝚯T𝒾 𝚯𝒿
Reason: Community alignment should not be too strict, like one-one alignment. Strict alignment will introduce large error, because. Community alignment should server as a guidance for the node alignment.
Method: Use existing models Aicher, C., Jacobs, A. Z., & Clauset, A. (2013). Adapting the stochastic block model to edge-weighted networks. arXiv preprint arXiv:1305.5782. Larremore, D. B., Clauset, A., & Jacobs, A. Z. (2014). Efficiently inferring community structure in bipartite networks. Physical Review E, 90(1), 012805.
Alignment Feedback Main Idea: The alignment results should affect what a community is. In our model an inferred “observable” community is defined by the metacommunity mixture 𝚯. Thus, the alignment results feed back to 𝚯. Requirement: If alignment changes a little, 𝚯 should not change a lot. This resembles the notion of “stability” in numerical analysis. Method: Truncated SVD. σ
1)
Re-estimate: 𝐂𝒾,𝒿 𝑖, 𝑗 =
(𝒾) 𝐕𝒾,𝒿 𝒾 𝑢∈𝐶 ,𝑣∈𝐶 𝑖 𝑖
2σ
𝑢∈𝐶 𝑖
2)
Decompose: 𝐂𝒾,𝒿 = 𝓤𝒾,𝒿 𝚲𝒾,𝒿 𝓥T𝒾,𝒿
𝒾
𝐕𝒾,𝒿 𝑢,⋅
σ
𝑢,𝑣
+ 1
(𝒾) 𝐕𝒾,𝒿 𝒾 𝑢∈𝐶 ,𝑣∈𝐶 𝑖 𝑖
2σ
𝒿 𝑣∈𝐶 𝑗
𝚯𝒾 = 𝑘𝚲𝒾,𝒿 𝓤𝑇𝒾,𝒿 ⇒ቐ 𝚯𝒿 = 𝑘𝚲𝒾,𝒿 𝓥T𝒾,𝒿
𝐕𝒾,𝒿 ⋅,𝑣
𝑢,𝑣 1
Satisfies 𝐂𝒾,𝒿 𝑖, 𝑗 = 𝐂𝒿,𝒾 𝑗, 𝑖 The weighting is undirected.
SVD is stable, it satisfies the requirement.
Experiment Setup Environment: Matlab, 8-core Intel i7 3.00GHz machine with 32GB memory Compare With: Cosine similarity (SimAlign)
Also used as plugin Koutra, D., Tong, H., & Lubensky, D. (2013). Big-align: Fast bipartite graph alignment. Paper presented at the Data Mining (ICDM), 2013 IEEE 13th International Conference on. into our framework Zhang, S., & Tong, H. (2016). Final: Fast attributed network alignment. Paper presented at the Proceedings of
the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.
Network Attributes: 1)
Topological: degree, h-index (node significance), number of adjacent triangles, clustering coefficient and neighborhood average degree (local topology)
2)
Non-topological: lemmatized words, extracted terms, etc.
Subnetwork Sampling: by edges, better preserve topology info; the network size is determined by the minimum capability of the algorithms to compare.
Experiment Results: Homogenous Networks Data sets: 1) Amazon co-purchase network (https://snap.stanford.edu/data/). If a product is frequently co-purchased with another product, then there is a link between them. With title words as non-topo attributes. 2) PlosOne co-author network (http://journals.plos.org). If two scholars co-authored a publication, then there is an edge between them. With extracted terms as nontopo attributes. Two overlapping subnetworks of about 40K and 20K nodes are extracted respectively for above networks with node identities masked.
Experiment Results: Homogenous Networks √ √
√√ √
√ √
√√ √√
√ √
√√
√√
Pre-alignment: some of ground-truth nodes are aligned and fixed throughout the experiment. Accuracy: measured on the remaining ground-true nodes. Conclusion: The alignment algorithms all perform better in our community-based approach than their original global search for the solutions. Divide-and-Conquer is not only for saving time, but might also improve accuracy for bigdata.
√√
Experiment Results: Homogenous Networks
Higher and leftward points indicate better time-accuracy balance.
Conclusion: The community-based approach saves time, just as expected.
Experiment Results: Homogenous Networks √
√ √
√
√ √
Conclusion: The memory usage is much better than the other two models UniAlign and FINAL-N. There is an analysis of linear memory usage in our paper.
Experiment Results: Heterogenous Networks Data sets (from AMiner https://aminer.org/data-sna): 1) ArnetMiner-LinkedIns. The first is a co-author network, the second is a co-view network. Words extracted from profile data are non-topo attributes. 2) Fliker-Last.fm network. The first is a friendship network, the second is a follower network. Username and gender as non-topo attributes.
Conclusion: In accuracy our approach is better or competitive. In some experiments our approach requires more time because the densification power is larger in these networks and our approach needs larger “unit” computation time.
Experiment Results: Parallelization Simple parallelization: each thread only responsible for parameter updates of some of the communities, and one thread for alignment.
Conclusion: Stable accuracy, sub-quadratic time use.
Experiment Results: Parallelization Simple parallelization: each thread only responsible for parameter updates of some of the communities, and one thread for alignment.
Conclusion: Needed rounds of iterations grow slowly with the number of nodes.
Experiment Results: Parallelization Simple parallelization: each thread only responsible for parameter updates of some of the communities, and one thread for alignment.
Conclusion: Our approach is capable of processing millions of nodes on a single modern compter.
Future Work 1) Bipartite network.
2) Non-parametric method. 3) Experiments on community discovery. 4) Application in fusing multi-source data.
Thank You!