cores fast and finding the maximum triangle core number of the graph. The experiments demonstrate the scalability and effectiveness of our ap- proach on 150+ ...... including social, facebook, and technological networks. Further, Table 3 ...
the fields of automatic clustering in multiparametric space and detection of time ... Keywords: Neural networks, Fuzzy Similarity, Data Mining, Data Visualization.
artificial intelligence (A.I.) tools based on neural networks, fuzzy-C sets or genetic algo- ... the analysis of photometric redshifts of galaxies in the Sloan Early Data .... density function of the data in the training set (id est prototype vectors
Jan 31, 2012 - N Hotel,Radiohead,Beatles. 28776 .04 1.04. Table 3: LastFm - Top support (Ï), str. correlation (ϵ), and normalized str. correlation (δlb) attribute ...
Given a network, we want to automatically capture the structural behavior (or function) of nodes via roles. Exam- ... no
Data mining queries: retrieve sets of data, called ... Constraint-based pattern mining framework. Topological patterns i
Data mining queries: retrieve sets of data, called ... Constraint-based pattern mining framework. Topological ..... Mini
Jan 1, 2009 - at providing a general framework on mining Web graphs for recommendations, (1) we first ..... the simple diffusion kernel on the hypercube can result in good ...... We use Flickr, a popular image hosting Web site and online.
Feb 1, 2009 - 4.2 Two graphs with the same number of nodes . .... people, with about 7 billion nodes. ⢠Biology contributes ..... A graph parameter is a real valued function defined on isomorphism types of graphs (including the graph K0 with ......
referred to as data mining techniques, and usually include statistical and machine ... Existing Data Mining algorithms applied in Bioinformatics are either (1).
leading open source machine learning and data mining tools, Weka and RapidMiner. Keywords: Arabic ... ambiguous and more phonetic in Arabic, certain.
RapidMiner, Weka, KEEL, KNIME, Orange, and SPSS. We also .... RapidMiner has an extensive set of tutorials which are very useful in learning how to use the.
integrate Arabic morphological analysis tools into the leading open source machine learning and data mining tools, Weka and RapidMiner. Keywords: Arabic ...
In recent years the educational data mining (EDM) and learning analytics (LA) ... to as knowledge discovery in databases (KDD), involves methods that .... Much of the automated feature distillation functionality of EDM Workbench is .... http://www.cs
[email protected] .... gorithms, though very fast, often do not give good drawings for real world ...... developer of interactive information visualization applications.
Eldar Fischer â¡. Michael Krivelevich §. Mario Szegedy ¶. Abstract. Let P be a property of graphs. An ϵ-test for P is a randomized algorithm which, given the ability.
at the Department of Mathematics, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel. Aviv, Israel, and supported by the Fritz ...
Nov 11, 2014 - ... be viewed as the cele- brated âNetflixâ challenge [4], or more in general as data prediction ... IT] 11 Nov 2014 ..... Cup and Workshop, 2007.
Dynamic graph mining is the task of searching for subgraph patterns that capture the evolution .... One of the most ac- knowledged ..... computing all occurrences of P in all target graphs Gi is O(m â mi) (which is bounded later by ..... trajectori
serve as a very useful feedback to the web site designer. Finally .... may indicate the variation of kind of graph such as scale-free graphs versus random graphs.
the graph are partial orders, represented as Message Se- ...... (API) [11]. The precision and recall of automaton min- ing is improved by a trace filtering and ...
Call the mining algorithm to obtain F. Estimate θlk ..... an advanced graph mining method with the taxonomy of labels .
Aug 23, 2006 - republish, to post on servers or to redistribute to lists, requires prior specific ..... t-frequent cycles as follows: We list the cycles of G for every.
NetMine: Mining Tools for Large Graphs - users.cs.umn.edu
NetMine: Mining Tools for. Large Graphs ... They are needed to build/test realistic graph generators ... User â Website Clickstream Graph: 222,704 nodes and ...
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch
Introduction
Internet Map [lumeta.com]
Food Web [Martinez ’91]
Protein Interactions [genomebiology.com]
► Graphs are ubiquitious Friendship Network [Moody ’01] 2
Graph “Patterns” Given a large graph dataset, what do we focus on? Patterns Aspects of graphs that show up frequently, in datasets from diverse domains. Degree distributions
Power Laws
Count vs Outdegree 3
Graph “Patterns” Given a large graph dataset, what do we focus on? Patterns Aspects of graphs that show up frequently, in datasets from diverse domains. Degree distributions Hop-plots “Scree” plots and others…
Effective Diameter
Hop-plot 4
Graph “Patterns” Why do we like them? They capture interesting properties of graphs. They provide “condensed information” about the graph. They are needed to build/test realistic graph generators ( useful for simulation studies). They help detect abnormalities and outliers.
5
Our Work The NetMine toolkit contains all the patterns mentioned before, and adds: The “min-cut” plot a novel pattern which carries interesting information about the graph.
A-plots a tool to quickly find suspicious subgraphs/nodes. 6
Size of mincut = 2 Two partitions of almost equal size 8
“Min-cut” plot Do min-cuts recursively. log (mincut-size / #edges)
Mincut size = sqrt(N) log (# edges)
N nodes 9
“Min-cut” plot Do min-cuts recursively. New min-cut
log (mincut-size / #edges)
log (# edges)
N nodes 10
“Min-cut” plot Do min-cuts recursively. New min-cut
log (mincut-size / #edges)
Slope = -0.5
log (# edges)
N nodes
For a d-dimensional grid, the slope is -1/d 11
“Min-cut” plot Min-cut sizes have important effects on graph properties, such as efficiency of divide-and-conquer algorithms compact graph representation difference of the graph from well-known graph types for example, slope = 0 for a random graph
12
Experiments Datasets: Google Web Graph: 916,428 nodes and 5,105,039 edges Lucent Router Graph: Undirected graph of network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edges User Website Clickstream Graph: 222,704 nodes and 952,580 edges
A-plots How can we find abnormal nodes or subgraphs? Visualization but most graph visualization techniques do not scale to large graphs!
18
A-plots However, humans are pretty good at “eyeballing” data ☺ Our idea: Sort the adjacency matrix in novel ways and plot the matrix so that patterns become visible to the user
We will demonstrate this on the Lucent Router graph (112,969 nodes and 181,639 edges) 19
A-plots Three types of such plots for undirected graphs… RV-RV (RankValue vs RankValue) Sort nodes based on their “network value” (~first eigenvector) RD-RD (RankDegree vs RankDegree) Sort nodes based on their degree D-RV (Degree vs RankValue) Sort nodes according to “network value”, and show their corresponding degree 20
We can see a “teardrop” shape and also some blank “stripes” and a strong diagonal
Rank of Network Value
RV-RV plot (RankValue vs RankValue) Stripes
(even though there are no self-loops)! Rank of Network Value 21
The “teardrop” structure can be explained by degree-1 and degree-2 nodes
1
2
NV1 = 1/λ * NV2
Rank of Network Value
RV-RV plot (RankValue vs RankValue)
Degree−1 nodes
Degree−2 nodes Rank of Network Value 22
Strong diagonal nodes are more likely to connect to “similar” nodes
Rank of Network Value
RV-RV plot (RankValue vs RankValue)
Degree−1 nodes
Degree−2 nodes Rank of Network Value 23
RD-RD (RankDegree vs RankDegree) dots due to 2-node isolated components
Rank of Degree of node
• Isolated
Rank of Degree of node 24
D-RV (Degree vs RankValue) 2000 1800
Spikes
Degree Degree
1600 1400 1200 1000 800 600 400 200 0 0
50000
100000 150000 200000 250000 300000
Rank of Network Value Rank of Network Value
25
Explanation of “Spikes” and “Stripes” RV-RV plot had stripes; D-RV plot shows spikes. Why? 2−degree“Stripe” nodes nodes (forming Stripe)
“Spike” nodes high degree, but all edges to “Stripe” nodes Spike1
degree-2 nodes connecting only to the “Spike” nodes
Spike2
Stripe
2000 1800
Spikes
1600
Degree
1400 1200 1000 800 600
External connections
400 200 0 0
50000
100000 150000 200000 250000 300000 Rank of Network Value
26
A-plots They helped us detect a buried abnormal subgraph in a large real-world dataset which can then be taken to the domain experts.