Data mining queries: retrieve sets of data, called ... Constraint-based pattern mining framework. Topological patterns i
Céline ROBARDET
Mon, 29.09. Resource-aware Machine Learning
1 / 78
Different kinds of torrents of data .
Resource-aware Machine Learning
2 / 78
Potential increase of our knowledge .
Resource-aware Machine Learning
3 / 78
Viewed as attributed dynamic graphs .
Resource-aware Machine Learning
4 / 78
Mining network data .
Network data brings several questions: • Working with network data is messy • Not just “wiring diagrams” but also dynamics and data (features, attributes) on nodes and edges • Computational challenges • Large scale network data • Algorithmic models as vocabulary for expressing complex
scientific questions • Social science, physics, biology +
Understanding how network structure and node attribute values relate and affect each other.
Resource-aware Machine Learning
5 / 78
Database versus Inductive queries Database queries: retrieve transactions that match a search criteria
+
. Data mining queries: retrieve sets of data, called patterns, that match some criteria
the criteria is computed on individual data or on sets of data
Resource-aware Machine Learning
6 / 78
Constraint-based pattern mining .
Developed for extracting itemsets in transaction databases
Resource-aware Machine Learning
7 / 78
Can be used to analyze graphs . Graphs that are often • Dynamic: when nodes and edges appear/disappear through time • Attributed: when another relation describes the nodes or the
edges themselves Relational graph +
Nodes are uniquely identified through time
Resource-aware Machine Learning
8 / 78
Some inductive queries on graphs . • What are the vertex
attributes that strongly co-vary with the graph structure?
Co-authors that published at ICDE with a high degree and a low clustering coefficient Resource-aware Machine Learning
• What are the sub-graphs
whose vertex attributes evolve similarly?
Airports whose arrival delays increased over the three weeks following Katrina hurricane
9 / 78
Outline of the talk . Constraint-based pattern mining framework Topological patterns in static attributed graphs Mining dynamic graphs Trend dynamic sub-graphs Triggering patterns of topology changes Conclusions and perspectives
Resource-aware Machine Learning
10 / 78
.
Constraint-based pattern mining framework
Resource-aware Machine Learning
Constraint-based pattern mining framework
11 / 78
Pattern mining .
A Pattern φ describes a subgroup of the data D • observed several times • or associated with characteristic properties
The pattern shape is fixed: φ ∈ L +
whose cartinality is exponential or infinite in the size of the data
C evaluates the adequacy of the pattern to the data C(φ, D) → Boolean
Pattern mining task: Find all interesting subgroups Th(L, D, C) = {φ ∈ L | C(φ, D) is true } Resource-aware Machine Learning
Constraint-based pattern mining framework
12 / 78
Pattern constraints .
Constraints are needed for: • only retrieving patterns that describe an interesting subgroup of
the data • making the extraction feasible
Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually. Ü
They are defined up to the partial order ⪯ used for listing the patterns
Resource-aware Machine Learning
Constraint-based pattern mining framework
13 / 78
Pattern constraints .
Constraints are needed for: • only retrieving patterns that describe an interesting subgroup of
the data • making the extraction feasible
Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually. Ü
They are defined up to the partial order ⪯ used for listing the patterns
Resource-aware Machine Learning
Constraint-based pattern mining framework
13 / 78
Pattern constraints .
Constraints are needed for: • only retrieving patterns that describe an interesting subgroup of
the data • making the extraction feasible
Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually. Ü
They are defined up to the partial order ⪯ used for listing the patterns
Resource-aware Machine Learning
Constraint-based pattern mining framework
13 / 78
Constraint properties - 1 .
Monotone constraint
Anti-monotone constraint
∀φ1 ⪯ φ2 , C(φ1 , D) ⇒ C(φ2 , D)
∀φ1 ⪯ φ2 , C(φ2 , D) ⇒ C(φ1 , D)
abcde
abcde
abcd abce abde acde bcde
abcd abce abde acde bcde
abc
abd
abe
acd
ace
ade
bcd
bce
bde
cde
abc
abd
abe
acd
ace
ade
bcd
bce
bde
cde
ab
ac
ad
ae
bc
bd
be
cd
ce
de
ab
ac
ad
ae
bc
bd
be
cd
ce
de
a
b
c
d
e
C(φ, D) ≡ b ∈ φ ∨ c ∈ φ Resource-aware Machine Learning
a
b
c
d
e
C(φ, D) ≡ a ̸∈ φ ∧ c ̸∈ φ
Constraint-based pattern mining framework
14 / 78
Constraint properties - 2 .
Loose AM constraints
Convertible constraints
⪯ is extended to the prefix order ≤ so C(φ, D) ⇒ ∃e ∈ φ : C(φ \ {e}, D) that ∀φ1 ≤ φ2 , C(φ2 , D) ⇒ C(φ1 , D) abcde
abcde
abcd abce abde acde bcde
abcd abce abde acde bcde
abc
abd
abe
acd
ace
ade
bcd
bce
bde
cde
ab
ac
ad
ae
bc
bd
be
cd
ce
de
a
b
c
d
abc
abd
abe
acd
ace
ade
bcd
bce
bde
cde
ab
ac
ad
ae
bc
bd
be
cd
ce
de
C(φ, w) ≡ avg(w(φ)) > σ w(a) ≥ w(b) ≥ w(c) ≥ w(d) ≥ w(e) Resource-aware Machine Learning
a
e
b
c
d
e
C(φ, w) ≡ var(w(φ)) ≤ σ
Pei and Han - 2000 Constraint-based pattern mining framework
Bonchi and Lucchese - 2007 15 / 78
Enumeration strategy .
Binary partition: the element 'a' is enumerated abcde
abcd abce abde acde bcde
abc
abd
abe
acd
ace
ade
bcd
bce
bde
cde
ab
ac
ad
ae
bc
bd
be
cd
ce
de
a
Resource-aware Machine Learning
b
c
d
e
Constraint-based pattern mining framework
16 / 78
Enumeration strategy .
Binary partition: the element 'a' is enumerated R∨
R∨
.
a ∈ R∨ \ R∧
R∧ ∪ {a} R∨ \ {a}
R∧
R∧
Resource-aware Machine Learning
Constraint-based pattern mining framework
16 / 78
Constraint evaluation .
Monotone constraint R∨ C(R∨ , D) is false . empty
R∧
Resource-aware Machine Learning
Anti-monotone constraint R∨
. empty C(R∧ , D) is false R∧
Constraint-based pattern mining framework
17 / 78
Constraint evaluation .
Monotone constraint R∨ C(R∨ , D) is false . empty
R∧
Resource-aware Machine Learning
Anti-monotone constraint R∨
. empty C(R∧ , D) is false R∧
Constraint-based pattern mining framework
17 / 78
Constraint evaluation .
Monotone constraint R∨ C(R∨ , D) is false . empty
R∧
Resource-aware Machine Learning
Anti-monotone constraint R∨
. empty C(R∧ , D) is false R∧
Constraint-based pattern mining framework
17 / 78
Constraint evaluation .
Monotone constraint R∨ C(R∨ , D) is false . empty
R∧
Resource-aware Machine Learning
Anti-monotone constraint R∨
. empty C(R∧ , D) is false R∧
Constraint-based pattern mining framework
17 / 78
A new class of constraints .
Piecewise monotone and anti-monotone constraints 1.
C involves p times the pattern φ: C(φ, D) = f(φ1 , · · · φp , D)
2.
fi,φ (x) = (φ1 , · · · φi−1 , x, φi+1 , · · · , φp , D)
3.
∀i = 1 . . . p, fi,φ is either monotone or anti-monotone: { fi,φ (x) ⇒ fi,φ (y) iff fi,φ is monotone ∀ x ⪯ y, fi,φ (y) ⇒ fi,φ (x) iff fi,φ is anti-monotone
Resource-aware Machine Learning
L. Cerf, J. Besson, C. Robardet, J-F. Boulicaut - 2009 Constraint-based pattern mining framework 18 / 78
An example . • ∀e, w(e) ≥ 0 • C(φ, w) ≡ avg(w(φ)) > σ ≡
∑
w(e) |φ|
e∈φ
> σ.
C(φ, D) is piecewise monotone and anti-monotone with ∑ e∈φ1 w(e) f(φ1 , φ2 , D) = |φ2 | ∀x ⪯ y, • f1,φ is monotone: f(x, φ2 , D) = • f2,φ is anti-monotone: ∑
f(φ1 , y, D) =
e∈φ1
Resource-aware Machine Learning
|y|
w(e)
∑
e∈x w(e) |φ2 |
∑
>σ⇒
e∈φ1
|x|
w(e)
∑
>σ⇒
w(e) |φ2 |
e∈y
>σ
>σ
Constraint-based pattern mining framework
19 / 78
Piecewise constraint exploitation . R∨
Evaluation
If f(R∨ , R∧ , D) = then R is empty.
∑
e∈R∨ w(e) |R∧ |
≤σ
. empty
R∧
Propagation • ∃e ∈ R∨ \ R∧ such that f(R∨ \ {e}, R∧ , D) ≤ σ, then e is moved in
R∧
• ∃e ∈ R∨ \ R∧ such that f(R∨ , R∧ ∪ {e}, D) ≤ σ, then e is removed
from R∨
Resource-aware Machine Learning
Constraint-based pattern mining framework
20 / 78
Algorithmic principles . Function Generic_CBPM_enumeration(R∨ , R∧ ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
if Check_constraints(R∧ , R∨ ) then (R∧ , R∨ )← Constraint_Propagation(R∧ , R∨ ) if R∧ = R∨ then output R∧ else for all e ∈ R∨ \ R∧ do Generic_CBPM_Enumeration(R∧ ∪ {e}, R∨ ) Generic_CBPM_Enumeration(R∧ , R∨ \ {e}) end for end if end if
Resource-aware Machine Learning
Constraint-based pattern mining framework
21 / 78
Case studies . Mining of • Formal concepts [IDA journal 05] • Fault-tolerant patterns [KDID 05, ICCS 06] • Closed patterns in n-ary relations [SDM 08] • Parallel episodes [SDM 09] • Subspace clustering [SDM 09] • Topological patterns in static attributed graphs [TKDE 13] • Evolution patterns in dynamic graphs [ICDM 09] • Trend dynamic sub-graphs [DS 12, PKDD 13] • Triggering patterns [ASONAM 14]
Resource-aware Machine Learning
Constraint-based pattern mining framework
22 / 78
.
Topological patterns in static attributed graphs
Resource-aware Machine Learning
Topological patterns in static attributed graphs
23 / 78
Outline . Constraint-based pattern mining framework Topological patterns in static attributed graphs Topological properties Rank correlation constraint Mining dynamic graphs Trend dynamic sub-graphs Triggering patterns of topology changes Conclusions and perspectives Resource-aware Machine Learning
Topological patterns in static attributed graphs
24 / 78
Mining attributed static graphs .
• Composed by two relations • Are these two relations
linked? + Do the attribute values depend on the role played by the vertex in the graph?
New pattern domain that identifies sets of node attributes that co-vary with the graph structure +
the graph structure is captured by topological properties
Resource-aware Machine Learning
A. Prado, M. Plantevit, C. Robardet, J-F. Boulicaut - 2013 Topological patterns in static attributed graphs 25 / 78
Topological node properties .
Topological properties are derived from the edges of the graph • Direct neighborhood ▶
degree, clustering
• Connection to all other
vertices ▶ Centrality measures Betw. Cent. EigenVector Cent. Size of the community Clust. Coeff. Degree cent. Microscopic View
Resource-aware Machine Learning
Macroscopic View Closeness Cent.
PageRank Size of the largest quasi-cliques #quasi-cliques involving v
Vertex attributes
Topological patterns in static attributed graphs
26 / 78
Topological patterns . h i t 2 5.1 10
A h 9
Topological pattern language
i 3.5
t 11
h i t 25 1.5 18
0 h
i 2 5.2
h 17
ms ≡ (m, s) ∈ M × {+, −}
• L = 2M×{+,−}
F h 13
1
C
i 2
t 9
t 6
N
166
E
h 47
i 0
t 8
h 32
i 1
t 12
P i 7
i 2.5
0
i 6
O t 1
t 15
h 3
i 5
t 3
I K
i 5.3
t 0
27 / 78
14
h 10
12
Betweenness centrality
Topological patterns in static attributed graphs
i 4
0
J
M h 5
0
h 8
0
t 2 h 1
Resource-aware Machine Learning
G
t 7
91
54
h 0
t 14
0
52
73
D
• M × {+, −}
i 4.25
H
B
• M: set of all properties +
h 7
0
L
i 3
1
h i 6 4.5
t 4
t 5
Pattern examples .
Co-authorship network • Node attributes: number of publications in some conferences
(KDD, ICDM, VLDB, etc.).
Patterns that can be discovered • The higher the number of publications at VLDB and SIGMOD, the
higher the centrality value. • The higher the number of publications at SAC • The lower the number of publications at KDD and ICDM. • The lower the centrality value.
Resource-aware Machine Learning
Topological patterns in static attributed graphs
28 / 78
How to extract such patterns? . “Propositionalization": the topological properties are associated to each node.
+
Tabular data
Resource-aware Machine Learning
Topological patterns in static attributed graphs
29 / 78
Topological constraint .
Sets of properties that behave in a similar manner for a large number of vertex pairs • Kendall tau rank correlation coefficient + +
based on the ranks of pattern property values counts the number of vertex pairs that are ordered similarly on all pattern properties
Age .
. Income
.
.
.
Resource-aware Machine Learning
.
.
.
.
.
.
.
.
.
.
Topological patterns in static attributed graphs
.
.
.
.
.
.
30 / 78
.
Example .
P = {KDD+ , ClusCoef− } • Supported by 5 pairs • Ex: (A2,A1) • suppτ (P) =
5
5 2
Resource-aware Machine Learning
=
1 2
Topological patterns in static attributed graphs
31 / 78
Constraint properties .
• Let P ∈ L, Suppτ (P) = {
▷s ≡
< when s is + > when s is −
|{(u,v)∈V2 | ∀ms ∈P:m(u)▷s m(v)}|
(2n)
with
and n is the number of vertices
Ctopo (P, D) ≡ Suppτ (P) ≥ σ
• Ctopo is anti-monotone • Redundant symmetrical patterns
Suppτ (A+ , B− ) = Suppτ (A− , B+ ) • Support computation quadratic on the number of vertices: + +
An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices
Resource-aware Machine Learning
Topological patterns in static attributed graphs
32 / 78
Constraint properties .
• Let P ∈ L, Suppτ (P) = {
▷s ≡
< when s is + > when s is −
|{(u,v)∈V2 | ∀ms ∈P:m(u)▷s m(v)}|
(2n)
with
and n is the number of vertices
Ctopo (P, D) ≡ Suppτ (P) ≥ σ
• Ctopo is anti-monotone • Redundant symmetrical patterns
Suppτ (A+ , B− ) = Suppτ (A− , B+ ) • Support computation quadratic on the number of vertices: + +
An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices
Resource-aware Machine Learning
Topological patterns in static attributed graphs
32 / 78
Two case studies .
Movie Graph
DBLP Graph
• Netflix and IMDb (1998-2005). • Users rate movies (1 to 5 stars). • 5,972 nodes (movies). • 64 338 edges (common actor). • 5 local attributes: Release year, num_users (that rate the movie), avg_rating, stdev_rating, et num_actors • 9 topological properties
Resource-aware Machine Learning
• 42 252 nodes. • 210 320 edges (common
publication). • 29 local attributes: Nb of publications (1990-2013) in 29 journals and conference venues • 9 topological properties
Topological patterns in static attributed graphs
33 / 78
Some results . ä
{avg_rating+ , num_customers+ }
“People rate the films they like." ä
{num_customers+ , Degree+ } “People rate movies with major actors." ( e.g., R de Niro, S. Connery, and T. Hanks )
ä
{stdev_rating+ , PageRank− } “Controversial movies are isolated."
Resource-aware Machine Learning
Topological patterns in static attributed graphs
34 / 78
Some results . ä
{avg_rating+ , num_customers+ }
“People rate the films they like." ä
{num_customers+ , Degree+ } “People rate movies with major actors." ( e.g., R de Niro, S. Connery, and T. Hanks )
ä
{stdev_rating+ , PageRank− } “Controversial movies are isolated."
Resource-aware Machine Learning
Topological patterns in static attributed graphs
34 / 78
Some results ä
{avg_rating+ , num_customers+ }
.
“People rate the films they like." ä ä
{num_customers+ , Degree+ } “People rate movies with major actors." ( e.g., R de Niro, S. Connery, and T. Hanks ) {stdev_rating+ , PageRank− } “Controversial movies are isolated."
Resource-aware Machine Learning
Topological patterns in static attributed graphs
34 / 78
Results on DBLP - 1 .
Is publishing at SAC penalizing? • {SAC+ , ECML/PKDD− }, {SAC+ , KDD− }, {SAC+ , VLDB− } • {SAC+ , PageRank− } • Of course not! • Bias in the data (SAC has a much wider spectrum than databases
and data mining).
Resource-aware Machine Learning
Topological patterns in static attributed graphs
35 / 78
Results on DBLP - 1 .
Is publishing at SAC penalizing? • {SAC+ , ECML/PKDD− }, {SAC+ , KDD− }, {SAC+ , VLDB− } • {SAC+ , PageRank− } • Of course not! • Bias in the data (SAC has a much wider spectrum than databases
and data mining).
Resource-aware Machine Learning
Topological patterns in static attributed graphs
35 / 78
Results on DBLP - 2 .
{Prk+ , Deg+ , Betw+ , ClusCoef− } • Dense part ≡ database • Other parts ≡ NLP, ML
Conclusion: These behaviors are not specific to a community!
Resource-aware Machine Learning
Topological patterns in static attributed graphs
36 / 78
PageRank and conferences . Top 5 publications related to the emergence of {Deg+ } and {Betw+ } for Prk+ (A) and the top 5 authors (B) (A) Rank 1 2 3 4 5
Deg Publication ECML/PKDD+ IEEE TKDE+ PAKDD+ DASFAA+ ICDM+
Factor 2.5 2.28 2.21 2.09 1.95
Resource-aware Machine Learning
Between Publication Factor PVLDB+ 5.67 EDBT+ 5.11 VLDB J.+ 4.35 SIGMOD+ 4.25 ICDE+ 3.42
(B) Prk+ Deg+ Prk+ Between+ + + ECML/PKDD PVLDB Christos Faloutsos Gerhard Weikum Jiawei Han Jiawei Han Philip S. Yu David Maier Bing Liu Philip S. Yu C. Lee Giles Hector GarciaMolina
Topological patterns in static attributed graphs
37 / 78
.
Mining dynamic graphs
Resource-aware Machine Learning
Mining dynamic graphs
38 / 78
Outline . Constraint-based pattern mining framework Topological patterns in static attributed graphs Mining dynamic graphs Constrained subgraphs in static graphs Temporal relationships Trend dynamic sub-graphs Triggering patterns of topology changes Conclusions and perspectives Resource-aware Machine Learning
Mining dynamic graphs
39 / 78
Objective .
• Study the evolution of a
graph over time • Capture strong interactions
in the graph and their evolution over time + Evolving communities of social dynamic graphs
Resource-aware Machine Learning
Mining dynamic graphs
C. Robardet - ICDM 2009 40 / 78
The data . Time 1
Time 2
Time 3
Focus on the microscopic level and propose a constraint-based mining approach to uncover evolving patterns.
Resource-aware Machine Learning
Mining dynamic graphs
41 / 78
Mine subgraphs in each static graph . Time 1
Time 2
Time 3
⇓
⇓
⇓
Compute highly connected subgraphs that are also isolated.
Resource-aware Machine Learning
Mining dynamic graphs
42 / 78
Pseudo-clique constraint . Subgraphs of interest are usually those made of vertices that have a high density of edges. ⇒ Cliques are subgraphs with maximal density ⇒ Pseudo-cliques relax this strong property using a user-defined threshold σ ∈ [0, 1]: S is a pseudo-clique iff
2|ES | ≥σ |S|(|S| − 1)
Properties The pseudo-clique constraint is not anti-monotonic: Expanding a subgraph by adding a vertex could make the density increase or decrease. Resource-aware Machine Learning
Mining dynamic graphs
43 / 78
Loose anti-monotone constraint .
Pseudo-cliques can always be grown from a smaller pseudo-clique with one vertex less [F. Zhu et al. (PAKDD 2007)] • Let S be a pseudo-clique • Let v⋆ be a vertex of S having the smallest degree on S • Thus S \ {v⋆ } is also a pseudo-clique σ=
2 3
Resource-aware Machine Learning
S \ {v⋆ }
S
Mining dynamic graphs
44 / 78
Other useful constraints .
Maximality and isolated constraint Not all the pseudo-cliques of a graph are of importance: • Some are redundant (because non maximal) • Others have many links to external vertices
The isolation constraint imposes a maximum to the average number of external links per vertex: ∑ u∈S (deg(u) − degS (u)) ≤γ |S|
Resource-aware Machine Learning
Mining dynamic graphs
45 / 78
Other useful constraints . σ = 0.7
Without Isolated constraint
9 patterns
Resource-aware Machine Learning
With γ = 1
1 pattern
Mining dynamic graphs
46 / 78
Temporal relationships among subgraphs .
Time 1
Time 2
Time 3
⇓
⇓
⇓
⇒
Combine patterns from consecutive time stamps to construct a global model of the dynamic of the graphs.
Resource-aware Machine Learning
Mining dynamic graphs
47 / 78
Global model of dynamic graph .
Objective
Structured the numerous pseudo-cliques obtained to answer the following questions: • Do the strong interactions growth, diminish or remain stable over
time? • When do the change occur?
Temporal relationships between time consecutive subgraphs • Stability: S remains the same between t − 1 and t • Growth: S enlarges at time t • Diminution: S shrinks at time t • Extinction: S disappears at time t • Emergence: S appears at time t Resource-aware Machine Learning
Mining dynamic graphs
48 / 78
Lyon's shared bicycle system Velo'v .
Velo'v system • 340 stations spread in Lyon • 4000 bikes available in those stations • rental at any station, return it at any other one
Velo'v data • More than 13 millions of bicycle trips • Time-stamps of the trip • Rent and return station IDs Resource-aware Machine Learning
Mining dynamic graphs
49 / 78
Results for Velo'v graph .
Evolving patterns (σ = 0.8 and γ = 5) of Velo'v stations and their localization in Lyon. Patterns are shadowed on the map. Resource-aware Machine Learning
Mining dynamic graphs
50 / 78
.
Trend dynamic sub-graphs
Resource-aware Machine Learning
Trend dynamic sub-graphs
51 / 78
Outline . Constraint-based pattern mining framework Topological patterns in static attributed graphs Mining dynamic graphs Trend dynamic sub-graphs Definition of patterns Definition of constraints Triggering patterns of topology changes Conclusions and perspectives Resource-aware Machine Learning
Trend dynamic sub-graphs
52 / 78
Dynamic attributed graph .
The data {Gt = (V, Et ), t = 1, · · · , tmax } and A a set of ordinal attributes: ai : V × T → Di , with Di the domain of ai
Resource-aware Machine Learning
Trend dynamic sub-graphs
53 / 78
Trend Dynamic Sub-graphs .
What are the sub-graphs whose vertex attributes evolve similarly? Mining maximal sub-graphs that satisfy some constraints on the graph topology and on the attribute values • The connectivity of the dynamic subgraphs is constrained by a
maximum diameter value. • To be more robust towards intrinsic inter-individual variability, we
do not compare raw numerical values, but their trends.
Resource-aware Machine Learning
E. Desmier, M. Plantevit, C. Robardet, J-F. Boulicaut - PKDD 2013 Trend dynamic sub-graphs 54 / 78
Language of patterns . L = {(U, S, Ω) | U ⊂ V, S = ⟨t1 , · · · , ts ⟩, Ω ⊂ A × {+, −}} − Example: ({A, C, D}, ⟨t0 , t1 ⟩, {a+ 1 ,a3 }) ∈ L
a1 0
a2 2
D
a2 1
t0
Resource-aware Machine Learning
a3 4
a1 2
a2 0
C
a1 3
a2 2
a3 4
a1 1
a2 3
.
a3 2
F
C E
D
a3 0
a1 4
a2 2
t1
Trend dynamic sub-graphs
a3 1
a1 3
a2 1
a2 1
a3 0
.
.
. F E
a1 2
a1 2 B
A
.
a3 4
.
a1 a2 1 3
a3 0
.
.
a3 5
.
C
a2 1
a1 a2 6 3
a3 0
B
A
a1 2
a2 1
.
E
a1 2
a1 5
a2 4
D .
a3 1
.
.
a3 1
a2 2
.
.
. .
F
a1 5
.
a3 5
.
a2 3
a3 5
B
A
a1 0
a2 4
.
a1 2
.
a3 3
.
a1 a2 3 5
a3 3
a1 5 t2
55 / 78
a2 1
a3 0
a3 3
Definition of constraints .
A pattern (U, S, Ω) satisfies
• Cdiam ((U, S, Ω), D) ≡ diameterGt (U) ≤ k, ∀t ∈ S
...
1
...
...
Diameter
• Ctrend ((U, S, Ω), D) ≡ ∀u ∈ U, ∀t ∈ S, ∀(a, m) ∈ Ω
{
a(u, t) < a(u, t + 1), if m = + a(u, t) > a(u, t + 1), if m = −
• Cmax ((U, S, Ω), D) ≡ U, S and Ω cannot be enlarged without
invalidating one or both of the above constraints.
Resource-aware Machine Learning
Trend dynamic sub-graphs
56 / 78
Constraint properties .
• Ctrend ((U, S, Ω), D) is anti-monotone • Cdiam ((U, S, Ω), D) is piecewise monotone:
diameterGt (U) ≤ k ≡ max dGt (U) (v, w) ≤ k v,w∈U
f(U1 , U2 , D) = max dGt (U2 ) (v, w) v,w∈U1
• f1,U is anti-monotone i.e. if it is satisfied on U1 , it is also satisfied for
any of its subsets; • f2,U is monotone i.e. if it is satisfied on Gt (U2 ), then, adding some
vertices and edges to Gt (U2 ) will not increase its value +
We can use the propagation mechanisms
• Cmax ((U, S, Ω), D) is pushed using specific mechanisms Resource-aware Machine Learning
Trend dynamic sub-graphs
57 / 78
Katrina hurricane results .
Top pattern w.r.t. time specificity (in red)
Top pattern w.r.t. trend specificity (Yellow)
• 71 airports whose arrival delays increase over 3 weeks.
• 5 airports whose number of flights increased over 3 weeks
• Arrival delays never increased in these airports during another week.
• Substitutions flights were provided from these airports during this period.
• The hurricane strongly influenced the domestic flight punctuality.
• This behavior is rather rare in the rest of the graph
Resource-aware Machine Learning
Trend dynamic sub-graphs
58 / 78
.
Triggering patterns of topology changes
Resource-aware Machine Learning
Triggering patterns of topology changes
59 / 78
Outline . Constraint-based pattern mining framework Topological patterns in static attributed graphs Mining dynamic graphs Trend dynamic sub-graphs Triggering patterns of topology changes Triggering patterns Conclusions and perspectives
Resource-aware Machine Learning
Triggering patterns of topology changes
60 / 78
Context & motivation . Networks structurally change over time: How to describe these dynamics? a: 4 b: 1 c: 5 deg: 1
deg +
c-
a+ b + u2
u2
u1
u3
u5
u4
a: 2 b: 0 c: 6 deg: 2
u2
u2
a: 7 b: 4 u1 c: 5 deg: 1
u3
u5
u4
a: 2 b: 1 c: 6 deg: 2
a: 7 b: 5 c: 2 deg: 1
u1
u3
u5
u4
a: 5 b: 4 c: 5 deg: 2
a: 8 b: 5 c: 1 deg: 4
u3
u5
u4
2
a: 5 b: 5 c: 5 deg: 2
3
4
u2
a: 9 b: 6 u1 c: 1 deg: 4
u3
u5
u4
c-
a+ b +
1
u2
u1
a: 6 b: 5 c: 2 deg: 2
a: 9 b: 6 u1 c: 0 deg: 4
u3
u5
u4
a: 7 b: 5 c: 1 deg: 4
deg +
5
6
Intuition Consider attributed graphs evolving through time The variation of some attribute values (nodes attributes) of a node can lead in several cases to a structural change (topological properties).
Resource-aware Machine Learning
M. Kaytoue, Y. Pitarch, M. Plantevit, C. Robardet -- ASONAM 2014 Triggering patterns of topology changes 61 / 78
Context & motivation .
a: 4 b: 1 c: 5 deg: 1
deg +
c-
a+ b + u2
u2
u1
u3
u5
u4
a: 2 b: 0 c: 6 deg: 2
u2
u2
a: 7 b: 4 u1 c: 5 deg: 1
u3
u5
u4
a: 2 b: 1 c: 6 deg: 2
a: 7 b: 5 c: 2 deg: 1
u1
u3
u5
u4
a: 5 b: 4 c: 5 deg: 2
a: 8 b: 5 c: 1 deg: 4
u2
u1
u3
u5
u4
a: 5 b: 5 c: 5 deg: 2
2
u3
u5
u4
a: 6 b: 5 c: 2 deg: 2
a: 9 b: 6 u1 c: 0 deg: 4
u3
u5
u4
c-
a+ b +
1
u2
a: 9 b: 6 u1 c: 1 deg: 4
3
4
deg +
6
5
Triggering pattern example: ⟨{a+ , b+ }, {c− }, {deg+ }⟩ a+ b+ c− deg+
Updating his status more often giving positive opinions about others receiving less negative opinions from the others is often followed by an increase of user's popularity
Resource-aware Machine Learning
Triggering patterns of topology changes
62 / 78
a: 7 b: 5 c: 1 deg: 4
Dynamic attributed graphs . Let G = {G1 , . . . , Gt } be a sequence of t static attributed graphs • Gi = (V, Ei , F) with T = {1, . . . , t} • F the set of numerical attributes that map each vertex-time pair to
a real value: ∀f ∈ F, f : V × T → R. ä F gathers the node attributes and the node topological properties
a: 4 b: 1 c: 5 deg: 1
deg +
c-
a+ b + u2
u2
u1
u3
u5
u4
a: 2 b: 0 c: 6 deg: 2
u2
u2
a: 7 b: 4 u1 c: 5 deg: 1
u3
u5
u4
a: 2 b: 1 c: 6 deg: 2
a: 7 b: 5 c: 2 deg: 1
u1
u3
u5
u4
a: 5 b: 4 c: 5 deg: 2
a: 8 b: 5 c: 1 deg: 4
u2
u1
u3
u5
u4
a: 5 b: 5 c: 5 deg: 2
2
Resource-aware Machine Learning
u3
u5
u4
a: 6 b: 5 c: 2 deg: 2
a: 9 b: 6 u1 c: 0 deg: 4
u3
u5
u4
c-
a+ b +
1
u2
a: 9 b: 6 u1 c: 1 deg: 4
3
4
Triggering patterns of topology changes
deg +
6
5
63 / 78
a: 7 b: 5 c: 1 deg: 4
Characterizing vertex behaviors Vertex descriptive sequence
.
• A discretization function gives a variation symbol to a
vertex/attribute/time triple, e.g. discr(v, f, i) =
+ if f(v, i) − f(v, i − 1) ≥ 2 and i > 1 − if f(v, i) − f(v, i − 1) ≤ −2 and i > 1 ∅ otherwise
• A vertex v is described by a sequence of itemsets
δ(v) = ⟨{discr(v, f, 1) | f ∈ F}, . . . , {discr(v, f, t) | f ∈ F}⟩ • ∆ = {δ(v) | v ∈ V} is the set of all sequences.
Example δ(u1 ) = ⟨{a+ , b+ }, {c− }, {deg+ }⟩ ∆ = {δ(u1 ), δ(u2 ), δ(u3 ), δ(u4 ), δ(u5 )} Resource-aware Machine Learning
Triggering patterns of topology changes
64 / 78
Triggering patterns . A triggering pattern is a sequence P = ⟨L, R⟩ with • L a sequence of node attribute variations • R a single topological property variation
Support measure supp(P, ∆) = {v ∈ V | P ⪯ δ(v)} where p ⪯ q means that p is a super-sequence of q
Example • L = ⟨{a+ , b+ }, {c− }⟩ • R = ⟨{deg+ }⟩ • supp(⟨L, R⟩, ∆) =
{u1 , u3 } Resource-aware Machine Learning
a: 4 b: 1 c: 5 deg: 1
deg +
c-
a+ b + u2
u2
u1
u3
u5
u4
a: 2 b: 0 c: 6 deg: 2
u2
u2
a: 7 b: 4 u1 c: 5 deg: 1
u3
u5
u4
a: 2 b: 1 c: 6 deg: 2
a: 7 b: 5 c: 2 deg: 1
u1
u3
u5
u4
a: 5 b: 4 c: 5 deg: 2
a: 8 b: 5 c: 1 deg: 4
u3
u5
u4
2
a: 5 b: 5 c: 5 deg: 2
u2
a: 9 b: 6 u1 c: 1 deg: 4
u3
u5
u4
c-
a+ b +
1
u2
u1
3
Triggering patterns of topology changes
4
a: 6 b: 5 c: 2 deg: 2
a: 9 b: 6 u1 c: 0 deg: 4
u3
u5
u4
deg +
5
65 / 78
6
a: 7 b: 5 c: 1 deg: 4
Assessing the strength of a pattern .
Triggering pattern growth rate Let P = ⟨L, R⟩, we denote by ∆R ⊆ ∆ the set of vertex descriptive sequences that contain R. The growth rate of P is given by: GR(P, ∆R ) =
|∆ \ ∆R | |supp(L, ∆R )| × |∆R | |supp(L, ∆ \ ∆R )|
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD, pages 43--52, 1999.
Resource-aware Machine Learning
Triggering patterns of topology changes
66 / 78
Quantitative results .
(i) Runtimes and number of patterns, (ii) Distribution of execution times, (iii) Support distribution, (iv) Growth rate vs support, (v) Scalability. Resource-aware Machine Learning
Triggering patterns of topology changes
67 / 78
The DBLP data Detecting asynchronous events
.
++ , degree++ } → {degree++ } {eigenvector++ 1 }, {VLDB 2 3
Resource-aware Machine Learning
Triggering patterns of topology changes
68 / 78
US flights data . Synchronous events • RITA1: daily in September 2001 {#Canceled+ }{Deg− , Close− , NbCliq− , Prk− , Betw− } → Deg+
Airports that absorb the traffic two days after • RITA2: monthly in (sept. 2000 ; Sept. 2002 ) {#Canceled+ }{#Canceled− }, {nbCliq− , Betw+ } → nbCliq+
A "back to normal" around March 2002 • RITA3: Aug./Sept. 2005 (Katrina Hurricane) {#Canceled+ , DelayAtDep+ }, {#Diverted− , #Depar− , #Arrival− } → {close− }
All the airports supporting this pattern are located in the US West coast where Katrina raged.
Resource-aware Machine Learning
Triggering patterns of topology changes
69 / 78
.
Conclusions and perspectives
Resource-aware Machine Learning
Conclusions and perspectives
70 / 78
Conclusion .
• Constraint-based pattern mining framework can be used to
analyze dynamic graphs • Patterns combine information about the topological structure, the
vertex attribute values and their tendencies. + A wide variety of data can be mined + Provide new insights on the techniques that can be used to analyze dynamic graphs
Resource-aware Machine Learning
Conclusions and perspectives
71 / 78
Challenges and perspectives - 1 .
Enhancing the quality of the extracted information • by capturing changes in the graph structure • while allowing changes in trend direction +
Combining sub-graph and sequence mining
Resource-aware Machine Learning
Conclusions and perspectives
72 / 78
Challenges and perspectives - 2 .
Avoiding the known drawbacks of pattern mining approaches • How to get rid of redundant patterns? • How to fix the threshold values?
Returning patterns whose constraint values are surprising • with respect to what could be expected from randomized data • with respect to what happened in the past (real-time data mining) +
Trigger alerts when changes in behavior
Resource-aware Machine Learning
Conclusions and perspectives
73 / 78
Challenges and perspectives - 2 .
Avoiding the known drawbacks of pattern mining approaches • How to get rid of redundant patterns? • How to fix the threshold values?
Returning patterns whose constraint values are surprising • with respect to what could be expected from randomized data • with respect to what happened in the past (real-time data mining) +
Trigger alerts when changes in behavior
Resource-aware Machine Learning
Conclusions and perspectives
73 / 78
Challenges and perspectives - 3 .
Pattern without threshold values: Skyline patterns Retrieves the patterns that are not dominated by any other pattern Tid t1 t2 t3 t4 t5 t6 Patterns ABCDEF AB AC A ABCD C
Items A B C D E F A B C D E F A B D A C E freq 2 3 3 4 2 3
length 6 2 2 1 4 1
Resource-aware Machine Learning
area 12 6 6 4 8 3
Conclusions and perspectives
A. Soulet, C. Raissi, M. Plantevit, B. Cremilleux - 2011 74 / 78
Ongoing research projects .
Vél'innov - ANR INOV 2012 - labex IMU project • Constraint-based pattern mining of Vélo'V data • Characterizing the usage of the bicycle sharing system • Modeling the system to simulate the impacts of local changes (in the city or in the system) • Pitfall: • Vertex attributes are static (INSEE data) or derived from the edges (e.g., number of users of a considered category) + Searching for relationships between attributes and the graph structure does not make sense • Envisioned solutions: • Considering the user trajectories and looking for paths that characterize a user group • Considering an attributed static graph with temporal usage vectors associated to edges and seeking for sub-graphs homogeneous on vertex and edges attributes Resource-aware Machine Learning
Conclusions and perspectives
75 / 78
Ongoing research projects - 1 .
GRAISearch: enhancing a location-based social media search engine • FP7-PEOPLE-2013-IAPP: Industry-Academia Partnerships and
Pathways • Benefits for DM2L team • 40 researcher months • Access to data with high added value • An industrial cooperation with critical potential benefits for the
company
Resource-aware Machine Learning
Conclusions and perspectives
76 / 78
Ongoing research projects - 2 .
GRAISearch: Research tasks • Geo-localized demographic flow prediction using geo-tagged
social media uploaded data
• Geo-localized event detection algorithm • Integration into a geo-located social media recommender system
Resource-aware Machine Learning
Conclusions and perspectives
77 / 78
Acknowledgments .
Resource-aware Machine Learning
Conclusions and perspectives
78 / 78