Representation Learning and Sampling for Networks

Representation Learning and Sampling for Networks Tanay Kumar Saha

May 13, 2018

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

1 / 45

Outline 1

About Me

2

Introduction and Motivation

3

Properties of Algorithm for Learning Representation

4

Representation Learning of Nodes in an Evolving Network

5

Representation Learning of Sentences

6

Substructure Sampling

7

Total Recall

8

Name Disambiguation


May 13, 2018

2 / 45

About Me I Defended PhD Thesis (Advisor(s): Mohammad Al Hasan and Jennifer Neville) I Problem/Areas Worked: (1) Latent Representation in Networks, (2) Network Sampling, (3) Total Recall, (4) Name Disambiguation I Already Published: ECML/PKDD(1), CIKM (1), TCBB (1), SADM (1), SNAM (1), IEEE Big Data (1), ASONAM (1), Complenet (1), IEEE CNS (1), BIOKDD (1) I Poster Presentation: RECOMB (1), IEEE Big Data (1) I Paper Under Review: KDD (1), JBHI (1), TMC(1) I In Preparation: ECML/PKDD(1), CIKM(1) I Reproducible Research: Released codes for all the works related to the thesis I Served as a Reviewer: TKDE, TOIS I Provisional Patent Application (3) I Apparatus and Method of Implementing Batch-mode active learning for Technology-Assisted Review (iControlESI) I Apparatus and Method of Implementing Enhanced Batch-Mode Active Learning for Technology-Assisted Review of Documents (iControlESI) I Method and System for Log Based Computer Server Failure Diagnosis (NEC Labs)


May 13, 2018

3 / 45

Outline 1

About Me

2


3


4


5


6


7

Total Recall

8

Name Disambiguation


May 13, 2018

4 / 45

Data Representation I

For machine learning algorithms, we may need to represent data (i.e., learn feature-set, X ) in the d-dimensional space (learn d-factors of variation) I

I

I

Representation Learning: Learn a function which can convert the raw-data into a suitable feature representation, i.e. F : D [, Y ] 7→ X I I

I

For classification, we learn a function, F which can map from a feature-set, X to the corresponding label, Y , i.e., F : X 7→ Y For clustering, we learn a function, F which can map from a feature-set, X to an unknown label, Z , i.e., F : X 7→ Z

task-agnostic vs. task-sensitive localist vs. distributed

How do we define the suitability of feature-set, X (quantification/qualification)? I

Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse

I Disentangled Representations in Neural Models by Tenenbaum et. al I Representation Learning: A Review and New Perspectives by Bengio et al I www.cs.toronto.edu/~bonner/courses/2016s/csc321/webpages/lectures.htm Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

5 / 45

Data Representation

I

Good criteria for learning representation (learning X )? I

There is no clearly defined objective I

I

Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective)

A good representation must disentangle the underlying factors of variation in the training data? I I

How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model?

I Representation Learning: A Review and New Perspectives by Bengio et al I (Introduce Bias) Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings I How do we even decide on how many factors of variation is the best for an application?


May 13, 2018

6 / 45

Data Representation I For link prediction in network, we may represent edges in the space of total number of nodes in a network (d = |V |) Repre. of Nodes Repre. of Edges Network 4

5

3

2

id V1 V2

V1 0 1

V2 1 0

V3 1 1

··· ··· ···

id V1-V2

V1 0

V2 0

V3 1

··· ···

1

I

Edge features: Common neighbor, Adamic-Adar

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed


May 13, 2018

7 / 45

Data Representation I For link prediction in network, we may represent edges in the space of total number of nodes in a network (d = |V |) Repre. of Nodes Repre. of Edges Network 4

5

3

2

id V1 V2

V1 0 1

V2 1 0

V3 1 1

··· ··· ···

id V1-V2

V1 0

V2 0

V3 1

··· ···

1

I

Edge features: Common neighbor, Adamic-Adar

I For document summarization, we may represent a particular sentence in the space of vocabulary/word size (d = |W|) Sentence Representation Sent id Content w1 w2 w3 · · · S1 This place is nice 1 0 1 ··· S2 This place is beautiful 1 1 0 ··· I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed


May 13, 2018

7 / 45

Data Representation in the Latent Space I Capture syntactic (homophily) and semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) I For link prediction in network, we may represent edges as a fixed-length vector Repre. of Nodes Repre. of Edges Network 4

5

3

2

id V1 V2

a1 0.2 0.1

a2 0.3 0.2

a3 0.1 0.3

id V1-V2

a1 0.02

a2 0.06

a3 0.03

1

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed


May 13, 2018

8 / 45

Data Representation in the Latent Space I Capture syntactic (homophily) and semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) I For link prediction in network, we may represent edges as a fixed-length vector Repre. of Nodes Repre. of Edges Network 4

5

3

2

id V1 V2

a1 0.2 0.1

a2 0.3 0.2

a3 0.1 0.3

id V1-V2

a1 0.02

a2 0.06

a3 0.03

1

I Also for document summarization, we may represent fixed-length vector (say, 3-dimensional space) Sentence Sent id Content S1 This place is nice S2 This place is beautiful

a particular sentence as a Representation a1 a2 a3 0.2 0.3 0.4 0.2 0.3 0.4

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

8 / 45

Data Representation (Higher-order feature/substructure as a feature) A B C

A F

D

D D

B B

E

G1

E C

G2

D G3

I Given a set of networks, such as G1 , G2 , and G3


May 13, 2018

9 / 45

Data Representation (Higher-order feature/substructure as a feature) A

A

B C

F D

D D

G1

B B

E

E C

G2

D G3

I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features

A

B

B

B

D

B

C

D

E

E

2-node frequent subgraphs


May 13, 2018

9 / 45


A

B C

F

B

D

D

D

B

E

E C

G1

G2

D G3


A

B

B

B

D

D

A

A

B

B

C

B

C

D

E

E

B

B

B

D

E

B

C

D

E

D

D

E 2-node frequent subgraphs

B

D

E



May 13, 2018

9 / 45


A

B C

F

B

D

D

D

B

E

E C

G1

D

G2

G3


A

B

B

B

D

D

A

A

B

B

C

B

C

D

E

E

B

B

B

D

E

B

E

C

D

E

D

D


B

A B

D

E


D

C



May 13, 2018

9 / 45


A

B C

F D

B

D D

B

E

G1

E C

G2

D G3

I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features Induced Subgraphs A

B

B

B

D

A

B

C

D

E

E

B C


B

D

E



May 13, 2018

9 / 45

Data Representation (Higher-order feature/substructure as a feature) A B C

A F

D G1

D D

B B

E

E C

G2

D G3

I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features I Similar to learning compositional semantics (learning representation for phrases, sentences, paragraphs, or documents) in text domain I Graph can have cycles, so tree-lstm kind of recursive structure is not an option


May 13, 2018

9 / 45

Data Representation (Higher-order feature/substructure as a feature) I

Given a single large undirected network, find the concentration of 3, 4, and 5-size graphlets 3-node subgraph patterns

4-node subgraph patterns

5-node subgraph patterns

Figure: All 3, 4 and 5 node topologies I

This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data)


May 13, 2018

10 / 45

Data Representation (Higher-order feature/substructure as a feature) I

Given a single large directed network, find the concentration of 3, 4, and 5-size directed graphlets

ω3,1

ω3,7

ω3,2

ω3,8

ω3,3

ω3,9

ω3,10

ω3,4

ω3,11

ω3,5

ω3,12

ω3,6

ω3,13

Figure: The 13 unique 3-graphlet types ω3,i (i = 1, 2, . . . , 13). I

This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data)


May 13, 2018

11 / 45

Outline 1

About Me

2


3


4


5


6


7

Total Recall

8

Name Disambiguation


May 13, 2018

12 / 45


I

Good criteria for learning representations (learning X )? I Representation Learning: F : D [, Y ] 7→ X I I

I

There is no clearly defined objective Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective) A good representation must disentangle the underlying factors of variation in the training data? I I

How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model?

I Representation Learning: A Review and New Perspectives by Bengio et al I Disentangled Representation for Manipulation of Sentiment in Text


May 13, 2018

13 / 45

Outline 1

About Me

2


3


4


5


6


7

Total Recall

8

Name Disambiguation


May 13, 2018

14 / 45

Static vs Dynamic (Evolving) Network I

Static: A Single snapshot of a network at a particular time-stamp 1 3

2

4

5 G1

Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network.

Static vs Dynamic (Evolving) Network I

Static: A Single snapshot of a network at a particular time-stamp 1

1

1

3

2

3

2

3

2

4

5

4

5

4

5

G1

G2

G3

Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network. I

Evolving: Multiple snapshots of a network at various time-stamps


May 13, 2018

15 / 45

Latent Representation of Nodes in a Static Network Network

4

5

3

2 1

I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

16 / 45

Latent Representation of Nodes in a Static Network Create Corpus

Network

4

5

3

3

4

5

2

3

4

1

3

2

3

4

5

2 1


May 13, 2018

16 / 45

Latent Representation of Nodes in a Static Network Create Corpus

Network

4

5

3

Learn Representation

3

4

5

2

3

4

1

3

2

3

4

5

-log P( 4 | 3 ) -log P( 5 | 4 )

2 1

I Train a skipped version of Language Model I Minimize Negative log likelihood I Usually solved using negative sampling instead of softmax


May 13, 2018

16 / 45

Latent Representation of Nodes in a Static Network I Most of the existing works can not capture structural equivalence as advertised I Lyu et al. show that external information such as orbit participation of nodes may be helpful in this regard

I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Enhancing the Network Embedding Quality with Structural Similarity


May 13, 2018

17 / 45

Latent Representation in an Evolving Network 1 3

2

4

5 G1

φ1

I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step

I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?

Latent Representation in an Evolving Network 1

1

3

2

3

2

4

5

4

5

G1

G2

φ1

φ2




1

1

3

2

3

2

3

2

4

5

4

5

4

5

G1

G2

G3

φ1

φ2

φ3




1

1

3

2

3

2

3

2

4

5

4

5

4

5

G1

G2

G3

φ4 φ1

φ2

φ3


I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

18 / 45


1

1

1

3

2

3

2

3

2

3

2

4

5

4

5

4

5

4

5

G1

G2

G3

G4

φ4 φ1

φ2

φ3



May 13, 2018

18 / 45

Our Solution 1

1

1

1

3

2

3

2

3

2

3

2

4

5

4

5

4

5

4

5

G1

G2

G3

φ1

φ2

φ3

G4

φ4

Figure: Our Expectation W

G2

G3

Smoothing

φ1

φ2

ET

φ1

φ2

φ2

φ3

W1

W2

R

ET R

D

ee

pW al

k

G1

W

φ3

(a) RET Model

(b) Homo LT Model

φ1

φ2

φ3

(c) Heter LT Model

Figure: Toy illustration of our method Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

19 / 45

Solution Sketch

Figure: A conceptual sketch of retrofitting (top) and linear transformation (bottom) based temporal smoothness. Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

20 / 45

Mathematical Formulation

I

Mathematical Formulation for Retrofitted Models X X βu,v ||φt (u) − φt (v )||2 J(φt ) = αv ||φt (v ) − φ(t−1) (v )||2 + v ∈V

|

(v ,u)∈Et

{z

Temporal Smoothing

} |

{z

Network Proximity

} (1)

I

Mathematical Formulation for Homogeneous Transformation Models     φ2 φ1  φ3   φ2      (2) J(W ) = ||WX − Z ||2 , where X =  .  ; Z =  .  . .  ..   .  φT φT −1


May 13, 2018

21 / 45

Mathematical Formulation I

Heterogeneous Transformation Models J(Wt ) = ||Wt φt − φt+1 ||2 , for t = 1, 2, . . . , (T − 1).

(3)

(a) Uniform smoothing: We weight all projection matrices equally, and linearly combine them: (avg )

W =

T −1 X 1 Wt . T − 1 t=1

(4)

(b) Linear smoothing: We increment the weights of the projection matrices linearly with time: T −1 X t (linear ) W = Wt . (5) T − 1 t=1 (c) Exponential smoothing: We increase weights exponentially, using an exponential operator (exp) and a weighted-collapsed tensor (wct): (exp)

W =

T −1 X

t

exp T −1 Wt

(6)

(1 − θ)T −1−t Wt .

(7)

t=1

(wct)

W =

T −1 X t=1


May 13, 2018

22 / 45

Similarity in Other Modality I Exploiting Similarities among Languages for Machine Translation I Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation? I Retrofitting word vectors to semantic lexicons I (vision, text, knowledge graph) Cross-modal Knowledge Transfer: Improving the word embeddings of Apple by Looking at Oranges I Disentangled Representations for Manipulation of Sentiment of Text I unfortunately, this is a bad movie that is just plain bad I overall, this is a good movie that is just good I Deep manifold traversal: Changing labels with convolutional features I (vision) Transform a smiling portrait into an angry one and make one individual look more like someone else without changing clothing and background

I Controllable Text generation I Learning to generate reviews and discovering sentiment I A neural algorithm of artistic style


May 13, 2018

23 / 45

Outline 1

About Me

2


3


4


5


6


7

Total Recall

8

Name Disambiguation


May 13, 2018

24 / 45

Motivation (Latent Representation of Sentences)

I

Most existing Sen2Vec methods disregard context of a sentence

I

Meaning of one sentence depends on the meaning of its neighbors I I I

I eat my dinner. Then I take some rest. After that I go to bed.

I

Our approach: incorporate extra-sentential context into Sen2Vec

I

We propose two methods: regularization and retrofitting

I

We experiment with two types of context: discourse and similarity.

I Regularized and Retrofitted models for Learning Sentence Representation with Context I CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec


May 13, 2018

25 / 45

Motivation (Discourse is Important) I A simple strategy of decoding the concatenation of the previous and current sentence leads to good performance I A novel strategy of multiencoding and decoding of two sentences leads to the best performance I Target side context is important in translation

I Evaluating Discourse Phenomena in Neural Machine Translation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

26 / 45

Our Approach

I I

Consider content as well as context of a sentence Treat the context sentences as atomic linguistic units I I

Similar in spirit to (Le & Mikolov, 2014) Efficient to train compared to compositional methods like encoder-decoder models (e.g., SDAE, Skip-Thought)

I Sen2Vec, SDAE, SAE, Fast-Sent, Skip-Thought, w2v-avg, c-phrase


May 13, 2018

27 / 45

Content Model (Sen2Vec) I

Treats sentences and words similarly

I

Represented by vectors in shared embedding matrix

I

v: I eat my dinner I

eat

my

dinner

φ : V → Rd look-up v

Figure: Distributed bag of words or DBOW (Le & Mikolov, 2014)


May 13, 2018

28 / 45

Regularized Models (Reg-dis, Reg-sim) I

Incorporate neighborhood directly into the objective function of the content-based model (Sen2Vec) as a regularizer

I

Objective function: J(φ) =

Xh

i Lc (v) + β Lr (v, N(v))

v∈V

=

Xh v∈V

i X ||φ(u) − φ(v)||2 Lc (v) + β | {z } (v,u)∈E Content loss | {z }

(8)

Graph smoothing

I

Train with SGD

I

Regularization with discourse context ⇒ Reg-dis

I

Regularization with similarity context ⇒ Reg-sim

I CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

29 / 45

Pictorial Depiction

u : And I was wondering about the GD LEV.

is

it

reusable

is

it

Lc

reusable

Lc Lr

φ

y : Or is it discarded to burn up on return to LEO?

v

v

(b) Sen2Vec (DBOW)

(c) Reg-dis

(a) A sequence of sentences

u


φ

Lr

v : Is it reusable?

y

May 13, 2018

30 / 45

Retrofitted Model (Ret-dis, Ret-sim) I

Retrofit vectors learned from Sen2Vec s.t. the revised vector φ(v): I I

I

Similar to the prior vector, φ0 (v) Similar to the vectors of its neighboring sentences, φ(u)

Objective function: X X βu,v ||φ(u) − φ(v)||2 J(φ) = αv ||φ(v) − φ0 (v)||2 + {z } | v∈V (v,u)∈E close to prior | {z }

(9)

graph smoothing

I

Solve using Jacobi iterative method

I

Retrofit with discourse context ⇒ Ret-dis

I

Retrofit with similarity context ⇒ Ret-sim


May 13, 2018

31 / 45

Outline 1

About Me

2


3


4


5


6


7

Total Recall

8

Name Disambiguation


May 13, 2018

32 / 45

Frequent Subgraph Mining (Sampling Substructure) I Perform a first-order random-walk over the fixed-size substructure space I MH algorithm calculates acceptance probability using the following equation:

α(x, y ) = min

! π(y )q(y , x) ,1 π(x)q(x, y )

(10)

I For mining frequent substructure from a set of graphs, we use average (s1 ) and set interaction support (s2 ) as the target distibution, i.e., π = s1 or π = s2 I For collecting statistics from a single large graph, we use uniform probabililty distribution as our target distribution I In both cases, we use uniform distribution as our proposal distribution, i.e., q(x, y ) = 1 dx

I F-S-Cube: A sampling based method for top-k frequent subgraph mining I Finding network motifs using MCMC sampling I Discovery of Functional Motifs from the Interface Region of Oligomeric Proteins using Frequent Subgraph Mining I ACTS: Extracting android App topological signature through graphlet sampling


May 13, 2018

33 / 45

Data Representation (Frequent Subgraph Mining) Graph Database

A

A

B C

F

D

D

D

B B

E

G1

E C

G2

D G3

Frequent Induced Subgraphs

A

B

B

B

D

A

B

C

D

E

E

B C


B

D

E


I We find the support-set of edges BD, BE and DE of g13 which are {G1 , G2 , G3 }, {G2 , G3 }, and {G2 , G3 } respectively I So, for gBDE , s1 (gBDE ) = 3+2+3 = 2.67, and s2 (gBDE ) = 2 3


May 13, 2018

34 / 45

Frequent Subgraph Mining (Sampling Substructure)

6

6

1 5,6,7,8,9,10 1 12

1

2

5

1

2 5,6,7,8,10

7

9

12 11

3

4 8

(a)

9

11 3

10

4 8

3

4,9

8 4,5,6,9 10

(a)

(b)

(a) Left: A graph G with the current state of random walk; Right: Neighborhood information of the current state (1,2,3,4)

2 4,5,6,9,12

7

3 5,6,7,8,9,10 4 5,6,8,9

4,9

2

5

(b)

(b) Left: The state of random walk on G (Figure 8a) after one transition; Right: Updated Neighborhood information

Figure: Neighbor generation mechanism

I

For this example, dx = 21, dy = 13


May 13, 2018

35 / 45

Frequent Subgraph Mining (Sampling Substructure) Algorithm 1: SampleIndSubGraph Pseudocode Input : - Graph Gi - Size of subgraph, ` [2]x ← State saved at Gi ; [4]dx ← Neighbor-count of x ; [6]a supx ← score of graph x ; [8]while a neighbor state y is not found do [10] y ← a random neighbor of x; [12] dy ← Neighbor count of y ; [14] a supy ← score of graph y ; [16] accp val ← (dx ∗ a supy )/(dy ∗ a supx ) ; [18] accp probablility ← min(1, accp val) ; [20] if uniform(0, 1) ≤ accp probability then [22] return y ;


May 13, 2018

36 / 45

Frequent Subgraph Mining (Sampling Substructure) Algorithm 2: SampleIndSubGraph Pseudocode Input : - Graph Gi - Size of subgraph, ` [2]x ← State saved at Gi ; [4]dx ← Neighbor-count of x ; [6]a supx ← score of graph x ; [8]while a neighbor state y is not found do [10] y ← a random neighbor of x; [12] dy ← Neighbor count of y ; [14] a supy ← score of graph y ; [16] accp val ← dx /dy ; [18] accp probablility ← min(1, accp val) ; [20] if uniform(0, 1) ≤ accp probability then [22] return y ;

I Motif Counting Beyond Five Nodes Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

37 / 45

Frequent Subgraph Mining (Sampling Substructure) I The random walks are ergodic I It satisfies reversibility condition, so, it achieves the target distribution I We have use spectral gap (λ = 1 − max{λ1 , |λm−1 |}) technique to measure the mixing rate of our random walk I We compute the mixing time (inverse of spectral gap) for size-6 subgraphs of Mutagen dataset and found that the mixing time is approximately around 15 units I We suggest to use multiple chains along with a suitable distance measure (for example, jaccard distance) for choossing a suitable iteration count I We show that the acceptance probability for our technique is quite high (a large number of rejected moves indicate a poorly designed proposal distribution)

Table: Probability of Acceptance of FS3 for Mutagen and PS Dataset Mutagen

Acceptance (%), Strategy =s1 Acceptance (%), Strategy =s2

PS

`= 8

`=9

`=10

`=6

`=7

`=8

82.70 ± 0.04 75.27 ± 0.05

83.89 ± 0.03 76.74 ± 0.03

81.66 ± 0.03 75.20 ± 0.03

91.08 ± 0.01 85.08 ± 0.05

92.23 ± 0.02 87.46 ± 0.06

93.08 ± 0.01 89.41 ± 0.07


May 13, 2018

38 / 45

Outline 1

About Me

2


3


4


5


6


7

Total Recall

8

Name Disambiguation


May 13, 2018

39 / 45

The problem of Total Recall

I Vanity search: Find out everything about me Fandom: find out everything about my hero I Research: Find out everything about my PhD topic I Investigation: Find out everything about something or some activity I Systematic review: Find all published studies evaluating some method or effect Patent search: find all prior art I Electronic discovery: Find all documents responsive to a request for production in a legal matter I Creating archival collections: Label all relevant documents, for posterity, future IR evaluation, etc.

I Batch-mode active learning for technology-assisted review I A large scale study of SVM based methods for abstract screening in systematic reviews


May 13, 2018

40 / 45

An Active Learning Algorithm Algorithm 3: SelectABatch Input : hc , current hyperplane; D, available instances; k, batch size; and similarity threshold, t Output: A batch of k documents to be included in training 1 2 3 4 5 6 7

if Strategy is DS then Bc ← EmptySet() I ← ArgSort (Distance(hc , D), order = increase) while Size (Bc ) < k do Insert(Bc , I [1]) S ← GetSimilar(I [1], I , D, t, similarity = cosine) I ← Remove(I , S)

17

else if Strategy is BPS then w ← 1.0/(Distance(hc , D)2 w ← Normalize(w) I ← List(D) while Size (Bc ) < k do c ← Choose(I , prob = w, num = 1) Insert(Bc , c) S ← GetSimilar(c, I , D, t, similarity = cosine) I ← Remove(I , S) w ← Normalize(w[I ])

18

return Bc

8 9 10 11 12 13 14 15 16


May 13, 2018

41 / 45

Outline 1

About Me

2


3


4


5


6


7

Total Recall

8

Name Disambiguation


May 13, 2018

42 / 45

Name Disambiguation I The graph in this figure corresponds to the ego network of u, Gu I We also assume that u is a multi-node consisting of two name entities I So the removal of the node u (along with all of its incident edges) from Gu makes two disjoint clusters

Figure: A toy example of clustering based entity disambiguation


May 13, 2018

43 / 45

Name Disambiguation I Calculate Normalized-cut score NC =

k X

W (Ci , Ci )

i=1

W (Ci , Ci ) + W (Ci , Ci )

(11)

Number of Papers

I Modeling Temporal Mobility

3 2 1 0

Cluster3

4 3 2 1 0

Cluster2

4 3 2 1 0

Cluster1

I Calculating Temporal Mobility score 13 k−1 X

TM-score =

k X

12

11

10

09

w (Zi , Zj ) ·

07

06

05

D(Zi k Zj ) + D(Zj k Zi )

i=1 j=i+1

k×

Pk−1 Pk i=1

j=i+1


(12)

w (Zi , Zj ) May 13, 2018

44 / 45

Thanks!

I Sen2Vec Code and Datasets: https://github.com/tksaha/con-s2v/tree/jointlearning I Temporal node2vec Code: https://gitlab.com/tksaha/temporalnode2vec.git I Motif Finding Code: https://github.com/tksaha/motif-finding I Frequent Subgraph Mining Code: https://github.com/tksaha/fs3-graph-mining I Finding Functional Motif Code: https://gitlab.com/tksaha/func motif


May 13, 2018

45 / 45

Representation Learning and Sampling for Networks

Representation Learning and Sampling for Networks

Suggest Documents

Representation Learning for Large-scale Dynamic Networks

Representation Learning Using Multi-Task Deep Neural Networks for ...

Representation Learning Using Multi-Task Deep Neural Networks for ...

Deep Trans-layer Unsupervised Networks for Representation Learning

Cutset Sampling for Bayesian Networks

Efficient Structure Learning and Sampling of Bayesian Networks

Hypertext representation for education and learning - Dialnet

Co-Representation Learning For Classification and

Semantic Abstraction for Concept Representation and Learning

Joint Registration and Representation Learning for ...

Hypertext representation for education and learning

DEEP SYMBOLIC REPRESENTATION LEARNING FOR ...

Transfer Representation-Learning for Anomaly

Sampling Generative Networks

Learning, Representation, and Synthesis of

Neural networks for action representation: a ... - BioMedSearch

Neural networks for action representation: a ... - BioMedSearch

Heterogeneous Semantic Networks for Text Representation in ...

Deconvolutional Paragraph Representation Learning

Representation Learning - Semantic Scholar

Collaborative Representation Learning - arXiv

Time-Delay Neural Networks: Representation and ... - CiteSeerX

Representation Learning @ Scale

Distributed Variational Representation Learning