Representation Learning and Sampling for Networks

3 downloads 0 Views 2MB Size Report
May 13, 2018 - Is it even neecessary to do anything but maximize likelihood under a good model? ...... S ← GetSimilar(c, I, D, t, similarity = cosine). 16.
Representation Learning and Sampling for Networks Tanay Kumar Saha

May 13, 2018

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

1 / 45

Outline 1

About Me

2

Introduction and Motivation

3

Properties of Algorithm for Learning Representation

4

Representation Learning of Nodes in an Evolving Network

5

Representation Learning of Sentences

6

Substructure Sampling

7

Total Recall

8

Name Disambiguation

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

2 / 45

About Me I Defended PhD Thesis (Advisor(s): Mohammad Al Hasan and Jennifer Neville) I Problem/Areas Worked: (1) Latent Representation in Networks, (2) Network Sampling, (3) Total Recall, (4) Name Disambiguation I Already Published: ECML/PKDD(1), CIKM (1), TCBB (1), SADM (1), SNAM (1), IEEE Big Data (1), ASONAM (1), Complenet (1), IEEE CNS (1), BIOKDD (1) I Poster Presentation: RECOMB (1), IEEE Big Data (1) I Paper Under Review: KDD (1), JBHI (1), TMC(1) I In Preparation: ECML/PKDD(1), CIKM(1) I Reproducible Research: Released codes for all the works related to the thesis I Served as a Reviewer: TKDE, TOIS I Provisional Patent Application (3) I Apparatus and Method of Implementing Batch-mode active learning for Technology-Assisted Review (iControlESI) I Apparatus and Method of Implementing Enhanced Batch-Mode Active Learning for Technology-Assisted Review of Documents (iControlESI) I Method and System for Log Based Computer Server Failure Diagnosis (NEC Labs)

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

3 / 45

Outline 1

About Me

2

Introduction and Motivation

3

Properties of Algorithm for Learning Representation

4

Representation Learning of Nodes in an Evolving Network

5

Representation Learning of Sentences

6

Substructure Sampling

7

Total Recall

8

Name Disambiguation

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

4 / 45

Data Representation I

For machine learning algorithms, we may need to represent data (i.e., learn feature-set, X ) in the d-dimensional space (learn d-factors of variation) I

I

I

Representation Learning: Learn a function which can convert the raw-data into a suitable feature representation, i.e. F : D [, Y ] 7→ X I I

I

For classification, we learn a function, F which can map from a feature-set, X to the corresponding label, Y , i.e., F : X 7→ Y For clustering, we learn a function, F which can map from a feature-set, X to an unknown label, Z , i.e., F : X 7→ Z

task-agnostic vs. task-sensitive localist vs. distributed

How do we define the suitability of feature-set, X (quantification/qualification)? I

Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse

I Disentangled Representations in Neural Models by Tenenbaum et. al I Representation Learning: A Review and New Perspectives by Bengio et al I www.cs.toronto.edu/~bonner/courses/2016s/csc321/webpages/lectures.htm Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

5 / 45

Data Representation

I

Good criteria for learning representation (learning X )? I

There is no clearly defined objective I

I

Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective)

A good representation must disentangle the underlying factors of variation in the training data? I I

How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model?

I Representation Learning: A Review and New Perspectives by Bengio et al I (Introduce Bias) Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings I How do we even decide on how many factors of variation is the best for an application?

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

6 / 45

Data Representation I For link prediction in network, we may represent edges in the space of total number of nodes in a network (d = |V |) Repre. of Nodes Repre. of Edges Network 4

5

3

2

id V1 V2

V1 0 1

V2 1 0

V3 1 1

··· ··· ···

id V1-V2

V1 0

V2 0

V3 1

··· ···

1

I

Edge features: Common neighbor, Adamic-Adar

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

7 / 45

Data Representation I For link prediction in network, we may represent edges in the space of total number of nodes in a network (d = |V |) Repre. of Nodes Repre. of Edges Network 4

5

3

2

id V1 V2

V1 0 1

V2 1 0

V3 1 1

··· ··· ···

id V1-V2

V1 0

V2 0

V3 1

··· ···

1

I

Edge features: Common neighbor, Adamic-Adar

I For document summarization, we may represent a particular sentence in the space of vocabulary/word size (d = |W|) Sentence Representation Sent id Content w1 w2 w3 · · · S1 This place is nice 1 0 1 ··· S2 This place is beautiful 1 1 0 ··· I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

7 / 45

Data Representation in the Latent Space I Capture syntactic (homophily) and semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) I For link prediction in network, we may represent edges as a fixed-length vector Repre. of Nodes Repre. of Edges Network 4

5

3

2

id V1 V2

a1 0.2 0.1

a2 0.3 0.2

a3 0.1 0.3

id V1-V2

a1 0.02

a2 0.06

a3 0.03

1

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

8 / 45

Data Representation in the Latent Space I Capture syntactic (homophily) and semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) I For link prediction in network, we may represent edges as a fixed-length vector Repre. of Nodes Repre. of Edges Network 4

5

3

2

id V1 V2

a1 0.2 0.1

a2 0.3 0.2

a3 0.1 0.3

id V1-V2

a1 0.02

a2 0.06

a3 0.03

1

I Also for document summarization, we may represent fixed-length vector (say, 3-dimensional space) Sentence Sent id Content S1 This place is nice S2 This place is beautiful

a particular sentence as a Representation a1 a2 a3 0.2 0.3 0.4 0.2 0.3 0.4

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

8 / 45

Data Representation (Higher-order feature/substructure as a feature) A B C

A F

D

D D

B B

E

G1

E C

G2

D G3

I Given a set of networks, such as G1 , G2 , and G3

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

9 / 45

Data Representation (Higher-order feature/substructure as a feature) A

A

B C

F D

D D

G1

B B

E

E C

G2

D G3

I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features

A

B

B

B

D

B

C

D

E

E

2-node frequent subgraphs

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

9 / 45

Data Representation (Higher-order feature/substructure as a feature) A

A

B C

F

B

D

D

D

B

E

E C

G1

G2

D G3

I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features

A

B

B

B

D

D

A

A

B

B

C

B

C

D

E

E

B

B

B

D

E

B

C

D

E

D

D

E 2-node frequent subgraphs

B

D

E

3-node frequent subgraphs

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

9 / 45

Data Representation (Higher-order feature/substructure as a feature) A

A

B C

F

B

D

D

D

B

E

E C

G1

D

G2

G3

I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features

A

B

B

B

D

D

A

A

B

B

C

B

C

D

E

E

B

B

B

D

E

B

E

C

D

E

D

D

2-node frequent subgraphs

B

A B

D

E

3-node frequent subgraphs

D

C

4-node frequent subgraphs

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

9 / 45

Data Representation (Higher-order feature/substructure as a feature) A

A

B C

F D

B

D D

B

E

G1

E C

G2

D G3

I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features Induced Subgraphs A

B

B

B

D

A

B

C

D

E

E

B C

2-node frequent subgraphs

B

D

E

3-node frequent subgraphs

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

9 / 45

Data Representation (Higher-order feature/substructure as a feature) A B C

A F

D G1

D D

B B

E

E C

G2

D G3

I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features I Similar to learning compositional semantics (learning representation for phrases, sentences, paragraphs, or documents) in text domain I Graph can have cycles, so tree-lstm kind of recursive structure is not an option

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

9 / 45

Data Representation (Higher-order feature/substructure as a feature) I

Given a single large undirected network, find the concentration of 3, 4, and 5-size graphlets 3-node subgraph patterns

4-node subgraph patterns

5-node subgraph patterns

Figure: All 3, 4 and 5 node topologies I

This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data)

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

10 / 45

Data Representation (Higher-order feature/substructure as a feature) I

Given a single large directed network, find the concentration of 3, 4, and 5-size directed graphlets

ω3,1

ω3,7

ω3,2

ω3,8

ω3,3

ω3,9

ω3,10

ω3,4

ω3,11

ω3,5

ω3,12

ω3,6

ω3,13

Figure: The 13 unique 3-graphlet types ω3,i (i = 1, 2, . . . , 13). I

This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data)

I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

11 / 45

Outline 1

About Me

2

Introduction and Motivation

3

Properties of Algorithm for Learning Representation

4

Representation Learning of Nodes in an Evolving Network

5

Representation Learning of Sentences

6

Substructure Sampling

7

Total Recall

8

Name Disambiguation

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

12 / 45

Properties of Algorithm for Learning Representation

I

Good criteria for learning representations (learning X )? I Representation Learning: F : D [, Y ] 7→ X I I

I

There is no clearly defined objective Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective) A good representation must disentangle the underlying factors of variation in the training data? I I

How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model?

I Representation Learning: A Review and New Perspectives by Bengio et al I Disentangled Representation for Manipulation of Sentiment in Text

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

13 / 45

Outline 1

About Me

2

Introduction and Motivation

3

Properties of Algorithm for Learning Representation

4

Representation Learning of Nodes in an Evolving Network

5

Representation Learning of Sentences

6

Substructure Sampling

7

Total Recall

8

Name Disambiguation

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

14 / 45

Static vs Dynamic (Evolving) Network I

Static: A Single snapshot of a network at a particular time-stamp 1 3

2

4

5 G1

Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network.

Static vs Dynamic (Evolving) Network I

Static: A Single snapshot of a network at a particular time-stamp 1

1

1

3

2

3

2

3

2

4

5

4

5

4

5

G1

G2

G3

Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network. I

Evolving: Multiple snapshots of a network at various time-stamps

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

15 / 45

Latent Representation of Nodes in a Static Network Network

4

5

3

2 1

I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

16 / 45

Latent Representation of Nodes in a Static Network Create Corpus

Network

4

5

3

3

4

5

2

3

4

1

3

2

3

4

5

2 1

I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

16 / 45

Latent Representation of Nodes in a Static Network Create Corpus

Network

4

5

3

Learn Representation

3

4

5

2

3

4

1

3

2

3

4

5

-log P( 4 | 3 ) -log P( 5 | 4 )

2 1

I Train a skipped version of Language Model I Minimize Negative log likelihood I Usually solved using negative sampling instead of softmax

I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

16 / 45

Latent Representation of Nodes in a Static Network I Most of the existing works can not capture structural equivalence as advertised I Lyu et al. show that external information such as orbit participation of nodes may be helpful in this regard

I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Enhancing the Network Embedding Quality with Structural Similarity

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

17 / 45

Latent Representation in an Evolving Network 1 3

2

4

5 G1

φ1

I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step

I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?

Latent Representation in an Evolving Network 1

1

3

2

3

2

4

5

4

5

G1

G2

φ1

φ2

I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step

I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?

Latent Representation in an Evolving Network 1

1

1

3

2

3

2

3

2

4

5

4

5

4

5

G1

G2

G3

φ1

φ2

φ3

I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step

I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?

Latent Representation in an Evolving Network 1

1

1

3

2

3

2

3

2

4

5

4

5

4

5

G1

G2

G3

φ4 φ1

φ2

φ3

I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step

I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

18 / 45

Latent Representation in an Evolving Network 1

1

1

1

3

2

3

2

3

2

3

2

4

5

4

5

4

5

4

5

G1

G2

G3

G4

φ4 φ1

φ2

φ3

I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step

I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

18 / 45

Our Solution 1

1

1

1

3

2

3

2

3

2

3

2

4

5

4

5

4

5

4

5

G1

G2

G3

φ1

φ2

φ3

G4

φ4

Figure: Our Expectation W

G2

G3

Smoothing

φ1

φ2

ET

φ1

φ2

φ2

φ3

W1

W2

R

ET R

D

ee

pW al

k

G1

W

φ3

(a) RET Model

(b) Homo LT Model

φ1

φ2

φ3

(c) Heter LT Model

Figure: Toy illustration of our method Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

19 / 45

Solution Sketch

Figure: A conceptual sketch of retrofitting (top) and linear transformation (bottom) based temporal smoothness. Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

20 / 45

Mathematical Formulation

I

Mathematical Formulation for Retrofitted Models X X βu,v ||φt (u) − φt (v )||2 J(φt ) = αv ||φt (v ) − φ(t−1) (v )||2 + v ∈V

|

(v ,u)∈Et

{z

Temporal Smoothing

} |

{z

Network Proximity

} (1)

I

Mathematical Formulation for Homogeneous Transformation Models     φ2 φ1  φ3   φ2      (2) J(W ) = ||WX − Z ||2 , where X =  .  ; Z =  .  . .  ..   .  φT φT −1

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

21 / 45

Mathematical Formulation I

Heterogeneous Transformation Models J(Wt ) = ||Wt φt − φt+1 ||2 , for t = 1, 2, . . . , (T − 1).

(3)

(a) Uniform smoothing: We weight all projection matrices equally, and linearly combine them: (avg )

W =

T −1 X 1 Wt . T − 1 t=1

(4)

(b) Linear smoothing: We increment the weights of the projection matrices linearly with time: T −1 X t (linear ) W = Wt . (5) T − 1 t=1 (c) Exponential smoothing: We increase weights exponentially, using an exponential operator (exp) and a weighted-collapsed tensor (wct): (exp)

W =

T −1 X

t

exp T −1 Wt

(6)

(1 − θ)T −1−t Wt .

(7)

t=1

(wct)

W =

T −1 X t=1

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

22 / 45

Similarity in Other Modality I Exploiting Similarities among Languages for Machine Translation I Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation? I Retrofitting word vectors to semantic lexicons I (vision, text, knowledge graph) Cross-modal Knowledge Transfer: Improving the word embeddings of Apple by Looking at Oranges I Disentangled Representations for Manipulation of Sentiment of Text I unfortunately, this is a bad movie that is just plain bad I overall, this is a good movie that is just good I Deep manifold traversal: Changing labels with convolutional features I (vision) Transform a smiling portrait into an angry one and make one individual look more like someone else without changing clothing and background

I Controllable Text generation I Learning to generate reviews and discovering sentiment I A neural algorithm of artistic style

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

23 / 45

Outline 1

About Me

2

Introduction and Motivation

3

Properties of Algorithm for Learning Representation

4

Representation Learning of Nodes in an Evolving Network

5

Representation Learning of Sentences

6

Substructure Sampling

7

Total Recall

8

Name Disambiguation

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

24 / 45

Motivation (Latent Representation of Sentences)

I

Most existing Sen2Vec methods disregard context of a sentence

I

Meaning of one sentence depends on the meaning of its neighbors I I I

I eat my dinner. Then I take some rest. After that I go to bed.

I

Our approach: incorporate extra-sentential context into Sen2Vec

I

We propose two methods: regularization and retrofitting

I

We experiment with two types of context: discourse and similarity.

I Regularized and Retrofitted models for Learning Sentence Representation with Context I CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

25 / 45

Motivation (Discourse is Important) I A simple strategy of decoding the concatenation of the previous and current sentence leads to good performance I A novel strategy of multiencoding and decoding of two sentences leads to the best performance I Target side context is important in translation

I Evaluating Discourse Phenomena in Neural Machine Translation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

26 / 45

Our Approach

I I

Consider content as well as context of a sentence Treat the context sentences as atomic linguistic units I I

Similar in spirit to (Le & Mikolov, 2014) Efficient to train compared to compositional methods like encoder-decoder models (e.g., SDAE, Skip-Thought)

I Sen2Vec, SDAE, SAE, Fast-Sent, Skip-Thought, w2v-avg, c-phrase

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

27 / 45

Content Model (Sen2Vec) I

Treats sentences and words similarly

I

Represented by vectors in shared embedding matrix

I

v: I eat my dinner I

eat

my

dinner

φ : V → Rd look-up v

Figure: Distributed bag of words or DBOW (Le & Mikolov, 2014)

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

28 / 45

Regularized Models (Reg-dis, Reg-sim) I

Incorporate neighborhood directly into the objective function of the content-based model (Sen2Vec) as a regularizer

I

Objective function: J(φ) =

Xh

i Lc (v) + β Lr (v, N(v))

v∈V

=

Xh v∈V

i X ||φ(u) − φ(v)||2 Lc (v) + β | {z } (v,u)∈E Content loss | {z }

(8)

Graph smoothing

I

Train with SGD

I

Regularization with discourse context ⇒ Reg-dis

I

Regularization with similarity context ⇒ Reg-sim

I CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

29 / 45

Pictorial Depiction

u : And I was wondering about the GD LEV.

is

it

reusable

is

it

Lc

reusable

Lc Lr

φ

y : Or is it discarded to burn up on return to LEO?

v

v

(b) Sen2Vec (DBOW)

(c) Reg-dis

(a) A sequence of sentences

u

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

φ

Lr

v : Is it reusable?

y

May 13, 2018

30 / 45

Retrofitted Model (Ret-dis, Ret-sim) I

Retrofit vectors learned from Sen2Vec s.t. the revised vector φ(v): I I

I

Similar to the prior vector, φ0 (v) Similar to the vectors of its neighboring sentences, φ(u)

Objective function: X X βu,v ||φ(u) − φ(v)||2 J(φ) = αv ||φ(v) − φ0 (v)||2 + {z } | v∈V (v,u)∈E close to prior | {z }

(9)

graph smoothing

I

Solve using Jacobi iterative method

I

Retrofit with discourse context ⇒ Ret-dis

I

Retrofit with similarity context ⇒ Ret-sim

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

31 / 45

Outline 1

About Me

2

Introduction and Motivation

3

Properties of Algorithm for Learning Representation

4

Representation Learning of Nodes in an Evolving Network

5

Representation Learning of Sentences

6

Substructure Sampling

7

Total Recall

8

Name Disambiguation

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

32 / 45

Frequent Subgraph Mining (Sampling Substructure) I Perform a first-order random-walk over the fixed-size substructure space I MH algorithm calculates acceptance probability using the following equation:

α(x, y ) = min

! π(y )q(y , x) ,1 π(x)q(x, y )

(10)

I For mining frequent substructure from a set of graphs, we use average (s1 ) and set interaction support (s2 ) as the target distibution, i.e., π = s1 or π = s2 I For collecting statistics from a single large graph, we use uniform probabililty distribution as our target distribution I In both cases, we use uniform distribution as our proposal distribution, i.e., q(x, y ) = 1 dx

I F-S-Cube: A sampling based method for top-k frequent subgraph mining I Finding network motifs using MCMC sampling I Discovery of Functional Motifs from the Interface Region of Oligomeric Proteins using Frequent Subgraph Mining I ACTS: Extracting android App topological signature through graphlet sampling

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

33 / 45

Data Representation (Frequent Subgraph Mining) Graph Database

A

A

B C

F

D

D

D

B B

E

G1

E C

G2

D G3

Frequent Induced Subgraphs

A

B

B

B

D

A

B

C

D

E

E

B C

2-node frequent subgraphs

B

D

E

3-node frequent subgraphs

I We find the support-set of edges BD, BE and DE of g13 which are {G1 , G2 , G3 }, {G2 , G3 }, and {G2 , G3 } respectively I So, for gBDE , s1 (gBDE ) = 3+2+3 = 2.67, and s2 (gBDE ) = 2 3

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

34 / 45

Frequent Subgraph Mining (Sampling Substructure)

6

6

1 5,6,7,8,9,10 1 12

1

2

5

1

2 5,6,7,8,10

7

9

12 11

3

4 8

(a)

9

11 3

10

4 8

3

4,9

8 4,5,6,9 10

(a)

(b)

(a) Left: A graph G with the current state of random walk; Right: Neighborhood information of the current state (1,2,3,4)

2 4,5,6,9,12

7

3 5,6,7,8,9,10 4 5,6,8,9

4,9

2

5

(b)

(b) Left: The state of random walk on G (Figure 8a) after one transition; Right: Updated Neighborhood information

Figure: Neighbor generation mechanism

I

For this example, dx = 21, dy = 13

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

35 / 45

Frequent Subgraph Mining (Sampling Substructure) Algorithm 1: SampleIndSubGraph Pseudocode Input : - Graph Gi - Size of subgraph, ` [2]x ← State saved at Gi ; [4]dx ← Neighbor-count of x ; [6]a supx ← score of graph x ; [8]while a neighbor state y is not found do [10] y ← a random neighbor of x; [12] dy ← Neighbor count of y ; [14] a supy ← score of graph y ; [16] accp val ← (dx ∗ a supy )/(dy ∗ a supx ) ; [18] accp probablility ← min(1, accp val) ; [20] if uniform(0, 1) ≤ accp probability then [22] return y ;

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

36 / 45

Frequent Subgraph Mining (Sampling Substructure) Algorithm 2: SampleIndSubGraph Pseudocode Input : - Graph Gi - Size of subgraph, ` [2]x ← State saved at Gi ; [4]dx ← Neighbor-count of x ; [6]a supx ← score of graph x ; [8]while a neighbor state y is not found do [10] y ← a random neighbor of x; [12] dy ← Neighbor count of y ; [14] a supy ← score of graph y ; [16] accp val ← dx /dy ; [18] accp probablility ← min(1, accp val) ; [20] if uniform(0, 1) ≤ accp probability then [22] return y ;

I Motif Counting Beyond Five Nodes Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

37 / 45

Frequent Subgraph Mining (Sampling Substructure) I The random walks are ergodic I It satisfies reversibility condition, so, it achieves the target distribution I We have use spectral gap (λ = 1 − max{λ1 , |λm−1 |}) technique to measure the mixing rate of our random walk I We compute the mixing time (inverse of spectral gap) for size-6 subgraphs of Mutagen dataset and found that the mixing time is approximately around 15 units I We suggest to use multiple chains along with a suitable distance measure (for example, jaccard distance) for choossing a suitable iteration count I We show that the acceptance probability for our technique is quite high (a large number of rejected moves indicate a poorly designed proposal distribution)

Table: Probability of Acceptance of FS3 for Mutagen and PS Dataset Mutagen

Acceptance (%), Strategy =s1 Acceptance (%), Strategy =s2

PS

`= 8

`=9

`=10

`=6

`=7

`=8

82.70 ± 0.04 75.27 ± 0.05

83.89 ± 0.03 76.74 ± 0.03

81.66 ± 0.03 75.20 ± 0.03

91.08 ± 0.01 85.08 ± 0.05

92.23 ± 0.02 87.46 ± 0.06

93.08 ± 0.01 89.41 ± 0.07

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

38 / 45

Outline 1

About Me

2

Introduction and Motivation

3

Properties of Algorithm for Learning Representation

4

Representation Learning of Nodes in an Evolving Network

5

Representation Learning of Sentences

6

Substructure Sampling

7

Total Recall

8

Name Disambiguation

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

39 / 45

The problem of Total Recall

I Vanity search: Find out everything about me Fandom: find out everything about my hero I Research: Find out everything about my PhD topic I Investigation: Find out everything about something or some activity I Systematic review: Find all published studies evaluating some method or effect Patent search: find all prior art I Electronic discovery: Find all documents responsive to a request for production in a legal matter I Creating archival collections: Label all relevant documents, for posterity, future IR evaluation, etc.

I Batch-mode active learning for technology-assisted review I A large scale study of SVM based methods for abstract screening in systematic reviews

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

40 / 45

An Active Learning Algorithm Algorithm 3: SelectABatch Input : hc , current hyperplane; D, available instances; k, batch size; and similarity threshold, t Output: A batch of k documents to be included in training 1 2 3 4 5 6 7

if Strategy is DS then Bc ← EmptySet() I ← ArgSort (Distance(hc , D), order = increase) while Size (Bc ) < k do Insert(Bc , I [1]) S ← GetSimilar(I [1], I , D, t, similarity = cosine) I ← Remove(I , S)

17

else if Strategy is BPS then w ← 1.0/(Distance(hc , D)2 w ← Normalize(w) I ← List(D) while Size (Bc ) < k do c ← Choose(I , prob = w, num = 1) Insert(Bc , c) S ← GetSimilar(c, I , D, t, similarity = cosine) I ← Remove(I , S) w ← Normalize(w[I ])

18

return Bc

8 9 10 11 12 13 14 15 16

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

41 / 45

Outline 1

About Me

2

Introduction and Motivation

3

Properties of Algorithm for Learning Representation

4

Representation Learning of Nodes in an Evolving Network

5

Representation Learning of Sentences

6

Substructure Sampling

7

Total Recall

8

Name Disambiguation

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

42 / 45

Name Disambiguation I The graph in this figure corresponds to the ego network of u, Gu I We also assume that u is a multi-node consisting of two name entities I So the removal of the node u (along with all of its incident edges) from Gu makes two disjoint clusters

Figure: A toy example of clustering based entity disambiguation

I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

43 / 45

Name Disambiguation I Calculate Normalized-cut score NC =

k X

W (Ci , Ci )

i=1

W (Ci , Ci ) + W (Ci , Ci )

(11)

Number of Papers

I Modeling Temporal Mobility

3 2 1 0

Cluster3

4 3 2 1 0

Cluster2

4 3 2 1 0

Cluster1

I Calculating Temporal Mobility score 13 k−1 X

TM-score =

k X

12

11

10

09

 w (Zi , Zj ) ·

07

06

05

 D(Zi k Zj ) + D(Zj k Zi )

i=1 j=i+1



Pk−1 Pk i=1

j=i+1

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

(12)

w (Zi , Zj ) May 13, 2018

44 / 45

Thanks!

I Sen2Vec Code and Datasets: https://github.com/tksaha/con-s2v/tree/jointlearning I Temporal node2vec Code: https://gitlab.com/tksaha/temporalnode2vec.git I Motif Finding Code: https://github.com/tksaha/motif-finding I Frequent Subgraph Mining Code: https://github.com/tksaha/fs3-graph-mining I Finding Functional Motif Code: https://gitlab.com/tksaha/func motif

Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling

May 13, 2018

45 / 45

Suggest Documents